Re: [linux-next:master] [fs] cdc4ad36a8: kernel_BUG_at_include/linux/page-flags.h

2024-08-06 Thread Matthew Wilcox
On Tue, Aug 06, 2024 at 10:26:17PM +0800, kernel test robot wrote:
> kernel test robot noticed "kernel_BUG_at_include/linux/page-flags.h" on:
> 
> commit: cdc4ad36a871b7ac43fcc6b2891058d332ce60ce ("fs: Convert 
> aops->write_begin to take a folio")
> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
> 
> [test failed on linux-next/master 1e391b34f6aa043c7afa40a2103163a0ef06d179]
> 
> in testcase: boot

This patch should fix it.

Christian, can you squash the fix in?


diff --git a/mm/shmem.c b/mm/shmem.c
index 7d28304aea0f..66ff87417090 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -2904,7 +2904,8 @@ shmem_write_begin(struct file *file, struct address_space 
*mapping,
if (ret)
return ret;
 
-   if (folio_test_has_hwpoisoned(folio)) {
+   if (folio_test_hwpoison(folio) ||
+   (folio_test_large(folio) && folio_test_has_hwpoisoned(folio))) {
folio_unlock(folio);
folio_put(folio);
return -EIO;


Re: Regression on linux-next (next-20240228)

2024-03-05 Thread Matthew Wilcox
On Tue, Mar 05, 2024 at 06:49:16AM +, Borah, Chaitanya Kumar wrote:
> Issue is still seen with the following changes
> 
> void put_pages_list(struct list_head *pages)
>  
> folio_batch_init(&fbatch);
> list_for_each_entry(folio, pages, lru) {
> -   if (!folio_put_testzero(folio))
> +   if (!folio_put_testzero(folio)) {
> +   list_del(&folio->lru);
> continue;
> +   }
> if (folio_test_large(folio)) {
> __folio_put_large(folio);
> +   list_del(&folio->lru);
> continue;
> }

Thanks for testing.  Sorry about this.  I think I figured out what
the problem actually is.  I switched from list_for_each_entry_safe()
to list_for_each_entry() since I was no longer deleting the entries
from the list.  Unfortunately, I was still freeing the entries as I
walked the list!  So it would dereference folio->lru.next after giving
folio back to the page allocator (which probably put it on the PCP list,
where it would point to another free folio?)

Anyway, this should do the job, without the change I asked you to test
above.  If this doesn't do the job by itself, you could try combining
the two changes, but I don't think that will be necessary.

diff --git a/mm/swap.c b/mm/swap.c
index a910af21ba68..1d4b7713605d 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -139,10 +139,10 @@ EXPORT_SYMBOL(__folio_put);
 void put_pages_list(struct list_head *pages)
 {
struct folio_batch fbatch;
-   struct folio *folio;
+   struct folio *folio, *next;
 
folio_batch_init(&fbatch);
-   list_for_each_entry(folio, pages, lru) {
+   list_for_each_entry_safe(folio, next, pages, lru) {
if (!folio_put_testzero(folio))
continue;
if (folio_test_hugetlb(folio)) {


Re: Regression on linux-next (next-20240228)

2024-03-04 Thread Matthew Wilcox
On Mon, Mar 04, 2024 at 10:03:13AM +, Borah, Chaitanya Kumar wrote:
> > Could you try putting the two:
> > 
> > -   list_del(&folio->lru);
> > 
> > statements back in and see if that fixes it?
> 
> That seems to fix it.
> 
> if (!folio_put_testzero(folio))
> +   list_del(&folio->lru);
> continue;

Ummm ... did you put { and } around this?  Otherwise the indentation
is misleading and what you're actually done is:

if (!folio_put_testzero(folio))
list_del(&folio->lru);
continue;

which will simply leak memory.

> if (folio_test_large(folio)) {
> __folio_put_large(folio);
> +   list_del(&folio->lru);
> continue;
> }
> Regards
> 
> Chaitanya


Re: Regression on linux-next (next-20240228)

2024-03-03 Thread Matthew Wilcox
On Mon, Mar 04, 2024 at 04:49:47AM +, Borah, Chaitanya Kumar wrote:
> After bisecting the tree, the following patch [4] seems to be the first "bad"
> commit
> 
> `
> commit ac7130117e8860081be88149061b5abb654d5759
> Author: Matthew Wilcox (Oracle) mailto:wi...@infradead.org
> Date:   Tue Feb 27 17:42:41 2024 +
> 
>     mm: use free_unref_folios() in put_pages_list()
> 
>     Break up the list of folios into batches here so that the folios are more
>     likely to be cache hot when doing the rest of the processing.
> 
>     Link: 
> https://lkml.kernel.org/r/20240227174254.710559-8-wi...@infradead.org
>     Signed-off-by: Matthew Wilcox (Oracle) mailto:wi...@infradead.org
> `
> 
> We could not revert the patch because of a build errors but resetting to the 
> parent of the commit seems to fix the issue
> 
> Could you please check why the patch causes this regression and provide a fix 
> if necessary?

Could you try putting the two:

-   list_del(&folio->lru);

statements back in and see if that fixes it?


Re: [PATCH 2/2] xfs: disable large folio support in xfile_create

2024-01-11 Thread Matthew Wilcox
On Thu, Jan 11, 2024 at 02:00:53PM -0800, Andrew Morton wrote:
> On Wed, 10 Jan 2024 12:04:51 -0800 "Darrick J. Wong"  
> wrote:
> 
> > > > Fixing this will require a bit of an API change, and prefeably sorting 
> > > > out
> > > > the hwpoison story for pages vs folio and where it is placed in the 
> > > > shmem
> > > > API.  For now use this one liner to disable large folios.
> > > > 
> > > > Reported-by: Darrick J. Wong 
> > > > Signed-off-by: Christoph Hellwig 
> > > 
> > > Can someone who knows more about shmem.c than I do please review
> > > https://lore.kernel.org/linux-xfs/20240103084126.513354-4-...@lst.de/
> > > so that I can feel slightly more confident as hch and I sort through the
> > > xfile.c issues?
> > > 
> > > For this patch,
> > > Reviewed-by: Darrick J. Wong 
> > 
> > ...except that I'm still getting 2M THPs even with this enabled, so I
> > guess either we get to fix it now, or create our own private tmpfs mount
> > so that we can pass in huge=never, similar to what i915 does. :(
> 
> What is "this"?  Are you saying that $Subject doesn't work, or that the
> above-linked please-review patch doesn't work?

shmem pays no attention to the mapping_large_folio_support() flag,
so the proposed fix doesn't work.  It ought to, but it has its own way
of doing it that predates mapping_large_folio_support existing.


Re: disable large folios for shmem file used by xfs xfile

2024-01-10 Thread Matthew Wilcox
On Wed, Jan 10, 2024 at 05:28:22PM +0200, Joonas Lahtinen wrote:
> Quoting Joonas Lahtinen (2024-01-10 17:20:24)
> > However we specifically pass "huge=within_size" to vfs_kern_mount when
> > creating a private mount of tmpfs for the purpose of i915 created
> > allocations.
> > 
> > Older hardware also had some address hashing bugs where 2M aligned
> > memory caused a lot of collisions in TLB so we don't enable it always.
> > 
> > You can see drivers/gpu/drm/i915/gem/i915_gemfs.c function
> > i915_gemfs_init for details and references.
> > 
> > So in short, functionality wise we should be fine either default
> > for using 2M pages or not. If they become the default, we'd probably
> > want an option that would still be able to prevent them for performance
> > regression reasons on older hardware.
> 
> To maybe write out my concern better:
> 
> Is there plan to enable huge pages by default in shmem?

Not in the next kernel release, but eventually the plan is to allow
arbitrary order folios to be used in shmem.  So you could ask it to create
a 256kB folio for you, if that's the right size to manage memory in.

How shmem and its various users go about choosing the right size is not
quite clear to me yet.  Perhaps somebody else will do it before I get
to it; I have a lot of different sub-projects to work on at the moment,
and shmem isn't blocking any of them.  And I have a sneaking suspicion
that more work is needed in the swap code to deal with arbitrary order
folios, so that's another reason for me to delay looking at this ;-)


Re: disable large folios for shmem file used by xfs xfile

2024-01-10 Thread Matthew Wilcox
On Wed, Jan 10, 2024 at 10:21:07AM +0100, Christoph Hellwig wrote:
> Hi all,
> 
> Darrick reported that the fairly new XFS xfile code blows up when force
> enabling large folio for shmem.  This series fixes this quickly by disabling
> large folios for this particular shmem file for now until it can be fixed
> properly, which will be a lot more invasive.
> 
> I've added most of you to the CC list as I suspect most other users of
> shmem_file_setup and friends will have similar issues.

The graphics users _want_ to use large folios.  I'm pretty sure they've
been tested with this.  It's just XFS that didn't know about this
feature of shmem.


[Intel-gfx] [PATCH] i915: Limit the length of an sg list to the requested length

2023-09-19 Thread Matthew Wilcox (Oracle)
The folio conversion changed the behaviour of shmem_sg_alloc_table() to
put the entire length of the last folio into the sg list, even if the sg
list should have been shorter.  gen8_ggtt_insert_entries() relied on the
list being the right langth and would overrun the end of the page tables.
Other functions may also have been affected.

Clamp the length of the last entry in the sg list to be the expected
length.

Signed-off-by: Matthew Wilcox (Oracle) 
Fixes: 0b62af28f249 ("i915: convert shmem_sg_free_table() to use a folio_batch")
Cc: sta...@vger.kernel.org # 6.5.x
Link: https://gitlab.freedesktop.org/drm/intel/-/issues/9256
Link: https://lore.kernel.org/lkml/6287208.lov4wx5...@natalenko.name/
Reported-by: Oleksandr Natalenko 
Tested-by: Oleksandr Natalenko 
---
 drivers/gpu/drm/i915/gem/i915_gem_shmem.c | 11 +++
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c 
b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
index 8f1633c3fb93..73a4a4eb29e0 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
@@ -100,6 +100,7 @@ int shmem_sg_alloc_table(struct drm_i915_private *i915, 
struct sg_table *st,
st->nents = 0;
for (i = 0; i < page_count; i++) {
struct folio *folio;
+   unsigned long nr_pages;
const unsigned int shrink[] = {
I915_SHRINK_BOUND | I915_SHRINK_UNBOUND,
0,
@@ -150,6 +151,8 @@ int shmem_sg_alloc_table(struct drm_i915_private *i915, 
struct sg_table *st,
}
} while (1);
 
+   nr_pages = min_t(unsigned long,
+   folio_nr_pages(folio), page_count - i);
if (!i ||
sg->length >= max_segment ||
folio_pfn(folio) != next_pfn) {
@@ -157,13 +160,13 @@ int shmem_sg_alloc_table(struct drm_i915_private *i915, 
struct sg_table *st,
sg = sg_next(sg);
 
st->nents++;
-   sg_set_folio(sg, folio, folio_size(folio), 0);
+   sg_set_folio(sg, folio, nr_pages * PAGE_SIZE, 0);
} else {
/* XXX: could overflow? */
-   sg->length += folio_size(folio);
+   sg->length += nr_pages * PAGE_SIZE;
}
-   next_pfn = folio_pfn(folio) + folio_nr_pages(folio);
-   i += folio_nr_pages(folio) - 1;
+   next_pfn = folio_pfn(folio) + nr_pages;
+   i += nr_pages - 1;
 
/* Check that the i965g/gm workaround works. */
GEM_BUG_ON(gfp & __GFP_DMA32 && next_pfn >= 0x0010UL);
-- 
2.40.1



Re: [Intel-gfx] [REGRESSION] [BISECTED] Panic in gen8_ggtt_insert_entries() with v6.5

2023-09-19 Thread Matthew Wilcox
On Tue, Sep 19, 2023 at 08:11:47PM +0200, Oleksandr Natalenko wrote:
> I can confirm this one fixes the issue for me on T460s laptop. Thank you!

Yay!

> Should you submit it, please add:
> 
> Fixes: 0b62af28f2 ("i915: convert shmem_sg_free_table() to use a folio_batch")

Thanks for collecting all these; you're making my life really easy.
One minor point is that the standard format for Fixes: is 12 characters,
not 10.  eg,

Documentation/process/5.Posting.rst:Fixes: 1f2e3d4c5b6a ("The first line of 
the commit specified by the first 12 characters of its SHA-1 ID")

I have this in my .gitconfig:

[pretty]
fixes = Fixes: %h (\"%s\")

and in .git/config,

[core]
abbrev = 12

I'm working on the commit message now.


Re: [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for Panic in gen8_ggtt_insert_entries() with v6.5

2023-09-19 Thread Matthew Wilcox
On Tue, Sep 19, 2023 at 04:12:46PM -, Patchwork wrote:
> -:7: WARNING:COMMIT_LOG_LONG_LINE: Prefer a maximum 75 chars per line 
> (possible unwrapped commit description?)
> #7: 
> > Andrzej asked me to try to revert commits 0b62af28f249, e0b72c14d8dc and 
> > 1e0877d58b1e, and reverting those fixed the i915 crash for me. The 
> > e0b72c14d8dc and 1e0877d58b1e commits look like just prerequisites, so I 
> > assume 0b62af28f249 ("i915: convert shmem_sg_free_table() to use a 
> > folio_batch") is the culprit here.

This is just a parsing fail.

> -:48: CHECK:PARENTHESIS_ALIGNMENT: Alignment should match open parenthesis
> #48: FILE: drivers/gpu/drm/i915/gem/i915_gem_shmem.c:155:
> + nr_pages = min_t(unsigned long,
> + folio_nr_pages(folio), page_count - i);

This is bullshit.  Aligning to open paren is an antipattern that leads
to significant unnecessary code churn.  I will not be part of such a
stupid system.


Re: [Intel-gfx] [REGRESSION] [BISECTED] Panic in gen8_ggtt_insert_entries() with v6.5

2023-09-19 Thread Matthew Wilcox
On Tue, Sep 19, 2023 at 04:43:41PM +0100, Matthew Wilcox wrote:
> Could I ask you to try this patch?  I'll follow up with another patch
> later because I think I made another assumption that may not be valid.

Ah, no, never mind.  I thought we could start in the middle of a folio,
but we always start constructing the sg list from index 0 of the file,
so we always start at the first page of a folio.  If this patch solves
your problem, then I think we're done.


Re: [Intel-gfx] [REGRESSION] [BISECTED] Panic in gen8_ggtt_insert_entries() with v6.5

2023-09-19 Thread Matthew Wilcox
On Tue, Sep 19, 2023 at 10:26:42AM +0200, Oleksandr Natalenko wrote:
> Andrzej asked me to try to revert commits 0b62af28f249, e0b72c14d8dc and 
> 1e0877d58b1e, and reverting those fixed the i915 crash for me. The 
> e0b72c14d8dc and 1e0877d58b1e commits look like just prerequisites, so I 
> assume 0b62af28f249 ("i915: convert shmem_sg_free_table() to use a 
> folio_batch") is the culprit here.
> 
> Could you please check this?
> 
> Our conversation with Andrzej is available at drm-intel GitLab [1].
> 
> Thanks.
> 
> [1] https://gitlab.freedesktop.org/drm/intel/-/issues/9256

Wow, that is some great debugging.  Thanks for all the time & effort
you and others have invested.  Sorry for breaking your system.

You're almost right about the "prerequisites", but it's in the other
direction; 0b62af28f249 is a prerequisite for the later two cleanups,
so reverting all three is necessary to test 0b62af28f249.

It seems to me that you've isolated the problem to constructing overly
long sg lists.  I didn't realise that was going to be a problem, so
that's my fault.

Could I ask you to try this patch?  I'll follow up with another patch
later because I think I made another assumption that may not be valid.

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c 
b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
index 8f1633c3fb93..73a4a4eb29e0 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
@@ -100,6 +100,7 @@ int shmem_sg_alloc_table(struct drm_i915_private *i915, 
struct sg_table *st,
st->nents = 0;
for (i = 0; i < page_count; i++) {
struct folio *folio;
+   unsigned long nr_pages;
const unsigned int shrink[] = {
I915_SHRINK_BOUND | I915_SHRINK_UNBOUND,
0,
@@ -150,6 +151,8 @@ int shmem_sg_alloc_table(struct drm_i915_private *i915, 
struct sg_table *st,
}
} while (1);
 
+   nr_pages = min_t(unsigned long,
+   folio_nr_pages(folio), page_count - i);
if (!i ||
sg->length >= max_segment ||
folio_pfn(folio) != next_pfn) {
@@ -157,13 +160,13 @@ int shmem_sg_alloc_table(struct drm_i915_private *i915, 
struct sg_table *st,
sg = sg_next(sg);
 
st->nents++;
-   sg_set_folio(sg, folio, folio_size(folio), 0);
+   sg_set_folio(sg, folio, nr_pages * PAGE_SIZE, 0);
} else {
/* XXX: could overflow? */
-   sg->length += folio_size(folio);
+   sg->length += nr_pages * PAGE_SIZE;
}
-   next_pfn = folio_pfn(folio) + folio_nr_pages(folio);
-   i += folio_nr_pages(folio) - 1;
+   next_pfn = folio_pfn(folio) + nr_pages;
+   i += nr_pages - 1;
 
/* Check that the i965g/gm workaround works. */
GEM_BUG_ON(gfp & __GFP_DMA32 && next_pfn >= 0x0010UL);


Re: [Intel-gfx] [PATCH v6 1/4] drm: Use XArray instead of IDR for minors

2023-08-29 Thread Matthew Wilcox
On Tue, Aug 29, 2023 at 02:35:34PM -0400, James Zhu wrote:
> 
> On 2023-08-29 14:33, Matthew Wilcox wrote:
> > On Tue, Aug 29, 2023 at 01:34:22PM -0400, James Zhu wrote:
> > > > > > @@ -1067,7 +1055,7 @@ static void drm_core_exit(void)
> > > > > > unregister_chrdev(DRM_MAJOR, "drm");
> > > > > > debugfs_remove(drm_debugfs_root);
> > > > > > drm_sysfs_destroy();
> > > > > > -   idr_destroy(&drm_minors_idr);
> > > > > [JZ] Should we call xa_destroy instead here?
> > > > We could, if we expect the xarray to potentially not be empty.
> > > >   From what I understand - all minors should be released at this point.
> > > [JZ] In practice,  adding destroy here will be better.
> > Why do you say that?
> [JZ] In case, the future, INIT adds something, need DESTROY to free
> completely.

That isn't going to happen.


Re: [Intel-gfx] [PATCH v6 1/4] drm: Use XArray instead of IDR for minors

2023-08-29 Thread Matthew Wilcox
On Tue, Aug 29, 2023 at 01:34:22PM -0400, James Zhu wrote:
> > > > @@ -1067,7 +1055,7 @@ static void drm_core_exit(void)
> > > > unregister_chrdev(DRM_MAJOR, "drm");
> > > > debugfs_remove(drm_debugfs_root);
> > > > drm_sysfs_destroy();
> > > > -   idr_destroy(&drm_minors_idr);
> > > [JZ] Should we call xa_destroy instead here?
> > We could, if we expect the xarray to potentially not be empty.
> >  From what I understand - all minors should be released at this point.
> [JZ] In practice,  adding destroy here will be better.

Why do you say that?


Re: [Intel-gfx] [PATCH 03/13] scatterlist: Add sg_set_folio()

2023-07-30 Thread Matthew Wilcox
On Sun, Jul 30, 2023 at 09:57:06PM +0800, Zhu Yanjun wrote:
> 
> 在 2023/7/30 19:18, Matthew Wilcox 写道:
> > On Sun, Jul 30, 2023 at 07:01:26PM +0800, Zhu Yanjun wrote:
> > > Does the following function have folio version?
> > > 
> > > "
> > > int sg_alloc_append_table_from_pages(struct sg_append_table *sgt_append,
> > >   struct page **pages, unsigned int n_pages, unsigned int offset,
> > >   unsigned long size, unsigned int max_segment,
> > >   unsigned int left_pages, gfp_t gfp_mask)
> > > "
> > No -- I haven't needed to convert anything that uses
> > sg_alloc_append_table_from_pages() yet.  It doesn't look like it should
> > be _too_ hard to add a folio version.
> 
> In many places, this function is used. So this function needs the folio
> version.

It's not used in very many places.  But the first one that I see it used
(drivers/infiniband/core/umem.c), you can't do a straightforward folio
conversion:

pinned = pin_user_pages_fast(cur_base,
  min_t(unsigned long, npages,
PAGE_SIZE /
sizeof(struct page *)),
  gup_flags, page_list);
...
ret = sg_alloc_append_table_from_pages(
&umem->sgt_append, page_list, pinned, 0,
pinned << PAGE_SHIFT, ib_dma_max_seg_size(device),
npages, GFP_KERNEL);

That can't be converted to folios.  The GUP might start in the middle of
the folio, and we have no way to communicate that.

This particular usage really needs the phyr work that Jason is doing so
we can efficiently communicate physically contiguous ranges from GUP
to sg.

> Another problem, after folio is used, I want to know the performance after
> folio is implemented.
> 
> How to make tests to get the performance?

You know what you're working on ... I wouldn't know how best to test
your code.


Re: [Intel-gfx] [PATCH 03/13] scatterlist: Add sg_set_folio()

2023-07-30 Thread Matthew Wilcox
On Sun, Jul 30, 2023 at 07:01:26PM +0800, Zhu Yanjun wrote:
> Does the following function have folio version?
> 
> "
> int sg_alloc_append_table_from_pages(struct sg_append_table *sgt_append,
>   struct page **pages, unsigned int n_pages, unsigned int offset,
>   unsigned long size, unsigned int max_segment,
>   unsigned int left_pages, gfp_t gfp_mask)
> "

No -- I haven't needed to convert anything that uses
sg_alloc_append_table_from_pages() yet.  It doesn't look like it should
be _too_ hard to add a folio version.


[Intel-gfx] [PATCH 10/13] mm: Remove struct pagevec

2023-06-21 Thread Matthew Wilcox (Oracle)
All users are now converted to use the folio_batch so we can get rid of
this data structure.

Signed-off-by: Matthew Wilcox (Oracle) 
---
 include/linux/pagevec.h | 63 +++--
 mm/swap.c   | 18 ++--
 2 files changed, 13 insertions(+), 68 deletions(-)

diff --git a/include/linux/pagevec.h b/include/linux/pagevec.h
index 3a9d29dd28a3..87cc678adc85 100644
--- a/include/linux/pagevec.h
+++ b/include/linux/pagevec.h
@@ -3,65 +3,18 @@
  * include/linux/pagevec.h
  *
  * In many places it is efficient to batch an operation up against multiple
- * pages.  A pagevec is a multipage container which is used for that.
+ * folios.  A folio_batch is a container which is used for that.
  */
 
 #ifndef _LINUX_PAGEVEC_H
 #define _LINUX_PAGEVEC_H
 
-#include 
+#include 
 
-/* 15 pointers + header align the pagevec structure to a power of two */
+/* 15 pointers + header align the folio_batch structure to a power of two */
 #define PAGEVEC_SIZE   15
 
-struct page;
 struct folio;
-struct address_space;
-
-/* Layout must match folio_batch */
-struct pagevec {
-   unsigned char nr;
-   bool percpu_pvec_drained;
-   struct page *pages[PAGEVEC_SIZE];
-};
-
-void __pagevec_release(struct pagevec *pvec);
-
-static inline void pagevec_init(struct pagevec *pvec)
-{
-   pvec->nr = 0;
-   pvec->percpu_pvec_drained = false;
-}
-
-static inline void pagevec_reinit(struct pagevec *pvec)
-{
-   pvec->nr = 0;
-}
-
-static inline unsigned pagevec_count(struct pagevec *pvec)
-{
-   return pvec->nr;
-}
-
-static inline unsigned pagevec_space(struct pagevec *pvec)
-{
-   return PAGEVEC_SIZE - pvec->nr;
-}
-
-/*
- * Add a page to a pagevec.  Returns the number of slots still available.
- */
-static inline unsigned pagevec_add(struct pagevec *pvec, struct page *page)
-{
-   pvec->pages[pvec->nr++] = page;
-   return pagevec_space(pvec);
-}
-
-static inline void pagevec_release(struct pagevec *pvec)
-{
-   if (pagevec_count(pvec))
-   __pagevec_release(pvec);
-}
 
 /**
  * struct folio_batch - A collection of folios.
@@ -78,11 +31,6 @@ struct folio_batch {
struct folio *folios[PAGEVEC_SIZE];
 };
 
-/* Layout must match pagevec */
-static_assert(sizeof(struct pagevec) == sizeof(struct folio_batch));
-static_assert(offsetof(struct pagevec, pages) ==
-   offsetof(struct folio_batch, folios));
-
 /**
  * folio_batch_init() - Initialise a batch of folios
  * @fbatch: The folio batch.
@@ -127,10 +75,7 @@ static inline unsigned folio_batch_add(struct folio_batch 
*fbatch,
return folio_batch_space(fbatch);
 }
 
-static inline void __folio_batch_release(struct folio_batch *fbatch)
-{
-   __pagevec_release((struct pagevec *)fbatch);
-}
+void __folio_batch_release(struct folio_batch *pvec);
 
 static inline void folio_batch_release(struct folio_batch *fbatch)
 {
diff --git a/mm/swap.c b/mm/swap.c
index 423199ee8478..10348c1cf9c5 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -1044,25 +1044,25 @@ void release_pages(release_pages_arg arg, int nr)
 EXPORT_SYMBOL(release_pages);
 
 /*
- * The pages which we're about to release may be in the deferred lru-addition
+ * The folios which we're about to release may be in the deferred lru-addition
  * queues.  That would prevent them from really being freed right now.  That's
- * OK from a correctness point of view but is inefficient - those pages may be
+ * OK from a correctness point of view but is inefficient - those folios may be
  * cache-warm and we want to give them back to the page allocator ASAP.
  *
- * So __pagevec_release() will drain those queues here.
+ * So __folio_batch_release() will drain those queues here.
  * folio_batch_move_lru() calls folios_put() directly to avoid
  * mutual recursion.
  */
-void __pagevec_release(struct pagevec *pvec)
+void __folio_batch_release(struct folio_batch *fbatch)
 {
-   if (!pvec->percpu_pvec_drained) {
+   if (!fbatch->percpu_pvec_drained) {
lru_add_drain();
-   pvec->percpu_pvec_drained = true;
+   fbatch->percpu_pvec_drained = true;
}
-   release_pages(pvec->pages, pagevec_count(pvec));
-   pagevec_reinit(pvec);
+   release_pages(fbatch->folios, folio_batch_count(fbatch));
+   folio_batch_reinit(fbatch);
 }
-EXPORT_SYMBOL(__pagevec_release);
+EXPORT_SYMBOL(__folio_batch_release);
 
 /**
  * folio_batch_remove_exceptionals() - Prune non-folios from a batch.
-- 
2.39.2



[Intel-gfx] [PATCH 13/13] mm: Remove unnecessary pagevec includes

2023-06-21 Thread Matthew Wilcox (Oracle)
These files no longer need pagevec.h, mostly due to function declarations
being moved out of it.

Signed-off-by: Matthew Wilcox (Oracle) 
---
 mm/fadvise.c| 1 -
 mm/memory_hotplug.c | 1 -
 mm/migrate.c| 1 -
 mm/readahead.c  | 1 -
 mm/swap_state.c | 1 -
 5 files changed, 5 deletions(-)

diff --git a/mm/fadvise.c b/mm/fadvise.c
index f684ffd7f9c9..6c39d42f16dc 100644
--- a/mm/fadvise.c
+++ b/mm/fadvise.c
@@ -14,7 +14,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 35db4108bb15..3f231cf1b410 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -13,7 +13,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
diff --git a/mm/migrate.c b/mm/migrate.c
index ce35afdbc1e3..ee26f4a962ef 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -21,7 +21,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
diff --git a/mm/readahead.c b/mm/readahead.c
index 47afbca1d122..a9c999aa19af 100644
--- a/mm/readahead.c
+++ b/mm/readahead.c
@@ -120,7 +120,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
diff --git a/mm/swap_state.c b/mm/swap_state.c
index 4a5c7b748051..f8ea7015bad4 100644
--- a/mm/swap_state.c
+++ b/mm/swap_state.c
@@ -16,7 +16,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
-- 
2.39.2



[Intel-gfx] [PATCH 11/13] mm: Rename invalidate_mapping_pagevec to mapping_try_invalidate

2023-06-21 Thread Matthew Wilcox (Oracle)
We don't use pagevecs for the LRU cache any more, and we don't know
that the failed invalidations were due to the folio being in an
LRU cache.  So rename it to be more accurate.

Signed-off-by: Matthew Wilcox (Oracle) 
---
 mm/fadvise.c  | 16 +++-
 mm/internal.h |  4 ++--
 mm/truncate.c | 25 -
 3 files changed, 21 insertions(+), 24 deletions(-)

diff --git a/mm/fadvise.c b/mm/fadvise.c
index fb7c5f43fd2a..f684ffd7f9c9 100644
--- a/mm/fadvise.c
+++ b/mm/fadvise.c
@@ -143,7 +143,7 @@ int generic_fadvise(struct file *file, loff_t offset, 
loff_t len, int advice)
}
 
if (end_index >= start_index) {
-   unsigned long nr_pagevec = 0;
+   unsigned long nr_failed = 0;
 
/*
 * It's common to FADV_DONTNEED right after
@@ -156,17 +156,15 @@ int generic_fadvise(struct file *file, loff_t offset, 
loff_t len, int advice)
 */
lru_add_drain();
 
-   invalidate_mapping_pagevec(mapping,
-   start_index, end_index,
-   &nr_pagevec);
+   mapping_try_invalidate(mapping, start_index, end_index,
+   &nr_failed);
 
/*
-* If fewer pages were invalidated than expected then
-* it is possible that some of the pages were on
-* a per-cpu pagevec for a remote CPU. Drain all
-* pagevecs and try again.
+* The failures may be due to the folio being
+* in the LRU cache of a remote CPU. Drain all
+* caches and try again.
 */
-   if (nr_pagevec) {
+   if (nr_failed) {
lru_add_drain_all();
invalidate_mapping_pages(mapping, start_index,
end_index);
diff --git a/mm/internal.h b/mm/internal.h
index 119a8241f9d9..2ff7587b4045 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -133,8 +133,8 @@ int truncate_inode_folio(struct address_space *mapping, 
struct folio *folio);
 bool truncate_inode_partial_folio(struct folio *folio, loff_t start,
loff_t end);
 long invalidate_inode_page(struct page *page);
-unsigned long invalidate_mapping_pagevec(struct address_space *mapping,
-   pgoff_t start, pgoff_t end, unsigned long *nr_pagevec);
+unsigned long mapping_try_invalidate(struct address_space *mapping,
+   pgoff_t start, pgoff_t end, unsigned long *nr_failed);
 
 /**
  * folio_evictable - Test whether a folio is evictable.
diff --git a/mm/truncate.c b/mm/truncate.c
index 86de31ed4d32..4a917570887f 100644
--- a/mm/truncate.c
+++ b/mm/truncate.c
@@ -486,18 +486,17 @@ void truncate_inode_pages_final(struct address_space 
*mapping)
 EXPORT_SYMBOL(truncate_inode_pages_final);
 
 /**
- * invalidate_mapping_pagevec - Invalidate all the unlocked pages of one inode
- * @mapping: the address_space which holds the pages to invalidate
+ * mapping_try_invalidate - Invalidate all the evictable folios of one inode
+ * @mapping: the address_space which holds the folios to invalidate
  * @start: the offset 'from' which to invalidate
  * @end: the offset 'to' which to invalidate (inclusive)
- * @nr_pagevec: invalidate failed page number for caller
+ * @nr_failed: How many folio invalidations failed
  *
- * This helper is similar to invalidate_mapping_pages(), except that it 
accounts
- * for pages that are likely on a pagevec and counts them in @nr_pagevec, which
- * will be used by the caller.
+ * This function is similar to invalidate_mapping_pages(), except that it
+ * returns the number of folios which could not be evicted in @nr_failed.
  */
-unsigned long invalidate_mapping_pagevec(struct address_space *mapping,
-   pgoff_t start, pgoff_t end, unsigned long *nr_pagevec)
+unsigned long mapping_try_invalidate(struct address_space *mapping,
+   pgoff_t start, pgoff_t end, unsigned long *nr_failed)
 {
pgoff_t indices[PAGEVEC_SIZE];
struct folio_batch fbatch;
@@ -527,9 +526,9 @@ unsigned long invalidate_mapping_pagevec(struct 
address_space *mapping,
 */
if (!ret) {
deactivate_file_folio(folio);
-   /* It is likely on the pagevec of a remote CPU 
*/
-   if (nr_pagevec)
-   (*nr_pagevec)++;
+   /* Likely in the lru cache of a remote CPU */
+   if (nr_failed)
+

[Intel-gfx] [PATCH 08/13] i915: Convert i915_gpu_error to use a folio_batch

2023-06-21 Thread Matthew Wilcox (Oracle)
Remove one of the last remaining users of pagevec.

Signed-off-by: Matthew Wilcox (Oracle) 
---
 drivers/gpu/drm/i915/i915_gpu_error.c | 50 +--
 1 file changed, 25 insertions(+), 25 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c 
b/drivers/gpu/drm/i915/i915_gpu_error.c
index ec368e700235..0c38bfb60c9a 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -187,64 +187,64 @@ i915_error_printer(struct drm_i915_error_state_buf *e)
 }
 
 /* single threaded page allocator with a reserved stash for emergencies */
-static void pool_fini(struct pagevec *pv)
+static void pool_fini(struct folio_batch *fbatch)
 {
-   pagevec_release(pv);
+   folio_batch_release(fbatch);
 }
 
-static int pool_refill(struct pagevec *pv, gfp_t gfp)
+static int pool_refill(struct folio_batch *fbatch, gfp_t gfp)
 {
-   while (pagevec_space(pv)) {
-   struct page *p;
+   while (folio_batch_space(fbatch)) {
+   struct folio *folio;
 
-   p = alloc_page(gfp);
-   if (!p)
+   folio = folio_alloc(gfp, 0);
+   if (!folio)
return -ENOMEM;
 
-   pagevec_add(pv, p);
+   folio_batch_add(fbatch, folio);
}
 
return 0;
 }
 
-static int pool_init(struct pagevec *pv, gfp_t gfp)
+static int pool_init(struct folio_batch *fbatch, gfp_t gfp)
 {
int err;
 
-   pagevec_init(pv);
+   folio_batch_init(fbatch);
 
-   err = pool_refill(pv, gfp);
+   err = pool_refill(fbatch, gfp);
if (err)
-   pool_fini(pv);
+   pool_fini(fbatch);
 
return err;
 }
 
-static void *pool_alloc(struct pagevec *pv, gfp_t gfp)
+static void *pool_alloc(struct folio_batch *fbatch, gfp_t gfp)
 {
-   struct page *p;
+   struct folio *folio;
 
-   p = alloc_page(gfp);
-   if (!p && pagevec_count(pv))
-   p = pv->pages[--pv->nr];
+   folio = folio_alloc(gfp, 0);
+   if (!folio && folio_batch_count(fbatch))
+   folio = fbatch->folios[--fbatch->nr];
 
-   return p ? page_address(p) : NULL;
+   return folio ? folio_address(folio) : NULL;
 }
 
-static void pool_free(struct pagevec *pv, void *addr)
+static void pool_free(struct folio_batch *fbatch, void *addr)
 {
-   struct page *p = virt_to_page(addr);
+   struct folio *folio = virt_to_folio(addr);
 
-   if (pagevec_space(pv))
-   pagevec_add(pv, p);
+   if (folio_batch_space(fbatch))
+   folio_batch_add(fbatch, folio);
else
-   __free_page(p);
+   folio_put(folio);
 }
 
 #ifdef CONFIG_DRM_I915_COMPRESS_ERROR
 
 struct i915_vma_compress {
-   struct pagevec pool;
+   struct folio_batch pool;
struct z_stream_s zstream;
void *tmp;
 };
@@ -381,7 +381,7 @@ static void err_compression_marker(struct 
drm_i915_error_state_buf *m)
 #else
 
 struct i915_vma_compress {
-   struct pagevec pool;
+   struct folio_batch pool;
 };
 
 static bool compress_init(struct i915_vma_compress *c)
-- 
2.39.2



[Intel-gfx] [PATCH 12/13] mm: Remove references to pagevec

2023-06-21 Thread Matthew Wilcox (Oracle)
Most of these should just refer to the LRU cache rather than the
data structure used to implement the LRU cache.

Signed-off-by: Matthew Wilcox (Oracle) 
---
 mm/huge_memory.c| 2 +-
 mm/khugepaged.c | 6 +++---
 mm/ksm.c| 6 +++---
 mm/memory.c | 6 +++---
 mm/migrate_device.c | 2 +-
 mm/swap.c   | 2 +-
 mm/truncate.c   | 2 +-
 7 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index e94fe292f30a..eb3678360b97 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1344,7 +1344,7 @@ vm_fault_t do_huge_pmd_wp_page(struct vm_fault *vmf)
/*
 * See do_wp_page(): we can only reuse the folio exclusively if
 * there are no additional references. Note that we always drain
-* the LRU pagevecs immediately after adding a THP.
+* the LRU cache immediately after adding a THP.
 */
if (folio_ref_count(folio) >
1 + folio_test_swapcache(folio) * folio_nr_pages(folio))
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 5ef1e08b2a06..3beb4ad2ee5e 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -1051,7 +1051,7 @@ static int __collapse_huge_page_swapin(struct mm_struct 
*mm,
if (pte)
pte_unmap(pte);
 
-   /* Drain LRU add pagevec to remove extra pin on the swapped in pages */
+   /* Drain LRU cache to remove extra pin on the swapped in pages */
if (swapped_in)
lru_add_drain();
 
@@ -1972,7 +1972,7 @@ static int collapse_file(struct mm_struct *mm, unsigned 
long addr,
result = SCAN_FAIL;
goto xa_unlocked;
}
-   /* drain pagevecs to help isolate_lru_page() */
+   /* drain lru cache to help isolate_lru_page() */
lru_add_drain();
page = folio_file_page(folio, index);
} else if (trylock_page(page)) {
@@ -1988,7 +1988,7 @@ static int collapse_file(struct mm_struct *mm, unsigned 
long addr,
page_cache_sync_readahead(mapping, &file->f_ra,
  file, index,
  end - index);
-   /* drain pagevecs to help isolate_lru_page() */
+   /* drain lru cache to help isolate_lru_page() */
lru_add_drain();
page = find_lock_page(mapping, index);
if (unlikely(page == NULL)) {
diff --git a/mm/ksm.c b/mm/ksm.c
index d995779dc1fe..ba266359da55 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -932,7 +932,7 @@ static int remove_stable_node(struct ksm_stable_node 
*stable_node)
 * The stable node did not yet appear stale to get_ksm_page(),
 * since that allows for an unmapped ksm page to be recognized
 * right up until it is freed; but the node is safe to remove.
-* This page might be in a pagevec waiting to be freed,
+* This page might be in an LRU cache waiting to be freed,
 * or it might be PageSwapCache (perhaps under writeback),
 * or it might have been removed from swapcache a moment ago.
 */
@@ -2303,8 +2303,8 @@ static struct ksm_rmap_item 
*scan_get_next_rmap_item(struct page **page)
trace_ksm_start_scan(ksm_scan.seqnr, ksm_rmap_items);
 
/*
-* A number of pages can hang around indefinitely on per-cpu
-* pagevecs, raised page count preventing write_protect_page
+* A number of pages can hang around indefinitely in per-cpu
+* LRU cache, raised page count preventing write_protect_page
 * from merging them.  Though it doesn't really matter much,
 * it is puzzling to see some stuck in pages_volatile until
 * other activity jostles them out, and they also prevented
diff --git a/mm/memory.c b/mm/memory.c
index 9f2723749f55..d034c52071f4 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3404,8 +3404,8 @@ static vm_fault_t do_wp_page(struct vm_fault *vmf)
goto copy;
if (!folio_test_lru(folio))
/*
-* Note: We cannot easily detect+handle references from
-* remote LRU pagevecs or references to LRU folios.
+* We cannot easily detect+handle references from
+* remote LRU caches or references to LRU folios.
 */
lru_add_drain();
if (folio_ref_count(folio) > 1 + folio_test_swapcache(folio))
@@ -3883,7 +38

[Intel-gfx] [PATCH 09/13] net: Convert sunrpc from pagevec to folio_batch

2023-06-21 Thread Matthew Wilcox (Oracle)
Remove the last usage of pagevecs.  There is a slight change here; we
now free the folio_batch as soon as it fills up instead of freeing the
folio_batch when we try to add a page to a full batch.  This should have
no effect in practice.

Signed-off-by: Matthew Wilcox (Oracle) 
---
 include/linux/sunrpc/svc.h |  2 +-
 net/sunrpc/svc.c   | 10 +-
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h
index c2807e301790..f8751118c122 100644
--- a/include/linux/sunrpc/svc.h
+++ b/include/linux/sunrpc/svc.h
@@ -222,7 +222,7 @@ struct svc_rqst {
struct page *   *rq_next_page; /* next reply page to use */
struct page *   *rq_page_end;  /* one past the last page */
 
-   struct pagevec  rq_pvec;
+   struct folio_batch  rq_fbatch;
struct kvec rq_vec[RPCSVC_MAXPAGES]; /* generally useful.. 
*/
struct bio_vec  rq_bvec[RPCSVC_MAXPAGES];
 
diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c
index e7c101290425..587811a002c9 100644
--- a/net/sunrpc/svc.c
+++ b/net/sunrpc/svc.c
@@ -640,7 +640,7 @@ svc_rqst_alloc(struct svc_serv *serv, struct svc_pool 
*pool, int node)
if (!rqstp)
return rqstp;
 
-   pagevec_init(&rqstp->rq_pvec);
+   folio_batch_init(&rqstp->rq_fbatch);
 
__set_bit(RQ_BUSY, &rqstp->rq_flags);
rqstp->rq_server = serv;
@@ -851,9 +851,9 @@ bool svc_rqst_replace_page(struct svc_rqst *rqstp, struct 
page *page)
}
 
if (*rqstp->rq_next_page) {
-   if (!pagevec_space(&rqstp->rq_pvec))
-   __pagevec_release(&rqstp->rq_pvec);
-   pagevec_add(&rqstp->rq_pvec, *rqstp->rq_next_page);
+   if (!folio_batch_add(&rqstp->rq_fbatch,
+   page_folio(*rqstp->rq_next_page)))
+   __folio_batch_release(&rqstp->rq_fbatch);
}
 
get_page(page);
@@ -887,7 +887,7 @@ void svc_rqst_release_pages(struct svc_rqst *rqstp)
 void
 svc_rqst_free(struct svc_rqst *rqstp)
 {
-   pagevec_release(&rqstp->rq_pvec);
+   folio_batch_release(&rqstp->rq_fbatch);
svc_release_buffer(rqstp);
if (rqstp->rq_scratch_page)
put_page(rqstp->rq_scratch_page);
-- 
2.39.2



[Intel-gfx] [PATCH 06/13] mm: Remove check_move_unevictable_pages()

2023-06-21 Thread Matthew Wilcox (Oracle)
All callers have now been converted to call check_move_unevictable_folios().

Signed-off-by: Matthew Wilcox (Oracle) 
---
 include/linux/swap.h |  1 -
 mm/vmscan.c  | 17 -
 2 files changed, 18 deletions(-)

diff --git a/include/linux/swap.h b/include/linux/swap.h
index ce7e82cf787f..456546443f1f 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -439,7 +439,6 @@ static inline bool node_reclaim_enabled(void)
 }
 
 void check_move_unevictable_folios(struct folio_batch *fbatch);
-void check_move_unevictable_pages(struct pagevec *pvec);
 
 extern void __meminit kswapd_run(int nid);
 extern void __meminit kswapd_stop(int nid);
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 45d17c7cc555..f8dd1db15897 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -8074,23 +8074,6 @@ int node_reclaim(struct pglist_data *pgdat, gfp_t 
gfp_mask, unsigned int order)
 }
 #endif
 
-void check_move_unevictable_pages(struct pagevec *pvec)
-{
-   struct folio_batch fbatch;
-   unsigned i;
-
-   folio_batch_init(&fbatch);
-   for (i = 0; i < pvec->nr; i++) {
-   struct page *page = pvec->pages[i];
-
-   if (PageTransTail(page))
-   continue;
-   folio_batch_add(&fbatch, page_folio(page));
-   }
-   check_move_unevictable_folios(&fbatch);
-}
-EXPORT_SYMBOL_GPL(check_move_unevictable_pages);
-
 /**
  * check_move_unevictable_folios - Move evictable folios to appropriate zone
  * lru list
-- 
2.39.2



[Intel-gfx] [PATCH 04/13] i915: Convert shmem_sg_free_table() to use a folio_batch

2023-06-21 Thread Matthew Wilcox (Oracle)
Remove a few hidden compound_head() calls by converting the returned
page to a folio once and using the folio APIs.  We also only increment
the refcount on the folio once instead of once for each page.  Ideally,
we would have a for_each_sgt_folio macro, but until then this will do.

Signed-off-by: Matthew Wilcox (Oracle) 
---
 drivers/gpu/drm/i915/gem/i915_gem_shmem.c | 55 +--
 1 file changed, 31 insertions(+), 24 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c 
b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
index 33d5d5178103..8f1633c3fb93 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
@@ -19,13 +19,13 @@
 #include "i915_trace.h"
 
 /*
- * Move pages to appropriate lru and release the pagevec, decrementing the
- * ref count of those pages.
+ * Move folios to appropriate lru and release the batch, decrementing the
+ * ref count of those folios.
  */
-static void check_release_pagevec(struct pagevec *pvec)
+static void check_release_folio_batch(struct folio_batch *fbatch)
 {
-   check_move_unevictable_pages(pvec);
-   __pagevec_release(pvec);
+   check_move_unevictable_folios(fbatch);
+   __folio_batch_release(fbatch);
cond_resched();
 }
 
@@ -33,24 +33,29 @@ void shmem_sg_free_table(struct sg_table *st, struct 
address_space *mapping,
 bool dirty, bool backup)
 {
struct sgt_iter sgt_iter;
-   struct pagevec pvec;
+   struct folio_batch fbatch;
+   struct folio *last = NULL;
struct page *page;
 
mapping_clear_unevictable(mapping);
 
-   pagevec_init(&pvec);
+   folio_batch_init(&fbatch);
for_each_sgt_page(page, sgt_iter, st) {
-   if (dirty)
-   set_page_dirty(page);
+   struct folio *folio = page_folio(page);
 
+   if (folio == last)
+   continue;
+   last = folio;
+   if (dirty)
+   folio_mark_dirty(folio);
if (backup)
-   mark_page_accessed(page);
+   folio_mark_accessed(folio);
 
-   if (!pagevec_add(&pvec, page))
-   check_release_pagevec(&pvec);
+   if (!folio_batch_add(&fbatch, folio))
+   check_release_folio_batch(&fbatch);
}
-   if (pagevec_count(&pvec))
-   check_release_pagevec(&pvec);
+   if (fbatch.nr)
+   check_release_folio_batch(&fbatch);
 
sg_free_table(st);
 }
@@ -63,8 +68,7 @@ int shmem_sg_alloc_table(struct drm_i915_private *i915, 
struct sg_table *st,
unsigned int page_count; /* restricted by sg_alloc_table */
unsigned long i;
struct scatterlist *sg;
-   struct page *page;
-   unsigned long last_pfn = 0; /* suppress gcc warning */
+   unsigned long next_pfn = 0; /* suppress gcc warning */
gfp_t noreclaim;
int ret;
 
@@ -95,6 +99,7 @@ int shmem_sg_alloc_table(struct drm_i915_private *i915, 
struct sg_table *st,
sg = st->sgl;
st->nents = 0;
for (i = 0; i < page_count; i++) {
+   struct folio *folio;
const unsigned int shrink[] = {
I915_SHRINK_BOUND | I915_SHRINK_UNBOUND,
0,
@@ -103,12 +108,12 @@ int shmem_sg_alloc_table(struct drm_i915_private *i915, 
struct sg_table *st,
 
do {
cond_resched();
-   page = shmem_read_mapping_page_gfp(mapping, i, gfp);
-   if (!IS_ERR(page))
+   folio = shmem_read_folio_gfp(mapping, i, gfp);
+   if (!IS_ERR(folio))
break;
 
if (!*s) {
-   ret = PTR_ERR(page);
+   ret = PTR_ERR(folio);
goto err_sg;
}
 
@@ -147,19 +152,21 @@ int shmem_sg_alloc_table(struct drm_i915_private *i915, 
struct sg_table *st,
 
if (!i ||
sg->length >= max_segment ||
-   page_to_pfn(page) != last_pfn + 1) {
+   folio_pfn(folio) != next_pfn) {
if (i)
sg = sg_next(sg);
 
st->nents++;
-   sg_set_page(sg, page, PAGE_SIZE, 0);
+   sg_set_folio(sg, folio, folio_size(folio), 0);
} else {
-   sg->length += PAGE_SIZE;
+   /* XXX: could overflow? */
+   sg->length += folio_size(folio);
}
-   last_pfn = page_to_pfn(page);
+   next_pfn = folio_pfn(folio) + folio_nr_pages(folio);
+   i += folio_nr_pag

[Intel-gfx] [PATCH 02/13] mm: Add __folio_batch_release()

2023-06-21 Thread Matthew Wilcox (Oracle)
This performs the same role as __pagevec_release(), ie skipping the
check for batch length of 0.

Signed-off-by: Matthew Wilcox (Oracle) 
---
 include/linux/pagevec.h | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/include/linux/pagevec.h b/include/linux/pagevec.h
index f582f7213ea5..42aad53e382e 100644
--- a/include/linux/pagevec.h
+++ b/include/linux/pagevec.h
@@ -127,9 +127,15 @@ static inline unsigned folio_batch_add(struct folio_batch 
*fbatch,
return fbatch_space(fbatch);
 }
 
+static inline void __folio_batch_release(struct folio_batch *fbatch)
+{
+   __pagevec_release((struct pagevec *)fbatch);
+}
+
 static inline void folio_batch_release(struct folio_batch *fbatch)
 {
-   pagevec_release((struct pagevec *)fbatch);
+   if (folio_batch_count(fbatch))
+   __folio_batch_release(fbatch);
 }
 
 void folio_batch_remove_exceptionals(struct folio_batch *fbatch);
-- 
2.39.2



[Intel-gfx] [PATCH 00/13] Remove pagevecs

2023-06-21 Thread Matthew Wilcox (Oracle)
We're almost done with the pagevec -> folio_batch conversion.  Finish
the job.

Matthew Wilcox (Oracle) (13):
  afs: Convert pagevec to folio_batch in afs_extend_writeback()
  mm: Add __folio_batch_release()
  scatterlist: Add sg_set_folio()
  i915: Convert shmem_sg_free_table() to use a folio_batch
  drm: Convert drm_gem_put_pages() to use a folio_batch
  mm: Remove check_move_unevictable_pages()
  pagevec: Rename fbatch_count()
  i915: Convert i915_gpu_error to use a folio_batch
  net: Convert sunrpc from pagevec to folio_batch
  mm: Remove struct pagevec
  mm: Rename invalidate_mapping_pagevec to mapping_try_invalidate
  mm: Remove references to pagevec
  mm: Remove unnecessary pagevec includes

 drivers/gpu/drm/drm_gem.c | 68 +--
 drivers/gpu/drm/i915/gem/i915_gem_shmem.c | 55 ++
 drivers/gpu/drm/i915/i915_gpu_error.c | 50 -
 fs/afs/write.c| 16 +++---
 include/linux/pagevec.h   | 67 +++---
 include/linux/scatterlist.h   | 24 
 include/linux/sunrpc/svc.h|  2 +-
 include/linux/swap.h  |  1 -
 mm/fadvise.c  | 17 +++---
 mm/huge_memory.c  |  2 +-
 mm/internal.h |  4 +-
 mm/khugepaged.c   |  6 +-
 mm/ksm.c  |  6 +-
 mm/memory.c   |  6 +-
 mm/memory_hotplug.c   |  1 -
 mm/migrate.c  |  1 -
 mm/migrate_device.c   |  2 +-
 mm/readahead.c|  1 -
 mm/swap.c | 20 +++
 mm/swap_state.c   |  1 -
 mm/truncate.c | 27 +
 mm/vmscan.c   | 17 --
 net/sunrpc/svc.c  | 10 ++--
 23 files changed, 185 insertions(+), 219 deletions(-)

-- 
2.39.2



[Intel-gfx] [PATCH 05/13] drm: Convert drm_gem_put_pages() to use a folio_batch

2023-06-21 Thread Matthew Wilcox (Oracle)
Remove a few hidden compound_head() calls by converting the returned
page to a folio once and using the folio APIs.

Signed-off-by: Matthew Wilcox (Oracle) 
---
 drivers/gpu/drm/drm_gem.c | 68 ++-
 1 file changed, 39 insertions(+), 29 deletions(-)

diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c
index 1a5a2cd0d4ec..78dcae201cc6 100644
--- a/drivers/gpu/drm/drm_gem.c
+++ b/drivers/gpu/drm/drm_gem.c
@@ -496,13 +496,13 @@ int drm_gem_create_mmap_offset(struct drm_gem_object *obj)
 EXPORT_SYMBOL(drm_gem_create_mmap_offset);
 
 /*
- * Move pages to appropriate lru and release the pagevec, decrementing the
- * ref count of those pages.
+ * Move folios to appropriate lru and release the folios, decrementing the
+ * ref count of those folios.
  */
-static void drm_gem_check_release_pagevec(struct pagevec *pvec)
+static void drm_gem_check_release_batch(struct folio_batch *fbatch)
 {
-   check_move_unevictable_pages(pvec);
-   __pagevec_release(pvec);
+   check_move_unevictable_folios(fbatch);
+   __folio_batch_release(fbatch);
cond_resched();
 }
 
@@ -534,10 +534,10 @@ static void drm_gem_check_release_pagevec(struct pagevec 
*pvec)
 struct page **drm_gem_get_pages(struct drm_gem_object *obj)
 {
struct address_space *mapping;
-   struct page *p, **pages;
-   struct pagevec pvec;
-   int i, npages;
-
+   struct page **pages;
+   struct folio *folio;
+   struct folio_batch fbatch;
+   int i, j, npages;
 
if (WARN_ON(!obj->filp))
return ERR_PTR(-EINVAL);
@@ -559,11 +559,14 @@ struct page **drm_gem_get_pages(struct drm_gem_object 
*obj)
 
mapping_set_unevictable(mapping);
 
-   for (i = 0; i < npages; i++) {
-   p = shmem_read_mapping_page(mapping, i);
-   if (IS_ERR(p))
+   i = 0;
+   while (i < npages) {
+   folio = shmem_read_folio_gfp(mapping, i,
+   mapping_gfp_mask(mapping));
+   if (IS_ERR(folio))
goto fail;
-   pages[i] = p;
+   for (j = 0; j < folio_nr_pages(folio); j++, i++)
+   pages[i] = folio_file_page(folio, i);
 
/* Make sure shmem keeps __GFP_DMA32 allocated pages in the
 * correct region during swapin. Note that this requires
@@ -571,23 +574,26 @@ struct page **drm_gem_get_pages(struct drm_gem_object 
*obj)
 * so shmem can relocate pages during swapin if required.
 */
BUG_ON(mapping_gfp_constraint(mapping, __GFP_DMA32) &&
-   (page_to_pfn(p) >= 0x0010UL));
+   (folio_pfn(folio) >= 0x0010UL));
}
 
return pages;
 
 fail:
mapping_clear_unevictable(mapping);
-   pagevec_init(&pvec);
-   while (i--) {
-   if (!pagevec_add(&pvec, pages[i]))
-   drm_gem_check_release_pagevec(&pvec);
+   folio_batch_init(&fbatch);
+   j = 0;
+   while (j < i) {
+   struct folio *f = page_folio(pages[j]);
+   if (!folio_batch_add(&fbatch, f))
+   drm_gem_check_release_batch(&fbatch);
+   j += folio_nr_pages(f);
}
-   if (pagevec_count(&pvec))
-   drm_gem_check_release_pagevec(&pvec);
+   if (fbatch.nr)
+   drm_gem_check_release_batch(&fbatch);
 
kvfree(pages);
-   return ERR_CAST(p);
+   return ERR_CAST(folio);
 }
 EXPORT_SYMBOL(drm_gem_get_pages);
 
@@ -603,7 +609,7 @@ void drm_gem_put_pages(struct drm_gem_object *obj, struct 
page **pages,
 {
int i, npages;
struct address_space *mapping;
-   struct pagevec pvec;
+   struct folio_batch fbatch;
 
mapping = file_inode(obj->filp)->i_mapping;
mapping_clear_unevictable(mapping);
@@ -616,23 +622,27 @@ void drm_gem_put_pages(struct drm_gem_object *obj, struct 
page **pages,
 
npages = obj->size >> PAGE_SHIFT;
 
-   pagevec_init(&pvec);
+   folio_batch_init(&fbatch);
for (i = 0; i < npages; i++) {
+   struct folio *folio;
+
if (!pages[i])
continue;
+   folio = page_folio(pages[i]);
 
if (dirty)
-   set_page_dirty(pages[i]);
+   folio_mark_dirty(folio);
 
if (accessed)
-   mark_page_accessed(pages[i]);
+   folio_mark_accessed(folio);
 
/* Undo the reference we took when populating the table */
-   if (!pagevec_add(&pvec, pages[i]))
-   drm_gem_check_release_pagevec(&pvec);
+   if (!folio_batch_add(&fbatch, folio))
+   drm_gem_check_release_ba

[Intel-gfx] [PATCH 01/13] afs: Convert pagevec to folio_batch in afs_extend_writeback()

2023-06-21 Thread Matthew Wilcox (Oracle)
Removes a folio->page->folio conversion for each folio that's involved.
More importantly, removes one of the last few uses of a pagevec.

Signed-off-by: Matthew Wilcox (Oracle) 
---
 fs/afs/write.c | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/fs/afs/write.c b/fs/afs/write.c
index 18ccb613dff8..6e68c70d0b22 100644
--- a/fs/afs/write.c
+++ b/fs/afs/write.c
@@ -465,7 +465,7 @@ static void afs_extend_writeback(struct address_space 
*mapping,
 bool caching,
 unsigned int *_len)
 {
-   struct pagevec pvec;
+   struct folio_batch fbatch;
struct folio *folio;
unsigned long priv;
unsigned int psize, filler = 0;
@@ -476,7 +476,7 @@ static void afs_extend_writeback(struct address_space 
*mapping,
unsigned int i;
 
XA_STATE(xas, &mapping->i_pages, index);
-   pagevec_init(&pvec);
+   folio_batch_init(&fbatch);
 
do {
/* Firstly, we gather up a batch of contiguous dirty pages
@@ -535,7 +535,7 @@ static void afs_extend_writeback(struct address_space 
*mapping,
stop = false;
 
index += folio_nr_pages(folio);
-   if (!pagevec_add(&pvec, &folio->page))
+   if (!folio_batch_add(&fbatch, folio))
break;
if (stop)
break;
@@ -545,14 +545,14 @@ static void afs_extend_writeback(struct address_space 
*mapping,
xas_pause(&xas);
rcu_read_unlock();
 
-   /* Now, if we obtained any pages, we can shift them to being
+   /* Now, if we obtained any folios, we can shift them to being
 * writable and mark them for caching.
 */
-   if (!pagevec_count(&pvec))
+   if (!folio_batch_count(&fbatch))
break;
 
-   for (i = 0; i < pagevec_count(&pvec); i++) {
-   folio = page_folio(pvec.pages[i]);
+   for (i = 0; i < folio_batch_count(&fbatch); i++) {
+   folio = fbatch.folios[i];
trace_afs_folio_dirty(vnode, 
tracepoint_string("store+"), folio);
 
if (!folio_clear_dirty_for_io(folio))
@@ -565,7 +565,7 @@ static void afs_extend_writeback(struct address_space 
*mapping,
folio_unlock(folio);
}
 
-   pagevec_release(&pvec);
+   folio_batch_release(&fbatch);
cond_resched();
} while (!stop);
 
-- 
2.39.2



[Intel-gfx] [PATCH 07/13] pagevec: Rename fbatch_count()

2023-06-21 Thread Matthew Wilcox (Oracle)
This should always have been called folio_batch_count().

Signed-off-by: Matthew Wilcox (Oracle) 
---
 include/linux/pagevec.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/linux/pagevec.h b/include/linux/pagevec.h
index 42aad53e382e..3a9d29dd28a3 100644
--- a/include/linux/pagevec.h
+++ b/include/linux/pagevec.h
@@ -105,7 +105,7 @@ static inline unsigned int folio_batch_count(struct 
folio_batch *fbatch)
return fbatch->nr;
 }
 
-static inline unsigned int fbatch_space(struct folio_batch *fbatch)
+static inline unsigned int folio_batch_space(struct folio_batch *fbatch)
 {
return PAGEVEC_SIZE - fbatch->nr;
 }
@@ -124,7 +124,7 @@ static inline unsigned folio_batch_add(struct folio_batch 
*fbatch,
struct folio *folio)
 {
fbatch->folios[fbatch->nr++] = folio;
-   return fbatch_space(fbatch);
+   return folio_batch_space(fbatch);
 }
 
 static inline void __folio_batch_release(struct folio_batch *fbatch)
-- 
2.39.2



[Intel-gfx] [PATCH 03/13] scatterlist: Add sg_set_folio()

2023-06-21 Thread Matthew Wilcox (Oracle)
This wrapper for sg_set_page() lets drivers add folios to a scatterlist
more easily.  We could, perhaps, do better by using a different page
in the folio if offset is larger than UINT_MAX, but let's hope we get
a better data structure than this before we need to care about such
large folios.

Signed-off-by: Matthew Wilcox (Oracle) 
---
 include/linux/scatterlist.h | 24 
 1 file changed, 24 insertions(+)

diff --git a/include/linux/scatterlist.h b/include/linux/scatterlist.h
index ec46d8e8e49d..77df3d7b18a6 100644
--- a/include/linux/scatterlist.h
+++ b/include/linux/scatterlist.h
@@ -141,6 +141,30 @@ static inline void sg_set_page(struct scatterlist *sg, 
struct page *page,
sg->length = len;
 }
 
+/**
+ * sg_set_folio - Set sg entry to point at given folio
+ * @sg: SG entry
+ * @folio:  The folio
+ * @len:Length of data
+ * @offset: Offset into folio
+ *
+ * Description:
+ *   Use this function to set an sg entry pointing at a folio, never assign
+ *   the folio directly. We encode sg table information in the lower bits
+ *   of the folio pointer. See sg_page() for looking up the page belonging
+ *   to an sg entry.
+ *
+ **/
+static inline void sg_set_folio(struct scatterlist *sg, struct folio *folio,
+  size_t len, size_t offset)
+{
+   WARN_ON_ONCE(len > UINT_MAX);
+   WARN_ON_ONCE(offset > UINT_MAX);
+   sg_assign_page(sg, &folio->page);
+   sg->offset = offset;
+   sg->length = len;
+}
+
 static inline struct page *sg_page(struct scatterlist *sg)
 {
 #ifdef CONFIG_DEBUG_SG
-- 
2.39.2



Re: [Intel-gfx] [PATCH v2 1/6] mm: introduce vma->vm_flags modifier functions

2023-02-02 Thread Matthew Wilcox
On Wed, Jan 25, 2023 at 08:49:50AM -0800, Suren Baghdasaryan wrote:
> On Wed, Jan 25, 2023 at 1:10 AM Peter Zijlstra  wrote:
> > > + /*
> > > +  * Flags, see mm.h.
> > > +  * WARNING! Do not modify directly.
> > > +  * Use {init|reset|set|clear|mod}_vm_flags() functions instead.
> > > +  */
> > > + unsigned long vm_flags;
> >
> > We have __private and ACCESS_PRIVATE() to help with enforcing this.
> 
> Thanks for pointing this out, Peter! I guess for that I'll need to
> convert all read accesses and provide get_vm_flags() too? That will
> cause some additional churt (a quick search shows 801 hits over 248
> files) but maybe it's worth it? I think Michal suggested that too in
> another patch. Should I do that while we are at it?

Here's a trick I saw somewhere in the VFS:

union {
const vm_flags_t vm_flags;
vm_flags_t __private __vm_flags;
};

Now it can be read by anybody but written only by those using
ACCESS_PRIVATE.


Re: [Intel-gfx] [PATCH v2 1/6] mm: introduce vma->vm_flags modifier functions

2023-02-02 Thread Matthew Wilcox
On Thu, Jan 26, 2023 at 04:50:59PM +0200, Mike Rapoport wrote:
> On Thu, Jan 26, 2023 at 11:17:09AM +0200, Mike Rapoport wrote:
> > On Wed, Jan 25, 2023 at 12:38:46AM -0800, Suren Baghdasaryan wrote:
> > > +/* Use when VMA is not part of the VMA tree and needs no locking */
> > > +static inline void init_vm_flags(struct vm_area_struct *vma,
> > > +  unsigned long flags)
> > 
> > I'd suggest to make it vm_flags_init() etc.
> 
> Thinking more about it, it will be even clearer to name these vma_flags_xyz()

Perhaps vma_VERB_flags()?

vma_init_flags()
vma_reset_flags()
vma_set_flags()
vma_clear_flags()
vma_mod_flags()



Re: [Intel-gfx] [PATCH v2 1/6] mm: introduce vma->vm_flags modifier functions

2023-02-02 Thread Matthew Wilcox
On Wed, Jan 25, 2023 at 12:38:46AM -0800, Suren Baghdasaryan wrote:
> +/* Use when VMA is not part of the VMA tree and needs no locking */
> +static inline void init_vm_flags(struct vm_area_struct *vma,
> +  unsigned long flags)
> +{
> + vma->vm_flags = flags;

vm_flags are supposed to have type vm_flags_t.  That's not been
fully realised yet, but perhaps we could avoid making it worse?

>   pgprot_t vm_page_prot;
> - unsigned long vm_flags; /* Flags, see mm.h. */
> +
> + /*
> +  * Flags, see mm.h.
> +  * WARNING! Do not modify directly.
> +  * Use {init|reset|set|clear|mod}_vm_flags() functions instead.
> +  */
> + unsigned long vm_flags;

Including changing this line to vm_flags_t


Re: [Intel-gfx] [PATCH v5 1/3] drm: Use XArray instead of IDR for minors

2022-11-07 Thread Matthew Wilcox
On Sun, Nov 06, 2022 at 04:51:39PM +0200, Oded Gabbay wrote:
> I tried executing the following code in a dummy driver I wrote:

You don't need to write a dummy driver; you can write test-cases
entirely in userspace.  Just add them to lib/test_xarray.c and then
make -C tools/testing/radix-tree/

> static DEFINE_XARRAY_ALLOC(xa_dummy);
> void check_xa(void *pdev)
> {
>   void *entry;
>   int ret, index;
> 
>   ret = xa_alloc(&xa_dummy, &index, NULL, XA_LIMIT(0, 63), GFP_NOWAIT);
>   if (ret < 0)
>   return ret;
> 
>   entry = xa_cmpxchg(&xa_dummy, index, NULL, pdev, GFP_KERNEL);
>   if (xa_is_err(entry))
>return;
> 
>   xa_lock(&xa_dummy);
>   xa_dev = xa_load(&xa_dummy, index);
>   xa_unlock(&xa_dummy);
> }
> 
> And to my surprise xa_dev is always NULL, when it should be pdev.
> I believe that because we first allocate the entry with NULL, it is
> considered a "zero" entry in the XA.
> And when we replace it, this attribute doesn't change so when we do
> xa_load, the xa code thinks the entry is a "zero" entry and returns
> NULL.

There's no "attribute" to mark a zero entry.  It's just a zero entry.
The problem is that you're cmpxchg'ing against NULL, and it's not NULL,
it's the ZERO entry.  This is even documented in the test code:

/* cmpxchg sees a reserved entry as ZERO */
XA_BUG_ON(xa, xa_reserve(xa, 12345678, GFP_KERNEL) != 0);
XA_BUG_ON(xa, xa_cmpxchg(xa, 12345678, XA_ZERO_ENTRY,
xa_mk_value(12345678), GFP_NOWAIT) != NULL);

I'm not quite sure why you're using xa_cmpxchg() here anyway; if you
allocated it, it's yours and you can just xa_store() to it.



Re: [Intel-gfx] [PATCH] drm/i915/userptr: restore probe_range behaviour

2022-10-25 Thread Matthew Wilcox
On Tue, Oct 25, 2022 at 04:40:23PM +0100, Tvrtko Ursulin wrote:
> 
> On 24/10/2022 18:21, Matthew Auld wrote:
> > The conversion looks harmless, however the addr value is updated inside
> > the loop with the previous vm_end, which then incorrectly leads to
> > for_each_vma_range() iterating over stuff outside the range we care
> > about. Fix this by storing the end value separately.
> > 
> > Testcase: igt@gem_userptr_blits@probe
> > Fixes: f683b9d61319 ("i915: use the VMA iterator")
> > Reported-by: kernel test robot 
> > Signed-off-by: Matthew Auld 
> > Cc: Tvrtko Ursulin 
> > Cc: Matthew Wilcox (Oracle) 
> > Cc: Vlastimil Babka 
> > Cc: Yu Zhao 
> > ---
> >   drivers/gpu/drm/i915/gem/i915_gem_userptr.c | 3 ++-
> >   1 file changed, 2 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c 
> > b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
> > index b7e24476a0fd..dadb3e3fa9c8 100644
> > --- a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
> > +++ b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
> > @@ -427,9 +427,10 @@ probe_range(struct mm_struct *mm, unsigned long addr, 
> > unsigned long len)
> >   {
> > VMA_ITERATOR(vmi, mm, addr);
> > struct vm_area_struct *vma;
> > +   unsigned long end = addr + len;
> > mmap_read_lock(mm);
> > -   for_each_vma_range(vmi, vma, addr + len) {
> > +   for_each_vma_range(vmi, vma, end) {
> > /* Check for holes, note that we also update the addr below */
> > if (vma->vm_start > addr)
> > break;
> 
> I am unsure of the for_each_vma_range() behaviour regarding holes. If it
> just skips overs them and continues to next VMA in the range then patch
> looks good to me. Could someone confirm?

It's "For each VMA in this range".  It doesn't iterate over non-VMAs
within that range ;-)  Nor does a gap between VMAs stop the iteration.


Re: [Intel-gfx] [PATCH v4 1/3] drm: Use XArray instead of IDR for minors

2022-09-06 Thread Matthew Wilcox
On Tue, Sep 06, 2022 at 10:16:27PM +0200, Michał Winiarski wrote:
> IDR is deprecated, and since XArray manages its own state with internal
> locking, it simplifies the locking on DRM side.
> Additionally, don't use the IRQ-safe variant, since operating on drm
> minor is not done in IRQ context.
> 
> Signed-off-by: Michał Winiarski 
> Suggested-by: Matthew Wilcox 

I have a few questions, but I like where you're going.

> @@ -98,21 +98,18 @@ static struct drm_minor **drm_minor_get_slot(struct 
> drm_device *dev,
>  static void drm_minor_alloc_release(struct drm_device *dev, void *data)
>  {
>   struct drm_minor *minor = data;
> - unsigned long flags;
>  
>   WARN_ON(dev != minor->dev);
>  
>   put_device(minor->kdev);
>  
> - spin_lock_irqsave(&drm_minor_lock, flags);
> - idr_remove(&drm_minors_idr, minor->index);
> - spin_unlock_irqrestore(&drm_minor_lock, flags);
> + xa_release(&drm_minors_xa, minor->index);

Has it definitely been unused at this point?  I would think that
xa_erase() (an unconditional store) would be the correct function to
call.

> @@ -122,20 +119,12 @@ static int drm_minor_alloc(struct drm_device *dev, 
> unsigned int type)
>   minor->type = type;
>   minor->dev = dev;
>  
> - idr_preload(GFP_KERNEL);
> - spin_lock_irqsave(&drm_minor_lock, flags);
> - r = idr_alloc(&drm_minors_idr,
> -   NULL,
> -   64 * type,
> -   64 * (type + 1),
> -   GFP_NOWAIT);
> - spin_unlock_irqrestore(&drm_minor_lock, flags);
> - idr_preload_end();
> -
> + r = xa_alloc(&drm_minors_xa, &id, NULL,
> +  XA_LIMIT(64 * type, 64 * (type + 1) - 1), GFP_KERNEL);
>   if (r < 0)
>   return r;
>  
> - minor->index = r;
> + minor->index = id;

Wouldn't it be better to call:

r = xa_alloc(&drm_minors_xa, &minor->index, NULL,
XA_LIMIT(64 * type, 64 * (type + 1) - 1), GFP_KERNEL);

I might also prefer a little syntactic sugar like:

#define DRM_MINOR_LIMIT(type)   XA_LIMIT(64 * (type), 64 * (type) + 63)

but that's definitely a matter of taste.

> @@ -172,9 +161,12 @@ static int drm_minor_register(struct drm_device *dev, 
> unsigned int type)
>   goto err_debugfs;
>  
>   /* replace NULL with @minor so lookups will succeed from now on */
> - spin_lock_irqsave(&drm_minor_lock, flags);
> - idr_replace(&drm_minors_idr, minor, minor->index);
> - spin_unlock_irqrestore(&drm_minor_lock, flags);
> + entry = xa_store(&drm_minors_xa, minor->index, &minor, GFP_KERNEL);
> + if (xa_is_err(entry)) {
> + ret = xa_err(entry);
> + goto err_debugfs;
> + }
> + WARN_ON(entry);

Might be better as an xa_cmpxchg()?

> @@ -187,16 +179,13 @@ static int drm_minor_register(struct drm_device *dev, 
> unsigned int type)
>  static void drm_minor_unregister(struct drm_device *dev, unsigned int type)
>  {
>   struct drm_minor *minor;
> - unsigned long flags;
>  
>   minor = *drm_minor_get_slot(dev, type);
>   if (!minor || !device_is_registered(minor->kdev))
>   return;
>  
>   /* replace @minor with NULL so lookups will fail from now on */
> - spin_lock_irqsave(&drm_minor_lock, flags);
> - idr_replace(&drm_minors_idr, NULL, minor->index);
> - spin_unlock_irqrestore(&drm_minor_lock, flags);
> + xa_erase(&drm_minors_xa, minor->index);

This isn't an exact replacement, but I'm not sure whether that makes a
difference.  xa_erase() allows allocation of this ID again while
idr_replace() means that lookups return NULL, but the ID remains in
use.  The equivalent of idr_replace() is:
xa_store(&drm_minors_xa, minor->index, NULL, GFP_KERNEL);

> @@ -215,13 +204,10 @@ static void drm_minor_unregister(struct drm_device 
> *dev, unsigned int type)
>  struct drm_minor *drm_minor_acquire(unsigned int minor_id)
>  {
>   struct drm_minor *minor;
> - unsigned long flags;
>  
> - spin_lock_irqsave(&drm_minor_lock, flags);
> - minor = idr_find(&drm_minors_idr, minor_id);
> + minor = xa_load(&drm_minors_xa, minor_id);
>   if (minor)
>   drm_dev_get(minor->dev);
> - spin_unlock_irqrestore(&drm_minor_lock, flags);

This is also not an exact equivalent as the dev_drm_get() is now outside
the lock.  Does that matter in this case?  I don't know the code well
enough to say.  If you want it to be equivalent, then:

xa_lock(&drm_minors_xa);
minor = xa_load(&d

Re: [Intel-gfx] i915: crash with 5.19-rc2

2022-08-10 Thread Matthew Wilcox
On Wed, Aug 10, 2022 at 08:55:32AM -0700, Hugh Dickins wrote:
> This is not a bug in zram or i915, but what Matthew fixes in
> https://lore.kernel.org/lkml/20220730042518.1264767-1-wi...@infradead.org/

Thanks for tracking that down, Hugh.  Nice to know it's a crash rather
than a data corruption.  The fix is in Andrew's tree, so I think it's
already destined for upstream soon.

Andrew, I have two fixes that I don't see in your tree:

https://lore.kernel.org/linux-mm/20220808193430.3378317-2-wi...@infradead.org/T/#u
https://lore.kernel.org/linux-mm/20220808193430.3378317-4-wi...@infradead.org/T/#u

The first is of minor importance, the second I believe Hugh has hit in
his testing.



Re: [Intel-gfx] [PATCH RFC v2] mm: Add f_ops->populate()

2022-03-07 Thread Matthew Wilcox
On Sun, Mar 06, 2022 at 03:41:54PM -0800, Dave Hansen wrote:
> In short: page faults stink.  The core kernel has lots of ways of
> avoiding page faults like madvise(MADV_WILLNEED) or mmap(MAP_POPULATE).
>  But, those only work on normal RAM that the core mm manages.
> 
> SGX is weird.  SGX memory is managed outside the core mm.  It doesn't
> have a 'struct page' and get_user_pages() doesn't work on it.  Its VMAs
> are marked with VM_IO.  So, none of the existing methods for avoiding
> page faults work on SGX memory.
> 
> This essentially helps extend existing "normal RAM" kernel ABIs to work
> for avoiding faults for SGX too.  SGX users want to enjoy all of the
> benefits of a delayed allocation policy (better resource use,
> overcommit, NUMA affinity) but without the cost of millions of faults.

We have a mechanism for dynamically reducing the number of page faults
already; it's just buried in the page cache code.  You have vma->vm_file,
which contains a file_ra_state.  You can use this to track where
recent faults have been and grow the size of the region you fault in
per page fault.  You don't have to (indeed probably don't want to) use
the same algorithm as the page cache, but the _principle_ is the same --
were recent speculative faults actually used; should we grow the number
of pages actually faulted in, or is this a random sparse workload where
we want to allocate individual pages.

Don't rely on the user to ask.  They don't know.


Re: [Intel-gfx] [PATCH RFC 1/3] mm: Add f_ops->populate()

2022-03-06 Thread Matthew Wilcox
On Sun, Mar 06, 2022 at 07:02:57PM +0200, Jarkko Sakkinen wrote:
> So can I conclude from this that in general having populate available for
> device memory is something horrid, or just the implementation path?

You haven't even attempted to explain what the problem is you're trying
to solve.  You've shown up with some terrible code and said "Hey, is
this a good idea".  No, no, it's not.


Re: [Intel-gfx] [PATCH RFC 0/3] MAP_POPULATE for device memory

2022-03-06 Thread Matthew Wilcox
On Sun, Mar 06, 2022 at 07:32:04AM +0200, Jarkko Sakkinen wrote:
> For device memory (aka VM_IO | VM_PFNMAP) MAP_POPULATE does nothing. Allow
> to use that for initializing the device memory by providing a new callback
> f_ops->populate() for the purpose.

As I said, NAK.


Re: [Intel-gfx] [PATCH RFC] mm: Add f_ops->populate()

2022-03-05 Thread Matthew Wilcox
On Sun, Mar 06, 2022 at 06:25:52AM +0200, Jarkko Sakkinen wrote:
> > Are you deliberately avoiding the question?  I'm not asking about
> > implementation.  I'm asking about the semantics of MAP_POPULATE with
> > your driver.
> 
> No. I just noticed a bug in the guard from your comment that I wanted
> point out.
> 
> With the next version I post the corresponding change to the driver,
> in order to see this in context.

Oh, good grief.  Don't bother.  NAK.


Re: [Intel-gfx] [PATCH RFC] mm: Add f_ops->populate()

2022-03-05 Thread Matthew Wilcox
On Sun, Mar 06, 2022 at 06:11:21AM +0200, Jarkko Sakkinen wrote:
> On Sun, Mar 06, 2022 at 03:52:12AM +0000, Matthew Wilcox wrote:
> > On Sun, Mar 06, 2022 at 05:21:11AM +0200, Jarkko Sakkinen wrote:
> > > On Sun, Mar 06, 2022 at 02:57:55AM +0000, Matthew Wilcox wrote:
> > > > On Sun, Mar 06, 2022 at 04:15:33AM +0200, Jarkko Sakkinen wrote:
> > > > > Sometimes you might want to use MAP_POPULATE to ask a device driver to
> > > > > initialize the device memory in some specific manner. SGX driver can 
> > > > > use
> > > > > this to request more memory by issuing ENCLS[EAUG] x86 opcode for each
> > > > > page in the address range.
> > > > > 
> > > > > Add f_ops->populate() with the same parameters as f_ops->mmap() and 
> > > > > make
> > > > > it conditionally called inside call_mmap(). Update call sites
> > > > > accodingly.
> > > > 
> > > > Your device driver has a ->mmap operation.  Why does it need another
> > > > one?  More explanation required here.
> > > 
> > > f_ops->mmap() would require an additional parameter, which results
> > > heavy refactoring.
> > > 
> > > struct file_operations has 1125 references in the kernel tree, so I
> > > decided to check this way around first. 
> > 
> > Are you saying that your device driver behaves differently if
> > MAP_POPULATE is set versus if it isn't?  That seems hideously broken.
> 
> MAP_POPULATE does not do anything (according to __mm_populate in mm/gup.c)
> with VMA's that have some sort of device/IO memory, i.e. vm_flags
> intersecting with VM_PFNMAP | VM_IO.
> 
> I can extend the guard obviously to:
> 
> if (!ret && do_populate && file->f_op->populate &&
> !!(vma->vm_flags & (VM_IO | VM_PFNMAP)))
> file->f_op->populate(file, vma);

Are you deliberately avoiding the question?  I'm not asking about
implementation.  I'm asking about the semantics of MAP_POPULATE with
your driver.


Re: [Intel-gfx] [PATCH RFC] mm: Add f_ops->populate()

2022-03-05 Thread Matthew Wilcox
On Sun, Mar 06, 2022 at 05:21:11AM +0200, Jarkko Sakkinen wrote:
> On Sun, Mar 06, 2022 at 02:57:55AM +0000, Matthew Wilcox wrote:
> > On Sun, Mar 06, 2022 at 04:15:33AM +0200, Jarkko Sakkinen wrote:
> > > Sometimes you might want to use MAP_POPULATE to ask a device driver to
> > > initialize the device memory in some specific manner. SGX driver can use
> > > this to request more memory by issuing ENCLS[EAUG] x86 opcode for each
> > > page in the address range.
> > > 
> > > Add f_ops->populate() with the same parameters as f_ops->mmap() and make
> > > it conditionally called inside call_mmap(). Update call sites
> > > accodingly.
> > 
> > Your device driver has a ->mmap operation.  Why does it need another
> > one?  More explanation required here.
> 
> f_ops->mmap() would require an additional parameter, which results
> heavy refactoring.
> 
> struct file_operations has 1125 references in the kernel tree, so I
> decided to check this way around first. 

Are you saying that your device driver behaves differently if
MAP_POPULATE is set versus if it isn't?  That seems hideously broken.


Re: [Intel-gfx] [PATCH RFC] mm: Add f_ops->populate()

2022-03-05 Thread Matthew Wilcox
On Sun, Mar 06, 2022 at 04:15:33AM +0200, Jarkko Sakkinen wrote:
> Sometimes you might want to use MAP_POPULATE to ask a device driver to
> initialize the device memory in some specific manner. SGX driver can use
> this to request more memory by issuing ENCLS[EAUG] x86 opcode for each
> page in the address range.
> 
> Add f_ops->populate() with the same parameters as f_ops->mmap() and make
> it conditionally called inside call_mmap(). Update call sites
> accodingly.

Your device driver has a ->mmap operation.  Why does it need another
one?  More explanation required here.


Re: [Intel-gfx] [PATCH 2/6] treewide: remove using list iterator after loop body as a ptr

2022-03-01 Thread Matthew Wilcox
On Tue, Mar 01, 2022 at 10:14:07AM -0800, Kees Cook wrote:
> On Mon, Feb 28, 2022 at 04:45:11PM -0800, Linus Torvalds wrote:
> > Really. The "-Wshadow doesn't work on the kernel" is not some new
> > issue, because you have to do completely insane things to the source
> > code to enable it.
> 
> The first big glitch with -Wshadow was with shadowed global variables.
> GCC 4.8 fixed that, but it still yells about shadowed functions. What
> _almost_ works is -Wshadow=local. At first glace, all the warnings
> look solvable, but then one will eventually discover __wait_event()
> and associated macros that mix when and how deeply it intentionally
> shadows variables. :)

Well, that's just disgusting.  Macros fundamentally shouldn't be
referring to things that aren't in their arguments.  The first step to
cleaning this up is ...

I'll take a look at the rest of cleaning this up soon.

>From 28ffe35d56223d4242b915832299e5acc926737e Mon Sep 17 00:00:00 2001
From: "Matthew Wilcox (Oracle)" 
Date: Tue, 1 Mar 2022 13:47:07 -0500
Subject: [PATCH] wait: Parameterize the return variable to ___wait_event()

Macros should not refer to variables which aren't in their arguments.
Pass the name from its callers.

Signed-off-by: Matthew Wilcox (Oracle) 
---
 include/linux/swait.h| 12 ++--
 include/linux/wait.h | 32 
 include/linux/wait_bit.h |  4 ++--
 3 files changed, 24 insertions(+), 24 deletions(-)

diff --git a/include/linux/swait.h b/include/linux/swait.h
index 6a8c22b8c2a5..5e8e9b13be2d 100644
--- a/include/linux/swait.h
+++ b/include/linux/swait.h
@@ -191,14 +191,14 @@ do {  
\
 } while (0)
 
 #define __swait_event_timeout(wq, condition, timeout)  \
-   ___swait_event(wq, ___wait_cond_timeout(condition), \
+   ___swait_event(wq, ___wait_cond_timeout(condition, __ret),  \
  TASK_UNINTERRUPTIBLE, timeout,\
  __ret = schedule_timeout(__ret))
 
 #define swait_event_timeout_exclusive(wq, condition, timeout)  \
 ({ \
long __ret = timeout;   \
-   if (!___wait_cond_timeout(condition))   \
+   if (!___wait_cond_timeout(condition, __ret))\
__ret = __swait_event_timeout(wq, condition, timeout);  \
__ret;  \
 })
@@ -216,14 +216,14 @@ do {  
\
 })
 
 #define __swait_event_interruptible_timeout(wq, condition, timeout)\
-   ___swait_event(wq, ___wait_cond_timeout(condition), \
+   ___swait_event(wq, ___wait_cond_timeout(condition, __ret),  \
  TASK_INTERRUPTIBLE, timeout,  \
  __ret = schedule_timeout(__ret))
 
 #define swait_event_interruptible_timeout_exclusive(wq, condition, timeout)\
 ({ \
long __ret = timeout;   \
-   if (!___wait_cond_timeout(condition))   \
+   if (!___wait_cond_timeout(condition, __ret))\
__ret = __swait_event_interruptible_timeout(wq, \
condition, timeout);\
__ret;  \
@@ -252,7 +252,7 @@ do {
\
 } while (0)
 
 #define __swait_event_idle_timeout(wq, condition, timeout) \
-   ___swait_event(wq, ___wait_cond_timeout(condition), \
+   ___swait_event(wq, ___wait_cond_timeout(condition, __ret),  \
   TASK_IDLE, timeout,  \
   __ret = schedule_timeout(__ret))
 
@@ -278,7 +278,7 @@ do {
\
 #define swait_event_idle_timeout_exclusive(wq, condition, timeout) \
 ({ \
long __ret = timeout;   \
-   if (!___wait_cond_timeout(condition))   \
+   if (!___wait_cond_timeout(condition, __ret))\
__ret = __swait_event_idle_timeout(wq,  \
   condition, timeout); \
__ret;  \
diff --git a/include/linux/wait.h b/include/linux/wait.h
index 851e07da2583..890cce3c0f2e 1006

Re: [Intel-gfx] [PATCH 2/6] treewide: remove using list iterator after loop body as a ptr

2022-02-28 Thread Matthew Wilcox
On Mon, Feb 28, 2022 at 12:37:15PM -0800, Linus Torvalds wrote:
> On Mon, Feb 28, 2022 at 12:16 PM Matthew Wilcox  wrote:
> >
> > Then we can never use -Wshadow ;-(  I'd love to be able to turn it on;
> > it catches real bugs.
> 
> Oh, we already can never use -Wshadow regardless of things like this.
> That bridge hasn't just been burned, it never existed in the first
> place.
> 
> The whole '-Wshadow' thing simply cannot work with local variables in
> macros - something that we've used since day 1.
> 
> Try this (as a "p.c" file):
> 
> #define min(a,b) ({ \
> typeof(a) __a = (a);\
> typeof(b) __b = (b);\
> __a < __b ? __a : __b; })
> 
> int min3(int a, int b, int c)
> {
> return min(a,min(b,c));
> }
> 
> and now do "gcc -O2 -S t.c".
> 
> Then try it with -Wshadow.

#define ___PASTE(a, b)  a##b
#define __PASTE(a, b) ___PASTE(a, b)
#define _min(a, b, u) ({ \
typeof(a) __PASTE(__a,u) = (a);\
typeof(b) __PASTE(__b,u) = (b);\
__PASTE(__a,u) < __PASTE(__b,u) ? __PASTE(__a,u) : __PASTE(__b,u); })

#define min(a, b) _min(a, b, __COUNTER__)

int min3(int a, int b, int c)
{
return min(a,min(b,c));
}

(probably there's a more elegant way to do this)


Re: [Intel-gfx] [PATCH 2/6] treewide: remove using list iterator after loop body as a ptr

2022-02-28 Thread Matthew Wilcox
On Mon, Feb 28, 2022 at 12:10:24PM -0800, Linus Torvalds wrote:
> We can do
> 
> typeof(pos) pos
> 
> in the 'for ()' loop, and never use __iter at all.
> 
> That means that inside the for-loop, we use a _different_ 'pos' than outside.

Then we can never use -Wshadow ;-(  I'd love to be able to turn it on;
it catches real bugs.

> +#define list_for_each_entry(pos, head, member)   
> \
> + for (typeof(pos) pos = list_first_entry(head, typeof(*pos), member);
> \
> +  !list_entry_is_head(pos, head, member);\
>pos = list_next_entry(pos, member))


Re: [Intel-gfx] [PATCH v12 1/6] drm: Add arch arm64 for drm_clflush_virt_range

2022-02-25 Thread Matthew Wilcox
On Fri, Feb 25, 2022 at 06:42:37PM +, Tvrtko Ursulin wrote:
> Matthew, what do you think fix for this build warning on h8300 and s390 
> should be? Or perhaps a build environment issue with kernel test robot?

I'd suggest this should do the job:

+++ b/include/linux/cacheflush.h
@@ -4,6 +4,8 @@

 #include 

+struct folio;
+
 #if ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE
 #ifndef ARCH_IMPLEMENTS_FLUSH_DCACHE_FOLIO
 void flush_dcache_folio(struct folio *folio);



Re: [Intel-gfx] [PATCH v5 3/9] dyndbg: add DEFINE_DYNAMIC_DEBUG_CATEGORIES and callbacks

2021-08-13 Thread Matthew Wilcox
On Fri, Aug 13, 2021 at 06:51:05PM +0300, Andy Shevchenko wrote:
> On Fri, Aug 13, 2021 at 09:17:11AM -0600, Jim Cromie wrote:
> > +int param_set_dyndbg(const char *instr, const struct kernel_param *kp)
> > +{
> > +   unsigned long inbits;
> > +   int rc, i, chgct = 0, totct = 0;
> > +   char query[OUR_QUERY_SIZE];
> > +   struct dyndbg_bitdesc *bitmap = (struct dyndbg_bitdesc *) kp->data;
> 
> So you need space after ')' ?

More importantly, if ->data is of type 'void *', it is bad style to
cast the pointer at all.  I can't tell what type 'data' has; if it
is added to kernel_param as part of this series, I wasn't cc'd on
the patch that did that.



Re: [Intel-gfx] [PATCH 00/53] Get rid of UTF-8 chars that can be mapped as ASCII

2021-05-10 Thread Matthew Wilcox
On Mon, May 10, 2021 at 02:16:16PM +0100, Edward Cree wrote:
> On 10/05/2021 12:55, Mauro Carvalho Chehab wrote:
> > The main point on this series is to replace just the occurrences
> > where ASCII represents the symbol equally well
> 
> > - U+2014 ('—'): EM DASH
> Em dash is not the same thing as hyphen-minus, and the latter does not
>  serve 'equally well'.  People use em dashes because — even in
>  monospace fonts — they make text easier to read and comprehend, when
>  used correctly.
> I accept that some of the other distinctions — like en dashes — are
>  needlessly pedantic (though I don't doubt there is someone out there
>  who will gladly defend them with the same fervour with which I argue
>  for the em dash) and I wouldn't take the trouble to use them myself;
>  but I think there is a reasonable assumption that when someone goes
>  to the effort of using a Unicode punctuation mark that is semantic
>  (rather than merely typographical), they probably had a reason for
>  doing so.

I think you're overestimating the amount of care and typographical
knowledge that your average kernel developer has.  Most of these
UTF-8 characters come from latex conversions and really aren't
necessary (and are being used incorrectly).

You seem quite knowedgeable about the various differences.  Perhaps
you'd be willing to write a document for Documentation/doc-guide/
that provides guidance for when to use which kinds of horizontal
line?  https://www.punctuationmatters.com/hyphen-dash-n-dash-and-m-dash/
talks about it in the context of publications, but I think we need
something more suited to our needs for kernel documentation.
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] Missing DPPLL case on i7-1165G7

2021-01-04 Thread Matthew Wilcox
On Tue, Dec 29, 2020 at 04:41:31PM +0200, Imre Deak wrote:
> Hi,
> 
> On Mon, Dec 21, 2020 at 04:07:58AM +, Matthew Wilcox wrote:
> > 
> > At boot,
> > 
> > [2.787995] [drm:lspcon_init [i915]] *ERROR* Failed to probe lspcon
> > [2.788001] i915 :00:02.0: [drm] *ERROR* LSPCON init failed on port E
> > [2.790752] [ cut here ]
> > [2.790753] Missing case (clock == 539440)
> > [2.790790] WARNING: CPU: 0 PID: 159 at 
> > drivers/gpu/drm/i915/display/intel_dpll_mgr.c:2967 
> > icl_get_dplls+0x53a/0xa50 [i915]
> 
> the above warn looks to be due to a missing workaround fixed by
> 
> commit 0e2497e334de42dbaaee8e325241b5b5b34ede7e
> Author: Imre Deak 
> Date:   Sat Oct 3 03:18:46 2020 +0300
> 
> drm/i915/tgl: Fix Combo PHY DPLL fractional divider for 38.4MHz ref clock
> 
> in drm-tip. Could you give it a try?

I tried -rc2, which contains that commit, and the problem is gone.  Thank
you!

There is a different problem, which is that the brightness buttons
(on F2 and F3 on this laptop) do not actually increase/decrease the
brightness.  GNOME pops up a graphic that illustrates it is changing
the brightness, but nothing actually changes.

xbacklight says "No outputs have backlight property" and using
xrandr --output XWAYLAND0 --brightness 0.0001 doesn't change anything
(for various different values, not just 0.0001).  Using xrandr --prop
--verbose shows the reported value of "Brightness" changing, but nothing
has changed on the screen.

I found
/sys/devices/pci:00/:00:02.0/drm/card0/card0-eDP-1/intel_backlight
and tried setting 'brightness' in there to a few different values (100,
2000, 19200, 7000) and also nothing changed.

Any thoughts?
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] Missing DPPLL case on i7-1165G7

2020-12-20 Thread Matthew Wilcox


At boot,

[2.787995] [drm:lspcon_init [i915]] *ERROR* Failed to probe lspcon
[2.788001] i915 :00:02.0: [drm] *ERROR* LSPCON init failed on port E
[2.790752] [ cut here ]
[2.790753] Missing case (clock == 539440)
[2.790790] WARNING: CPU: 0 PID: 159 at 
drivers/gpu/drm/i915/display/intel_dpll_mgr.c:2967 icl_get_dplls+0x53a/0xa50 
[i915]

In drivers/gpu/drm/i915/display/intel_dpll_mgr.c, I see an entry for 540,000.
Presumbly, this clock was supposed to be rounded up to that somewhere.
This is an HP Spectre x360 using the internal display.  Here's the EDID
in case that's useful:

$ edid-decode /sys/class/drm/card0-eDP-1/edid
edid-decode (hex):

00 ff ff ff ff ff ff 00 4c 83 49 41 00 00 00 00 
13 1d 01 04 b5 1d 11 78 02 38 d1 ae 51 3b b8 23 
0b 50 54 00 00 00 01 01 01 01 01 01 01 01 01 01 
01 01 01 01 01 01 b9 d5 00 40 f1 70 20 80 30 20 
88 00 26 a5 10 00 00 1b b9 d5 00 40 f1 70 20 80 
30 20 88 00 26 a5 10 00 00 1b 00 00 00 0f 00 ff 
09 3c ff 09 3c 2c 80 00 00 00 00 00 00 00 00 10 
00 00 01 00 00 00 00 00 00 00 00 00 00 00 01 af 

02 03 0f 00 e3 05 80 00 e6 06 05 01 73 6d 07 00 
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ab 



EDID version: 1.4
Manufacturer: SDC Model 16713 Serial Number 0
Made in week 19 of 2019
Digital display
10 bits per primary color channel
DisplayPort interface
Maximum image size: 29 cm x 17 cm
Gamma: 2.20
Supported color formats: RGB 4:4:4
First detailed timing includes the native pixel format and preferred refresh 
rate
Color Characteristics
  Red:   0.6796, 0.3193
  Green: 0.2324, 0.7187
  Blue:  0.1396, 0.0439
  White: 0.3125, 0.3291
Established Timings I & II: none
Standard Timings: none
Detailed mode: Clock 547.130 MHz, 294 mm x 165 mm
   3840 3888 3920 4160 ( 48  32 240)
   2160 2168 2176 2192 (  8   8  16)
   +hsync -vsync
   VertFreq: 60.001 Hz, HorFreq: 131.522 kHz
Detailed mode: Clock 547.130 MHz, 294 mm x 165 mm
   3840 3888 3920 4160 ( 48  32 240)
   2160 2168 2176 2192 (  8   8  16)
   +hsync -vsync
   VertFreq: 60.001 Hz, HorFreq: 131.522 kHz
Manufacturer-Specified Display Descriptor (0x0f): 00 0f 00 ff 09 3c ff 09 3c 2c 
80 00 00 00 00 00  .<..<,..
Dummy Descriptor
Has 1 extension block
Checksum: 0xaf



CTA-861 Extension Block Revision 3
0 native detailed modes
11 bytes of CTA data blocks
  Extended tag: Colorimetry Data Block
BT2020RGB
  Extended tag: HDR Static Metadata Data Block
Electro optical transfer functions:
  Traditional gamma - SDR luminance range
  SMPTE ST2084
Supported static metadata descriptors:
  Static metadata type 1
Desired content max luminance: 115 (603.666 cd/m^2)
Desired content max frame-average luminance: 109 (530.095 cd/m^2)
Desired content min luminance: 7 (0.005 cd/m^2)
Checksum: 0xab


___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [RFC] MAINTAINERS tag for cleanup robot

2020-11-22 Thread Matthew Wilcox
On Sun, Nov 22, 2020 at 06:46:46AM -0800, Tom Rix wrote:
> 
> On 11/21/20 7:23 PM, Matthew Wilcox wrote:
> > On Sat, Nov 21, 2020 at 08:50:58AM -0800, t...@redhat.com wrote:
> >> The fixer review is
> >> https://reviews.llvm.org/D91789
> >>
> >> A run over allyesconfig for x86_64 finds 62 issues, 5 are false positives.
> >> The false positives are caused by macros passed to other macros and by
> >> some macro expansions that did not have an extra semicolon.
> >>
> >> This cleans up about 1,000 of the current 10,000 -Wextra-semi-stmt
> >> warnings in linux-next.
> > Are any of them not false-positives?  It's all very well to enable
> > stricter warnings, but if they don't fix any bugs, they're just churn.
> >
> While enabling additional warnings may be a side effect of this effort
> 
> the primary goal is to set up a cleaning robot. After that a refactoring 
> robot.

Why do we need such a thing?  Again, it sounds like more churn.
It's really annoying when I'm working on something important that gets
derailed by pointless churn.  Churn also makes it harder to backport
patches to earlier kernels.
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [RFC] MAINTAINERS tag for cleanup robot

2020-11-21 Thread Matthew Wilcox
On Sat, Nov 21, 2020 at 08:50:58AM -0800, t...@redhat.com wrote:
> The fixer review is
> https://reviews.llvm.org/D91789
> 
> A run over allyesconfig for x86_64 finds 62 issues, 5 are false positives.
> The false positives are caused by macros passed to other macros and by
> some macro expansions that did not have an extra semicolon.
> 
> This cleans up about 1,000 of the current 10,000 -Wextra-semi-stmt
> warnings in linux-next.

Are any of them not false-positives?  It's all very well to enable
stricter warnings, but if they don't fix any bugs, they're just churn.
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 04/65] mm: Extract might_alloc() debug check

2020-10-23 Thread Matthew Wilcox
On Fri, Oct 23, 2020 at 02:21:15PM +0200, Daniel Vetter wrote:
> Note that unlike fs_reclaim_acquire/release gfpflags_allow_blocking
> does not consult the PF_MEMALLOC flags. But there is no flag
> equivalent for GFP_NOWAIT, hence this check can't go wrong due to
> memalloc_no*_save/restore contexts.

I have a patch series that adds memalloc_nowait_save/restore.

https://lore.kernel.org/linux-mm/20200625113122.7540-7-wi...@infradead.org/
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH RFC PKS/PMEM 33/58] fs/cramfs: Utilize new kmap_thread()

2020-10-13 Thread Matthew Wilcox
On Tue, Oct 13, 2020 at 11:44:29AM -0700, Dan Williams wrote:
> On Fri, Oct 9, 2020 at 12:52 PM  wrote:
> >
> > From: Ira Weiny 
> >
> > The kmap() calls in this FS are localized to a single thread.  To avoid
> > the over head of global PKRS updates use the new kmap_thread() call.
> >
> > Cc: Nicolas Pitre 
> > Signed-off-by: Ira Weiny 
> > ---
> >  fs/cramfs/inode.c | 10 +-
> >  1 file changed, 5 insertions(+), 5 deletions(-)
> >
> > diff --git a/fs/cramfs/inode.c b/fs/cramfs/inode.c
> > index 912308600d39..003c014a42ed 100644
> > --- a/fs/cramfs/inode.c
> > +++ b/fs/cramfs/inode.c
> > @@ -247,8 +247,8 @@ static void *cramfs_blkdev_read(struct super_block *sb, 
> > unsigned int offset,
> > struct page *page = pages[i];
> >
> > if (page) {
> > -   memcpy(data, kmap(page), PAGE_SIZE);
> > -   kunmap(page);
> > +   memcpy(data, kmap_thread(page), PAGE_SIZE);
> > +   kunmap_thread(page);
> 
> Why does this need a sleepable kmap? This looks like a textbook
> kmap_atomic() use case.

There's a lot of code of this form.  Could we perhaps have:

static inline void copy_to_highpage(struct page *to, void *vfrom, unsigned int 
size)
{
char *vto = kmap_atomic(to);

memcpy(vto, vfrom, size);
kunmap_atomic(vto);
}

in linux/highmem.h ?
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH RFC PKS/PMEM 22/58] fs/f2fs: Utilize new kmap_thread()

2020-10-12 Thread Matthew Wilcox
On Mon, Oct 12, 2020 at 12:53:54PM -0700, Ira Weiny wrote:
> On Mon, Oct 12, 2020 at 05:44:38PM +0100, Matthew Wilcox wrote:
> > On Mon, Oct 12, 2020 at 09:28:29AM -0700, Dave Hansen wrote:
> > > kmap_atomic() is always preferred over kmap()/kmap_thread().
> > > kmap_atomic() is _much_ more lightweight since its TLB invalidation is
> > > always CPU-local and never broadcast.
> > > 
> > > So, basically, unless you *must* sleep while the mapping is in place,
> > > kmap_atomic() is preferred.
> > 
> > But kmap_atomic() disables preemption, so the _ideal_ interface would map
> > it only locally, then on preemption make it global.  I don't even know
> > if that _can_ be done.  But this email makes it seem like kmap_atomic()
> > has no downsides.
> 
> And that is IIUC what Thomas was trying to solve.
> 
> Also, Linus brought up that kmap_atomic() has quirks in nesting.[1]
> 
> >From what I can see all of these discussions support the need to have 
> >something
> between kmap() and kmap_atomic().
> 
> However, the reason behind converting call sites to kmap_thread() are 
> different
> between Thomas' patch set and mine.  Both require more kmap granularity.
> However, they do so with different reasons and underlying implementations but
> with the _same_ resulting semantics; a thread local mapping which is
> preemptable.[2]  Therefore they each focus on changing different call sites.
> 
> While this patch set is huge I think it serves a valuable purpose to identify 
> a
> large number of call sites which are candidates for this new semantic.

Yes, I agree.  My problem with this patch-set is that it ties it to
some Intel feature that almost nobody cares about.  Maybe we should
care about it, but you didn't try very hard to make anyone care about
it in the cover letter.

For a future patch-set, I'd like to see you just introduce the new
API.  Then you can optimise the Intel implementation of it afterwards.
Those patch-sets have entirely different reviewers.
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH RFC PKS/PMEM 22/58] fs/f2fs: Utilize new kmap_thread()

2020-10-12 Thread Matthew Wilcox
On Mon, Oct 12, 2020 at 09:28:29AM -0700, Dave Hansen wrote:
> kmap_atomic() is always preferred over kmap()/kmap_thread().
> kmap_atomic() is _much_ more lightweight since its TLB invalidation is
> always CPU-local and never broadcast.
> 
> So, basically, unless you *must* sleep while the mapping is in place,
> kmap_atomic() is preferred.

But kmap_atomic() disables preemption, so the _ideal_ interface would map
it only locally, then on preemption make it global.  I don't even know
if that _can_ be done.  But this email makes it seem like kmap_atomic()
has no downsides.
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH RFC PKS/PMEM 22/58] fs/f2fs: Utilize new kmap_thread()

2020-10-09 Thread Matthew Wilcox
On Fri, Oct 09, 2020 at 02:34:34PM -0700, Eric Biggers wrote:
> On Fri, Oct 09, 2020 at 12:49:57PM -0700, ira.we...@intel.com wrote:
> > The kmap() calls in this FS are localized to a single thread.  To avoid
> > the over head of global PKRS updates use the new kmap_thread() call.
> >
> > @@ -2410,12 +2410,12 @@ static inline struct page *f2fs_pagecache_get_page(
> >  
> >  static inline void f2fs_copy_page(struct page *src, struct page *dst)
> >  {
> > -   char *src_kaddr = kmap(src);
> > -   char *dst_kaddr = kmap(dst);
> > +   char *src_kaddr = kmap_thread(src);
> > +   char *dst_kaddr = kmap_thread(dst);
> >  
> > memcpy(dst_kaddr, src_kaddr, PAGE_SIZE);
> > -   kunmap(dst);
> > -   kunmap(src);
> > +   kunmap_thread(dst);
> > +   kunmap_thread(src);
> >  }
> 
> Wouldn't it make more sense to switch cases like this to kmap_atomic()?
> The pages are only mapped to do a memcpy(), then they're immediately unmapped.

Maybe you missed the earlier thread from Thomas trying to do something
similar for rather different reasons ...

https://lore.kernel.org/lkml/20200919091751.06...@linutronix.de/
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 3/6] drm/i915: use vmap in shmem_pin_map

2020-09-22 Thread Matthew Wilcox
On Tue, Sep 22, 2020 at 04:39:06PM +0200, Christoph Hellwig wrote:
> On Tue, Sep 22, 2020 at 12:21:44PM +0100, Matthew Wilcox wrote:
> > Actually, vfree() will work today; I cc'd you on a documentation update
> > to make it clear that this is permitted.
> 
> vfree calls __free_pages, the i915 and a lot of other code calls
> put_page.  They are mostly the same, but not quite and everytime I
> look into that mess I'm more confused than before.
> 
> Can someone in the know write sensible documentation on when to use
> __free_page(s) vs put_page?

I started on that, and then I found a bug that's been lurking for 12
years, so that delayed the documentation somewhat.  The short answer is
that __free_pages() lets you free non-compound high-order pages while
put_page() can only free order-0 and compound pages.

I would really like to overhaul our memory allocation APIs:

current new
__get_free_page(s)  alloc_page(s)
free_page(s)free_page(s)
alloc_page(s)   get_free_page(s)
__free_pagesput_page_order

Then put_page() and put_page_order() are more obviously friends.

But I cannot imagine a world in which Linus says yes to that upheaval.
He's previous expressed dislike of the get_free_page() family of APIs,
and thinks all those callers should just use kmalloc().  Maybe we can
make that transition happen, now that kmalloc() aligns larger allocations.
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 3/6] drm/i915: use vmap in shmem_pin_map

2020-09-22 Thread Matthew Wilcox
On Tue, Sep 22, 2020 at 08:22:49AM +0200, Christoph Hellwig wrote:
> On Mon, Sep 21, 2020 at 08:11:57PM +0100, Matthew Wilcox wrote:
> > This is awkward.  I'd like it if we had a vfree() variant which called
> > put_page() instead of __free_pages().  I'd like it even more if we
> > used release_pages() instead of our own loop that called put_page().
> 
> Note that we don't need a new vfree variant, we can do this manually if
> we want, take a look at kernel/dma/remap.c.  But I thought this code
> intentionally doesn't want to do that to avoid locking in the memory
> for the pages array.  Maybe the i915 maintainers can clarify.

Actually, vfree() will work today; I cc'd you on a documentation update
to make it clear that this is permitted.

>From my current experience with the i915 shmem code, I think that the
i915 maintainers are experts at graphics, and are unfamiliar with the MM.
There are a number of places where they do things the hard way.
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 3/6] drm/i915: use vmap in shmem_pin_map

2020-09-21 Thread Matthew Wilcox
On Fri, Sep 18, 2020 at 06:37:21PM +0200, Christoph Hellwig wrote:
>  void shmem_unpin_map(struct file *file, void *ptr)
>  {
> + long i = shmem_npages(file);
> +
>   mapping_clear_unevictable(file->f_mapping);
> - __shmem_unpin_map(file, ptr, shmem_npte(file));
> + vunmap(ptr);
> +
> + for (i = 0; i < shmem_npages(file); i++) {
> + struct page *page;
> +
> + page = shmem_read_mapping_page_gfp(file->f_mapping, i,
> +GFP_KERNEL);
> + if (!WARN_ON(IS_ERR(page))) {
> + put_page(page);
> + put_page(page);
> + }
> + }
>  }

This is awkward.  I'd like it if we had a vfree() variant which called
put_page() instead of __free_pages().  I'd like it even more if we
used release_pages() instead of our own loop that called put_page().

Perhaps something like this ...

+++ b/mm/vmalloc.c
@@ -2262,7 +2262,7 @@ static void __vunmap(const void *addr, int deallocate_page
s)
 
vm_remove_mappings(area, deallocate_pages);
 
-   if (deallocate_pages) {
+   if (deallocate_pages == 1) {
int i;
 
for (i = 0; i < area->nr_pages; i++) {
@@ -2271,8 +2271,12 @@ static void __vunmap(const void *addr, int 
deallocate_pages)
BUG_ON(!page);
__free_pages(page, 0);
}
-   atomic_long_sub(area->nr_pages, &nr_vmalloc_pages);
+   } else if (deallocate_pages == 2) {
+   release_pages(area->pages, area->nr_pages);
+   }
 
+   if (deallocate_pages) {
+   atomic_long_sub(area->nr_pages, &nr_vmalloc_pages);
kvfree(area->pages);
}
@@ -2369,6 +2373,14 @@ void vunmap(const void *addr)
 }
 EXPORT_SYMBOL(vunmap);
 
+void vunmap_put_pages(const void *addr)
+{
+   BUG_ON(in_interrupt());
+   might_sleep();
+   if (addr)
+   __vunmap(addr, 2);
+}
+
 /**
  * vmap - map an array of pages into virtually contiguous space
  * @pages: array of page pointers

only with kernel-doc and so on.  I bet somebody has a better idea for a name.
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [patch RFC 00/15] mm/highmem: Provide a preemptible variant of kmap_atomic & friends

2020-09-19 Thread Matthew Wilcox
On Sat, Sep 19, 2020 at 10:18:54AM -0700, Linus Torvalds wrote:
> On Sat, Sep 19, 2020 at 2:50 AM Thomas Gleixner  wrote:
> >
> > this provides a preemptible variant of kmap_atomic & related
> > interfaces. This is achieved by:
> 
> Ack. This looks really nice, even apart from the new capability.
> 
> The only thing I really reacted to is that the name doesn't make sense
> to me: "kmap_temporary()" seems a bit odd.
> 
> Particularly for an interface that really is basically meant as a
> better replacement of "kmap_atomic()" (but is perhaps also a better
> replacement for "kmap()").
> 
> I think I understand how the name came about: I think the "temporary"
> is there as a distinction from the "longterm" regular kmap(). So I
> think it makes some sense from an internal implementation angle, but I
> don't think it makes a lot of sense from an interface name.
> 
> I don't know what might be a better name, but if we want to emphasize
> that it's thread-private and a one-off, maybe "local" would be a
> better naming, and make it distinct from the "global" nature of the
> old kmap() interface?
> 
> However, another solution might be to just use this new preemptible
> "local" kmap(), and remove the old global one entirely. Yes, the old
> global one caches the page table mapping and that sounds really
> efficient and nice. But it's actually horribly horribly bad, because
> it means that we need to use locking for them. Your new "temporary"
> implementation seems to be fundamentally better locking-wise, and only
> need preemption disabling as locking (and is equally fast for the
> non-highmem case).
> 
> So I wonder if the single-page TLB flush isn't a better model, and
> whether it wouldn't be a lot simpler to just get rid of the old
> complex kmap() entirely, and replace it with this?
> 
> I agree we can't replace the kmap_atomic() version, because maybe
> people depend on the preemption disabling it also implied. But what
> about replacing the non-atomic kmap()?

My concern with that is people might use kmap() and then pass the address
to a different task.  So we need to audit the current users of kmap()
and convert any that do that into using vmap() instead.

I like kmap_local().  Or kmap_thread().
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [patch 00/13] preempt: Make preempt count unconditional

2020-09-16 Thread Matthew Wilcox
On Mon, Sep 14, 2020 at 11:55:24PM +0200, Thomas Gleixner wrote:
> But just look at any check which uses preemptible(), especially those
> which check !preemptible():

hmm.

+++ b/include/linux/preempt.h
@@ -180,7 +180,9 @@ do { \
 
 #define preempt_enable_no_resched() sched_preempt_enable_no_resched()
 
+#ifndef MODULE
 #define preemptible()  (preempt_count() == 0 && !irqs_disabled())
+#endif
 
 #ifdef CONFIG_PREEMPTION
 #define preempt_enable() \


$ git grep -w preemptible drivers
(slightly trimmed by hand to remove, eg, comments)
drivers/firmware/arm_sdei.c:WARN_ON_ONCE(preemptible());
drivers/firmware/arm_sdei.c:WARN_ON_ONCE(preemptible());
drivers/firmware/arm_sdei.c:WARN_ON_ONCE(preemptible());
drivers/firmware/arm_sdei.c:WARN_ON_ONCE(preemptible());
drivers/firmware/arm_sdei.c:WARN_ON(preemptible());
drivers/firmware/efi/efi-pstore.c:preemptible(), 
record->size, record->psi->buf);
drivers/irqchip/irq-gic-v4.c:   WARN_ON(preemptible());
drivers/irqchip/irq-gic-v4.c:   WARN_ON(preemptible());
drivers/scsi/hisi_sas/hisi_sas_main.c:  if (!preemptible())
drivers/xen/time.c: BUG_ON(preemptible());

That only looks like two drivers that need more than WARNectomies.

Although maybe rcu_read_load_sched_held() or rcu_read_lock_any_held()
might get called from a module ...
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH v2 0/8] Return head pages from find_*_entry

2020-09-15 Thread Matthew Wilcox
On Tue, Sep 15, 2020 at 06:23:27PM +0530, Naresh Kamboju wrote:
> While running kselftest mincore tests the following kernel BUG reported on the
> linux next-20200915 tag on x86_64, i386 and arm64.

https://lore.kernel.org/linux-mm/20200914112738.gm6...@casper.infradead.org/
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH v2 3/8] mm: Optimise madvise WILLNEED

2020-09-14 Thread Matthew Wilcox
On Mon, Sep 14, 2020 at 12:17:07PM -0400, Qian Cai wrote:
> Reverting the "Return head pages from find_*_entry" patchset [1] up to this
> patch fixed the issue that LTP madvise06 test [2] would trigger endless soft-
> lockups below. It does not help after applied patches fixed other separate
> issues in the patchset [3][4].

Thanks for the report.  Could you try this?

diff --git a/mm/madvise.c b/mm/madvise.c
index 96189acd6969..2d9ceccb338d 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -234,6 +234,7 @@ static void force_shm_swapin_readahead(struct 
vm_area_struct *vma,
 
if (!xa_is_value(page))
continue;
+   xas_pause(&xas);
rcu_read_unlock();
 
swap = radix_to_swp_entry(page);
@@ -243,7 +244,6 @@ static void force_shm_swapin_readahead(struct 
vm_area_struct *vma,
put_page(page);
 
rcu_read_lock();
-   xas_reset(&xas);
}
rcu_read_unlock();
 
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [mm] 2037ab69a5: BUG:KASAN:null-ptr-deref_in_t

2020-09-14 Thread Matthew Wilcox
On Mon, Sep 14, 2020 at 04:55:45PM +0800, kernel test robot wrote:
> Greeting,
> 
> FYI, we noticed the following commit (built with gcc-9):
> 
> commit: 2037ab69a5cd8afe58347135010f6160ea368dd0 ("mm: Convert find_get_entry 
> to return the head page")

Thank you!

diff --git a/mm/swap_state.c b/mm/swap_state.c
index c2fb62f660a5..a22c2430e80c 100644
--- a/mm/swap_state.c
+++ b/mm/swap_state.c
@@ -427,6 +427,8 @@ struct page *find_get_incore_page(struct address_space 
*mapping, pgoff_t index)
struct swap_info_struct *si;
struct page *page = find_get_entry(mapping, index);
 
+   if (!page)
+   return page;
if (!xa_is_value(page))
return find_subpage(page, index);
if (!shmem_mapping(mapping))

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH v2 7/8] mm/shmem: Return head page from find_lock_entry

2020-09-11 Thread Matthew Wilcox
On Thu, Sep 10, 2020 at 07:33:17PM +0100, Matthew Wilcox (Oracle) wrote:
> Convert shmem_getpage_gfp() (the only remaining caller of
> find_lock_entry()) to cope with a head page being returned instead of
> the subpage for the index.

This version was buggy.  Apparently I was too focused on running the test suite 
against XFS and neglected to run it against tmpfs, which crashed instantly.

Here's the patch I should have sent.

commit 7bfa655881da76f3386e6d4c07e38a165b4a6ca8
Author: Matthew Wilcox (Oracle) 
Date:   Sun Aug 2 07:22:34 2020 -0400

mm/shmem: Return head page from find_lock_entry

Convert shmem_getpage_gfp() (the only remaining caller of
find_lock_entry()) to cope with a head page being returned instead of
the subpage for the index.

Signed-off-by: Matthew Wilcox (Oracle) 

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 905a64030647..f374618b2c93 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -371,6 +371,15 @@ static inline struct page *grab_cache_page_nowait(struct 
address_space *mapping,
mapping_gfp_mask(mapping));
 }
 
+/* Does this page contain this index? */
+static inline bool thp_contains(struct page *head, pgoff_t index)
+{
+   /* HugeTLBfs indexes the page cache in units of hpage_size */
+   if (PageHuge(head))
+   return head->index == index;
+   return page_index(head) == (index & ~(thp_nr_pages(head) - 1UL));
+}
+
 /*
  * Given the page we found in the page cache, return the page corresponding
  * to this index in the file
diff --git a/mm/filemap.c b/mm/filemap.c
index 2f134383b0ae..453535170b8d 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1614,37 +1614,34 @@ struct page *find_get_entry(struct address_space 
*mapping, pgoff_t index)
 }
 
 /**
- * find_lock_entry - locate, pin and lock a page cache entry
- * @mapping: the address_space to search
- * @offset: the page cache index
+ * find_lock_entry - Locate and lock a page cache entry.
+ * @mapping: The address_space to search.
+ * @index: The page cache index.
  *
- * Looks up the page cache slot at @mapping & @offset.  If there is a
- * page cache page, it is returned locked and with an increased
- * refcount.
+ * Looks up the page at @mapping & @index.  If there is a page in the
+ * cache, the head page is returned locked and with an increased refcount.
  *
  * If the slot holds a shadow entry of a previously evicted page, or a
  * swap entry from shmem/tmpfs, it is returned.
  *
- * find_lock_entry() may sleep.
- *
- * Return: the found page or shadow entry, %NULL if nothing is found.
+ * Context: May sleep.
+ * Return: The head page or shadow entry, %NULL if nothing is found.
  */
-struct page *find_lock_entry(struct address_space *mapping, pgoff_t offset)
+struct page *find_lock_entry(struct address_space *mapping, pgoff_t index)
 {
struct page *page;
 
 repeat:
-   page = find_get_entry(mapping, offset);
+   page = find_get_entry(mapping, index);
if (page && !xa_is_value(page)) {
lock_page(page);
/* Has the page been truncated? */
-   if (unlikely(page_mapping(page) != mapping)) {
+   if (unlikely(page->mapping != mapping)) {
unlock_page(page);
put_page(page);
goto repeat;
}
-   page = find_subpage(page, offset);
-   VM_BUG_ON_PAGE(page_to_pgoff(page) != offset, page);
+   VM_BUG_ON_PAGE(!thp_contains(page, index), page);
}
return page;
 }
diff --git a/mm/shmem.c b/mm/shmem.c
index 271548ca20f3..58bc9e326d0d 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1822,6 +1822,8 @@ static int shmem_getpage_gfp(struct inode *inode, pgoff_t 
index,
return error;
}
 
+   if (page)
+   hindex = page->index;
if (page && sgp == SGP_WRITE)
mark_page_accessed(page);
 
@@ -1832,11 +1834,10 @@ static int shmem_getpage_gfp(struct inode *inode, 
pgoff_t index,
unlock_page(page);
put_page(page);
page = NULL;
+   hindex = index;
}
-   if (page || sgp == SGP_READ) {
-   *pagep = page;
-   return 0;
-   }
+   if (page || sgp == SGP_READ)
+   goto out;
 
/*
 * Fast cache lookup did not find it:
@@ -1961,14 +1962,13 @@ static int shmem_getpage_gfp(struct inode *inode, 
pgoff_t index,
 * it now, lest undo on failure cancel our earlier guarantee.
 */
if (sgp != SGP_WRITE && !PageUptodate(page)) {
-   struct page *head = compound_head(page);
int i;
 
-   for (i = 0; i < compound_nr(head); i++) {
-   clear_highpage(head + i);
-   flush_dcache_page(h

[Intel-gfx] [PATCH v2 2/8] mm: Use find_get_incore_page in memcontrol

2020-09-10 Thread Matthew Wilcox (Oracle)
The current code does not protect against swapoff of the underlying
swap device, so this is a bug fix as well as a worthwhile reduction in
code complexity.

Signed-off-by: Matthew Wilcox (Oracle) 
---
 mm/memcontrol.c | 24 ++--
 1 file changed, 2 insertions(+), 22 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index b807952b4d43..2f02eaee7115 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -5539,35 +5539,15 @@ static struct page *mc_handle_swap_pte(struct 
vm_area_struct *vma,
 static struct page *mc_handle_file_pte(struct vm_area_struct *vma,
unsigned long addr, pte_t ptent, swp_entry_t *entry)
 {
-   struct page *page = NULL;
-   struct address_space *mapping;
-   pgoff_t pgoff;
-
if (!vma->vm_file) /* anonymous vma */
return NULL;
if (!(mc.flags & MOVE_FILE))
return NULL;
 
-   mapping = vma->vm_file->f_mapping;
-   pgoff = linear_page_index(vma, addr);
-
/* page is moved even if it's not RSS of this task(page-faulted). */
-#ifdef CONFIG_SWAP
/* shmem/tmpfs may report page out on swap: account for that too. */
-   if (shmem_mapping(mapping)) {
-   page = find_get_entry(mapping, pgoff);
-   if (xa_is_value(page)) {
-   swp_entry_t swp = radix_to_swp_entry(page);
-   *entry = swp;
-   page = find_get_page(swap_address_space(swp),
-swp_offset(swp));
-   }
-   } else
-   page = find_get_page(mapping, pgoff);
-#else
-   page = find_get_page(mapping, pgoff);
-#endif
-   return page;
+   return find_get_incore_page(vma->vm_file->f_mapping,
+   linear_page_index(vma, addr));
 }
 
 /**
-- 
2.28.0

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH v2 3/8] mm: Optimise madvise WILLNEED

2020-09-10 Thread Matthew Wilcox (Oracle)
Instead of calling find_get_entry() for every page index, use an XArray
iterator to skip over NULL entries, and avoid calling get_page(),
because we only want the swap entries.

Signed-off-by: Matthew Wilcox (Oracle) 
Acked-by: Johannes Weiner 
---
 mm/madvise.c | 21 -
 1 file changed, 12 insertions(+), 9 deletions(-)

diff --git a/mm/madvise.c b/mm/madvise.c
index dd1d43cf026d..96189acd6969 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -224,25 +224,28 @@ static void force_shm_swapin_readahead(struct 
vm_area_struct *vma,
unsigned long start, unsigned long end,
struct address_space *mapping)
 {
-   pgoff_t index;
+   XA_STATE(xas, &mapping->i_pages, linear_page_index(vma, start));
+   pgoff_t end_index = end / PAGE_SIZE;
struct page *page;
-   swp_entry_t swap;
 
-   for (; start < end; start += PAGE_SIZE) {
-   index = ((start - vma->vm_start) >> PAGE_SHIFT) + vma->vm_pgoff;
+   rcu_read_lock();
+   xas_for_each(&xas, page, end_index) {
+   swp_entry_t swap;
 
-   page = find_get_entry(mapping, index);
-   if (!xa_is_value(page)) {
-   if (page)
-   put_page(page);
+   if (!xa_is_value(page))
continue;
-   }
+   rcu_read_unlock();
+
swap = radix_to_swp_entry(page);
page = read_swap_cache_async(swap, GFP_HIGHUSER_MOVABLE,
NULL, 0, false);
if (page)
put_page(page);
+
+   rcu_read_lock();
+   xas_reset(&xas);
}
+   rcu_read_unlock();
 
lru_add_drain();/* Push any new pages onto the LRU now */
 }
-- 
2.28.0

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH v2 6/8] mm: Convert find_get_entry to return the head page

2020-09-10 Thread Matthew Wilcox (Oracle)
There are only four callers remaining of find_get_entry().
get_shadow_from_swap_cache() only wants to see shadow entries and doesn't
care about which page is returned.  Push the find_subpage() call into
find_lock_entry(), find_get_incore_page() and pagecache_get_page().

Signed-off-by: Matthew Wilcox (Oracle) 
---
 mm/filemap.c| 13 +++--
 mm/swap_state.c |  2 +-
 2 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/mm/filemap.c b/mm/filemap.c
index d64f6f76bc0b..2f134383b0ae 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1567,19 +1567,19 @@ EXPORT_SYMBOL(page_cache_prev_miss);
 /**
  * find_get_entry - find and get a page cache entry
  * @mapping: the address_space to search
- * @offset: the page cache index
+ * @index: The page cache index.
  *
  * Looks up the page cache slot at @mapping & @offset.  If there is a
- * page cache page, it is returned with an increased refcount.
+ * page cache page, the head page is returned with an increased refcount.
  *
  * If the slot holds a shadow entry of a previously evicted page, or a
  * swap entry from shmem/tmpfs, it is returned.
  *
- * Return: the found page or shadow entry, %NULL if nothing is found.
+ * Return: The head page or shadow entry, %NULL if nothing is found.
  */
-struct page *find_get_entry(struct address_space *mapping, pgoff_t offset)
+struct page *find_get_entry(struct address_space *mapping, pgoff_t index)
 {
-   XA_STATE(xas, &mapping->i_pages, offset);
+   XA_STATE(xas, &mapping->i_pages, index);
struct page *page;
 
rcu_read_lock();
@@ -1607,7 +1607,6 @@ struct page *find_get_entry(struct address_space 
*mapping, pgoff_t offset)
put_page(page);
goto repeat;
}
-   page = find_subpage(page, offset);
 out:
rcu_read_unlock();
 
@@ -1644,6 +1643,7 @@ struct page *find_lock_entry(struct address_space 
*mapping, pgoff_t offset)
put_page(page);
goto repeat;
}
+   page = find_subpage(page, offset);
VM_BUG_ON_PAGE(page_to_pgoff(page) != offset, page);
}
return page;
@@ -1690,6 +1690,7 @@ struct page *pagecache_get_page(struct address_space 
*mapping, pgoff_t index,
page = NULL;
if (!page)
goto no_page;
+   page = find_subpage(page, index);
 
if (fgp_flags & FGP_LOCK) {
if (fgp_flags & FGP_NOWAIT) {
diff --git a/mm/swap_state.c b/mm/swap_state.c
index c79e2242dd04..c8cf1757ca06 100644
--- a/mm/swap_state.c
+++ b/mm/swap_state.c
@@ -432,7 +432,7 @@ struct page *find_get_incore_page(struct address_space 
*mapping, pgoff_t index)
struct page *page = find_get_entry(mapping, index);
 
if (!xa_is_value(page))
-   return page;
+   return find_subpage(page, index);
if (!shmem_mapping(mapping))
return NULL;
 
-- 
2.28.0

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH v2 4/8] proc: Optimise smaps for shmem entries

2020-09-10 Thread Matthew Wilcox (Oracle)
Avoid bumping the refcount on pages when we're only interested in the
swap entries.

Signed-off-by: Matthew Wilcox (Oracle) 
Acked-by: Johannes Weiner 
---
 fs/proc/task_mmu.c | 8 +---
 1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 5066b0251ed8..e42d9e5e9a3c 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -520,16 +520,10 @@ static void smaps_pte_entry(pte_t *pte, unsigned long 
addr,
page = device_private_entry_to_page(swpent);
} else if (unlikely(IS_ENABLED(CONFIG_SHMEM) && mss->check_shmem_swap
&& pte_none(*pte))) {
-   page = find_get_entry(vma->vm_file->f_mapping,
+   page = xa_load(&vma->vm_file->f_mapping->i_pages,
linear_page_index(vma, addr));
-   if (!page)
-   return;
-
if (xa_is_value(page))
mss->swap += PAGE_SIZE;
-   else
-   put_page(page);
-
return;
}
 
-- 
2.28.0

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH v2 7/8] mm/shmem: Return head page from find_lock_entry

2020-09-10 Thread Matthew Wilcox (Oracle)
Convert shmem_getpage_gfp() (the only remaining caller of
find_lock_entry()) to cope with a head page being returned instead of
the subpage for the index.

Signed-off-by: Matthew Wilcox (Oracle) 
---
 include/linux/pagemap.h |  9 +
 mm/filemap.c| 25 +++--
 mm/shmem.c  | 20 +---
 3 files changed, 29 insertions(+), 25 deletions(-)

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 905a64030647..f374618b2c93 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -371,6 +371,15 @@ static inline struct page *grab_cache_page_nowait(struct 
address_space *mapping,
mapping_gfp_mask(mapping));
 }
 
+/* Does this page contain this index? */
+static inline bool thp_contains(struct page *head, pgoff_t index)
+{
+   /* HugeTLBfs indexes the page cache in units of hpage_size */
+   if (PageHuge(head))
+   return head->index == index;
+   return page_index(head) == (index & ~(thp_nr_pages(head) - 1UL));
+}
+
 /*
  * Given the page we found in the page cache, return the page corresponding
  * to this index in the file
diff --git a/mm/filemap.c b/mm/filemap.c
index 2f134383b0ae..453535170b8d 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1614,37 +1614,34 @@ struct page *find_get_entry(struct address_space 
*mapping, pgoff_t index)
 }
 
 /**
- * find_lock_entry - locate, pin and lock a page cache entry
- * @mapping: the address_space to search
- * @offset: the page cache index
+ * find_lock_entry - Locate and lock a page cache entry.
+ * @mapping: The address_space to search.
+ * @index: The page cache index.
  *
- * Looks up the page cache slot at @mapping & @offset.  If there is a
- * page cache page, it is returned locked and with an increased
- * refcount.
+ * Looks up the page at @mapping & @index.  If there is a page in the
+ * cache, the head page is returned locked and with an increased refcount.
  *
  * If the slot holds a shadow entry of a previously evicted page, or a
  * swap entry from shmem/tmpfs, it is returned.
  *
- * find_lock_entry() may sleep.
- *
- * Return: the found page or shadow entry, %NULL if nothing is found.
+ * Context: May sleep.
+ * Return: The head page or shadow entry, %NULL if nothing is found.
  */
-struct page *find_lock_entry(struct address_space *mapping, pgoff_t offset)
+struct page *find_lock_entry(struct address_space *mapping, pgoff_t index)
 {
struct page *page;
 
 repeat:
-   page = find_get_entry(mapping, offset);
+   page = find_get_entry(mapping, index);
if (page && !xa_is_value(page)) {
lock_page(page);
/* Has the page been truncated? */
-   if (unlikely(page_mapping(page) != mapping)) {
+   if (unlikely(page->mapping != mapping)) {
unlock_page(page);
put_page(page);
goto repeat;
}
-   page = find_subpage(page, offset);
-   VM_BUG_ON_PAGE(page_to_pgoff(page) != offset, page);
+   VM_BUG_ON_PAGE(!thp_contains(page, index), page);
}
return page;
 }
diff --git a/mm/shmem.c b/mm/shmem.c
index 271548ca20f3..d2a46ef7df43 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1793,7 +1793,7 @@ static int shmem_getpage_gfp(struct inode *inode, pgoff_t 
index,
struct mm_struct *charge_mm;
struct page *page;
enum sgp_type sgp_huge = sgp;
-   pgoff_t hindex = index;
+   pgoff_t hindex;
int error;
int once = 0;
int alloced = 0;
@@ -1833,10 +1833,8 @@ static int shmem_getpage_gfp(struct inode *inode, 
pgoff_t index,
put_page(page);
page = NULL;
}
-   if (page || sgp == SGP_READ) {
-   *pagep = page;
-   return 0;
-   }
+   if (page || sgp == SGP_READ)
+   goto out;
 
/*
 * Fast cache lookup did not find it:
@@ -1961,14 +1959,13 @@ static int shmem_getpage_gfp(struct inode *inode, 
pgoff_t index,
 * it now, lest undo on failure cancel our earlier guarantee.
 */
if (sgp != SGP_WRITE && !PageUptodate(page)) {
-   struct page *head = compound_head(page);
int i;
 
-   for (i = 0; i < compound_nr(head); i++) {
-   clear_highpage(head + i);
-   flush_dcache_page(head + i);
+   for (i = 0; i < compound_nr(page); i++) {
+   clear_highpage(page + i);
+   flush_dcache_page(page + i);
}
-   SetPageUptodate(head);
+   SetPageUptodate(page);
}
 
/* Perhaps the file has been truncated since we checked */
@@ -1984,7 +1981,8 @@ static int shmem_getpage_gfp(struct inode *inode, pgoff_t 
index,
error = -EINVAL;
goto

[Intel-gfx] [PATCH v2 5/8] i915: Use find_lock_page instead of find_lock_entry

2020-09-10 Thread Matthew Wilcox (Oracle)
i915 does not want to see value entries.  Switch it to use
find_lock_page() instead, and remove the export of find_lock_entry().
Move find_lock_entry() and find_get_entry() to mm/internal.h to discourage
any future use.

Signed-off-by: Matthew Wilcox (Oracle) 
Acked-by: Johannes Weiner 
---
 drivers/gpu/drm/i915/gem/i915_gem_shmem.c | 4 ++--
 include/linux/pagemap.h   | 2 --
 mm/filemap.c  | 1 -
 mm/internal.h | 3 +++
 4 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c 
b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
index 38113d3c0138..75e8b71c18b9 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
@@ -258,8 +258,8 @@ shmem_writeback(struct drm_i915_gem_object *obj)
for (i = 0; i < obj->base.size >> PAGE_SHIFT; i++) {
struct page *page;
 
-   page = find_lock_entry(mapping, i);
-   if (!page || xa_is_value(page))
+   page = find_lock_page(mapping, i);
+   if (!page)
continue;
 
if (!page_mapped(page) && clear_page_dirty_for_io(page)) {
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 12ab56c3a86f..905a64030647 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -384,8 +384,6 @@ static inline struct page *find_subpage(struct page *head, 
pgoff_t index)
return head + (index & (thp_nr_pages(head) - 1));
 }
 
-struct page *find_get_entry(struct address_space *mapping, pgoff_t offset);
-struct page *find_lock_entry(struct address_space *mapping, pgoff_t offset);
 unsigned find_get_entries(struct address_space *mapping, pgoff_t start,
  unsigned int nr_entries, struct page **entries,
  pgoff_t *indices);
diff --git a/mm/filemap.c b/mm/filemap.c
index 78d07a712112..d64f6f76bc0b 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1648,7 +1648,6 @@ struct page *find_lock_entry(struct address_space 
*mapping, pgoff_t offset)
}
return page;
 }
-EXPORT_SYMBOL(find_lock_entry);
 
 /**
  * pagecache_get_page - Find and get a reference to a page.
diff --git a/mm/internal.h b/mm/internal.h
index ab4beb7c5cd2..6345b08ce86c 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -60,6 +60,9 @@ static inline void force_page_cache_readahead(struct 
address_space *mapping,
force_page_cache_ra(&ractl, &file->f_ra, nr_to_read);
 }
 
+struct page *find_get_entry(struct address_space *mapping, pgoff_t index);
+struct page *find_lock_entry(struct address_space *mapping, pgoff_t index);
+
 /**
  * page_evictable - test whether a page is evictable
  * @page: the page to test
-- 
2.28.0

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH v2 1/8] mm: Factor find_get_incore_page out of mincore_page

2020-09-10 Thread Matthew Wilcox (Oracle)
Provide this functionality from the swap cache.  It's useful for
more than just mincore().

Signed-off-by: Matthew Wilcox (Oracle) 
---
 include/linux/swap.h |  7 +++
 mm/mincore.c | 28 ++--
 mm/swap_state.c  | 32 
 3 files changed, 41 insertions(+), 26 deletions(-)

diff --git a/include/linux/swap.h b/include/linux/swap.h
index 661046994db4..df87de38dca5 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -427,6 +427,7 @@ extern void free_pages_and_swap_cache(struct page **, int);
 extern struct page *lookup_swap_cache(swp_entry_t entry,
  struct vm_area_struct *vma,
  unsigned long addr);
+struct page *find_get_incore_page(struct address_space *mapping, pgoff_t 
index);
 extern struct page *read_swap_cache_async(swp_entry_t, gfp_t,
struct vm_area_struct *vma, unsigned long addr,
bool do_poll);
@@ -569,6 +570,12 @@ static inline struct page *lookup_swap_cache(swp_entry_t 
swp,
return NULL;
 }
 
+static inline
+struct page *find_get_incore_page(struct address_space *mapping, pgoff_t index)
+{
+   return find_get_page(mapping, index);
+}
+
 static inline int add_to_swap(struct page *page)
 {
return 0;
diff --git a/mm/mincore.c b/mm/mincore.c
index 453ff112470f..02db1a834021 100644
--- a/mm/mincore.c
+++ b/mm/mincore.c
@@ -48,7 +48,7 @@ static int mincore_hugetlb(pte_t *pte, unsigned long hmask, 
unsigned long addr,
  * and is up to date; i.e. that no page-in operation would be required
  * at this time if an application were to map and access this page.
  */
-static unsigned char mincore_page(struct address_space *mapping, pgoff_t pgoff)
+static unsigned char mincore_page(struct address_space *mapping, pgoff_t index)
 {
unsigned char present = 0;
struct page *page;
@@ -59,31 +59,7 @@ static unsigned char mincore_page(struct address_space 
*mapping, pgoff_t pgoff)
 * any other file mapping (ie. marked !present and faulted in with
 * tmpfs's .fault). So swapped out tmpfs mappings are tested here.
 */
-#ifdef CONFIG_SWAP
-   if (shmem_mapping(mapping)) {
-   page = find_get_entry(mapping, pgoff);
-   /*
-* shmem/tmpfs may return swap: account for swapcache
-* page too.
-*/
-   if (xa_is_value(page)) {
-   swp_entry_t swp = radix_to_swp_entry(page);
-   struct swap_info_struct *si;
-
-   /* Prevent swap device to being swapoff under us */
-   si = get_swap_device(swp);
-   if (si) {
-   page = find_get_page(swap_address_space(swp),
-swp_offset(swp));
-   put_swap_device(si);
-   } else
-   page = NULL;
-   }
-   } else
-   page = find_get_page(mapping, pgoff);
-#else
-   page = find_get_page(mapping, pgoff);
-#endif
+   page = find_get_incore_page(mapping, index);
if (page) {
present = PageUptodate(page);
put_page(page);
diff --git a/mm/swap_state.c b/mm/swap_state.c
index c16eebb81d8b..c79e2242dd04 100644
--- a/mm/swap_state.c
+++ b/mm/swap_state.c
@@ -21,6 +21,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "internal.h"
 
 /*
@@ -414,6 +415,37 @@ struct page *lookup_swap_cache(swp_entry_t entry, struct 
vm_area_struct *vma,
return page;
 }
 
+/**
+ * find_get_incore_page - Find and get a page from the page or swap caches.
+ * @mapping: The address_space to search.
+ * @index: The page cache index.
+ *
+ * This differs from find_get_page() in that it will also look for the
+ * page in the swap cache.
+ *
+ * Return: The found page or %NULL.
+ */
+struct page *find_get_incore_page(struct address_space *mapping, pgoff_t index)
+{
+   swp_entry_t swp;
+   struct swap_info_struct *si;
+   struct page *page = find_get_entry(mapping, index);
+
+   if (!xa_is_value(page))
+   return page;
+   if (!shmem_mapping(mapping))
+   return NULL;
+
+   swp = radix_to_swp_entry(page);
+   /* Prevent swapoff from happening to us */
+   si = get_swap_device(swp);
+   if (!si)
+   return NULL;
+   page = find_get_page(swap_address_space(swp), swp_offset(swp));
+   put_swap_device(si);
+   return page;
+}
+
 struct page *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask,
struct vm_area_struct *vma, unsigned long addr,
bool *new_page_allocated)
-- 
2.28.0

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.

[Intel-gfx] [PATCH v2 0/8] Return head pages from find_*_entry

2020-09-10 Thread Matthew Wilcox (Oracle)
This patch series started out as part of the THP patch set, but it has
some nice effects along the way and it seems worth splitting it out and
submitting separately.

Currently find_get_entry() and find_lock_entry() return the page
corresponding to the requested index, but the first thing most callers do
is find the head page, which we just threw away.  As part of auditing
all the callers, I found some misuses of the APIs and some plain
inefficiencies that I've fixed.

The diffstat is unflattering, but I added more kernel-doc and a new wrapper.

v2:
 - Rework how shmem_getpage_gfp() handles getting a head page back from
   find_lock_entry()
 - Renamed find_get_swap_page() to find_get_incore_page()
 - Make sure find_get_incore_page() doesn't return a head page
 - Fix the missing include of linux/shmem_fs.h
 - Move find_get_entry and find_lock_entry prototypes to mm/internal.h
 - Rename thp_valid_index() to thp_contains()
 - Fix thp_contains() for hugetlbfs and swapcache
 - Add find_lock_head() wrapper around pagecache_get_page()

Matthew Wilcox (Oracle) (8):
  mm: Factor find_get_incore_page out of mincore_page
  mm: Use find_get_incore_page in memcontrol
  mm: Optimise madvise WILLNEED
  proc: Optimise smaps for shmem entries
  i915: Use find_lock_page instead of find_lock_entry
  mm: Convert find_get_entry to return the head page
  mm/shmem: Return head page from find_lock_entry
  mm: Add find_lock_head

 drivers/gpu/drm/i915/gem/i915_gem_shmem.c |  4 +--
 fs/proc/task_mmu.c|  8 +
 include/linux/pagemap.h   | 43 +-
 include/linux/swap.h  |  7 
 mm/filemap.c  | 44 +++
 mm/internal.h |  3 ++
 mm/madvise.c  | 21 ++-
 mm/memcontrol.c   | 24 ++---
 mm/mincore.c  | 28 ++-
 mm/shmem.c| 20 +--
 mm/swap_state.c   | 32 +
 11 files changed, 127 insertions(+), 107 deletions(-)

-- 
2.28.0

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH v2 8/8] mm: Add find_lock_head

2020-09-10 Thread Matthew Wilcox (Oracle)
Add a new FGP_HEAD flag which avoids calling find_subpage() and add a
convenience wrapper for it.

Signed-off-by: Matthew Wilcox (Oracle) 
---
 include/linux/pagemap.h | 32 ++--
 mm/filemap.c|  9 ++---
 2 files changed, 32 insertions(+), 9 deletions(-)

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index f374618b2c93..4e52a3ff92fb 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -278,6 +278,7 @@ pgoff_t page_cache_prev_miss(struct address_space *mapping,
 #define FGP_NOFS   0x0010
 #define FGP_NOWAIT 0x0020
 #define FGP_FOR_MMAP   0x0040
+#define FGP_HEAD   0x0080
 
 struct page *pagecache_get_page(struct address_space *mapping, pgoff_t offset,
int fgp_flags, gfp_t cache_gfp_mask);
@@ -309,18 +310,37 @@ static inline struct page *find_get_page_flags(struct 
address_space *mapping,
  * @mapping: the address_space to search
  * @offset: the page index
  *
- * Looks up the page cache slot at @mapping & @offset.  If there is a
+ * Looks up the page cache entry at @mapping & @offset.  If there is a
  * page cache page, it is returned locked and with an increased
  * refcount.
  *
- * Otherwise, %NULL is returned.
- *
- * find_lock_page() may sleep.
+ * Context: May sleep.
+ * Return: A struct page or %NULL if there is no page in the cache for this
+ * index.
  */
 static inline struct page *find_lock_page(struct address_space *mapping,
-   pgoff_t offset)
+   pgoff_t index)
+{
+   return pagecache_get_page(mapping, index, FGP_LOCK, 0);
+}
+
+/**
+ * find_lock_head - Locate, pin and lock a pagecache page.
+ * @mapping: The address_space to search.
+ * @offset: The page index.
+ *
+ * Looks up the page cache entry at @mapping & @offset.  If there is a
+ * page cache page, its head page is returned locked and with an increased
+ * refcount.
+ *
+ * Context: May sleep.
+ * Return: A struct page which is !PageTail, or %NULL if there is no page
+ * in the cache for this index.
+ */
+static inline struct page *find_lock_head(struct address_space *mapping,
+   pgoff_t index)
 {
-   return pagecache_get_page(mapping, offset, FGP_LOCK, 0);
+   return pagecache_get_page(mapping, index, FGP_LOCK | FGP_HEAD, 0);
 }
 
 /**
diff --git a/mm/filemap.c b/mm/filemap.c
index 453535170b8d..e429e02317ef 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1659,6 +1659,8 @@ struct page *find_lock_entry(struct address_space 
*mapping, pgoff_t index)
  *
  * * %FGP_ACCESSED - The page will be marked accessed.
  * * %FGP_LOCK - The page is returned locked.
+ * * %FGP_HEAD - If the page is present and a THP, return the head page
+ *   rather than the exact page specified by the index.
  * * %FGP_CREAT - If no page is present then a new page is allocated using
  *   @gfp_mask and added to the page cache and the VM's LRU list.
  *   The page is returned locked and with an increased refcount.
@@ -1687,7 +1689,6 @@ struct page *pagecache_get_page(struct address_space 
*mapping, pgoff_t index,
page = NULL;
if (!page)
goto no_page;
-   page = find_subpage(page, index);
 
if (fgp_flags & FGP_LOCK) {
if (fgp_flags & FGP_NOWAIT) {
@@ -1700,12 +1701,12 @@ struct page *pagecache_get_page(struct address_space 
*mapping, pgoff_t index,
}
 
/* Has the page been truncated? */
-   if (unlikely(compound_head(page)->mapping != mapping)) {
+   if (unlikely(page->mapping != mapping)) {
unlock_page(page);
put_page(page);
goto repeat;
}
-   VM_BUG_ON_PAGE(page->index != index, page);
+   VM_BUG_ON_PAGE(!thp_contains(page, index), page);
}
 
if (fgp_flags & FGP_ACCESSED)
@@ -1715,6 +1716,8 @@ struct page *pagecache_get_page(struct address_space 
*mapping, pgoff_t index,
if (page_is_idle(page))
clear_page_idle(page);
}
+   if (!(fgp_flags & FGP_HEAD))
+   page = find_subpage(page, index);
 
 no_page:
if (!page && (fgp_flags & FGP_CREAT)) {
-- 
2.28.0

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 2/8] mm: Use find_get_swap_page in memcontrol

2020-08-27 Thread Matthew Wilcox
On Wed, Aug 26, 2020 at 10:20:02AM -0400, Johannes Weiner wrote:
> The refactor makes sense to me, but the name is confusing. We're not
> looking for a swap page, we're primarily looking for a file page in
> the page cache mapping that's handed in. Only in the special case
> where it's a shmem mapping and there is a swap entry do we consult the
> auxiliary swap cache.
> 
> How about find_get_page_or_swapcache()? find_get_page_shmemswap()?
> Maybe you have a better idea. It's a fairly specialized operation that
> isn't widely used, so a longer name isn't a bad thing IMO.

Got it.  find_get_incore_page().  I was going to go with inmem, but that
it matches mincore sold me on it.

/**
 * find_get_incore_page - Find and get a page from the page or swap caches.
 * @mapping: The address_space to search.
 * @index: The page cache index.
 *
 * This differs from find_get_page() in that it will also look for the
 * page in the swap cache.
 *
 * Return: The found page or %NULL.
 */

I was focusing too much on what the function did, not why it was doing it.
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 6/8] mm: Convert find_get_entry to return the head page

2020-08-26 Thread Matthew Wilcox
On Wed, Aug 26, 2020 at 11:09:25AM -0400, Johannes Weiner wrote:
> On Wed, Aug 19, 2020 at 07:48:48PM +0100, Matthew Wilcox (Oracle) wrote:
> > There are only three callers remaining of find_get_entry().
> > find_get_swap_page() is happy to get the head page instead of the subpage.
> > Add find_subpage() calls to find_lock_entry() and pagecache_get_page()
> > to avoid auditing all their callers.
> 
> I believe this would cause a subtle bug in memcg charge moving for pte
> mapped huge pages. We currently skip over tail pages in the range
> (they don't have page->mem_cgroup set) and account for the huge page
> once from the headpage. After this change, we would see the headpage
> and account for it 512 times (or whatever the number is on non-x86).

Hmm ... so if you have the last 511 pages of a huge page mapped, you
actually don't charge for it at all today?

I think you're right that I'd introduce this bug, and so that needs to
be fixed.

> But that aside, I don't quite understand the intent.
> 
> Before, all these functions simply return the base page at @index,
> whether it's a regular page or a tail page.
> 
> Afterwards, find_lock_entry(), find_get_page() et al still do, but
> find_get_entry() returns headpage at @index & HPAGE_CACHE_INDEX_MASK.
> 
> Shouldn't we be consistent about how we handle huge pages when
> somebody queries the tree for a given base page index?
> 
> [ Wouldn't that mean that e.g. find_get_swap_page() would return tail
>   pages for regular files and head pages for shmem files? ]

What I'd _like_ to do is convert all the callers to cope with tail
pages never being returned from all the find_* functions.  That seems
like a lot of disruption.

My intent in this series is to get all the find_*_entr{y,ies}
functions to the point where they don't return tail pages.
Also find_get_pages_tag() because tags are only set on head pages.

This is generally what the callers want anyway.  There's even a hack
in find_get_entries() in current to terminate early on finding a THP
(see commit 71725ed10c40696dc6bdccf8e225815dcef24dba).  If I want
to remove that, I need to do _something_ to not put all the subpages
of a THP into the pagevec.

So the new rule will be that find_*_entry() don't return tail pages but
find_*_page() do.  With the full THP patchset in place, THPs become quite
common, so bugs in this area will surface quickly instead of lingering
for years and only popping out in rare circumstances.
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 2/8] mm: Use find_get_swap_page in memcontrol

2020-08-26 Thread Matthew Wilcox
On Wed, Aug 26, 2020 at 10:20:02AM -0400, Johannes Weiner wrote:
> On Wed, Aug 19, 2020 at 07:48:44PM +0100, Matthew Wilcox (Oracle) wrote:
> > +   return find_get_swap_page(vma->vm_file->f_mapping,
> > +   linear_page_index(vma, addr));
> 
> The refactor makes sense to me, but the name is confusing. We're not
> looking for a swap page, we're primarily looking for a file page in
> the page cache mapping that's handed in. Only in the special case
> where it's a shmem mapping and there is a swap entry do we consult the
> auxiliary swap cache.
> 
> How about find_get_page_or_swapcache()? find_get_page_shmemswap()?
> Maybe you have a better idea. It's a fairly specialized operation that
> isn't widely used, so a longer name isn't a bad thing IMO.

Yeah, I had trouble with the naming here too.

get_page_even_from_swap()
find_get_shmem_page()

or maybe refactor the whole thing:

struct page *page = find_get_entry(mapping, index);
page = find_swap_page(mapping, page);

struct page *find_swap_page(struct address_space *mapping, struct page *page)
{
swp_entry_t swp;
struct swap_info_struct *si;

if (!xa_is_value(page))
return page;
if (!shmem_mapping(mapping))
return NULL;

...
}
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] ✗ Fi.CI.BUILD: failure for Return head pages from find_get_entry and find_lock_entry (rev2)

2020-08-19 Thread Matthew Wilcox
On Wed, Aug 19, 2020 at 07:29:24PM -, Patchwork wrote:
> == Series Details ==
> 
> Series: Return head pages from find_get_entry and find_lock_entry (rev2)
> URL   : https://patchwork.freedesktop.org/series/80818/
> State : failure
> 
> == Summary ==
> 
> CALLscripts/checksyscalls.sh
>   CALLscripts/atomic/check-atomics.sh
>   DESCEND  objtool
>   CHK include/generated/compile.h
>   CC  mm/memcontrol.o
> mm/memcontrol.c: In function ‘mc_handle_file_pte’:
> mm/memcontrol.c:5548:9: error: implicit declaration of function 
> ‘find_get_swap_page’; did you mean ‘get_swap_page’? 
> [-Werror=implicit-function-declaration]
>   return find_get_swap_page(vma->vm_file->f_mapping,
>  ^~
>  get_swap_page
> mm/memcontrol.c:5548:9: warning: return makes pointer from integer without a 
> cast [-Wint-conversion]
>   return find_get_swap_page(vma->vm_file->f_mapping,
>  ^~~
> linear_page_index(vma, addr));
> ~
> cc1: some warnings being treated as errors

This doesn't make sense.  Dave Airlie pointed me at what he believes to
be the config file used [1] and I can't reproduce it.  Is it possible
the build-bot applied only 2/8 and not 1/8?

[1] https://gitlab.freedesktop.org/gfx-ci/i915-infra/-/blob/master/kconfig/debug
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 1/8] mm: Factor find_get_swap_page out of mincore_page

2020-08-19 Thread Matthew Wilcox
On Wed, Aug 19, 2020 at 07:48:43PM +0100, Matthew Wilcox (Oracle) wrote:
> Provide this functionality from the swap cache.  It's useful for
> more than just mincore().

The build bot showed I was missing this for some configs:

diff --git a/mm/swap_state.c b/mm/swap_state.c
index 986b4e3d3bad..92a1f40be589 100644
--- a/mm/swap_state.c
+++ b/mm/swap_state.c
@@ -21,6 +21,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "internal.h"
 
 /*

I'll wait for more feedback before reposting the series in ~24 hours.
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] ✗ Fi.CI.BUILD: failure for Return head pages from find_get_entry and find_lock_entry

2020-08-19 Thread Matthew Wilcox
On Wed, Aug 19, 2020 at 07:06:17PM -, Patchwork wrote:
>   CC  mm/swap_state.o
> mm/swap_state.c: In function ‘find_get_swap_page’:
> mm/swap_state.c:435:7: error: implicit declaration of function 
> ‘shmem_mapping’; did you mean ‘page_mapping’? 
> [-Werror=implicit-function-declaration]
>   if (!shmem_mapping(mapping))
>^
>page_mapping
> cc1: some warnings being treated as errors

Never mind the .config; I decided on a way to fix it without seeing that.
It would be useful for future warnings though.
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] ✗ Fi.CI.BUILD: failure for Return head pages from find_get_entry and find_lock_entry

2020-08-19 Thread Matthew Wilcox
On Wed, Aug 19, 2020 at 07:06:17PM -, Patchwork wrote:
> == Series Details ==
> 
> Series: Return head pages from find_get_entry and find_lock_entry
> URL   : https://patchwork.freedesktop.org/series/80818/
> State : failure
> 
> == Summary ==
> 
> CALLscripts/checksyscalls.sh
>   CALLscripts/atomic/check-atomics.sh
>   DESCEND  objtool
>   CHK include/generated/compile.h
>   CC  mm/swap_state.o
> mm/swap_state.c: In function ‘find_get_swap_page’:
> mm/swap_state.c:435:7: error: implicit declaration of function 
> ‘shmem_mapping’; did you mean ‘page_mapping’? 
> [-Werror=implicit-function-declaration]
>   if (!shmem_mapping(mapping))
>^
>page_mapping
> cc1: some warnings being treated as errors
> scripts/Makefile.build:283: recipe for target 'mm/swap_state.o' failed
> make[1]: *** [mm/swap_state.o] Error 1
> Makefile:1789: recipe for target 'mm' failed
> make: *** [mm] Error 2

Thanks!  Do you have the .config for this build?
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 8/8] mm: Hoist find_subpage call further up in pagecache_get_page

2020-08-19 Thread Matthew Wilcox (Oracle)
This avoids a call to compound_head().

Signed-off-by: Matthew Wilcox (Oracle) 
---
 mm/filemap.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/mm/filemap.c b/mm/filemap.c
index 6594baae7cd2..8c354277108d 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1667,7 +1667,6 @@ struct page *pagecache_get_page(struct address_space 
*mapping, pgoff_t index,
page = NULL;
if (!page)
goto no_page;
-   page = find_subpage(page, index);
 
if (fgp_flags & FGP_LOCK) {
if (fgp_flags & FGP_NOWAIT) {
@@ -1680,12 +1679,12 @@ struct page *pagecache_get_page(struct address_space 
*mapping, pgoff_t index,
}
 
/* Has the page been truncated? */
-   if (unlikely(compound_head(page)->mapping != mapping)) {
+   if (unlikely(page->mapping != mapping)) {
unlock_page(page);
put_page(page);
goto repeat;
}
-   VM_BUG_ON_PAGE(page->index != index, page);
+   VM_BUG_ON_PAGE(!thp_valid_index(page, index), page);
}
 
if (fgp_flags & FGP_ACCESSED)
@@ -1695,6 +1694,7 @@ struct page *pagecache_get_page(struct address_space 
*mapping, pgoff_t index,
if (page_is_idle(page))
clear_page_idle(page);
}
+   page = find_subpage(page, index);
 
 no_page:
if (!page && (fgp_flags & FGP_CREAT)) {
-- 
2.28.0

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 2/8] mm: Use find_get_swap_page in memcontrol

2020-08-19 Thread Matthew Wilcox (Oracle)
The current code does not protect against swapoff of the underlying
swap device, so this is a bug fix as well as a worthwhile reduction in
code complexity.

Signed-off-by: Matthew Wilcox (Oracle) 
---
 mm/memcontrol.c | 25 ++---
 1 file changed, 2 insertions(+), 23 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index b807952b4d43..4075f214a841 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -5539,35 +5539,14 @@ static struct page *mc_handle_swap_pte(struct 
vm_area_struct *vma,
 static struct page *mc_handle_file_pte(struct vm_area_struct *vma,
unsigned long addr, pte_t ptent, swp_entry_t *entry)
 {
-   struct page *page = NULL;
-   struct address_space *mapping;
-   pgoff_t pgoff;
-
if (!vma->vm_file) /* anonymous vma */
return NULL;
if (!(mc.flags & MOVE_FILE))
return NULL;
 
-   mapping = vma->vm_file->f_mapping;
-   pgoff = linear_page_index(vma, addr);
-
/* page is moved even if it's not RSS of this task(page-faulted). */
-#ifdef CONFIG_SWAP
-   /* shmem/tmpfs may report page out on swap: account for that too. */
-   if (shmem_mapping(mapping)) {
-   page = find_get_entry(mapping, pgoff);
-   if (xa_is_value(page)) {
-   swp_entry_t swp = radix_to_swp_entry(page);
-   *entry = swp;
-   page = find_get_page(swap_address_space(swp),
-swp_offset(swp));
-   }
-   } else
-   page = find_get_page(mapping, pgoff);
-#else
-   page = find_get_page(mapping, pgoff);
-#endif
-   return page;
+   return find_get_swap_page(vma->vm_file->f_mapping,
+   linear_page_index(vma, addr));
 }
 
 /**
-- 
2.28.0

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 4/8] proc: Optimise smaps for shmem entries

2020-08-19 Thread Matthew Wilcox (Oracle)
Avoid bumping the refcount on pages when we're only interested in the
swap entries.

Signed-off-by: Matthew Wilcox (Oracle) 
---
 fs/proc/task_mmu.c | 8 +---
 1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 5066b0251ed8..e42d9e5e9a3c 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -520,16 +520,10 @@ static void smaps_pte_entry(pte_t *pte, unsigned long 
addr,
page = device_private_entry_to_page(swpent);
} else if (unlikely(IS_ENABLED(CONFIG_SHMEM) && mss->check_shmem_swap
&& pte_none(*pte))) {
-   page = find_get_entry(vma->vm_file->f_mapping,
+   page = xa_load(&vma->vm_file->f_mapping->i_pages,
linear_page_index(vma, addr));
-   if (!page)
-   return;
-
if (xa_is_value(page))
mss->swap += PAGE_SIZE;
-   else
-   put_page(page);
-
return;
}
 
-- 
2.28.0

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 5/8] i915: Use find_lock_page instead of find_lock_entry

2020-08-19 Thread Matthew Wilcox (Oracle)
i915 does not want to see value entries.  Switch it to use
find_lock_page() instead, and remove the export of find_lock_entry().

Signed-off-by: Matthew Wilcox (Oracle) 
---
 drivers/gpu/drm/i915/gem/i915_gem_shmem.c | 4 ++--
 mm/filemap.c  | 1 -
 2 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c 
b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
index 38113d3c0138..75e8b71c18b9 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
@@ -258,8 +258,8 @@ shmem_writeback(struct drm_i915_gem_object *obj)
for (i = 0; i < obj->base.size >> PAGE_SHIFT; i++) {
struct page *page;
 
-   page = find_lock_entry(mapping, i);
-   if (!page || xa_is_value(page))
+   page = find_lock_page(mapping, i);
+   if (!page)
continue;
 
if (!page_mapped(page) && clear_page_dirty_for_io(page)) {
diff --git a/mm/filemap.c b/mm/filemap.c
index 1aaea26556cc..36067cf7f8c5 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1628,7 +1628,6 @@ struct page *find_lock_entry(struct address_space 
*mapping, pgoff_t offset)
}
return page;
 }
-EXPORT_SYMBOL(find_lock_entry);
 
 /**
  * pagecache_get_page - Find and get a reference to a page.
-- 
2.28.0

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 0/8] Return head pages from find_get_entry and find_lock_entry

2020-08-19 Thread Matthew Wilcox (Oracle)
This patch seris started out as part of the THP patch set, but it has
some nice effects along the way and it seems worth splitting it out and
submitting separately.

Currently find_get_entry() and find_lock_entry() return the page
corresponding to the requested index, but the first thing most callers do
is find the head page, which we just threw away.  As part of auditing
all the callers, I found some misuses of the APIs and some plain
inefficiencies that I've fixed.

The diffstat is unflattering, but I added more kernel-doc.

Matthew Wilcox (Oracle) (8):
  mm: Factor find_get_swap_page out of mincore_page
  mm: Use find_get_swap_page in memcontrol
  mm: Optimise madvise WILLNEED
  proc: Optimise smaps for shmem entries
  i915: Use find_lock_page instead of find_lock_entry
  mm: Convert find_get_entry to return the head page
  mm: Return head page from find_lock_entry
  mm: Hoist find_subpage call further up in pagecache_get_page

 drivers/gpu/drm/i915/gem/i915_gem_shmem.c |  4 +--
 fs/proc/task_mmu.c|  8 +
 include/linux/pagemap.h   | 16 +++--
 include/linux/swap.h  |  7 
 mm/filemap.c  | 41 +++
 mm/madvise.c  | 21 +++-
 mm/memcontrol.c   | 25 ++
 mm/mincore.c  | 28 ++--
 mm/shmem.c| 15 +
 mm/swap_state.c   | 31 +
 10 files changed, 98 insertions(+), 98 deletions(-)

-- 
2.28.0

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 6/8] mm: Convert find_get_entry to return the head page

2020-08-19 Thread Matthew Wilcox (Oracle)
There are only three callers remaining of find_get_entry().
find_get_swap_page() is happy to get the head page instead of the subpage.
Add find_subpage() calls to find_lock_entry() and pagecache_get_page()
to avoid auditing all their callers.

Signed-off-by: Matthew Wilcox (Oracle) 
---
 include/linux/pagemap.h |  4 ++--
 mm/filemap.c| 13 +++--
 2 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 7de11dcd534d..8261c64f592d 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -384,8 +384,8 @@ static inline struct page *find_subpage(struct page *head, 
pgoff_t index)
return head + (index & (thp_nr_pages(head) - 1));
 }
 
-struct page *find_get_entry(struct address_space *mapping, pgoff_t offset);
-struct page *find_lock_entry(struct address_space *mapping, pgoff_t offset);
+struct page *find_get_entry(struct address_space *mapping, pgoff_t index);
+struct page *find_lock_entry(struct address_space *mapping, pgoff_t index);
 unsigned find_get_entries(struct address_space *mapping, pgoff_t start,
  unsigned int nr_entries, struct page **entries,
  pgoff_t *indices);
diff --git a/mm/filemap.c b/mm/filemap.c
index 36067cf7f8c5..5a87e1b6 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1547,19 +1547,19 @@ EXPORT_SYMBOL(page_cache_prev_miss);
 /**
  * find_get_entry - find and get a page cache entry
  * @mapping: the address_space to search
- * @offset: the page cache index
+ * @index: The page cache index.
  *
  * Looks up the page cache slot at @mapping & @offset.  If there is a
- * page cache page, it is returned with an increased refcount.
+ * page cache page, the head page is returned with an increased refcount.
  *
  * If the slot holds a shadow entry of a previously evicted page, or a
  * swap entry from shmem/tmpfs, it is returned.
  *
- * Return: the found page or shadow entry, %NULL if nothing is found.
+ * Return: The head page or shadow entry, %NULL if nothing is found.
  */
-struct page *find_get_entry(struct address_space *mapping, pgoff_t offset)
+struct page *find_get_entry(struct address_space *mapping, pgoff_t index)
 {
-   XA_STATE(xas, &mapping->i_pages, offset);
+   XA_STATE(xas, &mapping->i_pages, index);
struct page *page;
 
rcu_read_lock();
@@ -1587,7 +1587,6 @@ struct page *find_get_entry(struct address_space 
*mapping, pgoff_t offset)
put_page(page);
goto repeat;
}
-   page = find_subpage(page, offset);
 out:
rcu_read_unlock();
 
@@ -1624,6 +1623,7 @@ struct page *find_lock_entry(struct address_space 
*mapping, pgoff_t offset)
put_page(page);
goto repeat;
}
+   page = find_subpage(page, offset);
VM_BUG_ON_PAGE(page_to_pgoff(page) != offset, page);
}
return page;
@@ -1670,6 +1670,7 @@ struct page *pagecache_get_page(struct address_space 
*mapping, pgoff_t index,
page = NULL;
if (!page)
goto no_page;
+   page = find_subpage(page, index);
 
if (fgp_flags & FGP_LOCK) {
if (fgp_flags & FGP_NOWAIT) {
-- 
2.28.0

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 7/8] mm: Return head page from find_lock_entry

2020-08-19 Thread Matthew Wilcox (Oracle)
Convert the one caller of find_lock_entry() to cope with a head page
being returned instead of the subpage for the index.

Signed-off-by: Matthew Wilcox (Oracle) 
---
 include/linux/pagemap.h | 12 
 mm/filemap.c| 25 +++--
 mm/shmem.c  | 15 ---
 3 files changed, 31 insertions(+), 21 deletions(-)

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 8261c64f592d..de8f3758ceaa 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -371,6 +371,18 @@ static inline struct page *grab_cache_page_nowait(struct 
address_space *mapping,
mapping_gfp_mask(mapping));
 }
 
+/*
+ * Is this index for one of the subpages of this page?
+ * This should always be called for a locked, non-hugetlbfs page.
+ */
+static inline bool thp_valid_index(struct page *head, pgoff_t index)
+{
+   VM_BUG_ON_PAGE(PageHuge(head), head);
+   VM_BUG_ON_PAGE(!PageLocked(head), head);
+
+   return head->index == (index & ~(thp_nr_pages(head) - 1));
+}
+
 /*
  * Given the page we found in the page cache, return the page corresponding
  * to this index in the file
diff --git a/mm/filemap.c b/mm/filemap.c
index 5a87e1b6..6594baae7cd2 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1594,37 +1594,34 @@ struct page *find_get_entry(struct address_space 
*mapping, pgoff_t index)
 }
 
 /**
- * find_lock_entry - locate, pin and lock a page cache entry
- * @mapping: the address_space to search
- * @offset: the page cache index
+ * find_lock_entry - Locate and lock a page cache entry.
+ * @mapping: The address_space to search.
+ * @index: The page cache index.
  *
- * Looks up the page cache slot at @mapping & @offset.  If there is a
- * page cache page, it is returned locked and with an increased
- * refcount.
+ * Looks up the page at @mapping & @index.  If there is a page in the
+ * cache, the head page is returned locked and with an increased refcount.
  *
  * If the slot holds a shadow entry of a previously evicted page, or a
  * swap entry from shmem/tmpfs, it is returned.
  *
- * find_lock_entry() may sleep.
- *
- * Return: the found page or shadow entry, %NULL if nothing is found.
+ * Context: May sleep.
+ * Return: The head page or shadow entry, %NULL if nothing is found.
  */
-struct page *find_lock_entry(struct address_space *mapping, pgoff_t offset)
+struct page *find_lock_entry(struct address_space *mapping, pgoff_t index)
 {
struct page *page;
 
 repeat:
-   page = find_get_entry(mapping, offset);
+   page = find_get_entry(mapping, index);
if (page && !xa_is_value(page)) {
lock_page(page);
/* Has the page been truncated? */
-   if (unlikely(page_mapping(page) != mapping)) {
+   if (unlikely(page->mapping != mapping)) {
unlock_page(page);
put_page(page);
goto repeat;
}
-   page = find_subpage(page, offset);
-   VM_BUG_ON_PAGE(page_to_pgoff(page) != offset, page);
+   VM_BUG_ON_PAGE(!thp_valid_index(page, index), page);
}
return page;
 }
diff --git a/mm/shmem.c b/mm/shmem.c
index 271548ca20f3..bbb43e9d35df 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1834,8 +1834,9 @@ static int shmem_getpage_gfp(struct inode *inode, pgoff_t 
index,
page = NULL;
}
if (page || sgp == SGP_READ) {
-   *pagep = page;
-   return 0;
+   if (page && PageTransHuge(page))
+   hindex = round_down(index, HPAGE_PMD_NR);
+   goto out;
}
 
/*
@@ -1961,14 +1962,13 @@ static int shmem_getpage_gfp(struct inode *inode, 
pgoff_t index,
 * it now, lest undo on failure cancel our earlier guarantee.
 */
if (sgp != SGP_WRITE && !PageUptodate(page)) {
-   struct page *head = compound_head(page);
int i;
 
-   for (i = 0; i < compound_nr(head); i++) {
-   clear_highpage(head + i);
-   flush_dcache_page(head + i);
+   for (i = 0; i < compound_nr(page); i++) {
+   clear_highpage(page + i);
+   flush_dcache_page(page + i);
}
-   SetPageUptodate(head);
+   SetPageUptodate(page);
}
 
/* Perhaps the file has been truncated since we checked */
@@ -1984,6 +1984,7 @@ static int shmem_getpage_gfp(struct inode *inode, pgoff_t 
index,
error = -EINVAL;
goto unlock;
}
+out:
*pagep = page + index - hindex;
return 0;
 
-- 
2.28.0

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 3/8] mm: Optimise madvise WILLNEED

2020-08-19 Thread Matthew Wilcox (Oracle)
Instead of calling find_get_entry() for every page index, use an XArray
iterator to skip over NULL entries, and avoid calling get_page(),
because we only want the swap entries.

Signed-off-by: Matthew Wilcox (Oracle) 
---
 mm/madvise.c | 21 -
 1 file changed, 12 insertions(+), 9 deletions(-)

diff --git a/mm/madvise.c b/mm/madvise.c
index dd1d43cf026d..96189acd6969 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -224,25 +224,28 @@ static void force_shm_swapin_readahead(struct 
vm_area_struct *vma,
unsigned long start, unsigned long end,
struct address_space *mapping)
 {
-   pgoff_t index;
+   XA_STATE(xas, &mapping->i_pages, linear_page_index(vma, start));
+   pgoff_t end_index = end / PAGE_SIZE;
struct page *page;
-   swp_entry_t swap;
 
-   for (; start < end; start += PAGE_SIZE) {
-   index = ((start - vma->vm_start) >> PAGE_SHIFT) + vma->vm_pgoff;
+   rcu_read_lock();
+   xas_for_each(&xas, page, end_index) {
+   swp_entry_t swap;
 
-   page = find_get_entry(mapping, index);
-   if (!xa_is_value(page)) {
-   if (page)
-   put_page(page);
+   if (!xa_is_value(page))
continue;
-   }
+   rcu_read_unlock();
+
swap = radix_to_swp_entry(page);
page = read_swap_cache_async(swap, GFP_HIGHUSER_MOVABLE,
NULL, 0, false);
if (page)
put_page(page);
+
+   rcu_read_lock();
+   xas_reset(&xas);
}
+   rcu_read_unlock();
 
lru_add_drain();/* Push any new pages onto the LRU now */
 }
-- 
2.28.0

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 1/8] mm: Factor find_get_swap_page out of mincore_page

2020-08-19 Thread Matthew Wilcox (Oracle)
Provide this functionality from the swap cache.  It's useful for
more than just mincore().

Signed-off-by: Matthew Wilcox (Oracle) 
---
 include/linux/swap.h |  7 +++
 mm/mincore.c | 28 ++--
 mm/swap_state.c  | 31 +++
 3 files changed, 40 insertions(+), 26 deletions(-)

diff --git a/include/linux/swap.h b/include/linux/swap.h
index 661046994db4..247527ea3e66 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -427,6 +427,7 @@ extern void free_pages_and_swap_cache(struct page **, int);
 extern struct page *lookup_swap_cache(swp_entry_t entry,
  struct vm_area_struct *vma,
  unsigned long addr);
+struct page *find_get_swap_page(struct address_space *mapping, pgoff_t index);
 extern struct page *read_swap_cache_async(swp_entry_t, gfp_t,
struct vm_area_struct *vma, unsigned long addr,
bool do_poll);
@@ -569,6 +570,12 @@ static inline struct page *lookup_swap_cache(swp_entry_t 
swp,
return NULL;
 }
 
+static inline
+struct page *find_get_swap_page(struct address_space *mapping, pgoff_t index)
+{
+   return find_get_page(mapping, index);
+}
+
 static inline int add_to_swap(struct page *page)
 {
return 0;
diff --git a/mm/mincore.c b/mm/mincore.c
index 453ff112470f..2806258f99c6 100644
--- a/mm/mincore.c
+++ b/mm/mincore.c
@@ -48,7 +48,7 @@ static int mincore_hugetlb(pte_t *pte, unsigned long hmask, 
unsigned long addr,
  * and is up to date; i.e. that no page-in operation would be required
  * at this time if an application were to map and access this page.
  */
-static unsigned char mincore_page(struct address_space *mapping, pgoff_t pgoff)
+static unsigned char mincore_page(struct address_space *mapping, pgoff_t index)
 {
unsigned char present = 0;
struct page *page;
@@ -59,31 +59,7 @@ static unsigned char mincore_page(struct address_space 
*mapping, pgoff_t pgoff)
 * any other file mapping (ie. marked !present and faulted in with
 * tmpfs's .fault). So swapped out tmpfs mappings are tested here.
 */
-#ifdef CONFIG_SWAP
-   if (shmem_mapping(mapping)) {
-   page = find_get_entry(mapping, pgoff);
-   /*
-* shmem/tmpfs may return swap: account for swapcache
-* page too.
-*/
-   if (xa_is_value(page)) {
-   swp_entry_t swp = radix_to_swp_entry(page);
-   struct swap_info_struct *si;
-
-   /* Prevent swap device to being swapoff under us */
-   si = get_swap_device(swp);
-   if (si) {
-   page = find_get_page(swap_address_space(swp),
-swp_offset(swp));
-   put_swap_device(si);
-   } else
-   page = NULL;
-   }
-   } else
-   page = find_get_page(mapping, pgoff);
-#else
-   page = find_get_page(mapping, pgoff);
-#endif
+   page = find_get_swap_page(mapping, index);
if (page) {
present = PageUptodate(page);
put_page(page);
diff --git a/mm/swap_state.c b/mm/swap_state.c
index c16eebb81d8b..986b4e3d3bad 100644
--- a/mm/swap_state.c
+++ b/mm/swap_state.c
@@ -414,6 +414,37 @@ struct page *lookup_swap_cache(swp_entry_t entry, struct 
vm_area_struct *vma,
return page;
 }
 
+/**
+ * find_get_swap_page - Find and get a page from the page or swap caches.
+ * @mapping: The address_space to search.
+ * @index: The page cache index.
+ *
+ * This is like find_get_page(), but if we find a swap entry for
+ * shmem/tmpfs, we also look in the swap cache for the page.
+ *
+ * Return: The found page or %NULL.
+ */
+struct page *find_get_swap_page(struct address_space *mapping, pgoff_t index)
+{
+   swp_entry_t swp;
+   struct swap_info_struct *si;
+   struct page *page = find_get_entry(mapping, index);
+
+   if (!xa_is_value(page))
+   return page;
+   if (!shmem_mapping(mapping))
+   return NULL;
+
+   swp = radix_to_swp_entry(page);
+   /* Prevent swapoff from happening to us */
+   si = get_swap_device(swp);
+   if (!si)
+   return NULL;
+   page = find_get_page(swap_address_space(swp), swp_offset(swp));
+   put_swap_device(si);
+   return page;
+}
+
 struct page *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask,
struct vm_area_struct *vma, unsigned long addr,
bool *new_page_allocated)
-- 
2.28.0

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] mm: Skip opportunistic reclaim for dma pinned pages

2020-06-25 Thread Matthew Wilcox
On Thu, Jun 25, 2020 at 03:40:44PM +0200, Jan Kara wrote:
> On Thu 25-06-20 12:42:09, Matthew Wilcox wrote:
> > Why are DMA pinned pages still on the LRU list at all?  I never got an
> > answer to this that made sense to me.  By definition, a page which is
> > pinned for DMA is being accessed, and needs to at the very least change
> > position on the LRU list, so just take it off the list when DMA-pinned
> > and put it back on the list when DMA-unpinned.
> 
> Well, we do mark_page_accessed() when pinning in GUP. This is not perfect
> but it's as good as it gets with CPU having no control when the page is
> actually accessed. Also taking the page off and then back to LRU list would
> increase the contention on the LRU list locks and generally cost
> performance so for short term pins it is not desirable... Otherwise I agree
> that conceptually it would make some sence although I'm not sure some
> places wouldn't get confused by e.g. page cache pages not being on LRU
> list. 

We could/should do what we do for mlocked pages:
Documentation/vm/unevictable-lru.rst

I think 'case five' is wrong and needs to be removed.  Pinning is
inappropriate for "I'm going to modify the page myself".
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] mm: Skip opportunistic reclaim for dma pinned pages

2020-06-25 Thread Matthew Wilcox
On Wed, Jun 24, 2020 at 08:14:17PM +0100, Chris Wilson wrote:
> A side effect of the LRU shrinker not being dma aware is that we will
> often attempt to perform direct reclaim on the persistent group of dma
> pages while continuing to use the dma HW (an issue as the HW may already
> be actively waiting for the next user request), and even attempt to
> reclaim a partially allocated dma object in order to satisfy pinning
> the next user page for that object.
> 
> It is to be expected that such pages are made available for reclaim at
> the end of the dma operation [unpin_user_pages()], and for truly
> longterm pins to be proactively recovered via device specific shrinkers
> [i.e. stop the HW, allow the pages to be returned to the system, and
> then compete again for the memory].

Why are DMA pinned pages still on the LRU list at all?  I never got an
answer to this that made sense to me.  By definition, a page which is
pinned for DMA is being accessed, and needs to at the very least change
position on the LRU list, so just take it off the list when DMA-pinned
and put it back on the list when DMA-unpinned.

This overly complex lease stuff must have some reason for existing, but
I still don't get it.
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] improve use_mm / unuse_mm v2

2020-04-16 Thread Matthew Wilcox
On Thu, Apr 16, 2020 at 07:31:55AM +0200, Christoph Hellwig wrote:
> this series improves the use_mm / unuse_mm interface by better
> documenting the assumptions, and my taking the set_fs manipulations
> spread over the callers into the core API.

I appreciate all the work you're doing here.

Do you have plans to introduce a better-named API than set_fs() / get_fs()?

Also, having set_fs() return the previous value of 'fs' would simplify
a lot of the callers.
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] kernel-doc: rename the kernel-doc directive 'functions' to 'specific'

2019-10-15 Thread Matthew Wilcox
On Wed, Oct 16, 2019 at 08:03:24AM +0800, Changbin Du wrote:
> On Tue, Oct 15, 2019 at 04:54:39AM -0700, Matthew Wilcox wrote:
> > On Tue, Oct 15, 2019 at 11:25:53AM +0200, Thomas Zimmermann wrote:
> > > > My preference would be to use 'symbols'.  I tried to come up with 
> > > > something
> > > > but 'symbols' is better than anything I came up with.
> > > 
> > > Maybe 'interfaces' or 'artifacts'. The term 'symbols' is just as
> > > imprecise as 'functions'.
> > 
> > I suggested 'identifier' because that's the term used in the C spec (6.2.1):
> > 
> > : An identifier can denote an object; a function; a tag or a member
> > : of a structure, union, or enumeration; a typedef name; a label name;
> > : a macro name; or a macro parameter.
>
> I also prefer this one now. I was looking for something like this. My original
> idea is 'prototype', but that is only for function.

We could also go with 'declaration' or 'definition'.  But I prefer
'identifier'.
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH] kernel-doc: rename the kernel-doc directive 'functions' to 'specific'

2019-10-15 Thread Matthew Wilcox
On Tue, Oct 15, 2019 at 11:25:53AM +0200, Thomas Zimmermann wrote:
> > My preference would be to use 'symbols'.  I tried to come up with something
> > but 'symbols' is better than anything I came up with.
> 
> Maybe 'interfaces' or 'artifacts'. The term 'symbols' is just as
> imprecise as 'functions'.

I suggested 'identifier' because that's the term used in the C spec (6.2.1):

: An identifier can denote an object; a function; a tag or a member
: of a structure, union, or enumeration; a typedef name; a label name;
: a macro name; or a macro parameter.

We don't allow documenting all those things separately, but it does cover
all the things we do allow to be individually documented.
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH] kernel-doc: rename the kernel-doc directive 'functions' to 'specific'

2019-10-14 Thread Matthew Wilcox
On Mon, Oct 14, 2019 at 08:48:48PM +, tim.b...@sony.com wrote:
> 
> 
> > -Original Message-
> > From: Jani Nikula on October 13, 2019 11:00 PM
> > On Sun, 13 Oct 2019, Changbin Du  wrote:
> > > The 'functions' directive is not only for functions, but also works for
> > > structs/unions. So the name is misleading. This patch renames it to
> > > 'specific', so now we have export/internal/specific directives to limit
> > > the functions/types to be included in documentation. Meanwhile we
> > improved
> > > the warning message.
> > 
> > Agreed on "functions" being less than perfect. It directly exposes the
> > idiosyncrasies of scripts/kernel-doc. I'm not sure "specific" is any
> > better, though.
> 
> I strongly agree with this.  'specific' IMHO, has no semantic value and
> I'd rather just leave the only-sometimes-wrong 'functions' than convert
> to something that obscures the meaning always.
> 
> > 
> > Perhaps "symbols" would be more self-explanatory. Or, actually make
> > "functions" only work on functions, and add a separate keyword for other
> > stuff. *shrug*
> My preference would be to use 'symbols'.  I tried to come up with something
> but 'symbols' is better than anything I came up with.

structures aren't symbols though ... How about 'identifier'?
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH 00/34] put_user_pages(): miscellaneous call sites

2019-08-02 Thread Matthew Wilcox
On Fri, Aug 02, 2019 at 02:41:46PM +0200, Jan Kara wrote:
> On Fri 02-08-19 11:12:44, Michal Hocko wrote:
> > On Thu 01-08-19 19:19:31, john.hubb...@gmail.com wrote:
> > [...]
> > > 2) Convert all of the call sites for get_user_pages*(), to
> > > invoke put_user_page*(), instead of put_page(). This involves dozens of
> > > call sites, and will take some time.
> > 
> > How do we make sure this is the case and it will remain the case in the
> > future? There must be some automagic to enforce/check that. It is simply
> > not manageable to do it every now and then because then 3) will simply
> > be never safe.
> > 
> > Have you considered coccinele or some other scripted way to do the
> > transition? I have no idea how to deal with future changes that would
> > break the balance though.
> 
> Yeah, that's why I've been suggesting at LSF/MM that we may need to create
> a gup wrapper - say vaddr_pin_pages() - and track which sites dropping
> references got converted by using this wrapper instead of gup. The
> counterpart would then be more logically named as unpin_page() or whatever
> instead of put_user_page().  Sure this is not completely foolproof (you can
> create new callsite using vaddr_pin_pages() and then just drop refs using
> put_page()) but I suppose it would be a high enough barrier for missed
> conversions... Thoughts?

I think the API we really need is get_user_bvec() / put_user_bvec(),
and I know Christoph has been putting some work into that.  That avoids
doing refcount operations on hundreds of pages if the page in question is
a huge page.  Once people are switched over to that, they won't be tempted
to manually call put_page() on the individual constituent pages of a bvec.
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

  1   2   3   >