Re: [PATCH] vmalloc: respect the GFP_NOIO and GFP_NOFS flags

2017-07-04 Thread Michal Hocko
On Mon 03-07-17 18:57:14, Mikulas Patocka wrote:
> 
> 
> On Mon, 3 Jul 2017, Michal Hocko wrote:
> 
> > We can add a warning (or move it from kvmalloc) and hope that the
> > respective maintainers will fix those places properly. The reason I
> > didn't add the warning to vmalloc and kept it in kvmalloc was to catch
> > only new users rather than suddenly splat on existing ones. Note that
> > there are users with panic_on_warn enabled.
> > 
> > Considering how many NOFS users we have in tree I would rather work with
> > maintainers to fix them.
> 
> So - do you want this patch?

no, see below
 
> I still believe that the previous patch that pushes 
> memalloc_noio/nofs_save into __vmalloc is better than this.

It is, but both of them are actually wrong. Why? Because that would be
just a mindless application of the scope where the scope doesn't match
the actual reclaim recursion restricted scope. Really, the right way to
go is to simply talk to the respective maintainers. Find out whether
NOFS context is really needed and if so find the scope (e.g. a lock
which would be needed in the reclaim context) and document it. This is
not a trivial work to do but a) we do not seem to have any bug reports
complaining about these call sites so there is no need to hurry and b)
this will result in a cleaner and easier to maintain code.
-- 
Michal Hocko
SUSE Labs


Re: [PATCH] vmalloc: respect the GFP_NOIO and GFP_NOFS flags

2017-07-04 Thread Michal Hocko
On Mon 03-07-17 18:57:14, Mikulas Patocka wrote:
> 
> 
> On Mon, 3 Jul 2017, Michal Hocko wrote:
> 
> > We can add a warning (or move it from kvmalloc) and hope that the
> > respective maintainers will fix those places properly. The reason I
> > didn't add the warning to vmalloc and kept it in kvmalloc was to catch
> > only new users rather than suddenly splat on existing ones. Note that
> > there are users with panic_on_warn enabled.
> > 
> > Considering how many NOFS users we have in tree I would rather work with
> > maintainers to fix them.
> 
> So - do you want this patch?

no, see below
 
> I still believe that the previous patch that pushes 
> memalloc_noio/nofs_save into __vmalloc is better than this.

It is, but both of them are actually wrong. Why? Because that would be
just a mindless application of the scope where the scope doesn't match
the actual reclaim recursion restricted scope. Really, the right way to
go is to simply talk to the respective maintainers. Find out whether
NOFS context is really needed and if so find the scope (e.g. a lock
which would be needed in the reclaim context) and document it. This is
not a trivial work to do but a) we do not seem to have any bug reports
complaining about these call sites so there is no need to hurry and b)
this will result in a cleaner and easier to maintain code.
-- 
Michal Hocko
SUSE Labs


Re: [PATCH] vmalloc: respect the GFP_NOIO and GFP_NOFS flags

2017-07-03 Thread Mikulas Patocka


On Mon, 3 Jul 2017, Michal Hocko wrote:

> We can add a warning (or move it from kvmalloc) and hope that the
> respective maintainers will fix those places properly. The reason I
> didn't add the warning to vmalloc and kept it in kvmalloc was to catch
> only new users rather than suddenly splat on existing ones. Note that
> there are users with panic_on_warn enabled.
> 
> Considering how many NOFS users we have in tree I would rather work with
> maintainers to fix them.

So - do you want this patch?

I still believe that the previous patch that pushes 
memalloc_noio/nofs_save into __vmalloc is better than this.

Currently there are 28 __vmalloc callers that use GFP_NOIO or GFP_NOFS, 
three of them already use memalloc_noio_save, 25 don't.

Mikulas

---
 drivers/block/drbd/drbd_bitmap.c|8 +---
 drivers/infiniband/hw/mlx4/qp.c |   21 +
 drivers/infiniband/sw/rdmavt/qp.c   |   19 +--
 drivers/infiniband/ulp/ipoib/ipoib_cm.c |7 +--
 drivers/md/dm-bufio.c   |2 +-
 drivers/mtd/ubi/io.c|   11 +--
 fs/btrfs/free-space-tree.c  |7 ++-
 fs/ext4/super.c |   21 +
 fs/gfs2/dir.c   |   29 +
 fs/gfs2/quota.c |8 ++--
 fs/nfs/blocklayout/extent_tree.c|7 ++-
 fs/ntfs/malloc.h|   11 +--
 fs/ubifs/debug.c|5 -
 fs/ubifs/lprops.c   |5 -
 fs/ubifs/lpt_commit.c   |   10 --
 fs/ubifs/orphan.c   |5 -
 fs/ubifs/ubifs.h|1 +
 fs/xfs/kmem.c   |2 +-
 mm/page_alloc.c |2 +-
 mm/vmalloc.c|6 ++
 net/ceph/ceph_common.c  |   14 --
 21 files changed, 156 insertions(+), 45 deletions(-)

Index: linux-2.6/drivers/block/drbd/drbd_bitmap.c
===
--- linux-2.6.orig/drivers/block/drbd/drbd_bitmap.c
+++ linux-2.6/drivers/block/drbd/drbd_bitmap.c
@@ -26,6 +26,7 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -408,9 +409,10 @@ static struct page **bm_realloc_pages(st
bytes = sizeof(struct page *)*want;
new_pages = kzalloc(bytes, GFP_NOIO | __GFP_NOWARN);
if (!new_pages) {
-   new_pages = __vmalloc(bytes,
-   GFP_NOIO | __GFP_ZERO,
-   PAGE_KERNEL);
+   unsigned noio;
+   noio = memalloc_noio_save();
+   new_pages = vmalloc(bytes);
+   memalloc_noio_restore(noio);
if (!new_pages)
return NULL;
}
Index: linux-2.6/drivers/infiniband/hw/mlx4/qp.c
===
--- linux-2.6.orig/drivers/infiniband/hw/mlx4/qp.c
+++ linux-2.6/drivers/infiniband/hw/mlx4/qp.c
@@ -37,6 +37,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -814,14 +815,26 @@ static int create_qp_common(struct mlx4_
 
qp->sq.wrid = kmalloc_array(qp->sq.wqe_cnt, sizeof(u64),
gfp | __GFP_NOWARN);
-   if (!qp->sq.wrid)
+   if (!qp->sq.wrid) {
+   unsigned noio;
+   if (!(gfp & __GFP_IO))
+   noio = memalloc_noio_save();
qp->sq.wrid = __vmalloc(qp->sq.wqe_cnt * sizeof(u64),
-   gfp, PAGE_KERNEL);
+   gfp | __GFP_FS | __GFP_IO, 
PAGE_KERNEL);
+   if (!(gfp & __GFP_IO))
+   memalloc_noio_restore(noio);
+   }
qp->rq.wrid = kmalloc_array(qp->rq.wqe_cnt, sizeof(u64),
gfp | __GFP_NOWARN);
-   if (!qp->rq.wrid)
+   if (!qp->rq.wrid) {
+   unsigned noio;
+   if (!(gfp & __GFP_IO))
+   noio = memalloc_noio_save();
qp->rq.wrid = __vmalloc(qp->rq.wqe_cnt * sizeof(u64),
-   gfp, PAGE_KERNEL);
+   gfp | __GFP_FS | __GFP_IO, 
PAGE_KERNEL);
+   if (!(gfp & __GFP_IO))
+   memalloc_noio_restore(noio);
+   }
if (!qp->sq.wrid || !qp->rq.wrid) {
err = -ENOMEM;
goto err_wrid;
Index: linux-2.6/drivers/infiniband/sw/rdmavt/qp.c
===
--- 

Re: [PATCH] vmalloc: respect the GFP_NOIO and GFP_NOFS flags

2017-07-03 Thread Mikulas Patocka


On Mon, 3 Jul 2017, Michal Hocko wrote:

> We can add a warning (or move it from kvmalloc) and hope that the
> respective maintainers will fix those places properly. The reason I
> didn't add the warning to vmalloc and kept it in kvmalloc was to catch
> only new users rather than suddenly splat on existing ones. Note that
> there are users with panic_on_warn enabled.
> 
> Considering how many NOFS users we have in tree I would rather work with
> maintainers to fix them.

So - do you want this patch?

I still believe that the previous patch that pushes 
memalloc_noio/nofs_save into __vmalloc is better than this.

Currently there are 28 __vmalloc callers that use GFP_NOIO or GFP_NOFS, 
three of them already use memalloc_noio_save, 25 don't.

Mikulas

---
 drivers/block/drbd/drbd_bitmap.c|8 +---
 drivers/infiniband/hw/mlx4/qp.c |   21 +
 drivers/infiniband/sw/rdmavt/qp.c   |   19 +--
 drivers/infiniband/ulp/ipoib/ipoib_cm.c |7 +--
 drivers/md/dm-bufio.c   |2 +-
 drivers/mtd/ubi/io.c|   11 +--
 fs/btrfs/free-space-tree.c  |7 ++-
 fs/ext4/super.c |   21 +
 fs/gfs2/dir.c   |   29 +
 fs/gfs2/quota.c |8 ++--
 fs/nfs/blocklayout/extent_tree.c|7 ++-
 fs/ntfs/malloc.h|   11 +--
 fs/ubifs/debug.c|5 -
 fs/ubifs/lprops.c   |5 -
 fs/ubifs/lpt_commit.c   |   10 --
 fs/ubifs/orphan.c   |5 -
 fs/ubifs/ubifs.h|1 +
 fs/xfs/kmem.c   |2 +-
 mm/page_alloc.c |2 +-
 mm/vmalloc.c|6 ++
 net/ceph/ceph_common.c  |   14 --
 21 files changed, 156 insertions(+), 45 deletions(-)

Index: linux-2.6/drivers/block/drbd/drbd_bitmap.c
===
--- linux-2.6.orig/drivers/block/drbd/drbd_bitmap.c
+++ linux-2.6/drivers/block/drbd/drbd_bitmap.c
@@ -26,6 +26,7 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -408,9 +409,10 @@ static struct page **bm_realloc_pages(st
bytes = sizeof(struct page *)*want;
new_pages = kzalloc(bytes, GFP_NOIO | __GFP_NOWARN);
if (!new_pages) {
-   new_pages = __vmalloc(bytes,
-   GFP_NOIO | __GFP_ZERO,
-   PAGE_KERNEL);
+   unsigned noio;
+   noio = memalloc_noio_save();
+   new_pages = vmalloc(bytes);
+   memalloc_noio_restore(noio);
if (!new_pages)
return NULL;
}
Index: linux-2.6/drivers/infiniband/hw/mlx4/qp.c
===
--- linux-2.6.orig/drivers/infiniband/hw/mlx4/qp.c
+++ linux-2.6/drivers/infiniband/hw/mlx4/qp.c
@@ -37,6 +37,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -814,14 +815,26 @@ static int create_qp_common(struct mlx4_
 
qp->sq.wrid = kmalloc_array(qp->sq.wqe_cnt, sizeof(u64),
gfp | __GFP_NOWARN);
-   if (!qp->sq.wrid)
+   if (!qp->sq.wrid) {
+   unsigned noio;
+   if (!(gfp & __GFP_IO))
+   noio = memalloc_noio_save();
qp->sq.wrid = __vmalloc(qp->sq.wqe_cnt * sizeof(u64),
-   gfp, PAGE_KERNEL);
+   gfp | __GFP_FS | __GFP_IO, 
PAGE_KERNEL);
+   if (!(gfp & __GFP_IO))
+   memalloc_noio_restore(noio);
+   }
qp->rq.wrid = kmalloc_array(qp->rq.wqe_cnt, sizeof(u64),
gfp | __GFP_NOWARN);
-   if (!qp->rq.wrid)
+   if (!qp->rq.wrid) {
+   unsigned noio;
+   if (!(gfp & __GFP_IO))
+   noio = memalloc_noio_save();
qp->rq.wrid = __vmalloc(qp->rq.wqe_cnt * sizeof(u64),
-   gfp, PAGE_KERNEL);
+   gfp | __GFP_FS | __GFP_IO, 
PAGE_KERNEL);
+   if (!(gfp & __GFP_IO))
+   memalloc_noio_restore(noio);
+   }
if (!qp->sq.wrid || !qp->rq.wrid) {
err = -ENOMEM;
goto err_wrid;
Index: linux-2.6/drivers/infiniband/sw/rdmavt/qp.c
===
--- 

Re: [PATCH] vmalloc: respect the GFP_NOIO and GFP_NOFS flags

2017-07-03 Thread Michal Hocko
On Fri 30-06-17 20:36:12, Mikulas Patocka wrote:
> 
> 
> On Fri, 30 Jun 2017, Michal Hocko wrote:
> 
> > On Fri 30-06-17 14:11:57, Mikulas Patocka wrote:
> > > 
> > > 
> > > On Fri, 30 Jun 2017, Michal Hocko wrote:
> > > 
> > > > On Thu 29-06-17 22:25:09, Mikulas Patocka wrote:
> > > > > The __vmalloc function has a parameter gfp_mask with the allocation 
> > > > > flags,
> > > > > however it doesn't fully respect the GFP_NOIO and GFP_NOFS flags. The
> > > > > pages are allocated with the specified gfp flags, but the pagetables 
> > > > > are
> > > > > always allocated with GFP_KERNEL. This allocation can cause unexpected
> > > > > recursion into the filesystem or I/O subsystem.
> > > > > 
> > > > > It is not practical to extend page table allocation routines with gfp
> > > > > flags because it would require modification of architecture-specific 
> > > > > code
> > > > > in all architecturs. However, the process can temporarily request 
> > > > > that all
> > > > > allocations are done with GFP_NOFS or GFP_NOIO with with the functions
> > > > > memalloc_nofs_save and memalloc_noio_save.
> > > > > 
> > > > > This patch makes the vmalloc code use memalloc_nofs_save or
> > > > > memalloc_noio_save if the supplied gfp flags do not contain __GFP_FS 
> > > > > or
> > > > > __GFP_IO. It fixes some possible deadlocks in drivers/mtd/ubi/io.c,
> > > > > fs/gfs2/, fs/btrfs/free-space-tree.c, fs/ubifs/,
> > > > > fs/nfs/blocklayout/extent_tree.c where __vmalloc is used with the 
> > > > > GFP_NOFS
> > > > > flag.
> > > > 
> > > > I strongly believe this is a step in the _wrong_ direction. Why? Because
> > > 
> > > What do you think __vmalloc with GFP_NOIO should do? Print a warning? 
> > > Silently ignore the GFP_NOIO flag?
> > 
> > I think noio users are not that much different from nofs users. Simply
> > use the scope API at the place where the scope starts and document why
> > it is needed. vmalloc calls do not have to be any special then and they
> > do not even have to think about proper gfp flags and they can use
> > whatever is the default.
> > -- 
> > Michal Hocko
> > SUSE Labs
> 
> But you didn't answer the question - what should __vmalloc with GFP_NOIO 
> (or GFP_NOFS) do? Silently drop the flag? Print a warning? Or respect the 
> flag?

We can add a warning (or move it from kvmalloc) and hope that the
respective maintainers will fix those places properly. The reason I
didn't add the warning to vmalloc and kept it in kvmalloc was to catch
only new users rather than suddenly splat on existing ones. Note that
there are users with panic_on_warn enabled.

Considering how many NOFS users we have in tree I would rather work with
maintainers to fix them.
 
> Currently, it silently drops the GFP_NOIO or GFP_NOFS flag, but some 
> programmers don't know it and use these flags. You can't blame those 
> programmers for not knowing it.

At least __vmalloc_node is documented to not support all gfp flags.

-- 
Michal Hocko
SUSE Labs


Re: [PATCH] vmalloc: respect the GFP_NOIO and GFP_NOFS flags

2017-07-03 Thread Michal Hocko
On Fri 30-06-17 20:36:12, Mikulas Patocka wrote:
> 
> 
> On Fri, 30 Jun 2017, Michal Hocko wrote:
> 
> > On Fri 30-06-17 14:11:57, Mikulas Patocka wrote:
> > > 
> > > 
> > > On Fri, 30 Jun 2017, Michal Hocko wrote:
> > > 
> > > > On Thu 29-06-17 22:25:09, Mikulas Patocka wrote:
> > > > > The __vmalloc function has a parameter gfp_mask with the allocation 
> > > > > flags,
> > > > > however it doesn't fully respect the GFP_NOIO and GFP_NOFS flags. The
> > > > > pages are allocated with the specified gfp flags, but the pagetables 
> > > > > are
> > > > > always allocated with GFP_KERNEL. This allocation can cause unexpected
> > > > > recursion into the filesystem or I/O subsystem.
> > > > > 
> > > > > It is not practical to extend page table allocation routines with gfp
> > > > > flags because it would require modification of architecture-specific 
> > > > > code
> > > > > in all architecturs. However, the process can temporarily request 
> > > > > that all
> > > > > allocations are done with GFP_NOFS or GFP_NOIO with with the functions
> > > > > memalloc_nofs_save and memalloc_noio_save.
> > > > > 
> > > > > This patch makes the vmalloc code use memalloc_nofs_save or
> > > > > memalloc_noio_save if the supplied gfp flags do not contain __GFP_FS 
> > > > > or
> > > > > __GFP_IO. It fixes some possible deadlocks in drivers/mtd/ubi/io.c,
> > > > > fs/gfs2/, fs/btrfs/free-space-tree.c, fs/ubifs/,
> > > > > fs/nfs/blocklayout/extent_tree.c where __vmalloc is used with the 
> > > > > GFP_NOFS
> > > > > flag.
> > > > 
> > > > I strongly believe this is a step in the _wrong_ direction. Why? Because
> > > 
> > > What do you think __vmalloc with GFP_NOIO should do? Print a warning? 
> > > Silently ignore the GFP_NOIO flag?
> > 
> > I think noio users are not that much different from nofs users. Simply
> > use the scope API at the place where the scope starts and document why
> > it is needed. vmalloc calls do not have to be any special then and they
> > do not even have to think about proper gfp flags and they can use
> > whatever is the default.
> > -- 
> > Michal Hocko
> > SUSE Labs
> 
> But you didn't answer the question - what should __vmalloc with GFP_NOIO 
> (or GFP_NOFS) do? Silently drop the flag? Print a warning? Or respect the 
> flag?

We can add a warning (or move it from kvmalloc) and hope that the
respective maintainers will fix those places properly. The reason I
didn't add the warning to vmalloc and kept it in kvmalloc was to catch
only new users rather than suddenly splat on existing ones. Note that
there are users with panic_on_warn enabled.

Considering how many NOFS users we have in tree I would rather work with
maintainers to fix them.
 
> Currently, it silently drops the GFP_NOIO or GFP_NOFS flag, but some 
> programmers don't know it and use these flags. You can't blame those 
> programmers for not knowing it.

At least __vmalloc_node is documented to not support all gfp flags.

-- 
Michal Hocko
SUSE Labs


Re: [PATCH] vmalloc: respect the GFP_NOIO and GFP_NOFS flags

2017-06-30 Thread Mikulas Patocka


On Fri, 30 Jun 2017, Andreas Dilger wrote:

> On Jun 29, 2017, at 8:25 PM, Mikulas Patocka  wrote:
> > 
> > The __vmalloc function has a parameter gfp_mask with the allocation flags,
> > however it doesn't fully respect the GFP_NOIO and GFP_NOFS flags. The
> > pages are allocated with the specified gfp flags, but the pagetables are
> > always allocated with GFP_KERNEL. This allocation can cause unexpected
> > recursion into the filesystem or I/O subsystem.
> > 
> > It is not practical to extend page table allocation routines with gfp
> > flags because it would require modification of architecture-specific code
> > in all architecturs. However, the process can temporarily request that all
> > allocations are done with GFP_NOFS or GFP_NOIO with with the functions
> > memalloc_nofs_save and memalloc_noio_save.
> > 
> > This patch makes the vmalloc code use memalloc_nofs_save or
> > memalloc_noio_save if the supplied gfp flags do not contain __GFP_FS or
> > __GFP_IO. It fixes some possible deadlocks in drivers/mtd/ubi/io.c,
> > fs/gfs2/, fs/btrfs/free-space-tree.c, fs/ubifs/,
> > fs/nfs/blocklayout/extent_tree.c where __vmalloc is used with the GFP_NOFS
> > flag.
> > 
> > The patch also simplifies code in dm-bufio.c, dm-ioctl.c and fs/xfs/kmem.c
> > by removing explicit calls to memalloc_nofs_save and memalloc_noio_save
> > before the call to __vmalloc.
> > 
> > Signed-off-by: Mikulas Patocka 
> > 
> > ---
> > drivers/md/dm-bufio.c |   24 +---
> > drivers/md/dm-ioctl.c |6 +-
> > fs/xfs/kmem.c |   14 --
> > mm/util.c |6 +++---
> > mm/vmalloc.c  |   18 +-
> > 5 files changed, 22 insertions(+), 46 deletions(-)
> > 
> > Index: linux-2.6/mm/vmalloc.c
> > ===
> > --- linux-2.6.orig/mm/vmalloc.c
> > +++ linux-2.6/mm/vmalloc.c
> > @@ -31,6 +31,7 @@
> > #include 
> > #include 
> > #include 
> > +#include 
> > 
> > #include 
> > #include 
> > @@ -1670,6 +1671,8 @@ static void *__vmalloc_area_node(struct
> > unsigned int nr_pages, array_size, i;
> > const gfp_t nested_gfp = (gfp_mask & GFP_RECLAIM_MASK) | __GFP_ZERO;
> > const gfp_t alloc_mask = gfp_mask | __GFP_HIGHMEM | __GFP_NOWARN;
> > +   unsigned noio_flag;
> > +   int r;
> > 
> > nr_pages = get_vm_area_size(area) >> PAGE_SHIFT;
> > array_size = (nr_pages * sizeof(struct page *));
> > @@ -1712,8 +1715,21 @@ static void *__vmalloc_area_node(struct
> > cond_resched();
> > }
> > 
> > -   if (map_vm_area(area, prot, pages))
> > +   if (unlikely(!(gfp_mask & __GFP_IO)))
> > +   noio_flag = memalloc_noio_save();
> > +   else if (unlikely(!(gfp_mask & __GFP_FS)))
> > +   noio_flag = memalloc_nofs_save();
> > +
> > +   r = map_vm_area(area, prot, pages);
> > +
> > +   if (unlikely(!(gfp_mask & __GFP_IO)))
> > +   memalloc_noio_restore(noio_flag);
> > +   else if (unlikely(!(gfp_mask & __GFP_FS)))
> > +   memalloc_nofs_restore(noio_flag);
> 
> Is this really an "else if"?  I think it should just a separate "if".
> 
> Cheers, Andreas

It is meant to be "else if". memalloc_noio_save() implies 
memalloc_nofs_save(). If we call memalloc_noio_save(), there's no need to 
call memalloc_nofs_save().

Mikulas


Re: [PATCH] vmalloc: respect the GFP_NOIO and GFP_NOFS flags

2017-06-30 Thread Mikulas Patocka


On Fri, 30 Jun 2017, Andreas Dilger wrote:

> On Jun 29, 2017, at 8:25 PM, Mikulas Patocka  wrote:
> > 
> > The __vmalloc function has a parameter gfp_mask with the allocation flags,
> > however it doesn't fully respect the GFP_NOIO and GFP_NOFS flags. The
> > pages are allocated with the specified gfp flags, but the pagetables are
> > always allocated with GFP_KERNEL. This allocation can cause unexpected
> > recursion into the filesystem or I/O subsystem.
> > 
> > It is not practical to extend page table allocation routines with gfp
> > flags because it would require modification of architecture-specific code
> > in all architecturs. However, the process can temporarily request that all
> > allocations are done with GFP_NOFS or GFP_NOIO with with the functions
> > memalloc_nofs_save and memalloc_noio_save.
> > 
> > This patch makes the vmalloc code use memalloc_nofs_save or
> > memalloc_noio_save if the supplied gfp flags do not contain __GFP_FS or
> > __GFP_IO. It fixes some possible deadlocks in drivers/mtd/ubi/io.c,
> > fs/gfs2/, fs/btrfs/free-space-tree.c, fs/ubifs/,
> > fs/nfs/blocklayout/extent_tree.c where __vmalloc is used with the GFP_NOFS
> > flag.
> > 
> > The patch also simplifies code in dm-bufio.c, dm-ioctl.c and fs/xfs/kmem.c
> > by removing explicit calls to memalloc_nofs_save and memalloc_noio_save
> > before the call to __vmalloc.
> > 
> > Signed-off-by: Mikulas Patocka 
> > 
> > ---
> > drivers/md/dm-bufio.c |   24 +---
> > drivers/md/dm-ioctl.c |6 +-
> > fs/xfs/kmem.c |   14 --
> > mm/util.c |6 +++---
> > mm/vmalloc.c  |   18 +-
> > 5 files changed, 22 insertions(+), 46 deletions(-)
> > 
> > Index: linux-2.6/mm/vmalloc.c
> > ===
> > --- linux-2.6.orig/mm/vmalloc.c
> > +++ linux-2.6/mm/vmalloc.c
> > @@ -31,6 +31,7 @@
> > #include 
> > #include 
> > #include 
> > +#include 
> > 
> > #include 
> > #include 
> > @@ -1670,6 +1671,8 @@ static void *__vmalloc_area_node(struct
> > unsigned int nr_pages, array_size, i;
> > const gfp_t nested_gfp = (gfp_mask & GFP_RECLAIM_MASK) | __GFP_ZERO;
> > const gfp_t alloc_mask = gfp_mask | __GFP_HIGHMEM | __GFP_NOWARN;
> > +   unsigned noio_flag;
> > +   int r;
> > 
> > nr_pages = get_vm_area_size(area) >> PAGE_SHIFT;
> > array_size = (nr_pages * sizeof(struct page *));
> > @@ -1712,8 +1715,21 @@ static void *__vmalloc_area_node(struct
> > cond_resched();
> > }
> > 
> > -   if (map_vm_area(area, prot, pages))
> > +   if (unlikely(!(gfp_mask & __GFP_IO)))
> > +   noio_flag = memalloc_noio_save();
> > +   else if (unlikely(!(gfp_mask & __GFP_FS)))
> > +   noio_flag = memalloc_nofs_save();
> > +
> > +   r = map_vm_area(area, prot, pages);
> > +
> > +   if (unlikely(!(gfp_mask & __GFP_IO)))
> > +   memalloc_noio_restore(noio_flag);
> > +   else if (unlikely(!(gfp_mask & __GFP_FS)))
> > +   memalloc_nofs_restore(noio_flag);
> 
> Is this really an "else if"?  I think it should just a separate "if".
> 
> Cheers, Andreas

It is meant to be "else if". memalloc_noio_save() implies 
memalloc_nofs_save(). If we call memalloc_noio_save(), there's no need to 
call memalloc_nofs_save().

Mikulas


Re: [PATCH] vmalloc: respect the GFP_NOIO and GFP_NOFS flags

2017-06-30 Thread Andreas Dilger
On Jun 29, 2017, at 8:25 PM, Mikulas Patocka  wrote:
> 
> The __vmalloc function has a parameter gfp_mask with the allocation flags,
> however it doesn't fully respect the GFP_NOIO and GFP_NOFS flags. The
> pages are allocated with the specified gfp flags, but the pagetables are
> always allocated with GFP_KERNEL. This allocation can cause unexpected
> recursion into the filesystem or I/O subsystem.
> 
> It is not practical to extend page table allocation routines with gfp
> flags because it would require modification of architecture-specific code
> in all architecturs. However, the process can temporarily request that all
> allocations are done with GFP_NOFS or GFP_NOIO with with the functions
> memalloc_nofs_save and memalloc_noio_save.
> 
> This patch makes the vmalloc code use memalloc_nofs_save or
> memalloc_noio_save if the supplied gfp flags do not contain __GFP_FS or
> __GFP_IO. It fixes some possible deadlocks in drivers/mtd/ubi/io.c,
> fs/gfs2/, fs/btrfs/free-space-tree.c, fs/ubifs/,
> fs/nfs/blocklayout/extent_tree.c where __vmalloc is used with the GFP_NOFS
> flag.
> 
> The patch also simplifies code in dm-bufio.c, dm-ioctl.c and fs/xfs/kmem.c
> by removing explicit calls to memalloc_nofs_save and memalloc_noio_save
> before the call to __vmalloc.
> 
> Signed-off-by: Mikulas Patocka 
> 
> ---
> drivers/md/dm-bufio.c |   24 +---
> drivers/md/dm-ioctl.c |6 +-
> fs/xfs/kmem.c |   14 --
> mm/util.c |6 +++---
> mm/vmalloc.c  |   18 +-
> 5 files changed, 22 insertions(+), 46 deletions(-)
> 
> Index: linux-2.6/mm/vmalloc.c
> ===
> --- linux-2.6.orig/mm/vmalloc.c
> +++ linux-2.6/mm/vmalloc.c
> @@ -31,6 +31,7 @@
> #include 
> #include 
> #include 
> +#include 
> 
> #include 
> #include 
> @@ -1670,6 +1671,8 @@ static void *__vmalloc_area_node(struct
>   unsigned int nr_pages, array_size, i;
>   const gfp_t nested_gfp = (gfp_mask & GFP_RECLAIM_MASK) | __GFP_ZERO;
>   const gfp_t alloc_mask = gfp_mask | __GFP_HIGHMEM | __GFP_NOWARN;
> + unsigned noio_flag;
> + int r;
> 
>   nr_pages = get_vm_area_size(area) >> PAGE_SHIFT;
>   array_size = (nr_pages * sizeof(struct page *));
> @@ -1712,8 +1715,21 @@ static void *__vmalloc_area_node(struct
>   cond_resched();
>   }
> 
> - if (map_vm_area(area, prot, pages))
> + if (unlikely(!(gfp_mask & __GFP_IO)))
> + noio_flag = memalloc_noio_save();
> + else if (unlikely(!(gfp_mask & __GFP_FS)))
> + noio_flag = memalloc_nofs_save();
> +
> + r = map_vm_area(area, prot, pages);
> +
> + if (unlikely(!(gfp_mask & __GFP_IO)))
> + memalloc_noio_restore(noio_flag);
> + else if (unlikely(!(gfp_mask & __GFP_FS)))
> + memalloc_nofs_restore(noio_flag);

Is this really an "else if"?  I think it should just a separate "if".

Cheers, Andreas

> +
> + if (unlikely(r))
>   goto fail;
> +
>   return area->addr;
> 
> fail:
> Index: linux-2.6/mm/util.c
> ===
> --- linux-2.6.orig/mm/util.c
> +++ linux-2.6/mm/util.c
> @@ -351,10 +351,10 @@ void *kvmalloc_node(size_t size, gfp_t f
>   void *ret;
> 
>   /*
> -  * vmalloc uses GFP_KERNEL for some internal allocations (e.g page 
> tables)
> -  * so the given set of flags has to be compatible.
> +  * vmalloc uses blocking allocations for some internal allocations
> +  * (e.g page tables) so the given set of flags has to be compatible.
>*/
> - WARN_ON_ONCE((flags & GFP_KERNEL) != GFP_KERNEL);
> + WARN_ON_ONCE(!gfpflags_allow_blocking(flags));
> 
>   /*
>* We want to attempt a large physically contiguous block first because
> Index: linux-2.6/drivers/md/dm-bufio.c
> ===
> --- linux-2.6.orig/drivers/md/dm-bufio.c
> +++ linux-2.6/drivers/md/dm-bufio.c
> @@ -386,9 +386,6 @@ static void __cache_size_refresh(void)
> static void *alloc_buffer_data(struct dm_bufio_client *c, gfp_t gfp_mask,
>  enum data_mode *data_mode)
> {
> - unsigned noio_flag;
> - void *ptr;
> -
>   if (c->block_size <= DM_BUFIO_BLOCK_SIZE_SLAB_LIMIT) {
>   *data_mode = DATA_MODE_SLAB;
>   return kmem_cache_alloc(DM_BUFIO_CACHE(c), gfp_mask);
> @@ -402,26 +399,7 @@ static void *alloc_buffer_data(struct dm
>   }
> 
>   *data_mode = DATA_MODE_VMALLOC;
> -
> - /*
> -  * __vmalloc allocates the data pages and auxiliary structures with
> -  * gfp_flags that were specified, but pagetables are always allocated
> -  * with GFP_KERNEL, no matter what was specified as gfp_mask.
> -  *
> -  * Consequently, we must set per-process flag PF_MEMALLOC_NOIO so that
> -  * all allocations done 

Re: [PATCH] vmalloc: respect the GFP_NOIO and GFP_NOFS flags

2017-06-30 Thread Andreas Dilger
On Jun 29, 2017, at 8:25 PM, Mikulas Patocka  wrote:
> 
> The __vmalloc function has a parameter gfp_mask with the allocation flags,
> however it doesn't fully respect the GFP_NOIO and GFP_NOFS flags. The
> pages are allocated with the specified gfp flags, but the pagetables are
> always allocated with GFP_KERNEL. This allocation can cause unexpected
> recursion into the filesystem or I/O subsystem.
> 
> It is not practical to extend page table allocation routines with gfp
> flags because it would require modification of architecture-specific code
> in all architecturs. However, the process can temporarily request that all
> allocations are done with GFP_NOFS or GFP_NOIO with with the functions
> memalloc_nofs_save and memalloc_noio_save.
> 
> This patch makes the vmalloc code use memalloc_nofs_save or
> memalloc_noio_save if the supplied gfp flags do not contain __GFP_FS or
> __GFP_IO. It fixes some possible deadlocks in drivers/mtd/ubi/io.c,
> fs/gfs2/, fs/btrfs/free-space-tree.c, fs/ubifs/,
> fs/nfs/blocklayout/extent_tree.c where __vmalloc is used with the GFP_NOFS
> flag.
> 
> The patch also simplifies code in dm-bufio.c, dm-ioctl.c and fs/xfs/kmem.c
> by removing explicit calls to memalloc_nofs_save and memalloc_noio_save
> before the call to __vmalloc.
> 
> Signed-off-by: Mikulas Patocka 
> 
> ---
> drivers/md/dm-bufio.c |   24 +---
> drivers/md/dm-ioctl.c |6 +-
> fs/xfs/kmem.c |   14 --
> mm/util.c |6 +++---
> mm/vmalloc.c  |   18 +-
> 5 files changed, 22 insertions(+), 46 deletions(-)
> 
> Index: linux-2.6/mm/vmalloc.c
> ===
> --- linux-2.6.orig/mm/vmalloc.c
> +++ linux-2.6/mm/vmalloc.c
> @@ -31,6 +31,7 @@
> #include 
> #include 
> #include 
> +#include 
> 
> #include 
> #include 
> @@ -1670,6 +1671,8 @@ static void *__vmalloc_area_node(struct
>   unsigned int nr_pages, array_size, i;
>   const gfp_t nested_gfp = (gfp_mask & GFP_RECLAIM_MASK) | __GFP_ZERO;
>   const gfp_t alloc_mask = gfp_mask | __GFP_HIGHMEM | __GFP_NOWARN;
> + unsigned noio_flag;
> + int r;
> 
>   nr_pages = get_vm_area_size(area) >> PAGE_SHIFT;
>   array_size = (nr_pages * sizeof(struct page *));
> @@ -1712,8 +1715,21 @@ static void *__vmalloc_area_node(struct
>   cond_resched();
>   }
> 
> - if (map_vm_area(area, prot, pages))
> + if (unlikely(!(gfp_mask & __GFP_IO)))
> + noio_flag = memalloc_noio_save();
> + else if (unlikely(!(gfp_mask & __GFP_FS)))
> + noio_flag = memalloc_nofs_save();
> +
> + r = map_vm_area(area, prot, pages);
> +
> + if (unlikely(!(gfp_mask & __GFP_IO)))
> + memalloc_noio_restore(noio_flag);
> + else if (unlikely(!(gfp_mask & __GFP_FS)))
> + memalloc_nofs_restore(noio_flag);

Is this really an "else if"?  I think it should just a separate "if".

Cheers, Andreas

> +
> + if (unlikely(r))
>   goto fail;
> +
>   return area->addr;
> 
> fail:
> Index: linux-2.6/mm/util.c
> ===
> --- linux-2.6.orig/mm/util.c
> +++ linux-2.6/mm/util.c
> @@ -351,10 +351,10 @@ void *kvmalloc_node(size_t size, gfp_t f
>   void *ret;
> 
>   /*
> -  * vmalloc uses GFP_KERNEL for some internal allocations (e.g page 
> tables)
> -  * so the given set of flags has to be compatible.
> +  * vmalloc uses blocking allocations for some internal allocations
> +  * (e.g page tables) so the given set of flags has to be compatible.
>*/
> - WARN_ON_ONCE((flags & GFP_KERNEL) != GFP_KERNEL);
> + WARN_ON_ONCE(!gfpflags_allow_blocking(flags));
> 
>   /*
>* We want to attempt a large physically contiguous block first because
> Index: linux-2.6/drivers/md/dm-bufio.c
> ===
> --- linux-2.6.orig/drivers/md/dm-bufio.c
> +++ linux-2.6/drivers/md/dm-bufio.c
> @@ -386,9 +386,6 @@ static void __cache_size_refresh(void)
> static void *alloc_buffer_data(struct dm_bufio_client *c, gfp_t gfp_mask,
>  enum data_mode *data_mode)
> {
> - unsigned noio_flag;
> - void *ptr;
> -
>   if (c->block_size <= DM_BUFIO_BLOCK_SIZE_SLAB_LIMIT) {
>   *data_mode = DATA_MODE_SLAB;
>   return kmem_cache_alloc(DM_BUFIO_CACHE(c), gfp_mask);
> @@ -402,26 +399,7 @@ static void *alloc_buffer_data(struct dm
>   }
> 
>   *data_mode = DATA_MODE_VMALLOC;
> -
> - /*
> -  * __vmalloc allocates the data pages and auxiliary structures with
> -  * gfp_flags that were specified, but pagetables are always allocated
> -  * with GFP_KERNEL, no matter what was specified as gfp_mask.
> -  *
> -  * Consequently, we must set per-process flag PF_MEMALLOC_NOIO so that
> -  * all allocations done by this process (including pagetables) are 

Re: [PATCH] vmalloc: respect the GFP_NOIO and GFP_NOFS flags

2017-06-30 Thread Mikulas Patocka


On Fri, 30 Jun 2017, Michal Hocko wrote:

> On Fri 30-06-17 14:11:57, Mikulas Patocka wrote:
> > 
> > 
> > On Fri, 30 Jun 2017, Michal Hocko wrote:
> > 
> > > On Thu 29-06-17 22:25:09, Mikulas Patocka wrote:
> > > > The __vmalloc function has a parameter gfp_mask with the allocation 
> > > > flags,
> > > > however it doesn't fully respect the GFP_NOIO and GFP_NOFS flags. The
> > > > pages are allocated with the specified gfp flags, but the pagetables are
> > > > always allocated with GFP_KERNEL. This allocation can cause unexpected
> > > > recursion into the filesystem or I/O subsystem.
> > > > 
> > > > It is not practical to extend page table allocation routines with gfp
> > > > flags because it would require modification of architecture-specific 
> > > > code
> > > > in all architecturs. However, the process can temporarily request that 
> > > > all
> > > > allocations are done with GFP_NOFS or GFP_NOIO with with the functions
> > > > memalloc_nofs_save and memalloc_noio_save.
> > > > 
> > > > This patch makes the vmalloc code use memalloc_nofs_save or
> > > > memalloc_noio_save if the supplied gfp flags do not contain __GFP_FS or
> > > > __GFP_IO. It fixes some possible deadlocks in drivers/mtd/ubi/io.c,
> > > > fs/gfs2/, fs/btrfs/free-space-tree.c, fs/ubifs/,
> > > > fs/nfs/blocklayout/extent_tree.c where __vmalloc is used with the 
> > > > GFP_NOFS
> > > > flag.
> > > 
> > > I strongly believe this is a step in the _wrong_ direction. Why? Because
> > 
> > What do you think __vmalloc with GFP_NOIO should do? Print a warning? 
> > Silently ignore the GFP_NOIO flag?
> 
> I think noio users are not that much different from nofs users. Simply
> use the scope API at the place where the scope starts and document why
> it is needed. vmalloc calls do not have to be any special then and they
> do not even have to think about proper gfp flags and they can use
> whatever is the default.
> -- 
> Michal Hocko
> SUSE Labs

But you didn't answer the question - what should __vmalloc with GFP_NOIO 
(or GFP_NOFS) do? Silently drop the flag? Print a warning? Or respect the 
flag?

Currently, it silently drops the GFP_NOIO or GFP_NOFS flag, but some 
programmers don't know it and use these flags. You can't blame those 
programmers for not knowing it.

Mikulas


Re: [PATCH] vmalloc: respect the GFP_NOIO and GFP_NOFS flags

2017-06-30 Thread Mikulas Patocka


On Fri, 30 Jun 2017, Michal Hocko wrote:

> On Fri 30-06-17 14:11:57, Mikulas Patocka wrote:
> > 
> > 
> > On Fri, 30 Jun 2017, Michal Hocko wrote:
> > 
> > > On Thu 29-06-17 22:25:09, Mikulas Patocka wrote:
> > > > The __vmalloc function has a parameter gfp_mask with the allocation 
> > > > flags,
> > > > however it doesn't fully respect the GFP_NOIO and GFP_NOFS flags. The
> > > > pages are allocated with the specified gfp flags, but the pagetables are
> > > > always allocated with GFP_KERNEL. This allocation can cause unexpected
> > > > recursion into the filesystem or I/O subsystem.
> > > > 
> > > > It is not practical to extend page table allocation routines with gfp
> > > > flags because it would require modification of architecture-specific 
> > > > code
> > > > in all architecturs. However, the process can temporarily request that 
> > > > all
> > > > allocations are done with GFP_NOFS or GFP_NOIO with with the functions
> > > > memalloc_nofs_save and memalloc_noio_save.
> > > > 
> > > > This patch makes the vmalloc code use memalloc_nofs_save or
> > > > memalloc_noio_save if the supplied gfp flags do not contain __GFP_FS or
> > > > __GFP_IO. It fixes some possible deadlocks in drivers/mtd/ubi/io.c,
> > > > fs/gfs2/, fs/btrfs/free-space-tree.c, fs/ubifs/,
> > > > fs/nfs/blocklayout/extent_tree.c where __vmalloc is used with the 
> > > > GFP_NOFS
> > > > flag.
> > > 
> > > I strongly believe this is a step in the _wrong_ direction. Why? Because
> > 
> > What do you think __vmalloc with GFP_NOIO should do? Print a warning? 
> > Silently ignore the GFP_NOIO flag?
> 
> I think noio users are not that much different from nofs users. Simply
> use the scope API at the place where the scope starts and document why
> it is needed. vmalloc calls do not have to be any special then and they
> do not even have to think about proper gfp flags and they can use
> whatever is the default.
> -- 
> Michal Hocko
> SUSE Labs

But you didn't answer the question - what should __vmalloc with GFP_NOIO 
(or GFP_NOFS) do? Silently drop the flag? Print a warning? Or respect the 
flag?

Currently, it silently drops the GFP_NOIO or GFP_NOFS flag, but some 
programmers don't know it and use these flags. You can't blame those 
programmers for not knowing it.

Mikulas


Re: [PATCH] vmalloc: respect the GFP_NOIO and GFP_NOFS flags

2017-06-30 Thread Michal Hocko
On Fri 30-06-17 14:11:57, Mikulas Patocka wrote:
> 
> 
> On Fri, 30 Jun 2017, Michal Hocko wrote:
> 
> > On Thu 29-06-17 22:25:09, Mikulas Patocka wrote:
> > > The __vmalloc function has a parameter gfp_mask with the allocation flags,
> > > however it doesn't fully respect the GFP_NOIO and GFP_NOFS flags. The
> > > pages are allocated with the specified gfp flags, but the pagetables are
> > > always allocated with GFP_KERNEL. This allocation can cause unexpected
> > > recursion into the filesystem or I/O subsystem.
> > > 
> > > It is not practical to extend page table allocation routines with gfp
> > > flags because it would require modification of architecture-specific code
> > > in all architecturs. However, the process can temporarily request that all
> > > allocations are done with GFP_NOFS or GFP_NOIO with with the functions
> > > memalloc_nofs_save and memalloc_noio_save.
> > > 
> > > This patch makes the vmalloc code use memalloc_nofs_save or
> > > memalloc_noio_save if the supplied gfp flags do not contain __GFP_FS or
> > > __GFP_IO. It fixes some possible deadlocks in drivers/mtd/ubi/io.c,
> > > fs/gfs2/, fs/btrfs/free-space-tree.c, fs/ubifs/,
> > > fs/nfs/blocklayout/extent_tree.c where __vmalloc is used with the GFP_NOFS
> > > flag.
> > 
> > I strongly believe this is a step in the _wrong_ direction. Why? Because
> 
> What do you think __vmalloc with GFP_NOIO should do? Print a warning? 
> Silently ignore the GFP_NOIO flag?

I think noio users are not that much different from nofs users. Simply
use the scope API at the place where the scope starts and document why
it is needed. vmalloc calls do not have to be any special then and they
do not even have to think about proper gfp flags and they can use
whatever is the default.
-- 
Michal Hocko
SUSE Labs


Re: [PATCH] vmalloc: respect the GFP_NOIO and GFP_NOFS flags

2017-06-30 Thread Michal Hocko
On Fri 30-06-17 14:11:57, Mikulas Patocka wrote:
> 
> 
> On Fri, 30 Jun 2017, Michal Hocko wrote:
> 
> > On Thu 29-06-17 22:25:09, Mikulas Patocka wrote:
> > > The __vmalloc function has a parameter gfp_mask with the allocation flags,
> > > however it doesn't fully respect the GFP_NOIO and GFP_NOFS flags. The
> > > pages are allocated with the specified gfp flags, but the pagetables are
> > > always allocated with GFP_KERNEL. This allocation can cause unexpected
> > > recursion into the filesystem or I/O subsystem.
> > > 
> > > It is not practical to extend page table allocation routines with gfp
> > > flags because it would require modification of architecture-specific code
> > > in all architecturs. However, the process can temporarily request that all
> > > allocations are done with GFP_NOFS or GFP_NOIO with with the functions
> > > memalloc_nofs_save and memalloc_noio_save.
> > > 
> > > This patch makes the vmalloc code use memalloc_nofs_save or
> > > memalloc_noio_save if the supplied gfp flags do not contain __GFP_FS or
> > > __GFP_IO. It fixes some possible deadlocks in drivers/mtd/ubi/io.c,
> > > fs/gfs2/, fs/btrfs/free-space-tree.c, fs/ubifs/,
> > > fs/nfs/blocklayout/extent_tree.c where __vmalloc is used with the GFP_NOFS
> > > flag.
> > 
> > I strongly believe this is a step in the _wrong_ direction. Why? Because
> 
> What do you think __vmalloc with GFP_NOIO should do? Print a warning? 
> Silently ignore the GFP_NOIO flag?

I think noio users are not that much different from nofs users. Simply
use the scope API at the place where the scope starts and document why
it is needed. vmalloc calls do not have to be any special then and they
do not even have to think about proper gfp flags and they can use
whatever is the default.
-- 
Michal Hocko
SUSE Labs


Re: [PATCH] vmalloc: respect the GFP_NOIO and GFP_NOFS flags

2017-06-30 Thread Mikulas Patocka


On Fri, 30 Jun 2017, Michal Hocko wrote:

> On Thu 29-06-17 22:25:09, Mikulas Patocka wrote:
> > The __vmalloc function has a parameter gfp_mask with the allocation flags,
> > however it doesn't fully respect the GFP_NOIO and GFP_NOFS flags. The
> > pages are allocated with the specified gfp flags, but the pagetables are
> > always allocated with GFP_KERNEL. This allocation can cause unexpected
> > recursion into the filesystem or I/O subsystem.
> > 
> > It is not practical to extend page table allocation routines with gfp
> > flags because it would require modification of architecture-specific code
> > in all architecturs. However, the process can temporarily request that all
> > allocations are done with GFP_NOFS or GFP_NOIO with with the functions
> > memalloc_nofs_save and memalloc_noio_save.
> > 
> > This patch makes the vmalloc code use memalloc_nofs_save or
> > memalloc_noio_save if the supplied gfp flags do not contain __GFP_FS or
> > __GFP_IO. It fixes some possible deadlocks in drivers/mtd/ubi/io.c,
> > fs/gfs2/, fs/btrfs/free-space-tree.c, fs/ubifs/,
> > fs/nfs/blocklayout/extent_tree.c where __vmalloc is used with the GFP_NOFS
> > flag.
> 
> I strongly believe this is a step in the _wrong_ direction. Why? Because

What do you think __vmalloc with GFP_NOIO should do? Print a warning? 
Silently ignore the GFP_NOIO flag?

Mikulas

> the memalloc_no{io,fs}_save API is for the scope allocation context. We
> want users of the scope to define it and document why it is needed.
> GFP_NOFS (I haven't checked GFP_NOIO users) is overused a _lot_ mostly
> based on the filesystem should rather use it to prevent deadlock cargo
> cult. This should change longterm because heavy fs workloads can cause
> troubles to the memory reclaim. So we really want to encourage those
> users to define nofs scopes (e.g. on journal locked contexts etc.)
> rather than have them use the GFP_NOFS explicitly and very often
> mindlessly.
> 
> I am not going to nack this patch because it not incorrect but I would
> really like to discourage you from it because while it saves 24 lines of
> code it (ab)uses the scope allocation context at a wrong layer.
> 
> > The patch also simplifies code in dm-bufio.c, dm-ioctl.c and fs/xfs/kmem.c
> > by removing explicit calls to memalloc_nofs_save and memalloc_noio_save
> > before the call to __vmalloc.
> > 
> > Signed-off-by: Mikulas Patocka 
> > 
> > ---
> >  drivers/md/dm-bufio.c |   24 +---
> >  drivers/md/dm-ioctl.c |6 +-
> >  fs/xfs/kmem.c |   14 --
> >  mm/util.c |6 +++---
> >  mm/vmalloc.c  |   18 +-
> >  5 files changed, 22 insertions(+), 46 deletions(-)
> > 
> > Index: linux-2.6/mm/vmalloc.c
> > ===
> > --- linux-2.6.orig/mm/vmalloc.c
> > +++ linux-2.6/mm/vmalloc.c
> > @@ -31,6 +31,7 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> >  
> >  #include 
> >  #include 
> > @@ -1670,6 +1671,8 @@ static void *__vmalloc_area_node(struct
> > unsigned int nr_pages, array_size, i;
> > const gfp_t nested_gfp = (gfp_mask & GFP_RECLAIM_MASK) | __GFP_ZERO;
> > const gfp_t alloc_mask = gfp_mask | __GFP_HIGHMEM | __GFP_NOWARN;
> > +   unsigned noio_flag;
> > +   int r;
> >  
> > nr_pages = get_vm_area_size(area) >> PAGE_SHIFT;
> > array_size = (nr_pages * sizeof(struct page *));
> > @@ -1712,8 +1715,21 @@ static void *__vmalloc_area_node(struct
> > cond_resched();
> > }
> >  
> > -   if (map_vm_area(area, prot, pages))
> > +   if (unlikely(!(gfp_mask & __GFP_IO)))
> > +   noio_flag = memalloc_noio_save();
> > +   else if (unlikely(!(gfp_mask & __GFP_FS)))
> > +   noio_flag = memalloc_nofs_save();
> > +
> > +   r = map_vm_area(area, prot, pages);
> > +
> > +   if (unlikely(!(gfp_mask & __GFP_IO)))
> > +   memalloc_noio_restore(noio_flag);
> > +   else if (unlikely(!(gfp_mask & __GFP_FS)))
> > +   memalloc_nofs_restore(noio_flag);
> > +
> > +   if (unlikely(r))
> > goto fail;
> > +
> > return area->addr;
> >  
> >  fail:
> > Index: linux-2.6/mm/util.c
> > ===
> > --- linux-2.6.orig/mm/util.c
> > +++ linux-2.6/mm/util.c
> > @@ -351,10 +351,10 @@ void *kvmalloc_node(size_t size, gfp_t f
> > void *ret;
> >  
> > /*
> > -* vmalloc uses GFP_KERNEL for some internal allocations (e.g page 
> > tables)
> > -* so the given set of flags has to be compatible.
> > +* vmalloc uses blocking allocations for some internal allocations
> > +* (e.g page tables) so the given set of flags has to be compatible.
> >  */
> > -   WARN_ON_ONCE((flags & GFP_KERNEL) != GFP_KERNEL);
> > +   WARN_ON_ONCE(!gfpflags_allow_blocking(flags));
> >  
> > /*
> >  * We want to attempt a large physically contiguous block first because
> > Index: 

Re: [PATCH] vmalloc: respect the GFP_NOIO and GFP_NOFS flags

2017-06-30 Thread Mikulas Patocka


On Fri, 30 Jun 2017, Michal Hocko wrote:

> On Thu 29-06-17 22:25:09, Mikulas Patocka wrote:
> > The __vmalloc function has a parameter gfp_mask with the allocation flags,
> > however it doesn't fully respect the GFP_NOIO and GFP_NOFS flags. The
> > pages are allocated with the specified gfp flags, but the pagetables are
> > always allocated with GFP_KERNEL. This allocation can cause unexpected
> > recursion into the filesystem or I/O subsystem.
> > 
> > It is not practical to extend page table allocation routines with gfp
> > flags because it would require modification of architecture-specific code
> > in all architecturs. However, the process can temporarily request that all
> > allocations are done with GFP_NOFS or GFP_NOIO with with the functions
> > memalloc_nofs_save and memalloc_noio_save.
> > 
> > This patch makes the vmalloc code use memalloc_nofs_save or
> > memalloc_noio_save if the supplied gfp flags do not contain __GFP_FS or
> > __GFP_IO. It fixes some possible deadlocks in drivers/mtd/ubi/io.c,
> > fs/gfs2/, fs/btrfs/free-space-tree.c, fs/ubifs/,
> > fs/nfs/blocklayout/extent_tree.c where __vmalloc is used with the GFP_NOFS
> > flag.
> 
> I strongly believe this is a step in the _wrong_ direction. Why? Because

What do you think __vmalloc with GFP_NOIO should do? Print a warning? 
Silently ignore the GFP_NOIO flag?

Mikulas

> the memalloc_no{io,fs}_save API is for the scope allocation context. We
> want users of the scope to define it and document why it is needed.
> GFP_NOFS (I haven't checked GFP_NOIO users) is overused a _lot_ mostly
> based on the filesystem should rather use it to prevent deadlock cargo
> cult. This should change longterm because heavy fs workloads can cause
> troubles to the memory reclaim. So we really want to encourage those
> users to define nofs scopes (e.g. on journal locked contexts etc.)
> rather than have them use the GFP_NOFS explicitly and very often
> mindlessly.
> 
> I am not going to nack this patch because it not incorrect but I would
> really like to discourage you from it because while it saves 24 lines of
> code it (ab)uses the scope allocation context at a wrong layer.
> 
> > The patch also simplifies code in dm-bufio.c, dm-ioctl.c and fs/xfs/kmem.c
> > by removing explicit calls to memalloc_nofs_save and memalloc_noio_save
> > before the call to __vmalloc.
> > 
> > Signed-off-by: Mikulas Patocka 
> > 
> > ---
> >  drivers/md/dm-bufio.c |   24 +---
> >  drivers/md/dm-ioctl.c |6 +-
> >  fs/xfs/kmem.c |   14 --
> >  mm/util.c |6 +++---
> >  mm/vmalloc.c  |   18 +-
> >  5 files changed, 22 insertions(+), 46 deletions(-)
> > 
> > Index: linux-2.6/mm/vmalloc.c
> > ===
> > --- linux-2.6.orig/mm/vmalloc.c
> > +++ linux-2.6/mm/vmalloc.c
> > @@ -31,6 +31,7 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> >  
> >  #include 
> >  #include 
> > @@ -1670,6 +1671,8 @@ static void *__vmalloc_area_node(struct
> > unsigned int nr_pages, array_size, i;
> > const gfp_t nested_gfp = (gfp_mask & GFP_RECLAIM_MASK) | __GFP_ZERO;
> > const gfp_t alloc_mask = gfp_mask | __GFP_HIGHMEM | __GFP_NOWARN;
> > +   unsigned noio_flag;
> > +   int r;
> >  
> > nr_pages = get_vm_area_size(area) >> PAGE_SHIFT;
> > array_size = (nr_pages * sizeof(struct page *));
> > @@ -1712,8 +1715,21 @@ static void *__vmalloc_area_node(struct
> > cond_resched();
> > }
> >  
> > -   if (map_vm_area(area, prot, pages))
> > +   if (unlikely(!(gfp_mask & __GFP_IO)))
> > +   noio_flag = memalloc_noio_save();
> > +   else if (unlikely(!(gfp_mask & __GFP_FS)))
> > +   noio_flag = memalloc_nofs_save();
> > +
> > +   r = map_vm_area(area, prot, pages);
> > +
> > +   if (unlikely(!(gfp_mask & __GFP_IO)))
> > +   memalloc_noio_restore(noio_flag);
> > +   else if (unlikely(!(gfp_mask & __GFP_FS)))
> > +   memalloc_nofs_restore(noio_flag);
> > +
> > +   if (unlikely(r))
> > goto fail;
> > +
> > return area->addr;
> >  
> >  fail:
> > Index: linux-2.6/mm/util.c
> > ===
> > --- linux-2.6.orig/mm/util.c
> > +++ linux-2.6/mm/util.c
> > @@ -351,10 +351,10 @@ void *kvmalloc_node(size_t size, gfp_t f
> > void *ret;
> >  
> > /*
> > -* vmalloc uses GFP_KERNEL for some internal allocations (e.g page 
> > tables)
> > -* so the given set of flags has to be compatible.
> > +* vmalloc uses blocking allocations for some internal allocations
> > +* (e.g page tables) so the given set of flags has to be compatible.
> >  */
> > -   WARN_ON_ONCE((flags & GFP_KERNEL) != GFP_KERNEL);
> > +   WARN_ON_ONCE(!gfpflags_allow_blocking(flags));
> >  
> > /*
> >  * We want to attempt a large physically contiguous block first because
> > Index: linux-2.6/drivers/md/dm-bufio.c
> > 

Re: [PATCH] vmalloc: respect the GFP_NOIO and GFP_NOFS flags

2017-06-30 Thread Michal Hocko
On Thu 29-06-17 22:25:09, Mikulas Patocka wrote:
> The __vmalloc function has a parameter gfp_mask with the allocation flags,
> however it doesn't fully respect the GFP_NOIO and GFP_NOFS flags. The
> pages are allocated with the specified gfp flags, but the pagetables are
> always allocated with GFP_KERNEL. This allocation can cause unexpected
> recursion into the filesystem or I/O subsystem.
> 
> It is not practical to extend page table allocation routines with gfp
> flags because it would require modification of architecture-specific code
> in all architecturs. However, the process can temporarily request that all
> allocations are done with GFP_NOFS or GFP_NOIO with with the functions
> memalloc_nofs_save and memalloc_noio_save.
> 
> This patch makes the vmalloc code use memalloc_nofs_save or
> memalloc_noio_save if the supplied gfp flags do not contain __GFP_FS or
> __GFP_IO. It fixes some possible deadlocks in drivers/mtd/ubi/io.c,
> fs/gfs2/, fs/btrfs/free-space-tree.c, fs/ubifs/,
> fs/nfs/blocklayout/extent_tree.c where __vmalloc is used with the GFP_NOFS
> flag.

I strongly believe this is a step in the _wrong_ direction. Why? Because
the memalloc_no{io,fs}_save API is for the scope allocation context. We
want users of the scope to define it and document why it is needed.
GFP_NOFS (I haven't checked GFP_NOIO users) is overused a _lot_ mostly
based on the filesystem should rather use it to prevent deadlock cargo
cult. This should change longterm because heavy fs workloads can cause
troubles to the memory reclaim. So we really want to encourage those
users to define nofs scopes (e.g. on journal locked contexts etc.)
rather than have them use the GFP_NOFS explicitly and very often
mindlessly.

I am not going to nack this patch because it not incorrect but I would
really like to discourage you from it because while it saves 24 lines of
code it (ab)uses the scope allocation context at a wrong layer.

> The patch also simplifies code in dm-bufio.c, dm-ioctl.c and fs/xfs/kmem.c
> by removing explicit calls to memalloc_nofs_save and memalloc_noio_save
> before the call to __vmalloc.
> 
> Signed-off-by: Mikulas Patocka 
> 
> ---
>  drivers/md/dm-bufio.c |   24 +---
>  drivers/md/dm-ioctl.c |6 +-
>  fs/xfs/kmem.c |   14 --
>  mm/util.c |6 +++---
>  mm/vmalloc.c  |   18 +-
>  5 files changed, 22 insertions(+), 46 deletions(-)
> 
> Index: linux-2.6/mm/vmalloc.c
> ===
> --- linux-2.6.orig/mm/vmalloc.c
> +++ linux-2.6/mm/vmalloc.c
> @@ -31,6 +31,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include 
>  #include 
> @@ -1670,6 +1671,8 @@ static void *__vmalloc_area_node(struct
>   unsigned int nr_pages, array_size, i;
>   const gfp_t nested_gfp = (gfp_mask & GFP_RECLAIM_MASK) | __GFP_ZERO;
>   const gfp_t alloc_mask = gfp_mask | __GFP_HIGHMEM | __GFP_NOWARN;
> + unsigned noio_flag;
> + int r;
>  
>   nr_pages = get_vm_area_size(area) >> PAGE_SHIFT;
>   array_size = (nr_pages * sizeof(struct page *));
> @@ -1712,8 +1715,21 @@ static void *__vmalloc_area_node(struct
>   cond_resched();
>   }
>  
> - if (map_vm_area(area, prot, pages))
> + if (unlikely(!(gfp_mask & __GFP_IO)))
> + noio_flag = memalloc_noio_save();
> + else if (unlikely(!(gfp_mask & __GFP_FS)))
> + noio_flag = memalloc_nofs_save();
> +
> + r = map_vm_area(area, prot, pages);
> +
> + if (unlikely(!(gfp_mask & __GFP_IO)))
> + memalloc_noio_restore(noio_flag);
> + else if (unlikely(!(gfp_mask & __GFP_FS)))
> + memalloc_nofs_restore(noio_flag);
> +
> + if (unlikely(r))
>   goto fail;
> +
>   return area->addr;
>  
>  fail:
> Index: linux-2.6/mm/util.c
> ===
> --- linux-2.6.orig/mm/util.c
> +++ linux-2.6/mm/util.c
> @@ -351,10 +351,10 @@ void *kvmalloc_node(size_t size, gfp_t f
>   void *ret;
>  
>   /*
> -  * vmalloc uses GFP_KERNEL for some internal allocations (e.g page 
> tables)
> -  * so the given set of flags has to be compatible.
> +  * vmalloc uses blocking allocations for some internal allocations
> +  * (e.g page tables) so the given set of flags has to be compatible.
>*/
> - WARN_ON_ONCE((flags & GFP_KERNEL) != GFP_KERNEL);
> + WARN_ON_ONCE(!gfpflags_allow_blocking(flags));
>  
>   /*
>* We want to attempt a large physically contiguous block first because
> Index: linux-2.6/drivers/md/dm-bufio.c
> ===
> --- linux-2.6.orig/drivers/md/dm-bufio.c
> +++ linux-2.6/drivers/md/dm-bufio.c
> @@ -386,9 +386,6 @@ static void __cache_size_refresh(void)
>  static void *alloc_buffer_data(struct dm_bufio_client *c, gfp_t gfp_mask,
> 

Re: [PATCH] vmalloc: respect the GFP_NOIO and GFP_NOFS flags

2017-06-30 Thread Michal Hocko
On Thu 29-06-17 22:25:09, Mikulas Patocka wrote:
> The __vmalloc function has a parameter gfp_mask with the allocation flags,
> however it doesn't fully respect the GFP_NOIO and GFP_NOFS flags. The
> pages are allocated with the specified gfp flags, but the pagetables are
> always allocated with GFP_KERNEL. This allocation can cause unexpected
> recursion into the filesystem or I/O subsystem.
> 
> It is not practical to extend page table allocation routines with gfp
> flags because it would require modification of architecture-specific code
> in all architecturs. However, the process can temporarily request that all
> allocations are done with GFP_NOFS or GFP_NOIO with with the functions
> memalloc_nofs_save and memalloc_noio_save.
> 
> This patch makes the vmalloc code use memalloc_nofs_save or
> memalloc_noio_save if the supplied gfp flags do not contain __GFP_FS or
> __GFP_IO. It fixes some possible deadlocks in drivers/mtd/ubi/io.c,
> fs/gfs2/, fs/btrfs/free-space-tree.c, fs/ubifs/,
> fs/nfs/blocklayout/extent_tree.c where __vmalloc is used with the GFP_NOFS
> flag.

I strongly believe this is a step in the _wrong_ direction. Why? Because
the memalloc_no{io,fs}_save API is for the scope allocation context. We
want users of the scope to define it and document why it is needed.
GFP_NOFS (I haven't checked GFP_NOIO users) is overused a _lot_ mostly
based on the filesystem should rather use it to prevent deadlock cargo
cult. This should change longterm because heavy fs workloads can cause
troubles to the memory reclaim. So we really want to encourage those
users to define nofs scopes (e.g. on journal locked contexts etc.)
rather than have them use the GFP_NOFS explicitly and very often
mindlessly.

I am not going to nack this patch because it not incorrect but I would
really like to discourage you from it because while it saves 24 lines of
code it (ab)uses the scope allocation context at a wrong layer.

> The patch also simplifies code in dm-bufio.c, dm-ioctl.c and fs/xfs/kmem.c
> by removing explicit calls to memalloc_nofs_save and memalloc_noio_save
> before the call to __vmalloc.
> 
> Signed-off-by: Mikulas Patocka 
> 
> ---
>  drivers/md/dm-bufio.c |   24 +---
>  drivers/md/dm-ioctl.c |6 +-
>  fs/xfs/kmem.c |   14 --
>  mm/util.c |6 +++---
>  mm/vmalloc.c  |   18 +-
>  5 files changed, 22 insertions(+), 46 deletions(-)
> 
> Index: linux-2.6/mm/vmalloc.c
> ===
> --- linux-2.6.orig/mm/vmalloc.c
> +++ linux-2.6/mm/vmalloc.c
> @@ -31,6 +31,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include 
>  #include 
> @@ -1670,6 +1671,8 @@ static void *__vmalloc_area_node(struct
>   unsigned int nr_pages, array_size, i;
>   const gfp_t nested_gfp = (gfp_mask & GFP_RECLAIM_MASK) | __GFP_ZERO;
>   const gfp_t alloc_mask = gfp_mask | __GFP_HIGHMEM | __GFP_NOWARN;
> + unsigned noio_flag;
> + int r;
>  
>   nr_pages = get_vm_area_size(area) >> PAGE_SHIFT;
>   array_size = (nr_pages * sizeof(struct page *));
> @@ -1712,8 +1715,21 @@ static void *__vmalloc_area_node(struct
>   cond_resched();
>   }
>  
> - if (map_vm_area(area, prot, pages))
> + if (unlikely(!(gfp_mask & __GFP_IO)))
> + noio_flag = memalloc_noio_save();
> + else if (unlikely(!(gfp_mask & __GFP_FS)))
> + noio_flag = memalloc_nofs_save();
> +
> + r = map_vm_area(area, prot, pages);
> +
> + if (unlikely(!(gfp_mask & __GFP_IO)))
> + memalloc_noio_restore(noio_flag);
> + else if (unlikely(!(gfp_mask & __GFP_FS)))
> + memalloc_nofs_restore(noio_flag);
> +
> + if (unlikely(r))
>   goto fail;
> +
>   return area->addr;
>  
>  fail:
> Index: linux-2.6/mm/util.c
> ===
> --- linux-2.6.orig/mm/util.c
> +++ linux-2.6/mm/util.c
> @@ -351,10 +351,10 @@ void *kvmalloc_node(size_t size, gfp_t f
>   void *ret;
>  
>   /*
> -  * vmalloc uses GFP_KERNEL for some internal allocations (e.g page 
> tables)
> -  * so the given set of flags has to be compatible.
> +  * vmalloc uses blocking allocations for some internal allocations
> +  * (e.g page tables) so the given set of flags has to be compatible.
>*/
> - WARN_ON_ONCE((flags & GFP_KERNEL) != GFP_KERNEL);
> + WARN_ON_ONCE(!gfpflags_allow_blocking(flags));
>  
>   /*
>* We want to attempt a large physically contiguous block first because
> Index: linux-2.6/drivers/md/dm-bufio.c
> ===
> --- linux-2.6.orig/drivers/md/dm-bufio.c
> +++ linux-2.6/drivers/md/dm-bufio.c
> @@ -386,9 +386,6 @@ static void __cache_size_refresh(void)
>  static void *alloc_buffer_data(struct dm_bufio_client *c, gfp_t gfp_mask,
>  enum 

Re: [PATCH] vmalloc: respect the GFP_NOIO and GFP_NOFS flags

2017-06-30 Thread John Hubbard
On 06/29/2017 07:25 PM, Mikulas Patocka wrote:
> The __vmalloc function has a parameter gfp_mask with the allocation flags,
> however it doesn't fully respect the GFP_NOIO and GFP_NOFS flags. The
> pages are allocated with the specified gfp flags, but the pagetables are
> always allocated with GFP_KERNEL. This allocation can cause unexpected
> recursion into the filesystem or I/O subsystem.
> 
> It is not practical to extend page table allocation routines with gfp
> flags because it would require modification of architecture-specific code
> in all architecturs. However, the process can temporarily request that all
> allocations are done with GFP_NOFS or GFP_NOIO with with the functions
> memalloc_nofs_save and memalloc_noio_save.
> 
> This patch makes the vmalloc code use memalloc_nofs_save or
> memalloc_noio_save if the supplied gfp flags do not contain __GFP_FS or
> __GFP_IO. It fixes some possible deadlocks in drivers/mtd/ubi/io.c,
> fs/gfs2/, fs/btrfs/free-space-tree.c, fs/ubifs/,
> fs/nfs/blocklayout/extent_tree.c where __vmalloc is used with the GFP_NOFS
> flag.
> 
> The patch also simplifies code in dm-bufio.c, dm-ioctl.c and fs/xfs/kmem.c
> by removing explicit calls to memalloc_nofs_save and memalloc_noio_save
> before the call to __vmalloc.
> 
> Signed-off-by: Mikulas Patocka 
> 
> ---
>  drivers/md/dm-bufio.c |   24 +---
>  drivers/md/dm-ioctl.c |6 +-
>  fs/xfs/kmem.c |   14 --
>  mm/util.c |6 +++---
>  mm/vmalloc.c  |   18 +-
>  5 files changed, 22 insertions(+), 46 deletions(-)
> 
> Index: linux-2.6/mm/vmalloc.c
> ===
> --- linux-2.6.orig/mm/vmalloc.c
> +++ linux-2.6/mm/vmalloc.c
> @@ -31,6 +31,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include 
>  #include 
> @@ -1670,6 +1671,8 @@ static void *__vmalloc_area_node(struct
>   unsigned int nr_pages, array_size, i;
>   const gfp_t nested_gfp = (gfp_mask & GFP_RECLAIM_MASK) | __GFP_ZERO;
>   const gfp_t alloc_mask = gfp_mask | __GFP_HIGHMEM | __GFP_NOWARN;
> + unsigned noio_flag;
> + int r;
>  
>   nr_pages = get_vm_area_size(area) >> PAGE_SHIFT;
>   array_size = (nr_pages * sizeof(struct page *));
> @@ -1712,8 +1715,21 @@ static void *__vmalloc_area_node(struct
>   cond_resched();
>   }
>  
> - if (map_vm_area(area, prot, pages))
> + if (unlikely(!(gfp_mask & __GFP_IO)))
> + noio_flag = memalloc_noio_save();
> + else if (unlikely(!(gfp_mask & __GFP_FS)))
> + noio_flag = memalloc_nofs_save();
> +
> + r = map_vm_area(area, prot, pages);
> +
> + if (unlikely(!(gfp_mask & __GFP_IO)))
> + memalloc_noio_restore(noio_flag);
> + else if (unlikely(!(gfp_mask & __GFP_FS)))
> + memalloc_nofs_restore(noio_flag);
> +
> + if (unlikely(r))
>   goto fail;
> +
>   return area->addr;
>  
>  fail:
> Index: linux-2.6/mm/util.c
> ===
> --- linux-2.6.orig/mm/util.c
> +++ linux-2.6/mm/util.c
> @@ -351,10 +351,10 @@ void *kvmalloc_node(size_t size, gfp_t f
>   void *ret;
>  
>   /*
> -  * vmalloc uses GFP_KERNEL for some internal allocations (e.g page 
> tables)
> -  * so the given set of flags has to be compatible.
> +  * vmalloc uses blocking allocations for some internal allocations
> +  * (e.g page tables) so the given set of flags has to be compatible.
>*/
> - WARN_ON_ONCE((flags & GFP_KERNEL) != GFP_KERNEL);
> + WARN_ON_ONCE(!gfpflags_allow_blocking(flags));

Hi Mikulas,

OK, so given the new behavior in the underlying __vmalloc code, I think it's
appropriate to add this documentation change on top of what you have so far:

 mm/util.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/mm/util.c b/mm/util.c
index cdbc9022c021..39fe94530dd2 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -343,7 +343,8 @@ EXPORT_SYMBOL(vm_mmap);
  * __GFP_RETRY_MAYFAIL is supported, and it should be used only if kmalloc is
  * preferable to the vmalloc fallback, due to visible performance drawbacks.
  *
- * Any use of gfp flags outside of GFP_KERNEL should be consulted with mm 
people.
+ * Any use of gfp flags other than GFP_KERNEL, GFP_NOIO, or GFP_NOFS should
+ * be done only after consulting with mm people.
  */
 void *kvmalloc_node(size_t size, gfp_t flags, int node)
 {

>  
>   /*
>* We want to attempt a large physically contiguous block first because
> Index: linux-2.6/drivers/md/dm-bufio.c
> ===
> --- linux-2.6.orig/drivers/md/dm-bufio.c
> +++ linux-2.6/drivers/md/dm-bufio.c
> @@ -386,9 +386,6 @@ static void __cache_size_refresh(void)
>  static void *alloc_buffer_data(struct dm_bufio_client *c, gfp_t gfp_mask,
>  enum data_mode 

Re: [PATCH] vmalloc: respect the GFP_NOIO and GFP_NOFS flags

2017-06-30 Thread John Hubbard
On 06/29/2017 07:25 PM, Mikulas Patocka wrote:
> The __vmalloc function has a parameter gfp_mask with the allocation flags,
> however it doesn't fully respect the GFP_NOIO and GFP_NOFS flags. The
> pages are allocated with the specified gfp flags, but the pagetables are
> always allocated with GFP_KERNEL. This allocation can cause unexpected
> recursion into the filesystem or I/O subsystem.
> 
> It is not practical to extend page table allocation routines with gfp
> flags because it would require modification of architecture-specific code
> in all architecturs. However, the process can temporarily request that all
> allocations are done with GFP_NOFS or GFP_NOIO with with the functions
> memalloc_nofs_save and memalloc_noio_save.
> 
> This patch makes the vmalloc code use memalloc_nofs_save or
> memalloc_noio_save if the supplied gfp flags do not contain __GFP_FS or
> __GFP_IO. It fixes some possible deadlocks in drivers/mtd/ubi/io.c,
> fs/gfs2/, fs/btrfs/free-space-tree.c, fs/ubifs/,
> fs/nfs/blocklayout/extent_tree.c where __vmalloc is used with the GFP_NOFS
> flag.
> 
> The patch also simplifies code in dm-bufio.c, dm-ioctl.c and fs/xfs/kmem.c
> by removing explicit calls to memalloc_nofs_save and memalloc_noio_save
> before the call to __vmalloc.
> 
> Signed-off-by: Mikulas Patocka 
> 
> ---
>  drivers/md/dm-bufio.c |   24 +---
>  drivers/md/dm-ioctl.c |6 +-
>  fs/xfs/kmem.c |   14 --
>  mm/util.c |6 +++---
>  mm/vmalloc.c  |   18 +-
>  5 files changed, 22 insertions(+), 46 deletions(-)
> 
> Index: linux-2.6/mm/vmalloc.c
> ===
> --- linux-2.6.orig/mm/vmalloc.c
> +++ linux-2.6/mm/vmalloc.c
> @@ -31,6 +31,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include 
>  #include 
> @@ -1670,6 +1671,8 @@ static void *__vmalloc_area_node(struct
>   unsigned int nr_pages, array_size, i;
>   const gfp_t nested_gfp = (gfp_mask & GFP_RECLAIM_MASK) | __GFP_ZERO;
>   const gfp_t alloc_mask = gfp_mask | __GFP_HIGHMEM | __GFP_NOWARN;
> + unsigned noio_flag;
> + int r;
>  
>   nr_pages = get_vm_area_size(area) >> PAGE_SHIFT;
>   array_size = (nr_pages * sizeof(struct page *));
> @@ -1712,8 +1715,21 @@ static void *__vmalloc_area_node(struct
>   cond_resched();
>   }
>  
> - if (map_vm_area(area, prot, pages))
> + if (unlikely(!(gfp_mask & __GFP_IO)))
> + noio_flag = memalloc_noio_save();
> + else if (unlikely(!(gfp_mask & __GFP_FS)))
> + noio_flag = memalloc_nofs_save();
> +
> + r = map_vm_area(area, prot, pages);
> +
> + if (unlikely(!(gfp_mask & __GFP_IO)))
> + memalloc_noio_restore(noio_flag);
> + else if (unlikely(!(gfp_mask & __GFP_FS)))
> + memalloc_nofs_restore(noio_flag);
> +
> + if (unlikely(r))
>   goto fail;
> +
>   return area->addr;
>  
>  fail:
> Index: linux-2.6/mm/util.c
> ===
> --- linux-2.6.orig/mm/util.c
> +++ linux-2.6/mm/util.c
> @@ -351,10 +351,10 @@ void *kvmalloc_node(size_t size, gfp_t f
>   void *ret;
>  
>   /*
> -  * vmalloc uses GFP_KERNEL for some internal allocations (e.g page 
> tables)
> -  * so the given set of flags has to be compatible.
> +  * vmalloc uses blocking allocations for some internal allocations
> +  * (e.g page tables) so the given set of flags has to be compatible.
>*/
> - WARN_ON_ONCE((flags & GFP_KERNEL) != GFP_KERNEL);
> + WARN_ON_ONCE(!gfpflags_allow_blocking(flags));

Hi Mikulas,

OK, so given the new behavior in the underlying __vmalloc code, I think it's
appropriate to add this documentation change on top of what you have so far:

 mm/util.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/mm/util.c b/mm/util.c
index cdbc9022c021..39fe94530dd2 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -343,7 +343,8 @@ EXPORT_SYMBOL(vm_mmap);
  * __GFP_RETRY_MAYFAIL is supported, and it should be used only if kmalloc is
  * preferable to the vmalloc fallback, due to visible performance drawbacks.
  *
- * Any use of gfp flags outside of GFP_KERNEL should be consulted with mm 
people.
+ * Any use of gfp flags other than GFP_KERNEL, GFP_NOIO, or GFP_NOFS should
+ * be done only after consulting with mm people.
  */
 void *kvmalloc_node(size_t size, gfp_t flags, int node)
 {

>  
>   /*
>* We want to attempt a large physically contiguous block first because
> Index: linux-2.6/drivers/md/dm-bufio.c
> ===
> --- linux-2.6.orig/drivers/md/dm-bufio.c
> +++ linux-2.6/drivers/md/dm-bufio.c
> @@ -386,9 +386,6 @@ static void __cache_size_refresh(void)
>  static void *alloc_buffer_data(struct dm_bufio_client *c, gfp_t gfp_mask,
>  enum data_mode *data_mode)
>  {
> -