[dm-devel] [PATCH RFC] multipath-tools: *untested* use sysfs prio also for arrays with dual implicit/explicit alua support

2020-07-24 Thread Xose Vazquez Perez



Cc: Martin Wilck 
Cc: Benjamin Marzinski 
Cc: Hannes Reinecke 
Cc: DM-DEVEL ML 
Signed-off-by: Xose Vazquez Perez 
---
diff --git a/libmultipath/propsel.c b/libmultipath/propsel.c
index 897e48ca..5a82234f 100644
--- a/libmultipath/propsel.c
+++ b/libmultipath/propsel.c
@@ -595,7 +595,7 @@ detect_prio(struct config *conf, struct path * pp)
tpgs = path_get_tpgs(pp);
if (tpgs == TPGS_NONE)
return;
-   if ((tpgs == TPGS_EXPLICIT || !check_rdac(pp)) &&
+   if ((tpgs == TPGS_EXPLICIT || tpgs == TPGS_BOTH || !check_rdac(pp)) 
&&
sysfs_get_asymmetric_access_state(pp, buff, 512) >= 0)
default_prio = PRIO_SYSFS;
else

In short:

diff --git a/libmultipath/propsel.c b/libmultipath/propsel.c
index 897e48ca..a9609a01 100644
--- a/libmultipath/propsel.c
+++ b/libmultipath/propsel.c
@@ -595,7 +595,7 @@ detect_prio(struct config *conf, struct path * pp)
tpgs = path_get_tpgs(pp);
if (tpgs == TPGS_NONE)
return;
-   if ((tpgs == TPGS_EXPLICIT || !check_rdac(pp)) &&
+   if ((tpgs != TPGS_IMPLICIT || !check_rdac(pp)) &&
sysfs_get_asymmetric_access_state(pp, buff, 512) >= 0)
default_prio = PRIO_SYSFS;
else

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [git pull] device mapper fix for 5.8-rc7

2020-07-24 Thread pr-tracker-bot
The pull request you sent on Fri, 24 Jul 2020 13:47:38 -0400:

> git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git 
> tags/for-5.8/dm-fixes-3

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/a38a19efcd9b7b536e2820df91e9f0be806f9a42

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.wiki.kernel.org/userdoc/prtracker

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel



[dm-devel] [git pull] device mapper fix for 5.8-rc7

2020-07-24 Thread Mike Snitzer
Hi Linus,

The following changes since commit 6958c1c640af8c3f40fa8a2eee3b5b905d95b677:

  dm: use noio when sending kobject event (2020-07-08 12:50:51 -0400)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git 
tags/for-5.8/dm-fixes-3

for you to fetch changes up to 5df96f2b9f58a5d2dc1f30fe7de75e197f2c25f2:

  dm integrity: fix integrity recalculation that is improperly skipped 
(2020-07-23 14:39:37 -0400)

Please pull, thanks!
Mike


- Stable fix for DM integrity target's integrity recalculation that
  gets skipped when resuming a device. This is a fix for a previous
  stable@ fix.


Mikulas Patocka (1):
  dm integrity: fix integrity recalculation that is improperly skipped

 drivers/md/dm-integrity.c |  4 ++--
 drivers/md/dm.c   | 17 +
 include/linux/device-mapper.h |  1 +
 3 files changed, 20 insertions(+), 2 deletions(-)

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [PATCH V2] libmultipath: free pp if store_path fails in disassemble_map

2020-07-24 Thread Benjamin Marzinski
On Fri, Jul 24, 2020 at 09:40:18AM +0800, Zhiqiang Liu wrote:
> In disassemble_map func, one pp will be allocated and stored in
> pgp->paths. However, if store_path fails, pp will not be freed,
> then memory leak problem occurs.
> 
> Here, we will call free_path to free pp when store_path fails.
> 
Reviewed-by: Benjamin Marzinski 
> Signed-off-by: Zhiqiang Liu 
> Signed-off-by: lixiaokeng 
> ---
> V1->V2: update based on ups/submit-2007 branch.
> 
>  libmultipath/dmparser.c | 9 +++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/libmultipath/dmparser.c b/libmultipath/dmparser.c
> index b9858fa5..8a0501ba 100644
> --- a/libmultipath/dmparser.c
> +++ b/libmultipath/dmparser.c
> @@ -143,6 +143,7 @@ int disassemble_map(const struct _vector *pathvec,
>   int def_minio = 0;
>   struct path * pp;
>   struct pathgroup * pgp;
> + int pp_alloc_flag = 0;
> 
>   assert(pathvec != NULL);
>   p = params;
> @@ -292,6 +293,7 @@ int disassemble_map(const struct _vector *pathvec,
> 
>   for (j = 0; j < num_paths; j++) {
>   pp = NULL;
> + pp_alloc_flag = 0;
>   p += get_word(p, );
> 
>   if (!word)
> @@ -304,13 +306,16 @@ int disassemble_map(const struct _vector *pathvec,
> 
>   if (!pp)
>   goto out1;
> -
> + pp_alloc_flag = 1;
>   strlcpy(pp->dev_t, word, BLK_DEV_SIZE);
>   }
>   FREE(word);
> 
> - if (store_path(pgp->paths, pp))
> + if (store_path(pgp->paths, pp)) {
> + if (pp_alloc_flag)
> + free_path(pp);
>   goto out;
> + }
> 
>   pgp->id ^= (long)pp;
>   pp->pgindex = i + 1;
> -- 
> 2.24.0.windows.2
> 

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel



[dm-devel] [PATCH 02/14] drbd: remove dead code in device_to_statistics

2020-07-24 Thread Christoph Hellwig
Ever since the switch to blk-mq, a lower device not used for VM
writeback will not be marked congested, so the check will never
trigger.

Signed-off-by: Christoph Hellwig 
---
 drivers/block/drbd/drbd_nl.c | 6 --
 1 file changed, 6 deletions(-)

diff --git a/drivers/block/drbd/drbd_nl.c b/drivers/block/drbd/drbd_nl.c
index d0d9a549b58388..650372ee2c7822 100644
--- a/drivers/block/drbd/drbd_nl.c
+++ b/drivers/block/drbd/drbd_nl.c
@@ -3370,7 +3370,6 @@ static void device_to_statistics(struct device_statistics 
*s,
if (get_ldev(device)) {
struct drbd_md *md = >ldev->md;
u64 *history_uuids = (u64 *)s->history_uuids;
-   struct request_queue *q;
int n;
 
spin_lock_irq(>uuid_lock);
@@ -3384,11 +3383,6 @@ static void device_to_statistics(struct 
device_statistics *s,
spin_unlock_irq(>uuid_lock);
 
s->dev_disk_flags = md->flags;
-   q = bdev_get_queue(device->ldev->backing_bdev);
-   s->dev_lower_blocked =
-   bdi_congested(q->backing_dev_info,
- (1 << WB_async_congested) |
- (1 << WB_sync_congested));
put_ldev(device);
}
s->dev_size = drbd_get_capacity(device->this_bdev);
-- 
2.27.0

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel



[dm-devel] [PATCH 13/14] bdi: invert BDI_CAP_NO_ACCT_WB

2020-07-24 Thread Christoph Hellwig
Replace BDI_CAP_NO_ACCT_WB with a positive BDI_CAP_WRITEBACK_ACCT to
make the checks more obvious.  Also remove the pointless
bdi_cap_account_writeback wrapper that just obsfucates the check.

Signed-off-by: Christoph Hellwig 
---
 fs/fuse/inode.c |  3 ++-
 include/linux/backing-dev.h | 13 +++--
 mm/backing-dev.c|  1 +
 mm/page-writeback.c |  4 ++--
 4 files changed, 8 insertions(+), 13 deletions(-)

diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index 17b00670fb539e..581329203d6860 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -1050,7 +1050,8 @@ static int fuse_bdi_init(struct fuse_conn *fc, struct 
super_block *sb)
return err;
 
/* fuse does it's own writeback accounting */
-   sb->s_bdi->capabilities = BDI_CAP_NO_ACCT_WB | BDI_CAP_STRICTLIMIT;
+   sb->s_bdi->capabilities &= ~BDI_CAP_WRITEBACK_ACCT;
+   sb->s_bdi->capabilities |= BDI_CAP_STRICTLIMIT;
 
/*
 * For a single fuse filesystem use max 1% of dirty +
diff --git a/include/linux/backing-dev.h b/include/linux/backing-dev.h
index 5da4ea3dd0cc5c..b217344a2c63be 100644
--- a/include/linux/backing-dev.h
+++ b/include/linux/backing-dev.h
@@ -120,17 +120,17 @@ int bdi_set_max_ratio(struct backing_dev_info *bdi, 
unsigned int max_ratio);
  *
  * BDI_CAP_NO_ACCT_DIRTY:  Dirty pages shouldn't contribute to accounting
  * BDI_CAP_NO_WRITEBACK:   Don't write pages back
- * BDI_CAP_NO_ACCT_WB: Don't automatically account writeback pages
+ * BDI_CAP_WRITEBACK_ACCT: Automatically account writeback pages
  * BDI_CAP_STRICTLIMIT:Keep number of dirty pages below bdi threshold.
  */
 #define BDI_CAP_NO_ACCT_DIRTY  0x0001
 #define BDI_CAP_NO_WRITEBACK   0x0002
-#define BDI_CAP_NO_ACCT_WB 0x0004
+#define BDI_CAP_WRITEBACK_ACCT 0x0004
 #define BDI_CAP_STRICTLIMIT0x0010
 #define BDI_CAP_CGROUP_WRITEBACK 0x0020
 
 #define BDI_CAP_NO_ACCT_AND_WRITEBACK \
-   (BDI_CAP_NO_WRITEBACK | BDI_CAP_NO_ACCT_DIRTY | BDI_CAP_NO_ACCT_WB)
+   (BDI_CAP_NO_WRITEBACK | BDI_CAP_NO_ACCT_DIRTY)
 
 extern struct backing_dev_info noop_backing_dev_info;
 
@@ -179,13 +179,6 @@ static inline bool bdi_cap_account_dirty(struct 
backing_dev_info *bdi)
return !(bdi->capabilities & BDI_CAP_NO_ACCT_DIRTY);
 }
 
-static inline bool bdi_cap_account_writeback(struct backing_dev_info *bdi)
-{
-   /* Paranoia: BDI_CAP_NO_WRITEBACK implies BDI_CAP_NO_ACCT_WB */
-   return !(bdi->capabilities & (BDI_CAP_NO_ACCT_WB |
- BDI_CAP_NO_WRITEBACK));
-}
-
 static inline bool mapping_cap_writeback_dirty(struct address_space *mapping)
 {
return bdi_cap_writeback_dirty(inode_to_bdi(mapping->host));
diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index 41ec322090fca6..5f5958e1d39060 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -744,6 +744,7 @@ struct backing_dev_info *bdi_alloc(int node_id)
kfree(bdi);
return NULL;
}
+   bdi->capabilities = BDI_CAP_WRITEBACK_ACCT;
bdi->ra_pages = VM_READAHEAD_PAGES;
return bdi;
 }
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 27a10536adad30..44c4a588f48df5 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -2740,7 +2740,7 @@ int test_clear_page_writeback(struct page *page)
if (ret) {
__xa_clear_mark(>i_pages, page_index(page),
PAGECACHE_TAG_WRITEBACK);
-   if (bdi_cap_account_writeback(bdi)) {
+   if (bdi->capabilities & BDI_CAP_WRITEBACK_ACCT) {
struct bdi_writeback *wb = inode_to_wb(inode);
 
dec_wb_stat(wb, WB_WRITEBACK);
@@ -2793,7 +2793,7 @@ int __test_set_page_writeback(struct page *page, bool 
keep_write)
   PAGECACHE_TAG_WRITEBACK);
 
xas_set_mark(, PAGECACHE_TAG_WRITEBACK);
-   if (bdi_cap_account_writeback(bdi))
+   if (bdi->capabilities & BDI_CAP_WRITEBACK_ACCT)
inc_wb_stat(inode_to_wb(inode), WB_WRITEBACK);
 
/*
-- 
2.27.0

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel



[dm-devel] [PATCH 01/14] fs: remove the unused SB_I_MULTIROOT flag

2020-07-24 Thread Christoph Hellwig
The last user of SB_I_MULTIROOT is disappeared with commit f2aedb713c28
("NFS: Add fs_context support.")

Signed-off-by: Christoph Hellwig 
Reviewed-by: Johannes Thumshirn 
---
 fs/namei.c | 4 ++--
 include/linux/fs.h | 1 -
 2 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 72d4219c93acb7..e9ff0d54a110a7 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -568,8 +568,8 @@ static bool path_connected(struct vfsmount *mnt, struct 
dentry *dentry)
 {
struct super_block *sb = mnt->mnt_sb;
 
-   /* Bind mounts and multi-root filesystems can have disconnected paths */
-   if (!(sb->s_iflags & SB_I_MULTIROOT) && (mnt->mnt_root == sb->s_root))
+   /* Bind mounts can have disconnected paths */
+   if (mnt->mnt_root == sb->s_root)
return true;
 
return is_subdir(dentry, mnt->mnt_root);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 41cd993ec0f686..236543605dd118 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1366,7 +1366,6 @@ extern int send_sigurg(struct fown_struct *fown);
 #define SB_I_CGROUPWB  0x0001  /* cgroup-aware writeback enabled */
 #define SB_I_NOEXEC0x0002  /* Ignore executables on this fs */
 #define SB_I_NODEV 0x0004  /* Ignore devices on this fs */
-#define SB_I_MULTIROOT 0x0008  /* Multiple roots to the dentry tree */
 
 /* sb->s_iflags to limit user namespace mounts */
 #define SB_I_USERNS_VISIBLE0x0010 /* fstype already mounted */
-- 
2.27.0

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel



[dm-devel] [PATCH 14/14] bdi: replace BDI_CAP_NO_{WRITEBACK, ACCT_DIRTY} with a single flag

2020-07-24 Thread Christoph Hellwig
Replace the two negative flags that are always used together with a
single positive flag that indicates the writeback capability instead
of two related non-capabilities.  Also remove the pointless wrappers
to just check the flag.

Signed-off-by: Christoph Hellwig 
---
 fs/9p/vfs_file.c|  2 +-
 fs/fs-writeback.c   |  7 +++---
 include/linux/backing-dev.h | 48 -
 mm/backing-dev.c|  6 ++---
 mm/filemap.c|  4 ++--
 mm/memcontrol.c |  2 +-
 mm/memory-failure.c |  2 +-
 mm/migrate.c|  2 +-
 mm/mmap.c   |  2 +-
 mm/page-writeback.c | 12 +-
 10 files changed, 29 insertions(+), 58 deletions(-)

diff --git a/fs/9p/vfs_file.c b/fs/9p/vfs_file.c
index 92cd1d80218d70..5479d894a10696 100644
--- a/fs/9p/vfs_file.c
+++ b/fs/9p/vfs_file.c
@@ -625,7 +625,7 @@ static void v9fs_mmap_vm_close(struct vm_area_struct *vma)
 
inode = file_inode(vma->vm_file);
 
-   if (!mapping_cap_writeback_dirty(inode->i_mapping))
+   if (!mapping_can_writeback(inode->i_mapping))
wbc.nr_to_write = 0;
 
might_sleep();
diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index a605c3dddabc76..e62e48fecff4f9 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -2318,7 +2318,7 @@ void __mark_inode_dirty(struct inode *inode, int flags)
 
wb = locked_inode_to_wb_and_lock_list(inode);
 
-   WARN(bdi_cap_writeback_dirty(wb->bdi) &&
+   WARN((wb->bdi->capabilities & BDI_CAP_WRITEBACK) &&
 !test_bit(WB_registered, >state),
 "bdi-%s not registered\n", bdi_dev_name(wb->bdi));
 
@@ -2343,7 +2343,8 @@ void __mark_inode_dirty(struct inode *inode, int flags)
 * to make sure background write-back happens
 * later.
 */
-   if (bdi_cap_writeback_dirty(wb->bdi) && wakeup_bdi)
+   if (wakeup_bdi &&
+   (wb->bdi->capabilities & BDI_CAP_WRITEBACK))
wb_wakeup_delayed(wb);
return;
}
@@ -2578,7 +2579,7 @@ int write_inode_now(struct inode *inode, int sync)
.range_end = LLONG_MAX,
};
 
-   if (!mapping_cap_writeback_dirty(inode->i_mapping))
+   if (!mapping_can_writeback(inode->i_mapping))
wbc.nr_to_write = 0;
 
might_sleep();
diff --git a/include/linux/backing-dev.h b/include/linux/backing-dev.h
index b217344a2c63be..44df4fcef65c1e 100644
--- a/include/linux/backing-dev.h
+++ b/include/linux/backing-dev.h
@@ -110,27 +110,14 @@ int bdi_set_max_ratio(struct backing_dev_info *bdi, 
unsigned int max_ratio);
 /*
  * Flags in backing_dev_info::capability
  *
- * The first three flags control whether dirty pages will contribute to the
- * VM's accounting and whether writepages() should be called for dirty pages
- * (something that would not, for example, be appropriate for ramfs)
- *
- * WARNING: these flags are closely related and should not normally be
- * used separately.  The BDI_CAP_NO_ACCT_AND_WRITEBACK combines these
- * three flags into a single convenience macro.
- *
- * BDI_CAP_NO_ACCT_DIRTY:  Dirty pages shouldn't contribute to accounting
- * BDI_CAP_NO_WRITEBACK:   Don't write pages back
- * BDI_CAP_WRITEBACK_ACCT: Automatically account writeback pages
- * BDI_CAP_STRICTLIMIT:Keep number of dirty pages below bdi threshold.
+ * BDI_CAP_WRITEBACK:  Supports dirty page writeback, and dirty pages
+ * should contribute to accounting
+ * BDI_CAP_WRITEBACK_ACCT: Automatically account writeback pages
+ * BDI_CAP_STRICTLIMIT:Keep number of dirty pages below bdi 
threshold
  */
-#define BDI_CAP_NO_ACCT_DIRTY  0x0001
-#define BDI_CAP_NO_WRITEBACK   0x0002
-#define BDI_CAP_WRITEBACK_ACCT 0x0004
-#define BDI_CAP_STRICTLIMIT0x0010
-#define BDI_CAP_CGROUP_WRITEBACK 0x0020
-
-#define BDI_CAP_NO_ACCT_AND_WRITEBACK \
-   (BDI_CAP_NO_WRITEBACK | BDI_CAP_NO_ACCT_DIRTY)
+#define BDI_CAP_WRITEBACK  (1 << 0)
+#define BDI_CAP_WRITEBACK_ACCT (1 << 1)
+#define BDI_CAP_STRICTLIMIT(1 << 2)
 
 extern struct backing_dev_info noop_backing_dev_info;
 
@@ -169,24 +156,9 @@ static inline int wb_congested(struct bdi_writeback *wb, 
int cong_bits)
 long congestion_wait(int sync, long timeout);
 long wait_iff_congested(int sync, long timeout);
 
-static inline bool bdi_cap_writeback_dirty(struct backing_dev_info *bdi)
-{
-   return !(bdi->capabilities & BDI_CAP_NO_WRITEBACK);
-}
-
-static inline bool bdi_cap_account_dirty(struct backing_dev_info *bdi)
-{
-   return !(bdi->capabilities & BDI_CAP_NO_ACCT_DIRTY);
-}
-
-static inline bool mapping_cap_writeback_dirty(struct address_space *mapping)
-{
-   

[dm-devel] [PATCH 04/14] bdi: initialize ->ra_pages in bdi_init

2020-07-24 Thread Christoph Hellwig
Set up a readahead size by default, as very few users have a good
reason to change it.

Signed-off-by: Christoph Hellwig 
Acked-by: David Sterba  [btrfs]
Acked-by: Richard Weinberger  [ubifs, mtd]
---
 block/blk-core.c  | 1 -
 drivers/mtd/mtdcore.c | 1 +
 fs/9p/vfs_super.c | 4 ++--
 fs/afs/super.c| 1 -
 fs/btrfs/disk-io.c| 1 -
 fs/fuse/inode.c   | 1 -
 fs/nfs/super.c| 9 +
 fs/ubifs/super.c  | 1 +
 fs/vboxsf/super.c | 1 +
 mm/backing-dev.c  | 1 +
 10 files changed, 7 insertions(+), 14 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 93104c7470e8ac..ea1665de7a2079 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -538,7 +538,6 @@ struct request_queue *blk_alloc_queue(int node_id)
if (!q->stats)
goto fail_stats;
 
-   q->backing_dev_info->ra_pages = VM_READAHEAD_PAGES;
q->backing_dev_info->capabilities = BDI_CAP_CGROUP_WRITEBACK;
q->node = node_id;
 
diff --git a/drivers/mtd/mtdcore.c b/drivers/mtd/mtdcore.c
index 7d930569a7dfb7..01b3fe888d885b 100644
--- a/drivers/mtd/mtdcore.c
+++ b/drivers/mtd/mtdcore.c
@@ -2196,6 +2196,7 @@ static struct backing_dev_info * __init mtd_bdi_init(char 
*name)
bdi = bdi_alloc(NUMA_NO_NODE);
if (!bdi)
return ERR_PTR(-ENOMEM);
+   bdi->ra_pages = 0;
 
/*
 * We put '-0' suffix to the name to get the same name format as we
diff --git a/fs/9p/vfs_super.c b/fs/9p/vfs_super.c
index 74df32be4c6a52..a338eb979cadf9 100644
--- a/fs/9p/vfs_super.c
+++ b/fs/9p/vfs_super.c
@@ -80,8 +80,8 @@ v9fs_fill_super(struct super_block *sb, struct 
v9fs_session_info *v9ses,
if (ret)
return ret;
 
-   if (v9ses->cache)
-   sb->s_bdi->ra_pages = VM_READAHEAD_PAGES;
+   if (!v9ses->cache)
+   sb->s_bdi->ra_pages = 0;
 
sb->s_flags |= SB_ACTIVE | SB_DIRSYNC;
if (!v9ses->cache)
diff --git a/fs/afs/super.c b/fs/afs/super.c
index b552357b1d1379..3a40ee752c1e3f 100644
--- a/fs/afs/super.c
+++ b/fs/afs/super.c
@@ -456,7 +456,6 @@ static int afs_fill_super(struct super_block *sb, struct 
afs_fs_context *ctx)
ret = super_setup_bdi(sb);
if (ret)
return ret;
-   sb->s_bdi->ra_pages = VM_READAHEAD_PAGES;
 
/* allocate the root inode and dentry */
if (as->dyn_root) {
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index ad157b55d7f5f0..f92c45fe019c48 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3033,7 +3033,6 @@ int __cold open_ctree(struct super_block *sb, struct 
btrfs_fs_devices *fs_device
}
 
sb->s_bdi->capabilities |= BDI_CAP_CGROUP_WRITEBACK;
-   sb->s_bdi->ra_pages = VM_READAHEAD_PAGES;
sb->s_bdi->ra_pages *= btrfs_super_num_devices(disk_super);
sb->s_bdi->ra_pages = max(sb->s_bdi->ra_pages, SZ_4M / PAGE_SIZE);
 
diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index bba747520e9b08..17b00670fb539e 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -1049,7 +1049,6 @@ static int fuse_bdi_init(struct fuse_conn *fc, struct 
super_block *sb)
if (err)
return err;
 
-   sb->s_bdi->ra_pages = VM_READAHEAD_PAGES;
/* fuse does it's own writeback accounting */
sb->s_bdi->capabilities = BDI_CAP_NO_ACCT_WB | BDI_CAP_STRICTLIMIT;
 
diff --git a/fs/nfs/super.c b/fs/nfs/super.c
index 7a70287f21a2c1..f943e37853fa25 100644
--- a/fs/nfs/super.c
+++ b/fs/nfs/super.c
@@ -1200,13 +1200,6 @@ static void nfs_get_cache_cookie(struct super_block *sb,
 }
 #endif
 
-static void nfs_set_readahead(struct backing_dev_info *bdi,
- unsigned long iomax_pages)
-{
-   bdi->ra_pages = VM_READAHEAD_PAGES;
-   bdi->io_pages = iomax_pages;
-}
-
 int nfs_get_tree_common(struct fs_context *fc)
 {
struct nfs_fs_context *ctx = nfs_fc2context(fc);
@@ -1251,7 +1244,7 @@ int nfs_get_tree_common(struct fs_context *fc)
 MINOR(server->s_dev));
if (error)
goto error_splat_super;
-   nfs_set_readahead(s->s_bdi, server->rpages);
+   s->s_bdi->io_pages = server->rpages;
server->super = s;
}
 
diff --git a/fs/ubifs/super.c b/fs/ubifs/super.c
index 7fc2f3f07c16ed..ee7692e7a35371 100644
--- a/fs/ubifs/super.c
+++ b/fs/ubifs/super.c
@@ -2159,6 +2159,7 @@ static int ubifs_fill_super(struct super_block *sb, void 
*data, int silent)
   c->vi.vol_id);
if (err)
goto out_close;
+   sb->s_bdi->ra_pages = 0; /* ubifs does its own readahead */
 
sb->s_fs_info = c;
sb->s_magic = UBIFS_SUPER_MAGIC;
diff --git a/fs/vboxsf/super.c b/fs/vboxsf/super.c
index 8fe03b4a0d2b03..6574ae5a97c2c8 100644
--- a/fs/vboxsf/super.c
+++ b/fs/vboxsf/super.c
@@ -167,6 +167,7 @@ static int vboxsf_fill_super(struct super_block *sb, struct 
fs_context 

[dm-devel] [PATCH 09/14] bdi: remove BDI_CAP_CGROUP_WRITEBACK

2020-07-24 Thread Christoph Hellwig
Just checking SB_I_CGROUPWB for cgroup writeback support is enough.
Either the file system allocates its own bdi (e.g. btrfs), in which case
it is known to support cgroup writeback, or the bdi comes from the block
layer, which always supports cgroup writeback.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Johannes Thumshirn 
---
 block/blk-core.c| 1 -
 fs/btrfs/disk-io.c  | 1 -
 include/linux/backing-dev.h | 8 +++-
 3 files changed, 3 insertions(+), 7 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index ea1665de7a2079..68db7e745b49dd 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -538,7 +538,6 @@ struct request_queue *blk_alloc_queue(int node_id)
if (!q->stats)
goto fail_stats;
 
-   q->backing_dev_info->capabilities = BDI_CAP_CGROUP_WRITEBACK;
q->node = node_id;
 
timer_setup(>backing_dev_info->laptop_mode_wb_timer,
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index f92c45fe019c48..4b5a8640329e4c 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3032,7 +3032,6 @@ int __cold open_ctree(struct super_block *sb, struct 
btrfs_fs_devices *fs_device
goto fail_sb_buffer;
}
 
-   sb->s_bdi->capabilities |= BDI_CAP_CGROUP_WRITEBACK;
sb->s_bdi->ra_pages *= btrfs_super_num_devices(disk_super);
sb->s_bdi->ra_pages = max(sb->s_bdi->ra_pages, SZ_4M / PAGE_SIZE);
 
diff --git a/include/linux/backing-dev.h b/include/linux/backing-dev.h
index 0b06b2d26c9aa3..52583b6f2ea05d 100644
--- a/include/linux/backing-dev.h
+++ b/include/linux/backing-dev.h
@@ -123,7 +123,6 @@ int bdi_set_max_ratio(struct backing_dev_info *bdi, 
unsigned int max_ratio);
  * BDI_CAP_NO_ACCT_WB: Don't automatically account writeback pages
  * BDI_CAP_STRICTLIMIT:Keep number of dirty pages below bdi threshold.
  *
- * BDI_CAP_CGROUP_WRITEBACK: Supports cgroup-aware writeback.
  * BDI_CAP_SYNCHRONOUS_IO: Device is so fast that asynchronous IO would be
  *inefficient.
  */
@@ -233,9 +232,9 @@ int inode_congested(struct inode *inode, int cong_bits);
  * inode_cgwb_enabled - test whether cgroup writeback is enabled on an inode
  * @inode: inode of interest
  *
- * cgroup writeback requires support from both the bdi and filesystem.
- * Also, both memcg and iocg have to be on the default hierarchy.  Test
- * whether all conditions are met.
+ * Cgroup writeback requires support from the filesystem.  Also, both memcg and
+ * iocg have to be on the default hierarchy.  Test whether all conditions are
+ * met.
  *
  * Note that the test result may change dynamically on the same inode
  * depending on how memcg and iocg are configured.
@@ -247,7 +246,6 @@ static inline bool inode_cgwb_enabled(struct inode *inode)
return cgroup_subsys_on_dfl(memory_cgrp_subsys) &&
cgroup_subsys_on_dfl(io_cgrp_subsys) &&
bdi_cap_account_dirty(bdi) &&
-   (bdi->capabilities & BDI_CAP_CGROUP_WRITEBACK) &&
(inode->i_sb->s_iflags & SB_I_CGROUPWB);
 }
 
-- 
2.27.0

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel



[dm-devel] [PATCH 07/14] block: make QUEUE_SYSFS_BIT_FNS a little more useful

2020-07-24 Thread Christoph Hellwig
Generate the queue_sysfs_entry given that we have all the required
information for it, and rename the generated show and store methods
to match the other ones in the file.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Johannes Thumshirn 
---
 block/blk-sysfs.c | 31 +--
 1 file changed, 9 insertions(+), 22 deletions(-)

diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index ce418d9128a0b2..cfbb039da8751f 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -257,16 +257,16 @@ static ssize_t queue_max_hw_sectors_show(struct 
request_queue *q, char *page)
return queue_var_show(max_hw_sectors_kb, (page));
 }
 
-#define QUEUE_SYSFS_BIT_FNS(name, flag, neg)   \
+#define QUEUE_SYSFS_BIT_FNS(_name, flag, neg)  \
 static ssize_t \
-queue_show_##name(struct request_queue *q, char *page) \
+queue_##_name##_show(struct request_queue *q, char *page)  \
 {  \
int bit;\
bit = test_bit(QUEUE_FLAG_##flag, >queue_flags); \
return queue_var_show(neg ? !bit : bit, page);  \
 }  \
 static ssize_t \
-queue_store_##name(struct request_queue *q, const char *page, size_t count) \
+queue_##_name##_store(struct request_queue *q, const char *page, size_t count) 
\
 {  \
unsigned long val;  \
ssize_t ret;\
@@ -281,7 +281,12 @@ queue_store_##name(struct request_queue *q, const char 
*page, size_t count) \
else\
blk_queue_flag_clear(QUEUE_FLAG_##flag, q); \
return ret; \
-}
+}  \
+static struct queue_sysfs_entry queue_##_name##_entry = {  \
+   .attr   = { .name = __stringify(_name), .mode = 0644 }, \
+   .show   = queue_##_name##_show, \
+   .store  = queue_##_name##_store,\
+};
 
 QUEUE_SYSFS_BIT_FNS(nonrot, NONROT, 1);
 QUEUE_SYSFS_BIT_FNS(random, ADD_RANDOM, 0);
@@ -661,12 +666,6 @@ static struct queue_sysfs_entry 
queue_zone_append_max_entry = {
.show = queue_zone_append_max_show,
 };
 
-static struct queue_sysfs_entry queue_nonrot_entry = {
-   .attr = {.name = "rotational", .mode = 0644 },
-   .show = queue_show_nonrot,
-   .store = queue_store_nonrot,
-};
-
 static struct queue_sysfs_entry queue_zoned_entry = {
.attr = {.name = "zoned", .mode = 0444 },
.show = queue_zoned_show,
@@ -699,18 +698,6 @@ static struct queue_sysfs_entry queue_rq_affinity_entry = {
.store = queue_rq_affinity_store,
 };
 
-static struct queue_sysfs_entry queue_iostats_entry = {
-   .attr = {.name = "iostats", .mode = 0644 },
-   .show = queue_show_iostats,
-   .store = queue_store_iostats,
-};
-
-static struct queue_sysfs_entry queue_random_entry = {
-   .attr = {.name = "add_random", .mode = 0644 },
-   .show = queue_show_random,
-   .store = queue_store_random,
-};
-
 static struct queue_sysfs_entry queue_poll_entry = {
.attr = {.name = "io_poll", .mode = 0644 },
.show = queue_poll_show,
-- 
2.27.0

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel



[dm-devel] [PATCH 06/14] block: lift setting the readahead size into the block layer

2020-07-24 Thread Christoph Hellwig
Drivers shouldn't really mess with the readahead size, as that is a VM
concept.  Instead set it based on the optimal I/O size by lifting the
algorithm from the md driver when registering the disk.  Also set
bdi->io_pages there as well by applying the same scheme based on
max_sectors.

Signed-off-by: Christoph Hellwig 
---
 block/blk-settings.c |  5 ++---
 block/blk-sysfs.c|  1 -
 block/genhd.c| 13 +++--
 drivers/block/aoe/aoeblk.c   |  2 --
 drivers/block/drbd/drbd_nl.c | 12 +---
 drivers/md/bcache/super.c|  4 
 drivers/md/dm-table.c|  3 ---
 drivers/md/raid0.c   | 16 
 drivers/md/raid10.c  | 24 +---
 drivers/md/raid5.c   | 13 +
 10 files changed, 16 insertions(+), 77 deletions(-)

diff --git a/block/blk-settings.c b/block/blk-settings.c
index 76a7e03bcd6cac..01049e9b998f1d 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -452,6 +452,8 @@ EXPORT_SYMBOL(blk_limits_io_opt);
 void blk_queue_io_opt(struct request_queue *q, unsigned int opt)
 {
blk_limits_io_opt(>limits, opt);
+   q->backing_dev_info->ra_pages =
+   max(queue_io_opt(q) * 2 / PAGE_SIZE, VM_READAHEAD_PAGES);
 }
 EXPORT_SYMBOL(blk_queue_io_opt);
 
@@ -628,9 +630,6 @@ void disk_stack_limits(struct gendisk *disk, struct 
block_device *bdev,
printk(KERN_NOTICE "%s: Warning: Device %s is misaligned\n",
   top, bottom);
}
-
-   t->backing_dev_info->io_pages =
-   t->limits.max_sectors >> (PAGE_SHIFT - 9);
 }
 EXPORT_SYMBOL(disk_stack_limits);
 
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index 7dda709f3ccb6f..ce418d9128a0b2 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -245,7 +245,6 @@ queue_max_sectors_store(struct request_queue *q, const char 
*page, size_t count)
 
spin_lock_irq(>queue_lock);
q->limits.max_sectors = max_sectors_kb << 1;
-   q->backing_dev_info->io_pages = max_sectors_kb >> (PAGE_SHIFT - 10);
spin_unlock_irq(>queue_lock);
 
return ret;
diff --git a/block/genhd.c b/block/genhd.c
index 8b1e9f48957cb5..097d4e4bc0b8a2 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -775,6 +775,7 @@ static void __device_add_disk(struct device *parent, struct 
gendisk *disk,
  const struct attribute_group **groups,
  bool register_queue)
 {
+   struct request_queue *q = disk->queue;
dev_t devt;
int retval;
 
@@ -785,7 +786,7 @@ static void __device_add_disk(struct device *parent, struct 
gendisk *disk,
 * registration.
 */
if (register_queue)
-   elevator_init_mq(disk->queue);
+   elevator_init_mq(q);
 
/* minors == 0 indicates to use ext devt from part0 and should
 * be accompanied with EXT_DEVT flag.  Make sure all
@@ -815,10 +816,18 @@ static void __device_add_disk(struct device *parent, 
struct gendisk *disk,
disk->flags |= GENHD_FL_SUPPRESS_PARTITION_INFO;
disk->flags |= GENHD_FL_NO_PART_SCAN;
} else {
-   struct backing_dev_info *bdi = disk->queue->backing_dev_info;
+   struct backing_dev_info *bdi = q->backing_dev_info;
struct device *dev = disk_to_dev(disk);
int ret;
 
+   /*
+* For read-ahead of large files to be effective, we need to
+* readahead at least twice the optimal I/O size.
+*/
+   bdi->ra_pages = max(queue_io_opt(q) * 2 / PAGE_SIZE,
+   VM_READAHEAD_PAGES);
+   bdi->io_pages = queue_max_sectors(q) >> (PAGE_SHIFT - 9);
+
/* Register BDI before referencing it from bdev */
dev->devt = devt;
ret = bdi_register(bdi, "%u:%u", MAJOR(devt), MINOR(devt));
diff --git a/drivers/block/aoe/aoeblk.c b/drivers/block/aoe/aoeblk.c
index 5ca7216e9e01f3..89b33b402b4e52 100644
--- a/drivers/block/aoe/aoeblk.c
+++ b/drivers/block/aoe/aoeblk.c
@@ -347,7 +347,6 @@ aoeblk_gdalloc(void *vp)
mempool_t *mp;
struct request_queue *q;
struct blk_mq_tag_set *set;
-   enum { KB = 1024, MB = KB * KB, READ_AHEAD = 2 * MB, };
ulong flags;
int late = 0;
int err;
@@ -407,7 +406,6 @@ aoeblk_gdalloc(void *vp)
WARN_ON(d->gd);
WARN_ON(d->flags & DEVFL_UP);
blk_queue_max_hw_sectors(q, BLK_DEF_MAX_SECTORS);
-   q->backing_dev_info->ra_pages = READ_AHEAD / PAGE_SIZE;
d->bufpool = mp;
d->blkq = gd->queue = q;
q->queuedata = d;
diff --git a/drivers/block/drbd/drbd_nl.c b/drivers/block/drbd/drbd_nl.c
index 650372ee2c7822..212bf711fb6b41 100644
--- a/drivers/block/drbd/drbd_nl.c
+++ b/drivers/block/drbd/drbd_nl.c
@@ -1360,18 +1360,8 @@ static void drbd_setup_queue_param(struct drbd_device 
*device, 

[dm-devel] [PATCH 08/14] block: add helper macros for queue sysfs entries

2020-07-24 Thread Christoph Hellwig
Add two helpers macros to avoid boilerplate code for the queue sysfs
entries.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Johannes Thumshirn 
---
 block/blk-sysfs.c | 248 +++---
 1 file changed, 58 insertions(+), 190 deletions(-)

diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index cfbb039da8751f..9bb4e42fb73265 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -551,201 +551,69 @@ static ssize_t queue_dax_show(struct request_queue *q, 
char *page)
return queue_var_show(blk_queue_dax(q), page);
 }
 
-static struct queue_sysfs_entry queue_requests_entry = {
-   .attr = {.name = "nr_requests", .mode = 0644 },
-   .show = queue_requests_show,
-   .store = queue_requests_store,
-};
-
-static struct queue_sysfs_entry queue_ra_entry = {
-   .attr = {.name = "read_ahead_kb", .mode = 0644 },
-   .show = queue_ra_show,
-   .store = queue_ra_store,
-};
-
-static struct queue_sysfs_entry queue_max_sectors_entry = {
-   .attr = {.name = "max_sectors_kb", .mode = 0644 },
-   .show = queue_max_sectors_show,
-   .store = queue_max_sectors_store,
-};
+#define QUEUE_RO_ENTRY(_prefix, _name) \
+static struct queue_sysfs_entry _prefix##_entry = {\
+   .attr   = { .name = _name, .mode = 0444 },  \
+   .show   = _prefix##_show,   \
+};
+
+#define QUEUE_RW_ENTRY(_prefix, _name) \
+static struct queue_sysfs_entry _prefix##_entry = {\
+   .attr   = { .name = _name, .mode = 0644 },  \
+   .show   = _prefix##_show,   \
+   .store  = _prefix##_store,  \
+};
+
+QUEUE_RW_ENTRY(queue_requests, "nr_requests");
+QUEUE_RW_ENTRY(queue_ra, "read_ahead_kb");
+QUEUE_RW_ENTRY(queue_max_sectors, "max_sectors_kb");
+QUEUE_RO_ENTRY(queue_max_hw_sectors, "max_hw_sectors_kb");
+QUEUE_RO_ENTRY(queue_max_segments, "max_segments");
+QUEUE_RO_ENTRY(queue_max_integrity_segments, "max_integrity_segments");
+QUEUE_RO_ENTRY(queue_max_segment_size, "max_segment_size");
+QUEUE_RW_ENTRY(elv_iosched, "scheduler");
+
+QUEUE_RO_ENTRY(queue_logical_block_size, "logical_block_size");
+QUEUE_RO_ENTRY(queue_physical_block_size, "physical_block_size");
+QUEUE_RO_ENTRY(queue_chunk_sectors, "chunk_sectors");
+QUEUE_RO_ENTRY(queue_io_min, "minimum_io_size");
+QUEUE_RO_ENTRY(queue_io_opt, "optimal_io_size");
+
+QUEUE_RO_ENTRY(queue_max_discard_segments, "max_discard_segments");
+QUEUE_RO_ENTRY(queue_discard_granularity, "discard_granularity");
+QUEUE_RO_ENTRY(queue_discard_max_hw, "discard_max_hw_bytes");
+QUEUE_RW_ENTRY(queue_discard_max, "discard_max_bytes");
+QUEUE_RO_ENTRY(queue_discard_zeroes_data, "discard_zeroes_data");
+
+QUEUE_RO_ENTRY(queue_write_same_max, "write_same_max_bytes");
+QUEUE_RO_ENTRY(queue_write_zeroes_max, "write_zeroes_max_bytes");
+QUEUE_RO_ENTRY(queue_zone_append_max, "zone_append_max_bytes");
+
+QUEUE_RO_ENTRY(queue_zoned, "zoned");
+QUEUE_RO_ENTRY(queue_nr_zones, "nr_zones");
+QUEUE_RO_ENTRY(queue_max_open_zones, "max_open_zones");
+QUEUE_RO_ENTRY(queue_max_active_zones, "max_active_zones");
+
+QUEUE_RW_ENTRY(queue_nomerges, "nomerges");
+QUEUE_RW_ENTRY(queue_rq_affinity, "rq_affinity");
+QUEUE_RW_ENTRY(queue_poll, "io_poll");
+QUEUE_RW_ENTRY(queue_poll_delay, "io_poll_delay");
+QUEUE_RW_ENTRY(queue_wc, "write_cache");
+QUEUE_RO_ENTRY(queue_fua, "fua");
+QUEUE_RO_ENTRY(queue_dax, "dax");
+QUEUE_RW_ENTRY(queue_io_timeout, "io_timeout");
+QUEUE_RW_ENTRY(queue_wb_lat, "wbt_lat_usec");
 
-static struct queue_sysfs_entry queue_max_hw_sectors_entry = {
-   .attr = {.name = "max_hw_sectors_kb", .mode = 0444 },
-   .show = queue_max_hw_sectors_show,
-};
-
-static struct queue_sysfs_entry queue_max_segments_entry = {
-   .attr = {.name = "max_segments", .mode = 0444 },
-   .show = queue_max_segments_show,
-};
-
-static struct queue_sysfs_entry queue_max_discard_segments_entry = {
-   .attr = {.name = "max_discard_segments", .mode = 0444 },
-   .show = queue_max_discard_segments_show,
-};
-
-static struct queue_sysfs_entry queue_max_integrity_segments_entry = {
-   .attr = {.name = "max_integrity_segments", .mode = 0444 },
-   .show = queue_max_integrity_segments_show,
-};
-
-static struct queue_sysfs_entry queue_max_segment_size_entry = {
-   .attr = {.name = "max_segment_size", .mode = 0444 },
-   .show = queue_max_segment_size_show,
-};
-
-static struct queue_sysfs_entry queue_iosched_entry = {
-   .attr = {.name = "scheduler", .mode = 0644 },
-   .show = elv_iosched_show,
-   .store = elv_iosched_store,
-};
+#ifdef CONFIG_BLK_DEV_THROTTLING_LOW
+QUEUE_RW_ENTRY(blk_throtl_sample, "throttle_sample_time");
+#endif
 
+/* legacy alias for logical_block_size: */
 static struct queue_sysfs_entry queue_hw_sector_size_entry = {
.attr = {.name = "hw_sector_size", .mode = 0444 },
.show = queue_logical_block_size_show,
 };
 
-static struct queue_sysfs_entry 

[dm-devel] [PATCH 12/14] bdi: replace BDI_CAP_STABLE_WRITES with a queue and a sb flag

2020-07-24 Thread Christoph Hellwig
The BDI_CAP_STABLE_WRITES is one of the few bits of information in the
backing_dev_info shared between the block drivers and the writeback code.
To help untangling the dependency replace it with a queue flag and a
superblock flag derived from it.  This also helps with the case of e.g.
a file system requiring stable writes due to its own checksumming, but
not forcing it on other users of the block device like the swap code.

One downside is that we can't support the stable_pages_required bdi
attribute in sysfs anymore.  It is replaced with a queue attribute, that
can also be made writable for easier testing.

Signed-off-by: Christoph Hellwig 
---
 block/blk-integrity.c |  4 ++--
 block/blk-mq-debugfs.c|  1 +
 block/blk-sysfs.c |  2 ++
 drivers/block/rbd.c   |  2 +-
 drivers/block/zram/zram_drv.c |  2 +-
 drivers/md/dm-table.c |  6 +++---
 drivers/md/raid5.c|  8 
 drivers/mmc/core/queue.c  |  3 +--
 drivers/nvme/host/core.c  |  3 +--
 drivers/nvme/host/multipath.c | 10 +++---
 drivers/scsi/iscsi_tcp.c  |  4 ++--
 fs/super.c|  2 ++
 include/linux/backing-dev.h   |  6 --
 include/linux/blkdev.h|  3 +++
 include/linux/fs.h|  1 +
 mm/backing-dev.c  |  6 ++
 mm/page-writeback.c   |  2 +-
 mm/swapfile.c |  2 +-
 18 files changed, 31 insertions(+), 36 deletions(-)

diff --git a/block/blk-integrity.c b/block/blk-integrity.c
index c03705cbb9c9f2..2b36a8f9b81390 100644
--- a/block/blk-integrity.c
+++ b/block/blk-integrity.c
@@ -408,7 +408,7 @@ void blk_integrity_register(struct gendisk *disk, struct 
blk_integrity *template
bi->tuple_size = template->tuple_size;
bi->tag_size = template->tag_size;
 
-   disk->queue->backing_dev_info->capabilities |= BDI_CAP_STABLE_WRITES;
+   blk_queue_flag_set(QUEUE_FLAG_STABLE_WRITES, disk->queue);
 
 #ifdef CONFIG_BLK_INLINE_ENCRYPTION
if (disk->queue->ksm) {
@@ -428,7 +428,7 @@ EXPORT_SYMBOL(blk_integrity_register);
  */
 void blk_integrity_unregister(struct gendisk *disk)
 {
-   disk->queue->backing_dev_info->capabilities &= ~BDI_CAP_STABLE_WRITES;
+   blk_queue_flag_clear(QUEUE_FLAG_STABLE_WRITES, disk->queue);
memset(>queue->integrity, 0, sizeof(struct blk_integrity));
 }
 EXPORT_SYMBOL(blk_integrity_unregister);
diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
index 3f09bcb8a6fd7e..5a7d870eff2f89 100644
--- a/block/blk-mq-debugfs.c
+++ b/block/blk-mq-debugfs.c
@@ -116,6 +116,7 @@ static const char *const blk_queue_flag_name[] = {
QUEUE_FLAG_NAME(SAME_FORCE),
QUEUE_FLAG_NAME(DEAD),
QUEUE_FLAG_NAME(INIT_DONE),
+   QUEUE_FLAG_NAME(STABLE_WRITES),
QUEUE_FLAG_NAME(POLL),
QUEUE_FLAG_NAME(WC),
QUEUE_FLAG_NAME(FUA),
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index 9bb4e42fb73265..4a3799ed33f775 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -291,6 +291,7 @@ static struct queue_sysfs_entry queue_##_name##_entry = {   
\
 QUEUE_SYSFS_BIT_FNS(nonrot, NONROT, 1);
 QUEUE_SYSFS_BIT_FNS(random, ADD_RANDOM, 0);
 QUEUE_SYSFS_BIT_FNS(iostats, IO_STAT, 0);
+QUEUE_SYSFS_BIT_FNS(stable_writes, STABLE_WRITES, 0);
 #undef QUEUE_SYSFS_BIT_FNS
 
 static ssize_t queue_zoned_show(struct request_queue *q, char *page)
@@ -645,6 +646,7 @@ static struct attribute *queue_attrs[] = {
_nomerges_entry.attr,
_rq_affinity_entry.attr,
_iostats_entry.attr,
+   _stable_writes_entry.attr,
_random_entry.attr,
_poll_entry.attr,
_wc_entry.attr,
diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
index 4f61e920946144..4a8515acccb3bf 100644
--- a/drivers/block/rbd.c
+++ b/drivers/block/rbd.c
@@ -5022,7 +5022,7 @@ static int rbd_init_disk(struct rbd_device *rbd_dev)
}
 
if (!ceph_test_opt(rbd_dev->rbd_client->client, NOCRC))
-   q->backing_dev_info->capabilities |= BDI_CAP_STABLE_WRITES;
+   blk_queue_flag_set(QUEUE_FLAG_STABLE_WRITES, q);
 
/*
 * disk_release() expects a queue ref from add_disk() and will
diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index d73ddf018fa65f..e6ed9c9f500a42 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -1954,7 +1954,7 @@ static int zram_add(void)
if (ZRAM_LOGICAL_BLOCK_SIZE == PAGE_SIZE)
blk_queue_max_write_zeroes_sectors(zram->disk->queue, UINT_MAX);
 
-   zram->disk->queue->backing_dev_info->capabilities |= 
BDI_CAP_STABLE_WRITES;
+   blk_queue_flag_set(QUEUE_FLAG_STABLE_WRITES, zram->disk->queue);
device_add_disk(NULL, zram->disk, zram_disk_attr_groups);
 
strlcpy(zram->compressor, default_compressor, sizeof(zram->compressor));
diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
index 324a42ed2f8894..e1adec51cb5b41 100644
--- a/drivers/md/dm-table.c
+++ 

[dm-devel] [PATCH 11/14] mm: use SWP_SYNCHRONOUS_IO more intelligently

2020-07-24 Thread Christoph Hellwig
There is no point in trying to call bdev_read_page if SWP_SYNCHRONOUS_IO
is not set, as the device won't support it.

Signed-off-by: Christoph Hellwig 
---
 mm/page_io.c | 18 ++
 1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/mm/page_io.c b/mm/page_io.c
index ccda7679008851..7eef3c84766abc 100644
--- a/mm/page_io.c
+++ b/mm/page_io.c
@@ -403,15 +403,17 @@ int swap_readpage(struct page *page, bool synchronous)
goto out;
}
 
-   ret = bdev_read_page(sis->bdev, swap_page_sector(page), page);
-   if (!ret) {
-   if (trylock_page(page)) {
-   swap_slot_free_notify(page);
-   unlock_page(page);
-   }
+   if (sis->flags & SWP_SYNCHRONOUS_IO) {
+   ret = bdev_read_page(sis->bdev, swap_page_sector(page), page);
+   if (!ret) {
+   if (trylock_page(page)) {
+   swap_slot_free_notify(page);
+   unlock_page(page);
+   }
 
-   count_vm_event(PSWPIN);
-   goto out;
+   count_vm_event(PSWPIN);
+   goto out;
+   }
}
 
ret = 0;
-- 
2.27.0

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel



[dm-devel] [PATCH 05/14] md: update the optimal I/O size on reshape

2020-07-24 Thread Christoph Hellwig
The raid5 and raid10 drivers currently update the read-ahead size,
but not the optimal I/O size on reshape.  To prepare for deriving the
read-ahead size from the optimal I/O size make sure it is updated
as well.

Signed-off-by: Christoph Hellwig 
---
 drivers/md/raid10.c | 22 ++
 drivers/md/raid5.c  | 10 --
 2 files changed, 22 insertions(+), 10 deletions(-)

diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index b1d0c9d4ef7757..9f88ff9bdee437 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -3695,10 +3695,20 @@ static struct r10conf *setup_conf(struct mddev *mddev)
return ERR_PTR(err);
 }
 
+static void raid10_set_io_opt(struct r10conf *conf)
+{
+   int raid_disks = conf->geo.raid_disks;
+
+   if (!(conf->geo.raid_disks % conf->geo.near_copies))
+   raid_disks /= conf->geo.near_copies;
+   blk_queue_io_opt(conf->mddev->queue, (conf->mddev->chunk_sectors << 9) *
+raid_disks);
+}
+
 static int raid10_run(struct mddev *mddev)
 {
struct r10conf *conf;
-   int i, disk_idx, chunk_size;
+   int i, disk_idx;
struct raid10_info *disk;
struct md_rdev *rdev;
sector_t size;
@@ -3734,18 +3744,13 @@ static int raid10_run(struct mddev *mddev)
mddev->thread = conf->thread;
conf->thread = NULL;
 
-   chunk_size = mddev->chunk_sectors << 9;
if (mddev->queue) {
blk_queue_max_discard_sectors(mddev->queue,
  mddev->chunk_sectors);
blk_queue_max_write_same_sectors(mddev->queue, 0);
blk_queue_max_write_zeroes_sectors(mddev->queue, 0);
-   blk_queue_io_min(mddev->queue, chunk_size);
-   if (conf->geo.raid_disks % conf->geo.near_copies)
-   blk_queue_io_opt(mddev->queue, chunk_size * 
conf->geo.raid_disks);
-   else
-   blk_queue_io_opt(mddev->queue, chunk_size *
-(conf->geo.raid_disks / 
conf->geo.near_copies));
+   blk_queue_io_min(mddev->queue, mddev->chunk_sectors << 9);
+   raid10_set_io_opt(conf);
}
 
rdev_for_each(rdev, mddev) {
@@ -4719,6 +4724,7 @@ static void end_reshape(struct r10conf *conf)
stripe /= conf->geo.near_copies;
if (conf->mddev->queue->backing_dev_info->ra_pages < 2 * stripe)
conf->mddev->queue->backing_dev_info->ra_pages = 2 * 
stripe;
+   raid10_set_io_opt(conf);
}
conf->fullsync = 0;
 }
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index d7780b1dd0c528..68e41ce3ca75cc 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -7123,6 +7123,12 @@ static int only_parity(int raid_disk, int algo, int 
raid_disks, int max_degraded
return 0;
 }
 
+static void raid5_set_io_opt(struct r5conf *conf)
+{
+   blk_queue_io_opt(conf->mddev->queue, (conf->chunk_sectors << 9) *
+(conf->raid_disks - conf->max_degraded));
+}
+
 static int raid5_run(struct mddev *mddev)
 {
struct r5conf *conf;
@@ -7412,8 +7418,7 @@ static int raid5_run(struct mddev *mddev)
 
chunk_size = mddev->chunk_sectors << 9;
blk_queue_io_min(mddev->queue, chunk_size);
-   blk_queue_io_opt(mddev->queue, chunk_size *
-(conf->raid_disks - conf->max_degraded));
+   raid5_set_io_opt(conf);
mddev->queue->limits.raid_partial_stripes_expensive = 1;
/*
 * We can only discard a whole stripe. It doesn't make sense to
@@ -8006,6 +8011,7 @@ static void end_reshape(struct r5conf *conf)
   / PAGE_SIZE);
if (conf->mddev->queue->backing_dev_info->ra_pages < 2 
* stripe)
conf->mddev->queue->backing_dev_info->ra_pages 
= 2 * stripe;
+   raid5_set_io_opt(conf);
}
}
 }
-- 
2.27.0

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel



[dm-devel] [PATCH 03/14] drbd: remove RB_CONGESTED_REMOTE

2020-07-24 Thread Christoph Hellwig
This case isn't ever used.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Johannes Thumshirn 
---
 drivers/block/drbd/drbd_req.c | 4 
 include/linux/drbd.h  | 1 -
 2 files changed, 5 deletions(-)

diff --git a/drivers/block/drbd/drbd_req.c b/drivers/block/drbd/drbd_req.c
index 674be09b2da94a..4d944f2eb56efa 100644
--- a/drivers/block/drbd/drbd_req.c
+++ b/drivers/block/drbd/drbd_req.c
@@ -901,13 +901,9 @@ static bool drbd_may_do_local_read(struct drbd_device 
*device, sector_t sector,
 static bool remote_due_to_read_balancing(struct drbd_device *device, sector_t 
sector,
enum drbd_read_balancing rbm)
 {
-   struct backing_dev_info *bdi;
int stripe_shift;
 
switch (rbm) {
-   case RB_CONGESTED_REMOTE:
-   bdi = 
device->ldev->backing_bdev->bd_disk->queue->backing_dev_info;
-   return bdi_read_congested(bdi);
case RB_LEAST_PENDING:
return atomic_read(>local_cnt) >
atomic_read(>ap_pending_cnt) + 
atomic_read(>rs_pending_cnt);
diff --git a/include/linux/drbd.h b/include/linux/drbd.h
index 5755537b51b114..6a8286132751df 100644
--- a/include/linux/drbd.h
+++ b/include/linux/drbd.h
@@ -94,7 +94,6 @@ enum drbd_read_balancing {
RB_PREFER_REMOTE,
RB_ROUND_ROBIN,
RB_LEAST_PENDING,
-   RB_CONGESTED_REMOTE,
RB_32K_STRIPING,
RB_64K_STRIPING,
RB_128K_STRIPING,
-- 
2.27.0

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel



[dm-devel] [PATCH 10/14] bdi: remove BDI_CAP_SYNCHRONOUS_IO

2020-07-24 Thread Christoph Hellwig
BDI_CAP_SYNCHRONOUS_IO is only checked in the swap code, and used to
decided if ->rw_page can be used on a block device.  Just check up for
the method instead.  The only complication is that zram needs a second
set of block_device_operations as it can switch between modes that
actually support ->rw_page and those who don't.

Signed-off-by: Christoph Hellwig 
---
 drivers/block/brd.c   |  1 -
 drivers/block/zram/zram_drv.c | 19 +--
 drivers/nvdimm/btt.c  |  2 --
 drivers/nvdimm/pmem.c |  1 -
 include/linux/backing-dev.h   |  9 -
 mm/swapfile.c |  2 +-
 6 files changed, 14 insertions(+), 20 deletions(-)

diff --git a/drivers/block/brd.c b/drivers/block/brd.c
index 2723a70eb85593..cc49a921339f77 100644
--- a/drivers/block/brd.c
+++ b/drivers/block/brd.c
@@ -403,7 +403,6 @@ static struct brd_device *brd_alloc(int i)
disk->flags = GENHD_FL_EXT_DEVT;
sprintf(disk->disk_name, "ram%d", i);
set_capacity(disk, rd_size * 2);
-   brd->brd_queue->backing_dev_info->capabilities |= 
BDI_CAP_SYNCHRONOUS_IO;
 
/* Tell the block layer that this is not a rotational device */
blk_queue_flag_set(QUEUE_FLAG_NONROT, brd->brd_queue);
diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index 9100ac36670afc..d73ddf018fa65f 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -52,6 +52,9 @@ static unsigned int num_devices = 1;
  */
 static size_t huge_class_size;
 
+static const struct block_device_operations zram_devops;
+static const struct block_device_operations zram_wb_devops;
+
 static void zram_free_page(struct zram *zram, size_t index);
 static int zram_bvec_read(struct zram *zram, struct bio_vec *bvec,
u32 index, int offset, struct bio *bio);
@@ -408,8 +411,7 @@ static void reset_bdev(struct zram *zram)
zram->backing_dev = NULL;
zram->old_block_size = 0;
zram->bdev = NULL;
-   zram->disk->queue->backing_dev_info->capabilities |=
-   BDI_CAP_SYNCHRONOUS_IO;
+   zram->disk->fops = _devops;
kvfree(zram->bitmap);
zram->bitmap = NULL;
 }
@@ -528,8 +530,7 @@ static ssize_t backing_dev_store(struct device *dev,
 * freely but in fact, IO is going on so finally could cause
 * use-after-free when the IO is really done.
 */
-   zram->disk->queue->backing_dev_info->capabilities &=
-   ~BDI_CAP_SYNCHRONOUS_IO;
+   zram->disk->fops = _wb_devops;
up_write(>init_lock);
 
pr_info("setup backing device %s\n", file_name);
@@ -1819,6 +1820,13 @@ static const struct block_device_operations zram_devops 
= {
.owner = THIS_MODULE
 };
 
+static const struct block_device_operations zram_wb_devops = {
+   .open = zram_open,
+   .submit_bio = zram_submit_bio,
+   .swap_slot_free_notify = zram_slot_free_notify,
+   .owner = THIS_MODULE
+};
+
 static DEVICE_ATTR_WO(compact);
 static DEVICE_ATTR_RW(disksize);
 static DEVICE_ATTR_RO(initstate);
@@ -1946,8 +1954,7 @@ static int zram_add(void)
if (ZRAM_LOGICAL_BLOCK_SIZE == PAGE_SIZE)
blk_queue_max_write_zeroes_sectors(zram->disk->queue, UINT_MAX);
 
-   zram->disk->queue->backing_dev_info->capabilities |=
-   (BDI_CAP_STABLE_WRITES | BDI_CAP_SYNCHRONOUS_IO);
+   zram->disk->queue->backing_dev_info->capabilities |= 
BDI_CAP_STABLE_WRITES;
device_add_disk(NULL, zram->disk, zram_disk_attr_groups);
 
strlcpy(zram->compressor, default_compressor, sizeof(zram->compressor));
diff --git a/drivers/nvdimm/btt.c b/drivers/nvdimm/btt.c
index 412d21d8f64351..b4184dc9b41eb4 100644
--- a/drivers/nvdimm/btt.c
+++ b/drivers/nvdimm/btt.c
@@ -1540,8 +1540,6 @@ static int btt_blk_init(struct btt *btt)
btt->btt_disk->private_data = btt;
btt->btt_disk->queue = btt->btt_queue;
btt->btt_disk->flags = GENHD_FL_EXT_DEVT;
-   btt->btt_disk->queue->backing_dev_info->capabilities |=
-   BDI_CAP_SYNCHRONOUS_IO;
 
blk_queue_logical_block_size(btt->btt_queue, btt->sector_size);
blk_queue_max_hw_sectors(btt->btt_queue, UINT_MAX);
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index 94790e6e0e4ce1..436b83fb24ad61 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -478,7 +478,6 @@ static int pmem_attach_disk(struct device *dev,
disk->queue = q;
disk->flags = GENHD_FL_EXT_DEVT;
disk->private_data  = pmem;
-   disk->queue->backing_dev_info->capabilities |= BDI_CAP_SYNCHRONOUS_IO;
nvdimm_namespace_disk_name(ndns, disk->disk_name);
set_capacity(disk, (pmem->size - pmem->pfn_pad - pmem->data_offset)
/ 512);
diff --git a/include/linux/backing-dev.h b/include/linux/backing-dev.h
index 52583b6f2ea05d..860ea33571bce5 100644
--- 

[dm-devel] bdi cleanups v3

2020-07-24 Thread Christoph Hellwig
Hi Jens,

this series contains a bunch of different BDI cleanups.  The biggest item
is to isolate block drivers from the BDI in preparation of changing the
lifetime of the block device BDI in a follow up series.


Changes since v2:
 - fix a rw_page return value check
 - fix various changelogs

Changes since v1:
 - rebased to the for-5.9/block-merge branch
 - explicitly set the readahead to 0 for ubifs, vboxsf and mtd
 - split the zram block_device operations
 - let rw_page users fall back to bios in swap_readpage


Diffstat:
 block/blk-core.c  |2 
 block/blk-integrity.c |4 
 block/blk-mq-debugfs.c|1 
 block/blk-settings.c  |5 
 block/blk-sysfs.c |  282 ++
 block/genhd.c |   13 +
 drivers/block/aoe/aoeblk.c|2 
 drivers/block/brd.c   |1 
 drivers/block/drbd/drbd_nl.c  |   18 --
 drivers/block/drbd/drbd_req.c |4 
 drivers/block/rbd.c   |2 
 drivers/block/zram/zram_drv.c |   19 +-
 drivers/md/bcache/super.c |4 
 drivers/md/dm-table.c |9 -
 drivers/md/raid0.c|   16 --
 drivers/md/raid10.c   |   46 ++
 drivers/md/raid5.c|   31 +---
 drivers/mmc/core/queue.c  |3 
 drivers/mtd/mtdcore.c |1 
 drivers/nvdimm/btt.c  |2 
 drivers/nvdimm/pmem.c |1 
 drivers/nvme/host/core.c  |3 
 drivers/nvme/host/multipath.c |   10 -
 drivers/scsi/iscsi_tcp.c  |4 
 fs/9p/vfs_file.c  |2 
 fs/9p/vfs_super.c |4 
 fs/afs/super.c|1 
 fs/btrfs/disk-io.c|2 
 fs/fs-writeback.c |7 -
 fs/fuse/inode.c   |4 
 fs/namei.c|4 
 fs/nfs/super.c|9 -
 fs/super.c|2 
 fs/ubifs/super.c  |1 
 fs/vboxsf/super.c |1 
 include/linux/backing-dev.h   |   78 +--
 include/linux/blkdev.h|3 
 include/linux/drbd.h  |1 
 include/linux/fs.h|2 
 mm/backing-dev.c  |   12 -
 mm/filemap.c  |4 
 mm/memcontrol.c   |2 
 mm/memory-failure.c   |2 
 mm/migrate.c  |2 
 mm/mmap.c |2 
 mm/page-writeback.c   |   18 +-
 mm/page_io.c  |   18 +-
 mm/swapfile.c |4 
 48 files changed, 204 insertions(+), 464 deletions(-)

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel