Re: USB upgrade fun

2017-10-09 Thread Satoru Takeuchi
At Sun, 08 Oct 2017 17:58:10 +0800,
Kai Hendry wrote:
> 
> Hi there,
> 
> My /mnt/raid1 suddenly became full somewhat expectedly, so I bought 2
> new USB 4TB hard drives (one WD, one Seagate) to upgrade to.
> 
> After adding sde and sdd I started to see errors in dmesg [2].
> https://s.natalian.org/2017-10-07/raid1-newdisks.txt
> [2] https://s.natalian.org/2017-10-07/btrfs-errors.txt

These messages are harmless. Qu is tuckling on this problem.

> 
> Sidenote: I've since learnt that removing a drive actually deletes the
> contents of the drive? I don't want that. I was hoping to put that drive
> into cold storage. How do I remove a drive without losing data from a
> RAID1 configuration?

Please let me clarify what you said. Do you worry about losing filesystem data
in removed device, in this case /dev/sdc1? To be more specific,
if /mnt/raid1/file is in /dev/sdc1 and lose this file by removing this device?
If so, don't worry. When removing /dev/sdc1, the filesystem data exists in this
device is moved to other devices, /dev/sdb1, /dev/sdd1, or /dev/sde1.

Just FYI, `btrfs replace /dev/sdc1 /dev/sdd1 /mnt/raid1` is more suitable
in your case.

> 
> 
> I assumed it had to perhaps with the USB bus on my NUC5CPYB being maxed
> out, and to expedite the sync, I tried to remove one of the older 2TB
> sdc1.  However the load went crazy and my system went completely
> unstable. I shutdown the machine and after an hour I hard powered it
> down since it seemed to hang (it's headless).

Because all data in /dev/sdc1, in this case totally

 1.81TiB(data) + 6.00GiB(metadata) + 32MiB(system)

should be moved to remaining devices.

> 
> 
> After a reboot it failed, namely because "nofail" wasn't in my fstab and
> systemd is pedantic by default. After managing to get it booting into my
> system without /mnt/raid1 I faced these "open ctree failed" issues. 
> After running btrfs check on all the drives and getting nowhere, I
> decided to unplug the new drives and I discovered that when I take out
> the new 4TB WD drive, I could mount it with -o degraded.
> 
> dmesg errors with the WD include "My Passport" Wrong diagnostic page;
> asked for 1 got 8 "Failed to get diagnostic page 0xffea" which
> raised my suspicions. The model number btw is WDBYFT0040BYI-WESN
> 
> Anyway, I'm back up and running with 2x2TB  (one of them didn't finish
> removing, I don't know which) & 1x4TB.
> 
> [1] https://s.natalian.org/2017-10-08/usb-btrfs.txt
> 
> I've decided to send the WD back for a refund. I've decided I want keep
> the 2x2TB and RAID1 with the new 1x4TB disk. So 4TB total. btrfs
> complains now of "Some devices missing" [1]. How do I fix this
> situation?

Probably `btrfs device remove missing /mnt/raid1` works.

Thanks,
Satoru

> 
> Any tips how to name this individual disks? hdparm -I isn't a lot to go
> on.
> 
> [hendry@nuc ~]$ sudo hdparm -I /dev/sdb1 | grep Model
> Model Number:   ST4000LM024-2AN17V
> [hendry@nuc ~]$ sudo hdparm -I /dev/sdc1 | grep Model
> Model Number:   ST2000LM003 HN-M201RAD
> [hendry@nuc ~]$ sudo hdparm -I /dev/sdd1 | grep Model
> Model Number:   ST2000LM005 HN-M201AAD
> 
> 
> Ok, thanks. Hope you can guide me,
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCHSET] cgroup, writeback, btrfs: make sure btrfs issues metadata IOs from the root cgroup

2017-10-09 Thread Tejun Heo
Hello,

btrfs has different ways to issue metadata IOs and may end up issuing
metadata or otherwise shared IOs from a non-root cgroup, which can
lead to priority inversion and ineffective IO isolation.

This patchset makes sure that btrfs issues all metadata and shared IOs
from the root cgroup by exempting btree_inodes from cgroup writeback
and explicitly associating shared IOs with the root cgroup.

This patchset containst he following three patches

 [PATCH 1/3] cgroup, writeback: replace SB_I_CGROUPWB with per-inode
 [PATCH 2/3] cgroup, writeback: implement submit_bh_blkcg_css()
 [PATCH 3/3] btrfs: ensure that metadata and flush are issued from the

and is also available in the following git branch

 git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git 
review-cgroup-btrfs-metadata

diffstat follows.  Thanks.

 fs/block_dev.c  |3 +--
 fs/btrfs/check-integrity.c  |2 +-
 fs/btrfs/disk-io.c  |4 
 fs/btrfs/ioctl.c|6 +-
 fs/btrfs/super.c|1 -
 fs/buffer.c |   12 
 fs/ext2/inode.c |3 ++-
 fs/ext2/super.c |1 -
 fs/ext4/inode.c |5 -
 fs/ext4/super.c |2 --
 fs/fs-writeback.c   |1 +
 include/linux/backing-dev.h |2 +-
 include/linux/buffer_head.h |   11 +++
 include/linux/fs.h  |3 ++-
 include/linux/writeback.h   |6 --
 15 files changed, 48 insertions(+), 14 deletions(-)

--
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/3] cgroup, writeback: implement submit_bh_blkcg_css()

2017-10-09 Thread Tejun Heo
Add wbc->blkcg_css so that the blkcg_css association can be specified
independently and implement submit_bh_blkcg_css() using it.  This will
be used to override cgroup membership on specific buffer_heads.

Signed-off-by: Tejun Heo 
Cc: Jan Kara 
Cc: Jens Axboe 
---
 fs/buffer.c | 12 
 fs/fs-writeback.c   |  1 +
 include/linux/buffer_head.h | 11 +++
 include/linux/writeback.h   |  6 --
 4 files changed, 28 insertions(+), 2 deletions(-)

diff --git a/fs/buffer.c b/fs/buffer.c
index 170df85..fac4f9a 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -3143,6 +3143,18 @@ int submit_bh(int op, int op_flags, struct buffer_head 
*bh)
 }
 EXPORT_SYMBOL(submit_bh);
 
+#ifdef CONFIG_CGROUP_WRITEBACK
+int submit_bh_blkcg_css(int op, int op_flags,  struct buffer_head *bh,
+   struct cgroup_subsys_state *blkcg_css)
+{
+   struct writeback_control wbc = { .blkcg_css = blkcg_css };
+
+   /* @wbc is used just to override the bio's blkcg_css */
+   return submit_bh_wbc(op, op_flags, bh, 0, );
+}
+EXPORT_SYMBOL(submit_bh_blkcg_css);
+#endif
+
 /**
  * ll_rw_block: low-level access to block devices (DEPRECATED)
  * @op: whether to %READ or %WRITE
diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 245c430..cd882e6 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -538,6 +538,7 @@ void wbc_attach_and_unlock_inode(struct writeback_control 
*wbc,
}
 
wbc->wb = inode_to_wb(inode);
+   wbc->blkcg_css = wbc->wb->blkcg_css;
wbc->inode = inode;
 
wbc->wb_id = wbc->wb->memcg_css->id;
diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h
index c8dae55..abb4dd4 100644
--- a/include/linux/buffer_head.h
+++ b/include/linux/buffer_head.h
@@ -13,6 +13,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #ifdef CONFIG_BLOCK
 
@@ -205,6 +206,16 @@ int bh_submit_read(struct buffer_head *bh);
 loff_t page_cache_seek_hole_data(struct inode *inode, loff_t offset,
 loff_t length, int whence);
 
+#ifdef CONFIG_CGROUP_WRITEBACK
+int submit_bh_blkcg(int op, int op_flags, struct buffer_head *bh,
+   struct cgroup_subsys_state *blkcg_css);
+#else
+static inline int submit_bh_blkcg(int op, int op_flags, struct buffer_head *bh,
+ struct cgroup_subsys_state *blkcg_css)
+{
+   return submit_bh(op, op_flags, bh);
+}
+#endif
 extern int buffer_heads_over_limit;
 
 /*
diff --git a/include/linux/writeback.h b/include/linux/writeback.h
index d581579..81e5946 100644
--- a/include/linux/writeback.h
+++ b/include/linux/writeback.h
@@ -91,6 +91,8 @@ struct writeback_control {
unsigned for_sync:1;/* sync(2) WB_SYNC_ALL writeback */
 #ifdef CONFIG_CGROUP_WRITEBACK
struct bdi_writeback *wb;   /* wb this writeback is issued under */
+   struct cgroup_subsys_state *blkcg_css; /* usually wb->blkcg_css but
+ may be overridden */
struct inode *inode;/* inode being written out */
 
/* foreign inode detection, see wbc_detach_inode() */
@@ -277,8 +279,8 @@ static inline void wbc_init_bio(struct writeback_control 
*wbc, struct bio *bio)
 * behind a slow cgroup.  Ultimately, we want pageout() to kick off
 * regular writeback instead of writing things out itself.
 */
-   if (wbc->wb)
-   bio_associate_blkcg(bio, wbc->wb->blkcg_css);
+   if (wbc->blkcg_css)
+   bio_associate_blkcg(bio, wbc->blkcg_css);
 }
 
 #else  /* CONFIG_CGROUP_WRITEBACK */
-- 
2.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/3] cgroup, writeback: replace SB_I_CGROUPWB with per-inode S_CGROUPWB

2017-10-09 Thread Tejun Heo
Currently, filesystem can indiate cgroup writeback support per
superblock; however, depending on the filesystem, especially if inodes
are used to carry metadata, it can be useful to indicate cgroup
writeback support per inode.

This patch replaces the superblock flag SB_I_CGROUPWB with per-inode
S_CGROUPWB, so that cgroup writeback can be enabled selectively.

* block_dev sets the new flag in bdget() when initializing new inode.

* ext2/4 set the new flag in ext?_set_inode_flags() function.

* btrfs sets the new flag in btrfs_update_iflags() function.  Note
  that this automatically excludes btree_inode which doesn't use
  btrfs_update_iflags() during initialization.  This is an intended
  behavior change.

Signed-off-by: Tejun Heo 
Cc: Jan Kara 
Cc: Jens Axboe 
Cc: Chris Mason 
Cc: Josef Bacik 
Cc: linux-btrfs@vger.kernel.org
Cc: "Theodore Ts'o" 
Cc: Andreas Dilger 
Cc: linux-e...@vger.kernel.org
---
 fs/block_dev.c  | 3 +--
 fs/btrfs/ioctl.c| 4 +++-
 fs/btrfs/super.c| 1 -
 fs/ext2/inode.c | 3 ++-
 fs/ext2/super.c | 1 -
 fs/ext4/inode.c | 5 -
 fs/ext4/super.c | 2 --
 include/linux/backing-dev.h | 2 +-
 include/linux/fs.h  | 3 ++-
 9 files changed, 13 insertions(+), 11 deletions(-)

diff --git a/fs/block_dev.c b/fs/block_dev.c
index 93d088f..ff9c282 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -800,8 +800,6 @@ static struct dentry *bd_mount(struct file_system_type 
*fs_type,
 {
struct dentry *dent;
dent = mount_pseudo(fs_type, "bdev:", _sops, NULL, BDEVFS_MAGIC);
-   if (!IS_ERR(dent))
-   dent->d_sb->s_iflags |= SB_I_CGROUPWB;
return dent;
 }
 
@@ -893,6 +891,7 @@ struct block_device *bdget(dev_t dev)
inode->i_mode = S_IFBLK;
inode->i_rdev = dev;
inode->i_bdev = bdev;
+   inode->i_flags |= S_CGROUPWB;
inode->i_data.a_ops = _blk_aops;
mapping_set_gfp_mask(>i_data, GFP_USER);
spin_lock(_lock);
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 6c7a49f..117cc63 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -150,9 +150,11 @@ void btrfs_update_iflags(struct inode *inode)
new_fl |= S_NOATIME;
if (ip->flags & BTRFS_INODE_DIRSYNC)
new_fl |= S_DIRSYNC;
+   new_fl |= S_CGROUPWB;
 
set_mask_bits(>i_flags,
- S_SYNC | S_APPEND | S_IMMUTABLE | S_NOATIME | S_DIRSYNC,
+ S_SYNC | S_APPEND | S_IMMUTABLE | S_NOATIME | S_DIRSYNC |
+ S_CGROUPWB,
  new_fl);
 }
 
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 35a128a..46a1488 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -1136,7 +1136,6 @@ static int btrfs_fill_super(struct super_block *sb,
sb->s_flags |= MS_POSIXACL;
 #endif
sb->s_flags |= MS_I_VERSION;
-   sb->s_iflags |= SB_I_CGROUPWB;
 
err = super_setup_bdi(sb);
if (err) {
diff --git a/fs/ext2/inode.c b/fs/ext2/inode.c
index 4dca6f3..3c5d822 100644
--- a/fs/ext2/inode.c
+++ b/fs/ext2/inode.c
@@ -1372,7 +1372,7 @@ void ext2_set_inode_flags(struct inode *inode)
unsigned int flags = EXT2_I(inode)->i_flags;
 
inode->i_flags &= ~(S_SYNC | S_APPEND | S_IMMUTABLE | S_NOATIME |
-   S_DIRSYNC | S_DAX);
+   S_DIRSYNC | S_DAX | S_CGROUPWB);
if (flags & EXT2_SYNC_FL)
inode->i_flags |= S_SYNC;
if (flags & EXT2_APPEND_FL)
@@ -1385,6 +1385,7 @@ void ext2_set_inode_flags(struct inode *inode)
inode->i_flags |= S_DIRSYNC;
if (test_opt(inode->i_sb, DAX) && S_ISREG(inode->i_mode))
inode->i_flags |= S_DAX;
+   inode->i_flags |= S_CGROUPWB;
 }
 
 struct inode *ext2_iget (struct super_block *sb, unsigned long ino)
diff --git a/fs/ext2/super.c b/fs/ext2/super.c
index 1458706..e6ba669e 100644
--- a/fs/ext2/super.c
+++ b/fs/ext2/super.c
@@ -922,7 +922,6 @@ static int ext2_fill_super(struct super_block *sb, void 
*data, int silent)
sb->s_flags = (sb->s_flags & ~MS_POSIXACL) |
((EXT2_SB(sb)->s_mount_opt & EXT2_MOUNT_POSIX_ACL) ?
 MS_POSIXACL : 0);
-   sb->s_iflags |= SB_I_CGROUPWB;
 
if (le32_to_cpu(es->s_rev_level) == EXT2_GOOD_OLD_REV &&
(EXT2_HAS_COMPAT_FEATURE(sb, ~0U) ||
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 31db875..344f12b 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -4591,8 +4591,11 @@ void ext4_set_inode_flags(struct inode *inode)
!ext4_should_journal_data(inode) && !ext4_has_inline_data(inode) &&
!ext4_encrypted_inode(inode))
new_fl |= S_DAX;
+   if (test_opt(inode->i_sb, DATA_FLAGS) != 

[PATCH 3/3] btrfs: ensure that metadata and flush are issued from the root cgroup

2017-10-09 Thread Tejun Heo
Issuing metdata or otherwise shared IOs from !root cgroup can lead to
priority inversion.  This patch ensures that those IOs are always
issued from the root cgroup.

This patch updates btrfs_update_iflags() to not set S_CGROUPWB on
btree_inodes.  This isn't strictly necessary as those inodes don't
call the function during init; however, this serves as documentation
and prevents possible future mistakes.  If this isn't desirable,
please feel free to drop the section.

Signed-off-by: Tejun Heo 
Cc: Chris Mason 
Cc: Josef Bacik 
---
 fs/btrfs/check-integrity.c | 2 +-
 fs/btrfs/disk-io.c | 4 
 fs/btrfs/ioctl.c   | 4 +++-
 3 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/check-integrity.c b/fs/btrfs/check-integrity.c
index 7d5a9b5..058dea6 100644
--- a/fs/btrfs/check-integrity.c
+++ b/fs/btrfs/check-integrity.c
@@ -2741,7 +2741,7 @@ int btrfsic_submit_bh(int op, int op_flags, struct 
buffer_head *bh)
struct btrfsic_dev_state *dev_state;
 
if (!btrfsic_is_initialized)
-   return submit_bh(op, op_flags, bh);
+   return submit_bh_blkcg_css(op, op_flags, blkcg_root_css);
 
mutex_lock(_mutex);
/* since btrfsic_submit_bh() might also be called before
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index dfdab84..fe8bbe1 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -1025,6 +1025,8 @@ static blk_status_t btree_submit_bio_hook(void 
*private_data, struct bio *bio,
int async = check_async_write(bio_flags);
blk_status_t ret;
 
+   bio_associate_blkcg(bio, blkcg_root_css);
+
if (bio_op(bio) != REQ_OP_WRITE) {
/*
 * called for a read, do the setup so that checksum validation
@@ -3512,6 +3514,8 @@ static void write_dev_flush(struct btrfs_device *device)
return;
 
bio_reset(bio);
+   bio_associate_blkcg(bio, blkcg_root_css);
+
bio->bi_end_io = btrfs_end_empty_barrier;
bio_set_dev(bio, device->bdev);
bio->bi_opf = REQ_OP_WRITE | REQ_SYNC | REQ_PREFLUSH;
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 117cc63..8a7db6c 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -150,7 +150,9 @@ void btrfs_update_iflags(struct inode *inode)
new_fl |= S_NOATIME;
if (ip->flags & BTRFS_INODE_DIRSYNC)
new_fl |= S_DIRSYNC;
-   new_fl |= S_CGROUPWB;
+   /* btree_inodes are always in the root cgroup */
+   if (btrfs_ino(ip) != BTRFS_BTREE_INODE_OBJECTID)
+   new_fl |= S_CGROUPWB;
 
set_mask_bits(>i_flags,
  S_SYNC | S_APPEND | S_IMMUTABLE | S_NOATIME | S_DIRSYNC |
-- 
2.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ANNOUNCE] fsperf: a simple fs/block performance testing framework

2017-10-09 Thread Josef Bacik
On Tue, Oct 10, 2017 at 08:09:20AM +1100, Dave Chinner wrote:
> On Mon, Oct 09, 2017 at 09:00:51AM -0400, Josef Bacik wrote:
> > On Mon, Oct 09, 2017 at 04:17:31PM +1100, Dave Chinner wrote:
> > > On Sun, Oct 08, 2017 at 10:25:10PM -0400, Josef Bacik wrote:
> > > > > Integrating into fstests means it will be immediately available to
> > > > > all fs developers, it'll run on everything that everyone already has
> > > > > setup for filesystem testing, and it will have familiar mkfs/mount
> > > > > option setup behaviour so there's no new hoops for everyone to jump
> > > > > through to run it...
> > > > > 
> > > > 
> > > > TBF I specifically made it as easy as possible because I know we all 
> > > > hate trying
> > > > to learn new shit.
> > > 
> > > Yeah, it's also hard to get people to change their workflows to add
> > > a whole new test harness into them. It's easy if it's just a new
> > > command to an existing workflow :P
> > > 
> > 
> > Agreed, so if you probably won't run this outside of fstests then I'll add 
> > it to
> > xfstests.  I envision this tool as being run by maintainers to verify their 
> > pull
> > requests haven't regressed since the last set of patches, as well as by 
> > anybody
> > trying to fix performance problems.  So it's way more important to me that 
> > you,
> > Ted, and all the various btrfs maintainers will run it than anybody else.
> > 
> > > > I figured this was different enough to warrant a separate
> > > > project, especially since I'm going to add block device jobs so Jens 
> > > > can test
> > > > block layer things.  If we all agree we'd rather see this in fstests 
> > > > then I'm
> > > > happy to do that too.  Thanks,
> > > 
> > > I'm not fussed either way - it's a good discussion to have, though.
> > > 
> > > If I want to add tests (e.g. my time-honoured fsmark tests), where
> > > should I send patches?
> > > 
> > 
> > I beat you to that!  I wanted to avoid adding fs_mark to the suite because 
> > it
> > means parsing another different set of outputs, so I added a new ioengine 
> > to fio
> > for this
> > 
> > http://www.spinics.net/lists/fio/msg06367.html
> > 
> > and added a fio job to do 500k files
> > 
> > https://github.com/josefbacik/fsperf/blob/master/tests/500kemptyfiles.fio
> > 
> > The test is disabled by default for now because obviously the fio support 
> > hasn't
> > landed yet.
> 
> That seems  misguided. fio is good, but it's not a universal
> solution.
> 
> > I'd _like_ to expand fio for cases we come up with that aren't possible, as
> > there's already a ton of measurements that are taken, especially around
> > latencies.
> 
> To be properly useful it needs to support more than just fio to run
> tests. Indeed, it's largely useless to me if that's all it can do or
> it's a major pain to add support for different tools like fsmark.
> 
> e.g.  my typical perf regression test that you see the concurrnet
> fsmark create workload is actually a lot more. It does:
> 
>   fsmark to create 50m zero length files
>   umount,
>   run parallel xfs_repair (excellent mmap_sem/page fault punisher)
>   mount
>   run parallel find -ctime (readdir + lookup traversal)
>   unmount, mount
>   run parallel ls -R (readdir + dtype traversal)
>   unmount, mount
>   parallel rm -rf of 50m files
> 
> I have variants that use small 4k files or large files rather than
> empty files, taht use different fsync patterns to stress the
> log, use grep -R to traverse the data as well as
> the directory/inode structure instead of find, etc.
> 
> > That said I'm not opposed to throwing new stuff in there, it just
> > means we have to add stuff to parse the output and store it in the database 
> > in a
> > consistent way, which seems like more of a pain than just making fio do 
> > what we
> > need it to.  Thanks,
> 
> fio is not going to be able to replace the sort of perf tests I run
> from week to week. If that's all it's going to do then it's not
> directly useful to me...
> 

Agreed, I'm just going to add this stuff to fstests since I'd like to be able to
use the _require stuff to make sure we only run stuff we have support for.  I'll
wire that up this week and send patches along.  Thanks,

Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ANNOUNCE] fsperf: a simple fs/block performance testing framework

2017-10-09 Thread Dave Chinner
On Mon, Oct 09, 2017 at 09:00:51AM -0400, Josef Bacik wrote:
> On Mon, Oct 09, 2017 at 04:17:31PM +1100, Dave Chinner wrote:
> > On Sun, Oct 08, 2017 at 10:25:10PM -0400, Josef Bacik wrote:
> > > > Integrating into fstests means it will be immediately available to
> > > > all fs developers, it'll run on everything that everyone already has
> > > > setup for filesystem testing, and it will have familiar mkfs/mount
> > > > option setup behaviour so there's no new hoops for everyone to jump
> > > > through to run it...
> > > > 
> > > 
> > > TBF I specifically made it as easy as possible because I know we all hate 
> > > trying
> > > to learn new shit.
> > 
> > Yeah, it's also hard to get people to change their workflows to add
> > a whole new test harness into them. It's easy if it's just a new
> > command to an existing workflow :P
> > 
> 
> Agreed, so if you probably won't run this outside of fstests then I'll add it 
> to
> xfstests.  I envision this tool as being run by maintainers to verify their 
> pull
> requests haven't regressed since the last set of patches, as well as by 
> anybody
> trying to fix performance problems.  So it's way more important to me that 
> you,
> Ted, and all the various btrfs maintainers will run it than anybody else.
> 
> > > I figured this was different enough to warrant a separate
> > > project, especially since I'm going to add block device jobs so Jens can 
> > > test
> > > block layer things.  If we all agree we'd rather see this in fstests then 
> > > I'm
> > > happy to do that too.  Thanks,
> > 
> > I'm not fussed either way - it's a good discussion to have, though.
> > 
> > If I want to add tests (e.g. my time-honoured fsmark tests), where
> > should I send patches?
> > 
> 
> I beat you to that!  I wanted to avoid adding fs_mark to the suite because it
> means parsing another different set of outputs, so I added a new ioengine to 
> fio
> for this
> 
> http://www.spinics.net/lists/fio/msg06367.html
> 
> and added a fio job to do 500k files
> 
> https://github.com/josefbacik/fsperf/blob/master/tests/500kemptyfiles.fio
> 
> The test is disabled by default for now because obviously the fio support 
> hasn't
> landed yet.

That seems  misguided. fio is good, but it's not a universal
solution.

> I'd _like_ to expand fio for cases we come up with that aren't possible, as
> there's already a ton of measurements that are taken, especially around
> latencies.

To be properly useful it needs to support more than just fio to run
tests. Indeed, it's largely useless to me if that's all it can do or
it's a major pain to add support for different tools like fsmark.

e.g.  my typical perf regression test that you see the concurrnet
fsmark create workload is actually a lot more. It does:

fsmark to create 50m zero length files
umount,
run parallel xfs_repair (excellent mmap_sem/page fault punisher)
mount
run parallel find -ctime (readdir + lookup traversal)
unmount, mount
run parallel ls -R (readdir + dtype traversal)
unmount, mount
parallel rm -rf of 50m files

I have variants that use small 4k files or large files rather than
empty files, taht use different fsync patterns to stress the
log, use grep -R to traverse the data as well as
the directory/inode structure instead of find, etc.

> That said I'm not opposed to throwing new stuff in there, it just
> means we have to add stuff to parse the output and store it in the database 
> in a
> consistent way, which seems like more of a pain than just making fio do what 
> we
> need it to.  Thanks,

fio is not going to be able to replace the sort of perf tests I run
from week to week. If that's all it's going to do then it's not
directly useful to me...

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: avoid losing data raid profile when deleting a device

2017-10-09 Thread Liu Bo
We've avoided data losing raid profile when doing balance, but it
turns out that deleting a device could also result in the same
problem.

This fixes the problem by creating an empty data chunk before
relocating the data chunk.

Metadata/System chunk are supposed to have non-zero bytes all the time
so their raid profile is persistent.

Reported-by: James Alandt 
Signed-off-by: Liu Bo 
---
 fs/btrfs/volumes.c | 87 ++
 1 file changed, 68 insertions(+), 19 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 4a72c45..3f48bcd 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -144,6 +144,8 @@ static int __btrfs_map_block(struct btrfs_fs_info *fs_info,
 u64 logical, u64 *length,
 struct btrfs_bio **bbio_ret,
 int mirror_num, int need_raid_map);
+static int btrfs_may_alloc_data_chunk(struct btrfs_fs_info *fs_info,
+ u64 chunk_offset);
 
 DEFINE_MUTEX(uuid_mutex);
 static LIST_HEAD(fs_uuids);
@@ -3476,7 +3478,6 @@ static int __btrfs_balance(struct btrfs_fs_info *fs_info)
u32 count_meta = 0;
u32 count_sys = 0;
int chunk_reserved = 0;
-   u64 bytes_used = 0;
 
/* step one make some room on all the devices */
devices = _info->fs_devices->devices;
@@ -3635,28 +3636,22 @@ static int __btrfs_balance(struct btrfs_fs_info 
*fs_info)
goto loop;
}
 
-   ASSERT(fs_info->data_sinfo);
-   spin_lock(_info->data_sinfo->lock);
-   bytes_used = fs_info->data_sinfo->bytes_used;
-   spin_unlock(_info->data_sinfo->lock);
-
-   if ((chunk_type & BTRFS_BLOCK_GROUP_DATA) &&
-   !chunk_reserved && !bytes_used) {
-   trans = btrfs_start_transaction(chunk_root, 0);
-   if (IS_ERR(trans)) {
-   mutex_unlock(_info->delete_unused_bgs_mutex);
-   ret = PTR_ERR(trans);
-   goto error;
-   }
-
-   ret = btrfs_force_chunk_alloc(trans, fs_info,
- BTRFS_BLOCK_GROUP_DATA);
-   btrfs_end_transaction(trans);
+   if (!chunk_reserved) {
+   /*
+* We may be relocating the only data chunk we have,
+* which could potentially end up with losing data's
+* raid profile, so lets allocate an empty one in
+* advance.
+*/
+   ret = btrfs_may_alloc_data_chunk(fs_info,
+found_key.offset);
if (ret < 0) {
mutex_unlock(_info->delete_unused_bgs_mutex);
+   ret = PTR_ERR(trans);
goto error;
+   } else if (ret == 1) {
+   chunk_reserved = 1;
}
-   chunk_reserved = 1;
}
 
ret = btrfs_relocate_chunk(fs_info, found_key.offset);
@@ -4327,6 +4322,48 @@ int btrfs_check_uuid_tree(struct btrfs_fs_info *fs_info)
 }
 
 /*
+ * return 1 : allocate a data chunk successfully,
+ * return <0: errors during allocating a data chunk,
+ * return 0 : no need to allocate a data chunk.
+ */
+static int btrfs_may_alloc_data_chunk(struct btrfs_fs_info *fs_info,
+ u64 chunk_offset)
+{
+   struct btrfs_block_group_cache *cache;
+   u64 bytes_used;
+   u64 chunk_type;
+
+   cache = btrfs_lookup_block_group(fs_info, chunk_offset);
+   ASSERT(cache);
+   chunk_type = cache->flags;
+   btrfs_put_block_group(cache);
+
+   if (chunk_type & BTRFS_BLOCK_GROUP_DATA) {
+   spin_lock(_info->data_sinfo->lock);
+   bytes_used = fs_info->data_sinfo->bytes_used;
+   spin_unlock(_info->data_sinfo->lock);
+
+   if (!bytes_used) {
+   struct btrfs_trans_handle *trans;
+   int ret;
+
+   trans = btrfs_join_transaction(fs_info->tree_root);
+   if (IS_ERR(trans))
+   return PTR_ERR(trans);
+
+   ret = btrfs_force_chunk_alloc(trans, fs_info,
+ BTRFS_BLOCK_GROUP_DATA);
+   btrfs_end_transaction(trans);
+   if (ret < 0)
+   return ret;
+
+   return 1;
+   }
+   }
+   return 0;
+}
+
+/*
  * shrinking a device means finding all of the 

[PATCH] Fstest: btrfs/151: test if device delete ends up with losing raid profile

2017-10-09 Thread Liu Bo
Currently running 'btrfs device delete' can end up with losing data raid
profile (if any), this test is to reproduce the problem.

The fix is
 "Btrfs: avoid losing data raid profile when deleting a device"

Signed-off-by: Liu Bo 
---
 tests/btrfs/151 | 73 +
 tests/btrfs/151.out |  2 ++
 tests/btrfs/group   |  1 +
 3 files changed, 76 insertions(+)
 create mode 100755 tests/btrfs/151
 create mode 100644 tests/btrfs/151.out

diff --git a/tests/btrfs/151 b/tests/btrfs/151
new file mode 100755
index 000..1866cb6
--- /dev/null
+++ b/tests/btrfs/151
@@ -0,0 +1,73 @@
+#! /bin/bash
+# FS QA Test 151
+#
+# Test if it's losing data chunk's raid profile after 'btrfs device
+# remove'.
+#
+# The fix is
+#  Btrfs: avoid losing data raid profile when deleting a device
+#
+#---
+# Copyright (c) 2017 Oracle.  All Rights Reserved.
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it would be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write the Free Software Foundation,
+# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#---
+#
+
+seq=`basename $0`
+seqres=$RESULT_DIR/$seq
+echo "QA output created by $seq"
+
+here=`pwd`
+tmp=/tmp/$$
+status=1   # failure is the default!
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+_cleanup()
+{
+   cd /
+   rm -f $tmp.*
+}
+
+# get standard environment, filters and checks
+. ./common/rc
+. ./common/filter
+
+# remove previous $seqres.full before test
+rm -f $seqres.full
+
+# real QA test starts here
+
+# Modify as appropriate.
+_supported_fs btrfs
+_supported_os Linux
+_require_scratch
+_require_scratch_dev_pool 3
+
+# create raid1 for data
+_scratch_pool_mkfs "-d raid1 -b 1G" >> $seqres.full 2>&1
+
+# we need an empty data chunk, so nospace_cache is required.
+_scratch_mount -onospace_cache
+
+# if data chunk is empty, 'btrfs device remove' can change raid1 to
+# single.
+$BTRFS_UTIL_PROG device delete 2 $SCRATCH_MNT >> $seqres.full
+
+df_ret=`$BTRFS_UTIL_PROG filesystem df $SCRATCH_MNT`
+echo $df_ret | $AWK_PROG -F ':' '/Data,/ {print $1}'
+
+# success, all done
+status=0
+exit
diff --git a/tests/btrfs/151.out b/tests/btrfs/151.out
new file mode 100644
index 000..0a1de06
--- /dev/null
+++ b/tests/btrfs/151.out
@@ -0,0 +1,2 @@
+QA output created by 151
+Data, RAID1
diff --git a/tests/btrfs/group b/tests/btrfs/group
index e73bb1b..a7ff7b0 100644
--- a/tests/btrfs/group
+++ b/tests/btrfs/group
@@ -153,3 +153,4 @@
 148 auto quick rw
 149 auto quick send compress
 150 auto quick dangerous
+151 auto quick
-- 
2.5.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Why do full balance and deduplication reduce available free space?

2017-10-09 Thread Kai Krakow
Am Mon, 02 Oct 2017 22:19:32 +0200
schrieb Niccolò Belli :

> Il 2017-10-02 21:35 Kai Krakow ha scritto:
> > Besides defragging removing the reflinks, duperemove will unshare
> > your snapshots when used in this way: If it sees duplicate blocks
> > within the subvolumes you give it, it will potentially unshare
> > blocks from the snapshots while rewriting extents.
> > 
> > BTW, you should be able to use duperemove with read-only snapshots
> > if used in read-only-open mode. But I'd rather suggest to use bees
> > instead: It works at whole-volume level, walking extents instead of
> > files. That way it is much faster, doesn't reprocess already
> > deduplicated extents, and it works with read-only snapshots.
> > 
> > Until my patch it didn't like mixed nodatasum/datasum workloads.
> > Currently this is fixed by just leaving nocow data alone as users
> > probably set nocow for exactly the reason to not fragment extents
> > and relocate blocks.  
> 
> Bad Btrfs Feature Interactions: btrfs read-only snapshots (never
> tested, probably wouldn't work well)
> 
> Unfortunately it seems that bees doesn't support read-only snapshots,
> so it's a no way.
> 
> P.S.
> I tried duperemove with -A, but besides taking much longer it didn't 
> improve the situation.
> Are you sure that the culprit is duperemove? AFAIK it shouldn't
> unshare extents...

Unsharing of extents depends... If an extent is shared between a
r/o and r/w snapshot, rewriting the extent for deduplication ends up in
a shared extent again but it is no longer reflinked with the original
r/o snapshot. At least if btrfs doesn't allow to change extents part of
a r/o snapshot... Which you all tell is the case...

And then, there's unsharing of metadata by the deduplication process
itself.

Both effects should be minimal, tho. But since chunks are allocated in
1GB sizes, it may jump 1GB worth of allocation just for a few extra MB
needed. A metadata rebalance may fix this.


-- 
Regards,
Kai

Replies to list-only preferred.


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


WARNING: ... at fs/btrfs/ctree.h:1559 btrfs_update_device+0x1be/0x1d0 [btrfs]

2017-10-09 Thread Cloud Admin
Hi,
I update kernel from 4.11.10 to 4.13.4 and since that I get the following 
message in the kernel journal calling 'scrub' or 'balance'. I use Fedora 26 
with btrfs-progs v4.9.1.
What does this mean and (more important) what can I do? 
Bye
Frank

BTRFS info (device dm-7): relocating block group 44050690342912 flags 
system|raid1
BTRFS info (device dm-7): found 117 extents
[ cut here ]
WARNING: CPU: 3 PID: 22095 at fs/btrfs/ctree.h:1559 
btrfs_update_device+0x1be/0x1d0 [btrfs]
Modules linked in: rpcsec_gss_krb5 veth xt_nat xt_addrtype br_netfilter 
xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 tun nf_conntrack_sane xt_CT 
ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink 
ebtable_nat ebtable_broute bridge stp llc ip6table_nat nf_conntrack_ipv6 
nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_raw ip6table_security 
iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack 
libcrc32c iptable_mangle iptable_raw iptable_security ebtable_filter ebtables 
ip6table_filter ip6_tables ftsteutates btrfs xor raid6_pq tda18212 cxd2841er 
tpm_crb intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm 
iTCO_wdt irqbypass iTCO_vendor_support intel_cstate intel_uncore mei_wdt 
intel_rapl_perf ppdev hci_uart ddbridge dvb_core
Okt 09 19:08:32 hypercloud.fritz.box kernel:  btbcm btqca btintel mei_me 
i2c_i801 shpchp bluetooth joydev mei intel_pch_thermal wmi intel_lpss_acpi 
intel_lpss pinctrl_sunrisepoint fujitsu_laptop parport_pc sparse_keymap 
ecdh_generic parport tpm_tis pinctrl_intel tpm_tis_core rfkill tpm acpi_pad 
nfsd auth_rpcgss nfs_acl lockd grace sunrpc dm_crypt i915 crct10dif_pclmul 
i2c_algo_bit crc32_pclmul drm_kms_helper crc32c_intel e1000e drm 
ghash_clmulni_intel serio_raw ptp pps_core video i2c_hid
CPU: 3 PID: 22095 Comm: btrfs Tainted: GW   4.13.4-200.fc26.x86_64 
#1
Hardware name: FUJITSU D3417-B1/D3417-B1, BIOS V5.0.0.11 R1.12.0 for D3417-B1x  
  02/09/2016
task: 8ecb59b026c0 task.stack: b805cae9
RIP: 0010:btrfs_update_device+0x1be/0x1d0 [btrfs]
RSP: 0018:b805cae93ac8 EFLAGS: 00010206
RAX: 0fff RBX: 8ed094bb11c0 RCX: 074702251e00
RDX: 0004 RSI: 3efa RDI: 8ec9eb032c08
RBP: b805cae93b10 R08: 3efe R09: b805cae93a80
R10: 1000 R11: 0003 R12: 8ed0c7a3f000
R13:  R14: 3eda R15: 8ec9eb032c08
FS:  7f10b256a8c0() GS:8ed0ee4c() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 7f6261c0d5f0 CR3: 0005cad4c000 CR4: 003406e0
DR0:  DR1:  DR2: 
DR3:  DR6: fffe0ff0 DR7: 0400
Call Trace:
 btrfs_remove_chunk+0x365/0x870 [btrfs]
 btrfs_relocate_chunk+0x7e/0xd0 [btrfs]
 btrfs_balance+0xc07/0x1390 [btrfs]
 btrfs_ioctl_balance+0x319/0x380 [btrfs]
 btrfs_ioctl+0x9d5/0x24a0 [btrfs]
 ? lru_cache_add+0x3a/0x80
 ? lru_cache_add_active_or_unevictable+0x4c/0xf0
 ? __handle_mm_fault+0x939/0x10b0
 do_vfs_ioctl+0xa5/0x600
 ? do_brk_flags+0x230/0x360
 ? do_vfs_ioctl+0xa5/0x600
 SyS_ioctl+0x79/0x90
 entry_SYSCALL_64_fastpath+0x1a/0xa5
RIP: 0033:0x7f10b15e65e7
RSP: 002b:7ffc9402ebe8 EFLAGS: 0246 ORIG_RAX: 0010
RAX: ffda RBX: 8041 RCX: 7f10b15e65e7
RDX: 7ffc9402ec80 RSI: c4009420 RDI: 0003
RBP: 7f10b18abae0 R08: 55b07a3b3010 R09: 0078
R10: 7f10b18abb38 R11: 0246 R12: 
R13: 7f10b18abb38 R14: 8060 R15: 
Code: 4c 89 ff 45 31 c0 ba 10 00 00 00 4c 89 f6 e8 fa 20 ff ff 4c 89 ff e8 72 
ef fc ff e9 d3 fe ff ff 41 bd f4 ff ff ff e9 d0 fe ff ff <0f> ff eb b6 e8 39 fd 
78 c6 66 0f 1f 84 00 00 00 00 00 0f 1f 44 
---[ end trace d1e1c8aff99bfeb8 ]---
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: prefix sysfs attribute struct names

2017-10-09 Thread David Sterba
On Sun, Oct 08, 2017 at 10:30:58PM +0200, Hans van Kranenburg wrote:
> Currently struct names for sysfs are generated only based on the
> attribute names. This means that attribute names cannot be reused in
> multiple places throughout the complete btrfs sysfs hierarchy.
> 
> E.g. allocation/data/total_bytes and allocation/data/single/total_bytes
> result in the same struct name btrfs_attr_total_bytes. A workaround for
> this case was made in the past by ad hoc creating an extra macro
> wrapper, BTRFS_RAID_ATTR, that inserts some extra text in the struct
> name.
> 
> Instead of polluting sysfs.h with such kind of extra macro definitions,
> and only doing so when there are collisions, use a prefix which gets
> inserted in the struct name, so we keep everything nicely grouped
> together by default.
> 
> Current collections of attributes are:
> * (the toplevel, empty prefix)
> * allocation
> * space_info
> * raid
> * features
> 
> Signed-off-by: Hans van Kranenburg 

Reviewed-by: David Sterba 

> @@ -277,7 +277,8 @@ static ssize_t raid_bytes_show(struct kobject *kobj,
>  
>   down_read(>groups_sem);
>   list_for_each_entry(block_group, >block_groups[index], list) {
> - if (>attr == BTRFS_RAID_ATTR_PTR(total_bytes))
> + if (>attr == \

the \ is not needed here, only in macro defintions that must be on one
logical line

> + BTRFS_ATTR_PTR(raid, total_bytes))
>   val += block_group->key.offset;
>   else
>   val += btrfs_block_group_used(_group->item);
> @@ -331,19 +332,20 @@ SPACE_INFO_ATTR(bytes_may_use);
>  SPACE_INFO_ATTR(bytes_readonly);
>  SPACE_INFO_ATTR(disk_used);
>  SPACE_INFO_ATTR(disk_total);
> -BTRFS_ATTR(total_bytes_pinned, btrfs_space_info_show_total_bytes_pinned);
> +BTRFS_ATTR(space_info, total_bytes_pinned, \

same here. will be fixed at commit time.

> +btrfs_space_info_show_total_bytes_pinned);
>  
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: Fix bool initialization/comparison

2017-10-09 Thread David Sterba
On Sat, Oct 07, 2017 at 04:02:21PM +0200, Thomas Meyer wrote:
> Bool initializations should use true and false. Bool tests don't need
> comparisons.
> 
> Signed-off-by: Thomas Meyer 

Reviewed-by: David Sterba 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] Btrfs: heuristic add shannon entropy calculation

2017-10-09 Thread David Sterba
On Sun, Oct 08, 2017 at 04:11:59PM +0300, Timofey Titovets wrote:
> Byte distribution check in heuristic will filter edge data
> cases and some time fail to classificate input data
> 
> Let's fix that by adding Shannon entropy calculation,
> that will cover classification of most other data types.
> 
> As Shannon entropy need log2 with some precision to work,
> let's use ilog2(N) and for increase precision,
> by do ilog2(pow(N, 4))
> 
> Shannon entropy slightly changed for avoiding signed numbers
> and divisions.
> 
> Changes:
>   v1 -> v2:
> - Replace log2_lshift4 wiht ilog2
> - To compensate accuracy errors of ilog2
>   @ENTROPY_LVL_ACEPTABLE 70 -> 65
>   @ENTROPY_LVL_HIGH  85 -> 80
> - Drop usage of u64 from shannon calculation
> 
> Signed-off-by: Timofey Titovets 
> ---
>  fs/btrfs/compression.c | 83 
> +-
>  1 file changed, 82 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
> index 0060bc4ae5f2..8efbce5633b5 100644
> --- a/fs/btrfs/compression.c
> +++ b/fs/btrfs/compression.c
> @@ -34,6 +34,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include "ctree.h"
>  #include "disk-io.h"
>  #include "transaction.h"
> @@ -1224,6 +1225,60 @@ int btrfs_decompress_buf2page(const char *buf, 
> unsigned long buf_start,
>   return 1;
>  }
> 
> +/*
> + * Shannon Entropy calculation
> + *
> + * Pure byte distribution analyze fail to determine
> + * compressiability of data. Try calculate entropy to
> + * estimate the average minimum number of bits needed
> + * to encode a sample data.
> + *
> + * For comfort, use return of percentage of needed bit's,
> + * instead of bit's amaount directly.
> + *
> + * @ENTROPY_LVL_ACEPTABLE - below that threshold sample has low byte
> + * entropy and can be compressible with high probability
> + *
> + * @ENTROPY_LVL_HIGH - data are not compressible with high probability,
> + *
> + * Use of ilog2() decrease precision, so to see ~ correct answer
> + * LVL decreased by 5.
> + */
> +
> +#define ENTROPY_LVL_ACEPTABLE 65
> +#define ENTROPY_LVL_HIGH 80
> +
> +/*
> + * For increase precision in shannon_entropy calculation
> + * Let's do pow(n, M) to save more digits after comma
> + *
> + * Max int bit lengh 64
> + * ilog2(MAX_SAMPLE_SIZE) -> 13
> + * 13*4 = 52 < 64 -> M = 4
> + * So use pow(n, 4)
> + */
> +static inline u32 ilog2_w(u64 n)
> +{
> + return ilog2(n * n * n * n);
> +}
> +
> +static u32 shannon_entropy(struct heuristic_ws *ws)
> +{
> + const u32 entropy_max = 8*ilog2_w(2);
> + u32 entropy_sum = 0;
> + u32 p, p_base, sz_base;
> + u32 i;
> +
> + sz_base = ilog2_w(ws->sample_size);
> + for (i = 0; i < BUCKET_SIZE && ws->bucket[i].count > 0; i++) {
> + p = ws->bucket[i].count;
> + p_base = ilog2_w(p);
> + entropy_sum += p * (sz_base - p_base);
> + }
> +
> + entropy_sum /= ws->sample_size;
> + return entropy_sum * 100 / entropy_max;
> +}

That's much better :) I'll add the patch to the rest of the heuristics.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/1] Btrfs: heuristic add shannon entropy calculation

2017-10-09 Thread David Sterba
On Sat, Oct 07, 2017 at 03:08:26AM +0300, Timofey Titovets wrote:
> > I hear the compiler scream, trying to optimize all the ifs. And I think
> > the CPU branch prediction would have a hard time, there's practically no
> > expected pattern and the performance could be worse compared to the
> > formula calculation.
> >
> > What's the reason you opencode it like that? Could you please post how
> > it would be implemented in C? There are potentially optimized functions
> > to do integer log2 (ilog2).
> 
> That is just:
> log2_lshift4(n) { return ilog2(pow(n, 16)); }
> But that will cause a overflow.
> 
> log2(8192) -> 13 - 13 bit long
> 13 * 4 = 52, 52 < 64
> 
> That also restrict sample size to 15 bit
> 
> So we can get approximately same behaviour,
> by doing like ilog2(pow(n, 4));
> 
> https://pastebin.com/VryvqkCy - some numbers
> TL;DR version:
> pow(n, 4)
>  - count100  728300
>  - error %0.0% 1.1%3.0%
> pow(n, 16)
>  - count 188431148314
>  - error % 0.0% 1.0%2.8%
> 
> (I sample Win10.iso image as just big file with different data).
> I gen error value by:
> if (shannon_e_f > 0)
> error = 100 - (shannon_e_i * 1.0 * 100 / shannon_e_f);
> 
> In fact that cause errors in entropy level like:
> TRUE -> Integer
> 83 -> 80
> 87 -> 84
> 32 -> 28
> 
> In theory that possible to just make entropy level markers more aggressive,
> like: 70->65 and 85 -> 80
> 
> That will cover precision errors.

With the v2 patch and direct calculation, I'm ok with leaving some
imprecision there. This is fine-tuning that can be done in parallel.

> or make a byte level BigInt in kernel (bigint_log2, bigint_Lshift & etc) %)
> (But this seems inexcusable to me + i'm unsure about performance + I'm
> afraid Big/Little.Endian)

I think we need to keep it fast and not anything too complex to
calculate, unless there's a significant win in the heuristic results.

> > Code like that could be microbenchmarked so we can see actual numbers,
> > so far I'm just guessing.
> 
> I'm mentioned in a cover letter https://github.com/Nefelim4ag/companalyze
> (implement the heuristic with no optimizations, all code run, to get
> RAW counters)
> Over in memory bad compressible data:
>  8188. BSize: 131072, RepPattern: 0, BSet: 256, BCSet: 220, ShanEi%:
> 99| 99 ~0.0%, RndDist: 0, out: -3
> 
> With -O0 - ~1100 MiB/s
> With -O2 - ~2500 MiB/s
> 
> With (i found in kernel and add pow(n, 4))
> static int ilog2(uint64_t v)
> {
> int l = 0;
> v = v*v*v*v;
> while ((1UL << l) < v)
> l++;

The while() implementation is generic and works on all architectures,
but x86_64 has an instruction for that and will be even faster:

ilog2 ->

http://elixir.free-electrons.com/linux/latest/source/include/linux/log2.h#L26

fls (find lowest set bit) ->

http://elixir.free-electrons.com/linux/latest/source/arch/x86/include/asm/bitops.h#L450

bsrl

> return l;
> }
> 
> With -O0 - ~1150 MiB/s
> With -O2 - ~2500 MiB/s
> 
> So, my solutions looks more bad at code, than performance
> (i may be that because most requests for bad compressible data got
> directly to lookup table)

Ok, thanks for collecting the numbers.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 2/3] btrfs: rename page offset parameter in submit_extent_page

2017-10-09 Thread David Sterba
We're going to remove sector_t and will use 'offset', so this patch
frees the name.

Reviewed-by: Liu Bo 
Signed-off-by: David Sterba 
---
 fs/btrfs/extent_io.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 12ab19a4b93e..ea9ab934794b 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2762,7 +2762,7 @@ static int merge_bio(struct extent_io_tree *tree, struct 
page *page,
 static int submit_extent_page(unsigned int opf, struct extent_io_tree *tree,
  struct writeback_control *wbc,
  struct page *page, sector_t sector,
- size_t size, unsigned long offset,
+ size_t size, unsigned long pg_offset,
  struct block_device *bdev,
  struct bio **bio_ret,
  bio_end_io_t end_io_func,
@@ -2786,8 +2786,8 @@ static int submit_extent_page(unsigned int opf, struct 
extent_io_tree *tree,
 
if (prev_bio_flags != bio_flags || !contig ||
force_bio_submit ||
-   merge_bio(tree, page, offset, page_size, bio, bio_flags) ||
-   bio_add_page(bio, page, page_size, offset) < page_size) {
+   merge_bio(tree, page, pg_offset, page_size, bio, bio_flags) 
||
+   bio_add_page(bio, page, page_size, pg_offset) < page_size) {
ret = submit_one_bio(bio, mirror_num, prev_bio_flags);
if (ret < 0) {
*bio_ret = NULL;
@@ -2802,7 +2802,7 @@ static int submit_extent_page(unsigned int opf, struct 
extent_io_tree *tree,
}
 
bio = btrfs_bio_alloc(bdev, sector << 9);
-   bio_add_page(bio, page, page_size, offset);
+   bio_add_page(bio, page, page_size, pg_offset);
bio->bi_end_io = end_io_func;
bio->bi_private = tree;
bio->bi_write_hint = page->mapping->host->i_write_hint;
-- 
2.14.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 1/3] btrfs: scrub: get rid of sector_t

2017-10-09 Thread David Sterba
The use of sector_t is not necessry, it's just for a warning.  Switch to
u64 and rename the variable and use byte units instead of 512b, ie.
dropping the >> 9 shifts.  The messages are adjusted as well.

Reviewed-by: Liu Bo 
Signed-off-by: David Sterba 
---
 fs/btrfs/scrub.c | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index e3f6c49e5c4d..cd1b791d9706 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -231,7 +231,7 @@ struct scrub_warning {
struct btrfs_path   *path;
u64 extent_item_size;
const char  *errstr;
-   sector_tsector;
+   u64 physical;
u64 logical;
struct btrfs_device *dev;
 };
@@ -797,10 +797,10 @@ static int scrub_print_warning_inode(u64 inum, u64 
offset, u64 root,
 */
for (i = 0; i < ipath->fspath->elem_cnt; ++i)
btrfs_warn_in_rcu(fs_info,
- "%s at logical %llu on dev %s, sector %llu, 
root %llu, inode %llu, offset %llu, length %llu, links %u (path: %s)",
+"%s at logical %llu on dev %s, physical %llu, root %llu, inode %llu, offset 
%llu, length %llu, links %u (path: %s)",
  swarn->errstr, swarn->logical,
  rcu_str_deref(swarn->dev->name),
- (unsigned long long)swarn->sector,
+ swarn->physical,
  root, inum, offset,
  min(isize - offset, (u64)PAGE_SIZE), nlink,
  (char *)(unsigned long)ipath->fspath->val[i]);
@@ -810,10 +810,10 @@ static int scrub_print_warning_inode(u64 inum, u64 
offset, u64 root,
 
 err:
btrfs_warn_in_rcu(fs_info,
- "%s at logical %llu on dev %s, sector %llu, root 
%llu, inode %llu, offset %llu: path resolving failed with ret=%d",
+ "%s at logical %llu on dev %s, physical %llu, root 
%llu, inode %llu, offset %llu: path resolving failed with ret=%d",
  swarn->errstr, swarn->logical,
  rcu_str_deref(swarn->dev->name),
- (unsigned long long)swarn->sector,
+ swarn->physical,
  root, inum, offset, ret);
 
free_ipath(ipath);
@@ -845,7 +845,7 @@ static void scrub_print_warning(const char *errstr, struct 
scrub_block *sblock)
if (!path)
return;
 
-   swarn.sector = (sblock->pagev[0]->physical) >> 9;
+   swarn.physical = sblock->pagev[0]->physical;
swarn.logical = sblock->pagev[0]->logical;
swarn.errstr = errstr;
swarn.dev = NULL;
@@ -868,10 +868,10 @@ static void scrub_print_warning(const char *errstr, 
struct scrub_block *sblock)
  item_size, _root,
  _level);
btrfs_warn_in_rcu(fs_info,
-   "%s at logical %llu on dev %s, sector %llu: 
metadata %s (level %d) in tree %llu",
+"%s at logical %llu on dev %s, physical %llu: metadata %s (level %d) in tree 
%llu",
errstr, swarn.logical,
rcu_str_deref(dev->name),
-   (unsigned long long)swarn.sector,
+   swarn.physical,
ref_level ? "node" : "leaf",
ret < 0 ? -1 : ref_level,
ret < 0 ? -1 : ref_root);
-- 
2.14.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 3/3] btrfs: get rid of sector_t and use u64 offset in submit_extent_page

2017-10-09 Thread David Sterba
The use of sector_t in the callchain of submit_extent_page is not
necessary.  Switch to u64 and rename the variable and use byte units
instead of 512b, ie.  dropping the >> 9 shifts and avoiding the
con(tro)versions of sector_t.

Reviewed-by: Liu Bo 
Signed-off-by: David Sterba 
---
 fs/btrfs/extent_io.c | 23 ---
 1 file changed, 12 insertions(+), 11 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index ea9ab934794b..8d0834ee792b 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2761,7 +2761,7 @@ static int merge_bio(struct extent_io_tree *tree, struct 
page *page,
  */
 static int submit_extent_page(unsigned int opf, struct extent_io_tree *tree,
  struct writeback_control *wbc,
- struct page *page, sector_t sector,
+ struct page *page, u64 offset,
  size_t size, unsigned long pg_offset,
  struct block_device *bdev,
  struct bio **bio_ret,
@@ -2776,6 +2776,7 @@ static int submit_extent_page(unsigned int opf, struct 
extent_io_tree *tree,
int contig = 0;
int old_compressed = prev_bio_flags & EXTENT_BIO_COMPRESSED;
size_t page_size = min_t(size_t, size, PAGE_SIZE);
+   sector_t sector = offset >> 9;
 
if (bio_ret && *bio_ret) {
bio = *bio_ret;
@@ -2801,7 +2802,7 @@ static int submit_extent_page(unsigned int opf, struct 
extent_io_tree *tree,
}
}
 
-   bio = btrfs_bio_alloc(bdev, sector << 9);
+   bio = btrfs_bio_alloc(bdev, offset);
bio_add_page(bio, page, page_size, pg_offset);
bio->bi_end_io = end_io_func;
bio->bi_private = tree;
@@ -2892,7 +2893,6 @@ static int __do_readpage(struct extent_io_tree *tree,
u64 last_byte = i_size_read(inode);
u64 block_start;
u64 cur_end;
-   sector_t sector;
struct extent_map *em;
struct block_device *bdev;
int ret = 0;
@@ -2928,6 +2928,7 @@ static int __do_readpage(struct extent_io_tree *tree,
}
while (cur <= end) {
bool force_bio_submit = false;
+   u64 offset;
 
if (cur >= last_byte) {
char *userpage;
@@ -2967,9 +2968,9 @@ static int __do_readpage(struct extent_io_tree *tree,
iosize = ALIGN(iosize, blocksize);
if (this_bio_flag & EXTENT_BIO_COMPRESSED) {
disk_io_size = em->block_len;
-   sector = em->block_start >> 9;
+   offset = em->block_start;
} else {
-   sector = (em->block_start + extent_offset) >> 9;
+   offset = em->block_start + extent_offset;
disk_io_size = iosize;
}
bdev = em->bdev;
@@ -3062,8 +3063,8 @@ static int __do_readpage(struct extent_io_tree *tree,
}
 
ret = submit_extent_page(REQ_OP_READ | read_flags, tree, NULL,
-page, sector, disk_io_size, pg_offset,
-bdev, bio,
+page, offset, disk_io_size,
+pg_offset, bdev, bio,
 end_bio_extent_readpage, mirror_num,
 *bio_flags,
 this_bio_flag,
@@ -3324,7 +3325,6 @@ static noinline_for_stack int 
__extent_writepage_io(struct inode *inode,
u64 extent_offset;
u64 block_start;
u64 iosize;
-   sector_t sector;
struct extent_map *em;
struct block_device *bdev;
size_t pg_offset = 0;
@@ -3367,6 +3367,7 @@ static noinline_for_stack int 
__extent_writepage_io(struct inode *inode,
 
while (cur <= end) {
u64 em_end;
+   u64 offset;
 
if (cur >= i_size) {
if (tree->ops && tree->ops->writepage_end_io_hook)
@@ -3388,7 +3389,7 @@ static noinline_for_stack int 
__extent_writepage_io(struct inode *inode,
BUG_ON(end < cur);
iosize = min(em_end - cur, end - cur + 1);
iosize = ALIGN(iosize, blocksize);
-   sector = (em->block_start + extent_offset) >> 9;
+   offset = em->block_start + extent_offset;
bdev = em->bdev;
block_start = em->block_start;
compressed = test_bit(EXTENT_FLAG_COMPRESSED, >flags);
@@ -3431,7 +3432,7 @@ static noinline_for_stack int 
__extent_writepage_io(struct inode *inode,
}
 
ret = submit_extent_page(REQ_OP_WRITE | write_flags, tree, wbc,
-page, sector, iosize, pg_offset,
+  

[PATCH v2 0/3] Cleanup sector_t

2017-10-09 Thread David Sterba
v2:
- update missing message s/sector/physical/
- updated changelogs where the semantics changed
- added reviewed-by

David Sterba (3):
  btrfs: scrub: get rid of sector_t
  btrfs: rename page offset parameter in submit_extent_page
  btrfs: get rid of sector_t and use u64 offset in submit_extent_page

 fs/btrfs/extent_io.c | 31 ---
 fs/btrfs/scrub.c | 16 
 2 files changed, 24 insertions(+), 23 deletions(-)

-- 
2.14.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ANNOUNCE] fsperf: a simple fs/block performance testing framework

2017-10-09 Thread Theodore Ts'o
On Mon, Oct 09, 2017 at 08:54:16AM -0400, Josef Bacik wrote:
> I purposefully used as little as possible, just json and sqlite, and I tried 
> to
> use as little python3 isms as possible.  Any rpm based systems should have 
> these
> libraries already installed, I agree that using any of the PyPI stuff is a 
> pain
> and is a non-starter for me as I want to make it as easy as possible to get
> running.
> 
> Where do you fall on the including it in xfstests question?  I expect that the
> perf stuff would be run more as maintainers put their pull requests together 
> to
> make sure things haven't regressed.  To that end I was going to wire up
> xfstests-bld to run this as well.  Whatever you and Dave prefer is what I'll 
> do,
> I'll use it wherever it ends up so you two are the ones that get to decide.

I'm currently using Python 2.7 mainly because the LTM subsystem in
gce-xfstests was implemented using that version of Python, and my
initial efforts to port it to Python 3 were... not smooth.  (Because
it was doing authentication, I got bit by the Python 2 vs Python 3
"bytes vs. strings vs. unicode" transition especially hard.)

So I'm going to be annoyed by needing to package Python 2.7 and Python
3.5 in my test VM's, but I can deal, and this will probably be the
forcing function for me to figure out how make that jump.  To be
honest, the bigger issue I'm going to have to figure out is how to
manage the state in the sqlite database across disposable VM's running
in parallel.  And the assumption being made with having a static
sqllite database on a test machine is that the hardware capabilities
of the are static, and that's not true with a VM, whether it's running
via Qemu or in some cloud environment.

I'm not going to care that much about Python 3 not being available on
enterprise distro's, but I could see that being annoying for some
folks.  I'll let them worry about that.

The main thing I think we'll need to worry about once we let Python
into xfstests is to be pretty careful about specifying what version of
Python the scripts need to be portable against (Python 3.3?  3.4?
3.5?) and what versions of python packages get imported.

The bottom line is that I'm personally supportive of adding Python and
fsperf to xfstests.  We just need to be careful about the portability
concerns, not just now, but any time we check in new Python code.  And
having some documented Python style guidelines, and adding unit tests
so we can notice potential portability breakages would be a good idea.

 - Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 02/16] btrfs: Use pagevec_lookup_range_tag()

2017-10-09 Thread Jan Kara
We want only pages from given range in btree_write_cache_pages() and
extent_write_cache_pages(). Use pagevec_lookup_range_tag() instead of
pagevec_lookup_tag() and remove unnecessary code.

CC: linux-btrfs@vger.kernel.org
CC: David Sterba 
Reviewed-by: David Sterba 
Reviewed-by: Daniel Jordan 
Signed-off-by: Jan Kara 
---
 fs/btrfs/extent_io.c | 19 ---
 1 file changed, 4 insertions(+), 15 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 970190cd347e..a4eb6c988f27 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -3818,8 +3818,8 @@ int btree_write_cache_pages(struct address_space *mapping,
if (wbc->sync_mode == WB_SYNC_ALL)
tag_pages_for_writeback(mapping, index, end);
while (!done && !nr_to_write_done && (index <= end) &&
-  (nr_pages = pagevec_lookup_tag(, mapping, , tag,
-   min(end - index, (pgoff_t)PAGEVEC_SIZE-1) + 1))) {
+  (nr_pages = pagevec_lookup_range_tag(, mapping, , end,
+   tag, PAGEVEC_SIZE))) {
unsigned i;
 
scanned = 1;
@@ -3829,11 +3829,6 @@ int btree_write_cache_pages(struct address_space 
*mapping,
if (!PagePrivate(page))
continue;
 
-   if (!wbc->range_cyclic && page->index > end) {
-   done = 1;
-   break;
-   }
-
spin_lock(>private_lock);
if (!PagePrivate(page)) {
spin_unlock(>private_lock);
@@ -3965,8 +3960,8 @@ static int extent_write_cache_pages(struct address_space 
*mapping,
tag_pages_for_writeback(mapping, index, end);
done_index = index;
while (!done && !nr_to_write_done && (index <= end) &&
-  (nr_pages = pagevec_lookup_tag(, mapping, , tag,
-   min(end - index, (pgoff_t)PAGEVEC_SIZE-1) + 1))) {
+  (nr_pages = pagevec_lookup_range_tag(, mapping, , end,
+   tag, PAGEVEC_SIZE))) {
unsigned i;
 
scanned = 1;
@@ -3991,12 +3986,6 @@ static int extent_write_cache_pages(struct address_space 
*mapping,
continue;
}
 
-   if (!wbc->range_cyclic && page->index > end) {
-   done = 1;
-   unlock_page(page);
-   continue;
-   }
-
if (wbc->sync_mode != WB_SYNC_NONE) {
if (PageWriteback(page))
flush_fn(data);
-- 
2.12.3

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ANNOUNCE] fsperf: a simple fs/block performance testing framework

2017-10-09 Thread David Sterba
On Fri, Oct 06, 2017 at 05:09:57PM -0400, Josef Bacik wrote:
> One thing that comes up a lot every LSF is the fact that we have no general 
> way
> that we do performance testing.  Every fs developer has a set of scripts or
> things that they run with varying degrees of consistency, but nothing central
> that we all use.  I for one am getting tired of finding regressions when we 
> are
> deploying new kernels internally, so I wired this thing up to try and address
> this need.
> 
> We all hate convoluted setups, the more brain power we have to put in to 
> setting
> something up the less likely we are to use it, so I took the xfstests approach
> of making it relatively simple to get running and relatively easy to add new
> tests.  For right now the only thing this framework does is run fio scripts.  
> I
> chose fio because it already gathers loads of performance data about it's 
> runs.
> We have everything we need there, latency, bandwidth, cpu time, and all broken
> down by reads, writes, and trims.  I figure most of us are familiar enough 
> with
> fio and how it works to make it relatively easy to add new tests to the
> framework.
> 
> I've posted my code up on github, you can get it here
> 
> https://github.com/josefbacik/fsperf

Let me propose an existing framework that is capable of what is in
fsperf, and much more. I'ts Mel Gorman's mmtests
http://github.com/gormanm/mmtests .

I've been using it for a year or so, built a few scripts on top of that to
help me set up configs for specific machines or run tests in sequences.

What are the capabilities regarding filesystem tests:

* create and mount filesystems (based on configs)
* start various workloads, that are possibly adapted to the machine
  (cpu, memory), there are many types, we'd be interested in those
  touching filesystems
* gather system statistics - cpu, memory, IO, latency there are scripts
  that understand the output of various benchmarking tools (fio, dbench,
  ffsb, tiobench, bonnie, fs_mark, iozone, blogbench, ...)
* export the results into plain text or html, with tables and graphs
* it is already able to compare results of several runs, with
  statistical indicators

The testsuite is actively used and maintained, which means that the
efforts are mosly on the configuration side. From users' perspective
this means to spend time with the setup and the rest will work as
expected. Ie. you don't have to start debugging the suite because there
are some version mismatches.

> All (well most) of the results from fio are stored in a local sqlite database.
> Right now the comparison stuff is very crude, it simply checks against the
> previous run and it only checks a few of the keys by default.  You can check
> latency if you want, but while writing this stuff up it seemed that latency 
> was
> too variable from run to run to be useful in a "did my thing regress or 
> improve"
> sort of way.
> 
> The configuration is brain dead simple, the README has examples.  All you need
> to do is make your local.cfg, run ./setup and then run ./fsperf and you are 
> good
> to go.
> 
> The plan is to add lots of workloads as we discover regressions and such.  We
> don't want anything that takes too long to run otherwise people won't run 
> this,
> so the existing tests don't take much longer than a few minutes each.

Sorry, this is IMO the wrong approach to benchmarking. Can you exercise
the filesystem enough in a few minutes? Can you write at least 2 times
memory size of data to the filesystem? Everything works fine when it's
from caches and the filesystem is fresh. With that you can simply start
using phoronix-test-suite and be done, with the same quality of results
we all roll eyes about.

Targeted tests using fio are fine and I understand the need to keep it
minimal. mmtests have support for fio and any jobfile can be used,
internally implemented with the 'fio --cmdline' option that will
transform it to a shell variable that's passed to fio in the end.

As proposed in the thread, why not use xfstests? It could be suitable
for the configs, mkfs/mount and running but I think it would need a lot
of work to enhance the result gathering and presentation. Essentially
duplicating mmtests from that side.

I was positively surprised by various performance monitors that I was
not primarily interested in, like memory allocations or context
switches.  This gives deeper insights into the system and may help
analyzing the benchmark results.

Side note: you can run xfstests from mmtests, ie. the machine/options
confugration is shared.

I'm willing to write more about the actual usage of mmtests, but at this
point I'm proposing the whole framework for consideration.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3.2 24/74] btrfs: Don't clear SGID when inheriting ACLs

2017-10-09 Thread Ben Hutchings
3.2.94-rc1 review patch.  If anyone has any objections, please let me know.

--

From: Jan Kara 

commit b7f8a09f8097db776b8d160862540e4fc1f51296 upstream.

When new directory 'DIR1' is created in a directory 'DIR0' with SGID bit
set, DIR1 is expected to have SGID bit set (and owning group equal to
the owning group of 'DIR0'). However when 'DIR0' also has some default
ACLs that 'DIR1' inherits, setting these ACLs will result in SGID bit on
'DIR1' to get cleared if user is not member of the owning group.

Fix the problem by moving posix_acl_update_mode() out of
__btrfs_set_acl() into btrfs_set_acl(). That way the function will not be
called when inheriting ACLs which is what we want as it prevents SGID
bit clearing and the mode has been properly set by posix_acl_create()
anyway.

Fixes: 073931017b49d9458aa351605b43a7e34598caef
CC: linux-btrfs@vger.kernel.org
CC: David Sterba 
Signed-off-by: Jan Kara 
Signed-off-by: David Sterba 
[bwh: Backported to 3.2: move the call to posix_acl_update_mode() into
 btrfs_xattr_acl_set()]
Signed-off-by: Ben Hutchings 
---
--- a/fs/btrfs/acl.c
+++ b/fs/btrfs/acl.c
@@ -117,12 +117,6 @@ static int btrfs_set_acl(struct btrfs_tr
switch (type) {
case ACL_TYPE_ACCESS:
name = POSIX_ACL_XATTR_ACCESS;
-   if (acl) {
-   ret = posix_acl_update_mode(inode, >i_mode, 
);
-   if (ret)
-   return ret;
-   }
-   ret = 0;
break;
case ACL_TYPE_DEFAULT:
if (!S_ISDIR(inode->i_mode))
@@ -161,11 +155,12 @@ static int btrfs_xattr_acl_set(struct de
 {
int ret;
struct posix_acl *acl = NULL;
+   struct inode *inode = dentry->d_inode;
 
-   if (!inode_owner_or_capable(dentry->d_inode))
+   if (!inode_owner_or_capable(inode))
return -EPERM;
 
-   if (!IS_POSIXACL(dentry->d_inode))
+   if (!IS_POSIXACL(inode))
return -EOPNOTSUPP;
 
if (value) {
@@ -180,7 +175,12 @@ static int btrfs_xattr_acl_set(struct de
}
}
 
-   ret = btrfs_set_acl(NULL, dentry->d_inode, acl, type);
+   if (type == ACL_TYPE_ACCESS && acl) {
+   ret = posix_acl_update_mode(inode, >i_mode, );
+   if (ret)
+   goto out;
+   }
+   ret = btrfs_set_acl(NULL, inode, acl, type);
 out:
posix_acl_release(acl);
 

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3.16 075/192] btrfs: Don't clear SGID when inheriting ACLs

2017-10-09 Thread Ben Hutchings
3.16.49-rc1 review patch.  If anyone has any objections, please let me know.

--

From: Jan Kara 

commit b7f8a09f8097db776b8d160862540e4fc1f51296 upstream.

When new directory 'DIR1' is created in a directory 'DIR0' with SGID bit
set, DIR1 is expected to have SGID bit set (and owning group equal to
the owning group of 'DIR0'). However when 'DIR0' also has some default
ACLs that 'DIR1' inherits, setting these ACLs will result in SGID bit on
'DIR1' to get cleared if user is not member of the owning group.

Fix the problem by moving posix_acl_update_mode() out of
__btrfs_set_acl() into btrfs_set_acl(). That way the function will not be
called when inheriting ACLs which is what we want as it prevents SGID
bit clearing and the mode has been properly set by posix_acl_create()
anyway.

Fixes: 073931017b49d9458aa351605b43a7e34598caef
CC: linux-btrfs@vger.kernel.org
CC: David Sterba 
Signed-off-by: Jan Kara 
Signed-off-by: David Sterba 
[bwh: Backported to 3.16: adjust context]
Signed-off-by: Ben Hutchings 
---
 fs/btrfs/acl.c | 13 +++--
 1 file changed, 7 insertions(+), 6 deletions(-)

--- a/fs/btrfs/acl.c
+++ b/fs/btrfs/acl.c
@@ -82,12 +82,6 @@ static int __btrfs_set_acl(struct btrfs_
switch (type) {
case ACL_TYPE_ACCESS:
name = POSIX_ACL_XATTR_ACCESS;
-   if (acl) {
-   ret = posix_acl_update_mode(inode, >i_mode, 
);
-   if (ret)
-   return ret;
-   }
-   ret = 0;
break;
case ACL_TYPE_DEFAULT:
if (!S_ISDIR(inode->i_mode))
@@ -123,6 +117,13 @@ out:
 
 int btrfs_set_acl(struct inode *inode, struct posix_acl *acl, int type)
 {
+   int ret;
+
+   if (type == ACL_TYPE_ACCESS && acl) {
+   ret = posix_acl_update_mode(inode, >i_mode, );
+   if (ret)
+   return ret;
+   }
return __btrfs_set_acl(NULL, inode, acl, type);
 }
 

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ANNOUNCE] fsperf: a simple fs/block performance testing framework

2017-10-09 Thread Josef Bacik
On Mon, Oct 09, 2017 at 04:17:31PM +1100, Dave Chinner wrote:
> On Sun, Oct 08, 2017 at 10:25:10PM -0400, Josef Bacik wrote:
> > On Mon, Oct 09, 2017 at 11:51:37AM +1100, Dave Chinner wrote:
> > > On Fri, Oct 06, 2017 at 05:09:57PM -0400, Josef Bacik wrote:
> > > > Hello,
> > > > 
> > > > One thing that comes up a lot every LSF is the fact that we have no 
> > > > general way
> > > > that we do performance testing.  Every fs developer has a set of 
> > > > scripts or
> > > > things that they run with varying degrees of consistency, but nothing 
> > > > central
> > > > that we all use.  I for one am getting tired of finding regressions 
> > > > when we are
> > > > deploying new kernels internally, so I wired this thing up to try and 
> > > > address
> > > > this need.
> > > > 
> > > > We all hate convoluted setups, the more brain power we have to put in 
> > > > to setting
> > > > something up the less likely we are to use it, so I took the xfstests 
> > > > approach
> > > > of making it relatively simple to get running and relatively easy to 
> > > > add new
> > > > tests.  For right now the only thing this framework does is run fio 
> > > > scripts.  I
> > > > chose fio because it already gathers loads of performance data about 
> > > > it's runs.
> > > > We have everything we need there, latency, bandwidth, cpu time, and all 
> > > > broken
> > > > down by reads, writes, and trims.  I figure most of us are familiar 
> > > > enough with
> > > > fio and how it works to make it relatively easy to add new tests to the
> > > > framework.
> > > > 
> > > > I've posted my code up on github, you can get it here
> > > > 
> > > > https://github.com/josefbacik/fsperf
> > > > 
> > > > All (well most) of the results from fio are stored in a local sqlite 
> > > > database.
> > > > Right now the comparison stuff is very crude, it simply checks against 
> > > > the
> > > > previous run and it only checks a few of the keys by default.  You can 
> > > > check
> > > > latency if you want, but while writing this stuff up it seemed that 
> > > > latency was
> > > > too variable from run to run to be useful in a "did my thing regress or 
> > > > improve"
> > > > sort of way.
> > > > 
> > > > The configuration is brain dead simple, the README has examples.  All 
> > > > you need
> > > > to do is make your local.cfg, run ./setup and then run ./fsperf and you 
> > > > are good
> > > > to go.
> > > 
> > > Why re-invent the test infrastructure? Why not just make it a
> > > tests/perf subdir in fstests?
> > > 
> > 
> > Probably should have led with that shouldn't I have?  There's nothing 
> > keeping me
> > from doing it, but I didn't want to try and shoehorn in a python thing into
> > fstests.  I need python to do the sqlite and the json parsing to dump into 
> > the
> > sqlite database.
> > 
> > Now if you (and others) are not opposed to this being dropped into 
> > tests/perf
> > then I'll work that up.  But it's definitely going to need to be done in 
> > python.
> > I know you yourself have said you aren't opposed to using python in the 
> > past, so
> > if that's still the case then I can definitely wire it all up.
> 
> I have no problems with people using python for stuff like this but,
> OTOH, I'm not the fstests maintainer anymore :P
> 
> > > > The plan is to add lots of workloads as we discover regressions and 
> > > > such.  We
> > > > don't want anything that takes too long to run otherwise people won't 
> > > > run this,
> > > > so the existing tests don't take much longer than a few minutes each.  
> > > > I will be
> > > > adding some more comparison options so you can compare against averages 
> > > > of all
> > > > previous runs and such.
> > > 
> > > Yup, that fits exactly into what fstests is for... :P
> > > 
> > > Integrating into fstests means it will be immediately available to
> > > all fs developers, it'll run on everything that everyone already has
> > > setup for filesystem testing, and it will have familiar mkfs/mount
> > > option setup behaviour so there's no new hoops for everyone to jump
> > > through to run it...
> > > 
> > 
> > TBF I specifically made it as easy as possible because I know we all hate 
> > trying
> > to learn new shit.
> 
> Yeah, it's also hard to get people to change their workflows to add
> a whole new test harness into them. It's easy if it's just a new
> command to an existing workflow :P
> 

Agreed, so if you probably won't run this outside of fstests then I'll add it to
xfstests.  I envision this tool as being run by maintainers to verify their pull
requests haven't regressed since the last set of patches, as well as by anybody
trying to fix performance problems.  So it's way more important to me that you,
Ted, and all the various btrfs maintainers will run it than anybody else.

> > I figured this was different enough to warrant a separate
> > project, especially since I'm going to add block device jobs so Jens can 
> > test
> > block layer things.  If we all agree we'd rather 

Re: [ANNOUNCE] fsperf: a simple fs/block performance testing framework

2017-10-09 Thread Josef Bacik
On Sun, Oct 08, 2017 at 11:43:35PM -0400, Theodore Ts'o wrote:
> On Sun, Oct 08, 2017 at 10:25:10PM -0400, Josef Bacik wrote:
> > 
> > Probably should have led with that shouldn't I have?  There's nothing 
> > keeping me
> > from doing it, but I didn't want to try and shoehorn in a python thing into
> > fstests.  I need python to do the sqlite and the json parsing to dump into 
> > the
> > sqlite database.
> 
> What version of python are you using?  From inspection it looks like
> some variant of python 3.x (you're using print as a function w/o using
> "from __future import print_function") but it's not immediately
> obvious from the top-level fsperf shell script what version of python
> your scripts are dependant upon.
> 
> This could potentially be a bit of a portability issue across various
> distributions --- RHEL doesn't ship with Python 3.x at all, and on
> Debian you need to use python3 to get python 3.x, since
> /usr/bin/python still points at Python 2.7 by default.  So I could see
> this as a potential issue for xfstests.
> 
> I'm currently using Python 2.7 in my wrapper scripts for, among other
> things, to parse xUnit XML format and create nice summaries like this:
> 
> ext4/4k: 337 tests, 6 failures, 21 skipped, 3814 seconds
>   Failures: generic/232 generic/233 generic/361 generic/388
> generic/451 generic/459
> 
> So I'm not opposed to python, but I will note that if you are using
> modules from the Python Package Index, and they are modules which are
> not packaged by your distribution (so you're using pip or easy_install
> to pull them off the network), it does make doing hermetic builds from
> trusted sources to be a bit trickier.
> 
> If you have a secops team who wants to know the provenance of software
> which get thrown in production data centers (and automated downloading
> from random external sources w/o any code review makes them break out
> in hives), use of PyPI adds a new wrinkle.  It's not impossible to
> solve, by any means, but it's something to consider.
>

I purposefully used as little as possible, just json and sqlite, and I tried to
use as little python3 isms as possible.  Any rpm based systems should have these
libraries already installed, I agree that using any of the PyPI stuff is a pain
and is a non-starter for me as I want to make it as easy as possible to get
running.

Where do you fall on the including it in xfstests question?  I expect that the
perf stuff would be run more as maintainers put their pull requests together to
make sure things haven't regressed.  To that end I was going to wire up
xfstests-bld to run this as well.  Whatever you and Dave prefer is what I'll do,
I'll use it wherever it ends up so you two are the ones that get to decide.
Thanks,

Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ANNOUNCE] fsperf: a simple fs/block performance testing framework

2017-10-09 Thread Theodore Ts'o
On Mon, Oct 09, 2017 at 02:54:34PM +0800, Eryu Guan wrote:
> I have no problem either if python is really needed, after all this is a
> very useful infrastructure improvement. But the python version problem
> brought up by Ted made me a bit nervous, we need to work that round
> carefully.
> 
> OTOH, I'm just curious, what is the specific reason that python is a
> hard requirement? If we can use perl, that'll be much easier for
> fstests.

Note that perl has its own versioning issues (but it's been easier
given that Perl has been frozen so long waiting for Godot^H^H^H^H^H
Perl6).  It's for similar reasons that Python 2.7 is nice and stable.
Python 2 is frozen while Python 3 caught up.  The only downside is
that Python 2.7 is now deprecated, and will stop getting security
updates from Python Core in 2020.

I'll note that Python 3.6 has some nice features that aren't in Python
3.5 --- but Debian Stable only has Python 3.5, and Debian Unstable has
Python 3.6.  Which is why I said, "it looks like you're using some
variant of Python 3.x", but it wasn't obvious what version exactly was
required by fsperf.  This version instability in Python and Perl is
way Larry McVoy ended up using Tcl for Bitkeeper, by the way.  That
was the only thing that was guaranteed to work everywhere, exactly the
same, without random changes added by Perl/Python innovations...

It's a little easier for me with gce/kvm-xfstests, since I'm using a
Debian Stable images/chroots, and I don't even going to _pretend_ that
I care about cross-distro portability for the Test Appliance VM's that
xfstests-bld creates.  But I suspect this will be more of an issue
with xfstests.

Also, the same issues around versioning / "DLL hell" and provenance of
various high-level package/modules exists with Perl just as much with
Python.  Just substitute CPAN with PyPI.  And again, some of the
popular Perl packages have been packaged by the distro's to solve the
versioning / provenance problem, but exactly _which_ packages /
modules are packaged varies from distro to distro.  (Hopefully the
most popular ones will be packaged by both Red Hat, SuSE and Debian
derivitives, but you'll have to check for each package / module you
want to use.)

One way of solving the problem is just including those Perl / Python
package modules in the sources of xfstests itself; that way you're not
depending on which version of a particular module / package is
available on a distro, and you're also not randomly downloading
software over the network and hoping it works / hasn't been taken over
by some hostile state power.  (I'd be much happier if PyPI or CPAN
used SHA checksums of what you expect to be downloaded; even if you
use Easy_Install's requirements.txt, you're still trusting PyPI to
give you what it thinks is version of Junitparser 1.0.0.)

This is why I've dropped my own copy of junitparser into the git repo
of xfstests-bld.  It's the "ship your own version of the DLL" solution
to the "DLL hell" problem, but it was the best I could come up with,
especially since Debian hadn't packaged the Junitparser python module.
I also figured it was much better at lowering the blood pressure of
the friendly local secops types.  :-)

  - Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v8 2/2] btrfs: check device for critical errors and mark failed

2017-10-09 Thread Austin S. Hemmelgarn

On 2017-10-06 19:33, Liu Bo wrote:

On Thu, Oct 05, 2017 at 07:07:44AM -0400, Austin S. Hemmelgarn wrote:

On 2017-10-04 16:11, Liu Bo wrote:

On Tue, Oct 03, 2017 at 11:59:20PM +0800, Anand Jain wrote:

From: Anand Jain 

Write and flush errors are critical errors, upon which the device fd
must be closed and marked as failed.



Can we defer the job of closing device to umount?

We can go mark the device failed and skip it while doing read/write,
and umount can do the cleanup work.

That way we don't need a dedicated thread looping around to detect a
rare situation.

If BTRFS doesn't close the device, then it's 100% guaranteed if it
reconnects that it will show up under a different device node.  It would
also mean that the device node stays visible when there is in fact no device
connected to it, which is a pain from a monitoring perspective.


I see, you're assuming that these errors are due to disconnection of
disks, it could be bad sectors (although almost impossible from
enterprise hard disks) or some other errors across the stack.
And we need to handle any of these cases.  If BTRFS is going to claim 
handling of device failures, it needs to handle all types of device 
failure it reasonably can without making assumptions about what type of 
failure occurred.


I do agree that cleanup needs to be done if disk got disconnected, but
not doing cleanup here, a udev rule is needed to handle such an event.
And why exactly does it need to involve userspace beyond notifying that 
something bad happened?  We know the failure happened, why should we 
tell userspace about it and require userspace to tell us to handle it? 
It's not like this is a performance critical path since it won't happen 
very often, and the primary concern is data safety, not performance, so 
why exactly do we need to push something that isn't a policy decision 
(and arguably isn't a decision, if we're not using a device, we 
shouldn't be holding it open, period) out to userspace?


Looking at it a bit differently though, every time I've seen any write 
or flush errors, either the connection to the device was unstable, or 
the storage device was failing, and neither case is something that 
should require userspace to make a decision about how to handle things.


To be entirely honest though, from a system administrator's perspective, 
I would actually expect a device getting disconnected (or otherwise 
disappearing off the bus) to end up in a 'Missing' state, not a 'Failed' 
state.  I see no reason to treat it any differently than if it 
disappeared before the volume was mounted, other than the requirement to 
actually handle it.  That type of handling _will_ require userspace 
involvement (because I seriously doubt there will ever be proper 
notification from the block layer to holders that a device is gone), but 
that's also largely orthogonal to this patch set (there's no reason that 
we need to handle write or flush errors the same way).

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: "BTRFS error (device vda1): couldn't get super buffer head for bytenr x"

2017-10-09 Thread Chris Murphy
On Mon, Oct 9, 2017 at 1:22 AM, Nick Gilmour  wrote:
>> I don't see a 'btrfs filesystem resize' command in your sequence. Did
>> you actually resize the file system before you resized the underlying
>> (virtual) block device?
>
>
> OK. I guess, this is it. I didn't do any 'btrfs filesystem resize' . The
> guides I was following didn't mention something like that. I was assuming
> that if it works for other FS like this it should also work for BTRFS.

Literally all other file systems require the fs be resized first when
shrinking, before the underlying partition or virtual block device is
reduced. There is nothing unique about Btrfs in this respect.

Some tools will use fsadm in the sequence to make sure the file system
is properly resized. So the question is whether any of the resize
tools you did use, expect to use fsadm. And if they do or did, whether
and why there wasn't an error if fsadm doesn't support Btrfs before
the block device was resized anyway. I don't know offhand if Btrfs is
supported by fsadmn, the man page does not list it as one of the
supported file systems.



> # btrfs filesystem resize -350g /home ?
>
> Should it be before or after I shrink the .img?

When shrinking, the fs resize must happen first. Then the block device
can be reduced to match.

When growing, the block device is extended, then the fs is resized.




-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ANNOUNCE] fsperf: a simple fs/block performance testing framework

2017-10-09 Thread Eryu Guan
On Mon, Oct 09, 2017 at 04:17:31PM +1100, Dave Chinner wrote:
> On Sun, Oct 08, 2017 at 10:25:10PM -0400, Josef Bacik wrote:
> > On Mon, Oct 09, 2017 at 11:51:37AM +1100, Dave Chinner wrote:
> > > On Fri, Oct 06, 2017 at 05:09:57PM -0400, Josef Bacik wrote:
> > > > Hello,
> > > > 
> > > > One thing that comes up a lot every LSF is the fact that we have no 
> > > > general way
> > > > that we do performance testing.  Every fs developer has a set of 
> > > > scripts or
> > > > things that they run with varying degrees of consistency, but nothing 
> > > > central
> > > > that we all use.  I for one am getting tired of finding regressions 
> > > > when we are
> > > > deploying new kernels internally, so I wired this thing up to try and 
> > > > address
> > > > this need.
> > > > 
> > > > We all hate convoluted setups, the more brain power we have to put in 
> > > > to setting
> > > > something up the less likely we are to use it, so I took the xfstests 
> > > > approach
> > > > of making it relatively simple to get running and relatively easy to 
> > > > add new
> > > > tests.  For right now the only thing this framework does is run fio 
> > > > scripts.  I
> > > > chose fio because it already gathers loads of performance data about 
> > > > it's runs.
> > > > We have everything we need there, latency, bandwidth, cpu time, and all 
> > > > broken
> > > > down by reads, writes, and trims.  I figure most of us are familiar 
> > > > enough with
> > > > fio and how it works to make it relatively easy to add new tests to the
> > > > framework.
> > > > 
> > > > I've posted my code up on github, you can get it here
> > > > 
> > > > https://github.com/josefbacik/fsperf
> > > > 
> > > > All (well most) of the results from fio are stored in a local sqlite 
> > > > database.
> > > > Right now the comparison stuff is very crude, it simply checks against 
> > > > the
> > > > previous run and it only checks a few of the keys by default.  You can 
> > > > check
> > > > latency if you want, but while writing this stuff up it seemed that 
> > > > latency was
> > > > too variable from run to run to be useful in a "did my thing regress or 
> > > > improve"
> > > > sort of way.
> > > > 
> > > > The configuration is brain dead simple, the README has examples.  All 
> > > > you need
> > > > to do is make your local.cfg, run ./setup and then run ./fsperf and you 
> > > > are good
> > > > to go.
> > > 
> > > Why re-invent the test infrastructure? Why not just make it a
> > > tests/perf subdir in fstests?
> > > 
> > 
> > Probably should have led with that shouldn't I have?  There's nothing 
> > keeping me
> > from doing it, but I didn't want to try and shoehorn in a python thing into
> > fstests.  I need python to do the sqlite and the json parsing to dump into 
> > the
> > sqlite database.
> > 
> > Now if you (and others) are not opposed to this being dropped into 
> > tests/perf
> > then I'll work that up.  But it's definitely going to need to be done in 
> > python.
> > I know you yourself have said you aren't opposed to using python in the 
> > past, so
> > if that's still the case then I can definitely wire it all up.
> 
> I have no problems with people using python for stuff like this but,
> OTOH, I'm not the fstests maintainer anymore :P

I have no problem either if python is really needed, after all this is a
very useful infrastructure improvement. But the python version problem
brought up by Ted made me a bit nervous, we need to work that round
carefully.

OTOH, I'm just curious, what is the specific reason that python is a
hard requirement? If we can use perl, that'll be much easier for
fstests.

BTW, opinions from key fs developers/fstests users, like you, are also
very important and welcomed :)

Thanks,
Eryu

> 
> > > > The plan is to add lots of workloads as we discover regressions and 
> > > > such.  We
> > > > don't want anything that takes too long to run otherwise people won't 
> > > > run this,
> > > > so the existing tests don't take much longer than a few minutes each.  
> > > > I will be
> > > > adding some more comparison options so you can compare against averages 
> > > > of all
> > > > previous runs and such.
> > > 
> > > Yup, that fits exactly into what fstests is for... :P
> > > 
> > > Integrating into fstests means it will be immediately available to
> > > all fs developers, it'll run on everything that everyone already has
> > > setup for filesystem testing, and it will have familiar mkfs/mount
> > > option setup behaviour so there's no new hoops for everyone to jump
> > > through to run it...
> > > 
> > 
> > TBF I specifically made it as easy as possible because I know we all hate 
> > trying
> > to learn new shit.
> 
> Yeah, it's also hard to get people to change their workflows to add
> a whole new test harness into them. It's easy if it's just a new
> command to an existing workflow :P
> 
> > I figured this was different enough to warrant a separate
> > project, especially since I'm going to add block