[PATCH RFC v2 1/5] btrfs: Fix the bug that fs_info-pending_changes is never cleared.
Fs_info-pending_changes is never cleared since the original code uses cmpxchg(fs_info-pending_changes, 0, 0), which will only clear it if pending_changes is already 0. This will cause a lot of problem when mount it with inode_cache mount option. If the btrfs is mounted as inode_cache, pending_changes will always be 1, even when the fs is frozen. Signed-off-by: Qu Wenruo quwen...@cn.fujitsu.com --- fs/btrfs/transaction.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c index a605d4e..e88b59d 100644 --- a/fs/btrfs/transaction.c +++ b/fs/btrfs/transaction.c @@ -2118,7 +2118,7 @@ void btrfs_apply_pending_changes(struct btrfs_fs_info *fs_info) unsigned long prev; unsigned long bit; - prev = cmpxchg(fs_info-pending_changes, 0, 0); + prev = xchg(fs_info-pending_changes, 0); if (!prev) return; -- 2.2.2 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH RFC v2 0/5] Fix bugs introduced by the new pending changes
Commit 572d9ab7845ea0e0 (btrfs: add support for processing pending changes) introduced several bugs which will eventually cause a deadlock. The deadlock can be triggered by fstests/generic/068 with inode_cache mount option. The deadlock happens in the following flow: moutn with inode_cache - freeze fs - sync - unfreeze. Sync_fs stack: Thaw_fs stack: (Holding s_umount) (Waiting s_umount) |- btrfs_sync_fs() |- start_transaction() (Waiting thaw_fs to unfreeze the fs) The problem has several causes: 1) Cmpxchg in btrfs_apply_pending_changes() doesn't work Cmpxchg in btrfs_apply_pending_changes() will only clear fs_info-pending_changes if it is already 0. So fs_info-pending_changes will never cleared, and every sync_fs() on frozen btrfs will try to start a new transaction and deadlock. So patch 1 fixes it by using xchg() other than cmpxchg(). 2) btrfs_freeze() doesn't handle and clear pending_changes If btrfs start a transaction if there are pending changes but no running transaction, sync should never start a transaction on frozen fs. (Except the following sysfs case) So patch 2~3 fixes it by adding pending changes handler in btrfs_freeze(). 3) Changes through sysfs interface can create pending changes without waiting for unfreezing. Since sysfs changes doesn't go through normal open routine, which will initiate sb_start_*write() and waiting for unfreezing, changes through sysfs like changing label will set fs_info-pending_changes to non-zero, and later sync_fs will deadlock again. So patch 4~5 reverts the commits relating patches to fix such problem. Qu Wenruo (5): btrfs: Fix the bug that fs_info-pending_changes is never cleared. btrfs: Add btrfs_start_transaction_freeze() to start transaction in btrfs_freeze() btrfs: Handle pending changes in btrfs_freeze(). Revert btrfs: move commit out of sysfs when changing label Revert btrfs: move commit out of sysfs when changing features fs/btrfs/super.c | 52 -- fs/btrfs/sysfs.c | 34 - fs/btrfs/transaction.c | 10 +- fs/btrfs/transaction.h | 2 ++ 4 files changed, 57 insertions(+), 41 deletions(-) -- 2.2.2 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH RFC v2 2/5] btrfs: Add btrfs_start_transaction_freeze() to start transaction in btrfs_freeze()
Add btrfs_start_transaction_freeze() function. This is needed for btrfs_freeze() to handle the pending changes. If there is no running transaction when btrfs_freeze() is called, btrfs_freeze() needs to start a transaction to handle the pending changes, and it can't initial a sb_start_intwrite() call. Signed-off-by: Qu Wenruo quwen...@cn.fujitsu.com --- fs/btrfs/transaction.c | 8 fs/btrfs/transaction.h | 2 ++ 2 files changed, 10 insertions(+) diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c index e88b59d..13a2b67 100644 --- a/fs/btrfs/transaction.c +++ b/fs/btrfs/transaction.c @@ -570,6 +570,14 @@ struct btrfs_trans_handle *btrfs_start_transaction_lflush( BTRFS_RESERVE_FLUSH_LIMIT); } +/* Only used in btrfs_freeze() */ +struct btrfs_trans_handle *btrfs_start_transaction_freeze( + struct btrfs_root *root, int num_items) +{ + return start_transaction(root, num_items, __TRANS_START, +BTRFS_RESERVE_FLUSH_ALL); +} + struct btrfs_trans_handle *btrfs_join_transaction(struct btrfs_root *root) { return start_transaction(root, 0, TRANS_JOIN, 0); diff --git a/fs/btrfs/transaction.h b/fs/btrfs/transaction.h index 00ed29c..49bd7eb 100644 --- a/fs/btrfs/transaction.h +++ b/fs/btrfs/transaction.h @@ -140,6 +140,8 @@ struct btrfs_trans_handle *btrfs_start_transaction(struct btrfs_root *root, int num_items); struct btrfs_trans_handle *btrfs_start_transaction_lflush( struct btrfs_root *root, int num_items); +struct btrfs_trans_handle *btrfs_start_transaction_freeze( + struct btrfs_root *root, int num_items); struct btrfs_trans_handle *btrfs_join_transaction(struct btrfs_root *root); struct btrfs_trans_handle *btrfs_join_transaction_nolock(struct btrfs_root *root); struct btrfs_trans_handle *btrfs_attach_transaction(struct btrfs_root *root); -- 2.2.2 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH RFC v2 4/5] Revert btrfs: move commit out of sysfs when changing label
This reverts commit a6f69dc8018d (btrfs: move commit out of sysfs when changing label). Unlike most btrfs file operations, sysfs don't go through the normal open/read/write routine which will initiate a sb_start_*write/read. This is very dangerous on frozen btrfs, if using the pending changes method, sysfs change can make fs_info-pending_changes to 1, and later sync_fs() call will cause deadlock. So revert the patch to deal the deadlock caused by the following workload: freeze() - change lable - sync() - unfreeze() Signed-off-by: Qu Wenruo quwen...@cn.fujitsu.com --- fs/btrfs/sysfs.c | 21 + 1 file changed, 13 insertions(+), 8 deletions(-) diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c index 92db3f6..226f726 100644 --- a/fs/btrfs/sysfs.c +++ b/fs/btrfs/sysfs.c @@ -369,6 +369,9 @@ static ssize_t btrfs_label_store(struct kobject *kobj, const char *buf, size_t len) { struct btrfs_fs_info *fs_info = to_fs_info(kobj); + struct btrfs_trans_handle *trans; + struct btrfs_root *root = fs_info-fs_root; + int ret; size_t p_len; if (fs_info-sb-s_flags MS_RDONLY) @@ -383,18 +386,20 @@ static ssize_t btrfs_label_store(struct kobject *kobj, if (p_len = BTRFS_LABEL_SIZE) return -EINVAL; - spin_lock(fs_info-super_lock); + trans = btrfs_start_transaction(root, 0); + if (IS_ERR(trans)) + return PTR_ERR(trans); + + spin_lock(root-fs_info-super_lock); memset(fs_info-super_copy-label, 0, BTRFS_LABEL_SIZE); memcpy(fs_info-super_copy-label, buf, p_len); - spin_unlock(fs_info-super_lock); + spin_unlock(root-fs_info-super_lock); + ret = btrfs_commit_transaction(trans, root); - /* -* We don't want to do full transaction commit from inside sysfs -*/ - btrfs_set_pending(fs_info, COMMIT); - wake_up_process(fs_info-transaction_kthread); + if (!ret) + return len; - return len; + return ret; } BTRFS_ATTR_RW(label, btrfs_label_show, btrfs_label_store); -- 2.2.2 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Recovery options for FS forced readonly due to 3.17 snapshot bug
On Tue, Jan 20, 2015 at 11:15:22PM +1100, brett.k...@commandict.com.au wrote: Hi, My FS has been forced readonly by the early 3.17 snapshot bug. After much reading, I'm looking for validation of some recovery scenarios: 1) btrfsck --repair under a later kernel. Kernel won't make a difference here, only the version of the btrfs progs you're using, which should be as recent as possible, but at least 3.17 (best to try one of the 3.18.x releases). It's entirely possible that you've got a different issue as well which is preventing the btrfs check from repairing the FS. 2) replacing the devices one by one under a later kernel, effectively removing the corruption. This won't work -- the FS would still try to migrate the broken metadata. 3) Just copying data across from the readonly source to a new FS created under a later kernel. That would work. Option 1 with check only (not repair) segfaults and coredumps on 3.18 (see below) .. are there some other options needed here or perhaps patches pending to rectify this ? Option 2 would be better than nothing, as I get to keep my snapshots and don't need any more surplus drives than 2x units of the largest size (data is RAID1). Option 3 is a fallback and basically forfeits the source FS including snapshots, is guaranteed to work (I can read the data fine), however requires 1:1 capacity in new disks to achieve. So being unable to complete a btrfsck check and generally scared off by a btrfsck -repair, will option 2 work ? I.E. can I still replace devices on a readonly FS ? And are there any other options ? Note the FS is very nearly full of data - I did also get continuous disk activity due to this high utilisation triggering the endless reclaim unused extent routine, for a few days prior to it being forced readonly. [root@array ~]# uname -r 3.18.2-200.fc21.x86_64 (upgraded since getting the issue) [root@array ~]# btrfsck /dev/sdm snip parent transid verify failed on 40198674350080 wanted 1623837 found 1622986 Ignoring transid failure Segmentation fault (core dumped) Even at its worst, this should not be segfaulting. This should be considered a bug in btrfs check at the very least. Hugo. [root@array ~]# btrfs fi sh /export/archive/ Label: 'archive' uuid: 22c7663a-93ca-40a6-9491-26abaa62b924 Total devices 8 FS bytes used 18.07TiB devid1 size 5.46TiB used 5.46TiB path /dev/sdm devid2 size 3.64TiB used 3.64TiB path /dev/sdc devid3 size 3.64TiB used 3.64TiB path /dev/sdd devid4 size 5.46TiB used 5.46TiB path /dev/sdl devid5 size 4.55TiB used 4.55TiB path /dev/sdb devid8 size 4.55TiB used 4.55TiB path /dev/sdj devid9 size 4.55TiB used 4.55TiB path /dev/sdk devid 10 size 4.55TiB used 4.55TiB path /dev/sde Btrfs v3.18.1 [root@array ~]# df -h /export/archive/ Filesystem Size Used Avail Use% Mounted on /dev/sdm 19T 19T 123G 100% /export/archive [root@array ~]# btrfs fi df /export/archive/ Data, RAID1: total=18.17TiB, used=18.05TiB System, RAID1: total=32.00MiB, used=3.00MiB Metadata, RAID1: total=23.00GiB, used=22.49GiB GlobalReserve, single: total=512.00MiB, used=0.00B [root@array ~]# dmesg |grep -i btrfs |less [2.044134] Btrfs loaded [2.046327] BTRFS: device label array devid 1 transid 289965 /dev/sda4 [2.472321] BTRFS info (device sda4): disk space caching is enabled [2.513784] BTRFS: detected SSD devices, enabling SSD mode [2.985877] BTRFS info (device sda4): disk space caching is enabled [3.520737] BTRFS: device label archive devid 5 transid 1623944 /dev/sdf [3.544394] BTRFS: device label archive devid 3 transid 1623944 /dev/sdh [3.544553] BTRFS: device label archive devid 10 transid 1623944 /dev/sdi [3.544957] BTRFS: device label archive devid 2 transid 1623944 /dev/sdg [3.614636] BTRFS: device label archive devid 9 transid 1623944 /dev/sdc [3.663006] BTRFS: device label archive devid 8 transid 1623944 /dev/sdb [3.663092] BTRFS: device label archive devid 4 transid 1623944 /dev/sdd [3.685397] BTRFS: device label archive devid 1 transid 1623944 /dev/sde [3.701732] BTRFS info (device sde): disk space caching is enabled [3.903413] BTRFS: bdev /dev/sde errs: wr 2805, rd 1548, flush 0, corrupt 0, gen 0 [3.903421] BTRFS: bdev /dev/sdd errs: wr 2805, rd 0, flush 0, corrupt 0, gen 0 [3.903426] BTRFS: bdev /dev/sdb errs: wr 2805, rd 33, flush 0, corrupt 0, gen 0 [3.903430] BTRFS: bdev /dev/sdc errs: wr 2806, rd 0, flush 0, corrupt 0, gen 0 [ 1432.231219] BTRFS: space cache generation (1623006) does not match inode (1623943) [ 1432.231283] BTRFS warning (device sde): failed to load free space cache for block group 57021632675840, rebuild it now [ 1434.603652] BTRFS (device sde): parent transid verify failed on 57022998740992 wanted 1623896
Re: Recovery options for FS forced readonly due to 3.17 snapshot bug
On Tue, Jan 20, 2015 at 12:15 PM, brett.k...@commandict.com.au wrote: Hi, My FS has been forced readonly by the early 3.17 snapshot bug. After much reading, I'm looking for validation of some recovery scenarios: 1) btrfsck --repair under a later kernel. 2) replacing the devices one by one under a later kernel, effectively removing the corruption. 3) Just copying data across from the readonly source to a new FS created under a later kernel. Option 1 with check only (not repair) segfaults and coredumps on 3.18 (see below) .. are there some other options needed here or perhaps patches pending to rectify this ? Option 2 would be better than nothing, as I get to keep my snapshots and don't need any more surplus drives than 2x units of the largest size (data is RAID1). Option 3 is a fallback and basically forfeits the source FS including snapshots, is guaranteed to work (I can read the data fine), however requires 1:1 capacity in new disks to achieve. So being unable to complete a btrfsck check and generally scared off by a btrfsck -repair, will option 2 work ? I.E. can I still replace devices on a readonly FS ? And are there any other options ? Note the FS is very nearly full of data - I did also get continuous disk activity due to this high utilisation triggering the endless reclaim unused extent routine, for a few days prior to it being forced readonly. [root@array ~]# uname -r 3.18.2-200.fc21.x86_64 (upgraded since getting the issue) [root@array ~]# btrfsck /dev/sdm snip parent transid verify failed on 40198674350080 wanted 1623837 found 1622986 Ignoring transid failure Segmentation fault (core dumped) [root@array ~]# btrfs fi sh /export/archive/ Label: 'archive' uuid: 22c7663a-93ca-40a6-9491-26abaa62b924 Total devices 8 FS bytes used 18.07TiB devid1 size 5.46TiB used 5.46TiB path /dev/sdm devid2 size 3.64TiB used 3.64TiB path /dev/sdc devid3 size 3.64TiB used 3.64TiB path /dev/sdd devid4 size 5.46TiB used 5.46TiB path /dev/sdl devid5 size 4.55TiB used 4.55TiB path /dev/sdb devid8 size 4.55TiB used 4.55TiB path /dev/sdj devid9 size 4.55TiB used 4.55TiB path /dev/sdk devid 10 size 4.55TiB used 4.55TiB path /dev/sde Btrfs v3.18.1 [root@array ~]# df -h /export/archive/ Filesystem Size Used Avail Use% Mounted on /dev/sdm 19T 19T 123G 100% /export/archive [root@array ~]# btrfs fi df /export/archive/ Data, RAID1: total=18.17TiB, used=18.05TiB System, RAID1: total=32.00MiB, used=3.00MiB Metadata, RAID1: total=23.00GiB, used=22.49GiB GlobalReserve, single: total=512.00MiB, used=0.00B [root@array ~]# dmesg |grep -i btrfs |less [2.044134] Btrfs loaded [2.046327] BTRFS: device label array devid 1 transid 289965 /dev/sda4 [2.472321] BTRFS info (device sda4): disk space caching is enabled [2.513784] BTRFS: detected SSD devices, enabling SSD mode [2.985877] BTRFS info (device sda4): disk space caching is enabled [3.520737] BTRFS: device label archive devid 5 transid 1623944 /dev/sdf [3.544394] BTRFS: device label archive devid 3 transid 1623944 /dev/sdh [3.544553] BTRFS: device label archive devid 10 transid 1623944 /dev/sdi [3.544957] BTRFS: device label archive devid 2 transid 1623944 /dev/sdg [3.614636] BTRFS: device label archive devid 9 transid 1623944 /dev/sdc [3.663006] BTRFS: device label archive devid 8 transid 1623944 /dev/sdb [3.663092] BTRFS: device label archive devid 4 transid 1623944 /dev/sdd [3.685397] BTRFS: device label archive devid 1 transid 1623944 /dev/sde [3.701732] BTRFS info (device sde): disk space caching is enabled [3.903413] BTRFS: bdev /dev/sde errs: wr 2805, rd 1548, flush 0, corrupt 0, gen 0 [3.903421] BTRFS: bdev /dev/sdd errs: wr 2805, rd 0, flush 0, corrupt 0, gen 0 [3.903426] BTRFS: bdev /dev/sdb errs: wr 2805, rd 33, flush 0, corrupt 0, gen 0 [3.903430] BTRFS: bdev /dev/sdc errs: wr 2806, rd 0, flush 0, corrupt 0, gen 0 You have device write and read errors here - the snapshots issue with 3.17 wouldn't bump these device error stats. Even if you were affected by the snapshots issue, you clearly have here another problem, very likely hardware related due to those stats. If you are affected by the snapshots issue, btrfs check/btrfsck as of btrfs-progs v3.17+ should be able to fix it and if you are affected by it, it reports at a very early stage (without --repair) one or more messages with the following format: root %llu has a root item with a more recent gen (%llu) compared to the found root node (%llu)\n If the snapshots issue from 3.17.0/1 was the only issue you had, those messages would be reported by fsck before it segfaults (at least for all cases I've seen). Also when one is affected by the snapshots issue, it should only cause problems when reading from the affected readonly
[PATCH] Btrfs: fix setup_leaf_for_split() to avoid leaf corruption
We were incorrectly detecting when the target key didn't exist anymore after releasing the path and re-searching the tree. This could make us split or duplicate (btrfs_split_item() and btrfs_duplicate_item() are its only callers at the moment) an item when we should not. For the case of duplicating an item, we currently only duplicate checksum items (csum tree) and file extent items (fs/subvol trees). For the checksum items we end up overriding the item completely, but for file extent items we update only some of their fields in the copy (done in __btrfs_drop_extents), which means we can end up having a logical corruption for some values. Also for the case where we duplicate a file extent item it will make us produce a leaf with a wrong key order, as btrfs_duplicate_item() advances us to the next slot and then its caller sets a smaller key on the new item at that slot (like in __btrfs_drop_extents() e.g.). Alternatively if the tree search in setup_leaf_for_split() leaves with path-slots[0] == btrfs_header_nritems(path-nodes[0]), we end up accessing beyond the leaf's end (when we check if the item's size has changed) and make our caller insert an item at the invalid slot btrfs_header_nritems(path-nodes[0]) + 1, causing an invalid memory access if the leaf is full or nearly full. This issue has been present since the introduction of this function in 2009: Btrfs: Add btrfs_duplicate_item commit ad48fd754676bfae4139be1a897b1ea58f9aaf21 Signed-off-by: Filipe Manana fdman...@suse.com --- fs/btrfs/ctree.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c index 14a72ed..78e8993c1 100644 --- a/fs/btrfs/ctree.c +++ b/fs/btrfs/ctree.c @@ -4356,13 +4356,15 @@ static noinline int setup_leaf_for_split(struct btrfs_trans_handle *trans, path-search_for_split = 1; ret = btrfs_search_slot(trans, root, key, path, 0, 1); path-search_for_split = 0; + if (ret 0) + ret = -EAGAIN; if (ret 0) goto err; ret = -EAGAIN; leaf = path-nodes[0]; - /* if our item isn't there or got smaller, return now */ - if (ret 0 || item_size != btrfs_item_size_nr(leaf, path-slots[0])) + /* if our item isn't there, return now */ + if (item_size != btrfs_item_size_nr(leaf, path-slots[0])) goto err; /* the leaf has changed, it now has room. return now */ -- 2.1.3 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs: Don't call btrfs_start_transaction() on frozen fs to avoid deadlock.
On Mon, Jan 19, 2015 at 03:42:41PM +0800, Qu Wenruo wrote: --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -1000,6 +1000,14 @@ int btrfs_sync_fs(struct super_block *sb, int wait) */ if (fs_info-pending_changes == 0) return 0; + /* + * Test if the fs is frozen, or start_trasaction + * will deadlock on itself. + */ + if (__sb_start_write(sb, SB_FREEZE_FS, false)) + __sb_end_write(sb, SB_FREEZE_FS); + else + return 0; The more I look into that the more I think that the first fix is the right one. Has been pointed out in this thread, it is ok to skip processing the pending changes if the filesystem is frozen. The pending changes have to flushed from sync (by design), we cannot use mnt_want_write or the sb_start* protections that. The btrfs_freeze callback can safely do the last commit, that's under s_umount held by vfs::freeze_super. Then any other new transaction would block. Any other call to btrfs_sync_fs will not find any active transaction and with this patch will not start one. Sounds safe to me. I think the right level to check is SB_FREEZE_WRITE though, to stop any potential writes as soon as possible and when the s_umount lock is still held in vfs::freeze_super. I'll collect the relevant patches and will send it for review. trans = btrfs_start_transaction(root, 0); } else { return PTR_ERR(trans); -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Snapshot cannot be deleted
+linux-btrfs,-linux-ext4 On Tue, Jan 20, 2015 at 01:28:04PM +0100, Andreas Philipp wrote: Due to the known (and fixed) bug in kernel 3.17.0 one of my btrfs volume suffers from unreadable and - even worse - uneraseable snapshots. Whenever such a snapshot is accessed there is parent transid verify failed in dmesg. Is there any way to delete these snapshots? You sent this to the wrong mailing list, so I'm reforwarding this to the linux-btrfs list. Cheers, - Ted -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs: Don't call btrfs_start_transaction() on frozen fs to avoid deadlock.
On Tue, Jan 20, 2015 at 7:58 PM, Qu Wenruo quwen...@cn.fujitsu.com wrote: Original Message Subject: Re: [PATCH] btrfs: Don't call btrfs_start_transaction() on frozen fs to avoid deadlock. From: David Sterba dste...@suse.cz To: Qu Wenruo quwen...@cn.fujitsu.com Date: 2015年01月21日 01:13 On Mon, Jan 19, 2015 at 03:42:41PM +0800, Qu Wenruo wrote: --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -1000,6 +1000,14 @@ int btrfs_sync_fs(struct super_block *sb, int wait) */ if (fs_info-pending_changes == 0) return 0; + /* +* Test if the fs is frozen, or start_trasaction +* will deadlock on itself. +*/ + if (__sb_start_write(sb, SB_FREEZE_FS, false)) + __sb_end_write(sb, SB_FREEZE_FS); + else + return 0; But what if someone freezes the FS after __sb_end_write() and before btrfs_start_transaction()? I don't see what keeps new freezers from coming in. -chris -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs convert running out of space
On Tue, Jan 20, 2015 at 4:04 PM, Gareth Pye gar...@cerberos.id.au wrote: Yeah, we don't have that much space spare :( File system has been going strong from when it was created with early RAID5 code, then converted to RAID10 with kernel 3.12. There aren't any nocow files to my knowledge but there are plenty of files larger than a gig on the file system. The first few results from logical-resolve have been for files in the 1G~2G range, so that could be some sticky spaghetti. Are any of those big files in a snapshot? The snapshotting may be pinning a bunch of large extents, so even if it seems like the volume has enough space, it might actually be running out of space. All I can think of is progressively removing the files that are implicated in the conversion failure. That could mean just deleting older snapshots that you probably don't need, progressively getting to the point where you migrate those files off this fs to another one, and then delete them (all instances in all subvol/snapshots) and just keep trying. Is a btrfs check happy? Or does it complain about anything? I've had quite good luck just adding a drive (two drives for raid1/10 volumes) to an existing btrfs volume, they don't have to be drdb, they can be local block devices, either physical drives or LV's. I've even done this with flash drives (kinda scary and slow but it worked). I'd still suggest contingency planning in case this volume becomes temperamental and you have no choice but to migrate it elsewhere. Better to do it on your timetable than the filesystem's. -- Chris Murphy -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] fstests: btrfs/079: Fix wrong value passed to available space check.
On Mon, Dec 29, 2014 at 03:17:18PM +0800, Qu Wenruo wrote: Before the patch, we passed wrong value to _require_fs_space, which should be in unit of 1024, but passed in unit of GB. Yes, that needs fixing. Fix it and add better prompt for falloc failure. That doesn't Signed-off-by: Qu Wenruo quwen...@cn.fujitsu.com --- tests/btrfs/079 | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/tests/btrfs/079 b/tests/btrfs/079 index 202d3e6..99d0187 100755 --- a/tests/btrfs/079 +++ b/tests/btrfs/079 @@ -73,8 +73,9 @@ rm -f $seqres.full _scratch_mkfs $seqres.full 21 _scratch_mount -_require_fs_space $SCRATCH_MNT $(($filesize / 1024 / 1024 / 1024)) -$XFS_IO_PROG -f -c falloc 0 $filesize $testfile +_require_fs_space $SCRATCH_MNT $(($filesize / 1024)) +$XFS_IO_PROG -f -c falloc 0 $filesize $testfile || \ + _fail falloc failed If the falloc fails, then the golden output match will fail. Let the test run, regardless, because the first thing it does is try to overwrite the fallocated region where the success or failure of the writes are completely ignored. Hence a falloc failure should also be ignored... Cheers, Dave. -- Dave Chinner da...@fromorbit.com -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs: Don't call btrfs_start_transaction() on frozen fs to avoid deadlock.
Original Message Subject: Re: [PATCH] btrfs: Don't call btrfs_start_transaction() on frozen fs to avoid deadlock. From: Chris Mason c...@fb.com To: Qu Wenruo quwen...@cn.fujitsu.com Date: 2015年01月21日 09:05 On Tue, Jan 20, 2015 at 7:58 PM, Qu Wenruo quwen...@cn.fujitsu.com wrote: Original Message Subject: Re: [PATCH] btrfs: Don't call btrfs_start_transaction() on frozen fs to avoid deadlock. From: David Sterba dste...@suse.cz To: Qu Wenruo quwen...@cn.fujitsu.com Date: 2015年01月21日 01:13 On Mon, Jan 19, 2015 at 03:42:41PM +0800, Qu Wenruo wrote: --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -1000,6 +1000,14 @@ int btrfs_sync_fs(struct super_block *sb, int wait) */ if (fs_info-pending_changes == 0) return 0; +/* + * Test if the fs is frozen, or start_trasaction + * will deadlock on itself. + */ +if (__sb_start_write(sb, SB_FREEZE_FS, false)) +__sb_end_write(sb, SB_FREEZE_FS); +else +return 0; But what if someone freezes the FS after __sb_end_write() and before btrfs_start_transaction()? I don't see what keeps new freezers from coming in. -chris Either VFS::freeze_super() and VFS::syncfs() will hold the s_umount mutex, so freeze will not happen during sync. Thanks, Qu -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs: Don't call btrfs_start_transaction() on frozen fs to avoid deadlock.
On Tue, 20 Jan 2015 20:10:56 -0500, Chris Mason wrote: On Tue, Jan 20, 2015 at 8:09 PM, Qu Wenruo quwen...@cn.fujitsu.com wrote: Original Message Subject: Re: [PATCH] btrfs: Don't call btrfs_start_transaction() on frozen fs to avoid deadlock. From: Chris Mason c...@fb.com To: Qu Wenruo quwen...@cn.fujitsu.com Date: 2015年01月21日 09:05 On Tue, Jan 20, 2015 at 7:58 PM, Qu Wenruo quwen...@cn.fujitsu.com wrote: Original Message Subject: Re: [PATCH] btrfs: Don't call btrfs_start_transaction() on frozen fs to avoid deadlock. From: David Sterba dste...@suse.cz To: Qu Wenruo quwen...@cn.fujitsu.com Date: 2015年01月21日 01:13 On Mon, Jan 19, 2015 at 03:42:41PM +0800, Qu Wenruo wrote: --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -1000,6 +1000,14 @@ int btrfs_sync_fs(struct super_block *sb, int wait) */ if (fs_info-pending_changes == 0) return 0; +/* + * Test if the fs is frozen, or start_trasaction + * will deadlock on itself. + */ +if (__sb_start_write(sb, SB_FREEZE_FS, false)) +__sb_end_write(sb, SB_FREEZE_FS); +else +return 0; But what if someone freezes the FS after __sb_end_write() and before btrfs_start_transaction()? I don't see what keeps new freezers from coming in. -chris Either VFS::freeze_super() and VFS::syncfs() will hold the s_umount mutex, so freeze will not happen during sync. You're right. I was worried about the sync ioctl, but the mutex won't be held there to deadlock against. We'll be fine. There is another problem which is introduced by pending change. That is we will start and commmit a transaction by changing pending mount option after we set the fs to be R/O. I think it is better that we don't start a new transaction for pending changes which are set after the transaction is committed, just make them be handled by the next transaction, the reason is: - Make the behavior of the fs be consistent(both freezed fs and unfreezed fs) - Data on the disk is right and integrated Thanks Miao -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] fstest: btrfs/006: Add extra check on return value and 'fi show' by device
Original Message Subject: Re: [PATCH] fstest: btrfs/006: Add extra check on return value and 'fi show' by device From: Dave Chinner da...@fromorbit.com To: Qu Wenruo quwen...@cn.fujitsu.com Date: 2015年01月21日 12:03 On Fri, Jan 16, 2015 at 09:57:10AM +0800, Qu Wenruo wrote: Reported in Red Hat BZ#1181627, 'btrfs fi show' on unmounted device will return 1 even no error happens. Please describe the bug in the commit message, don't make me have to go look at some web page to work out what you are trying to fix. OK, I'll make the description more detailed. Introduced by: commit 2513077f btrfs-progs: fix device missing of btrfs fi show with seed devices Patch fixing it: https://patchwork.kernel.org/patch/5626001/ btrfs-progs: Fix wrong return value when executing 'fi show' on umounted device. What btrfs progs commit introduced and fixed is not actually relevant to fstests. This just saves some time for some users who failed the test and what to check if the bug is fixed or not. If I have to go read other stuff to understand the change you are making, then the description needs more detail and less external references I understand that I need to make the description more detailed. Although external reference does not help much for fstests, but it will really saves time for some tests. Like Fillipe Manana's test, which embedded such info into the test script, saved me a lot of time to find the regression commit. Reported-by: Vratislav Podzimek vpodz...@redhat.com Signed-off-by: Qu Wenruo quwen...@cn.fujitsu.com --- tests/btrfs/006 | 51 --- tests/btrfs/006.out | 10 ++ 2 files changed, 54 insertions(+), 7 deletions(-) diff --git a/tests/btrfs/006 b/tests/btrfs/006 index 715fd80..2d8c1c0 100755 --- a/tests/btrfs/006 +++ b/tests/btrfs/006 @@ -62,33 +62,70 @@ _scratch_pool_mkfs $seqres.full 21 || _fail mkfs failed # These have to be done unmounted...? echo == Set filesystem label to $LABEL -$BTRFS_UTIL_PROG filesystem label $SCRATCH_DEV $LABEL +$BTRFS_UTIL_PROG filesystem label $SCRATCH_DEV $LABEL || \ + _fail set lable failed Apart from the typo, why not _run_btrfs_util_prog? Oh thanks a lot for the existing infrastructure. With this one, the PIPESTATUS[] is no longer needed. However, this will make the test fail on older btrfs progs installations that we previously passing the test just fine, right? No, old tests will still pass, since it returns 0 as expected. Add the return values test just in case. IOWs, if you want to test that `btrfs fi show` always returns the correct value, don't add that checking to an existing test: write a new test that tests the validity of the return value OK, creating a new one seems reasonable enough for me. echo == Get filesystem label -$BTRFS_UTIL_PROG filesystem label $SCRATCH_DEV +$BTRFS_UTIL_PROG filesystem label $SCRATCH_DEV || \ + _fail get lable failed + +echo == Show filesystem by device(offline) +$BTRFS_UTIL_PROG filesystem show $FIRST_POOL_DEV | \ + _filter_btrfs_filesystem_show $TOTAL_DEVS $UUID +[ ${PIPESTATUS[0]} -ne 0 ] \ + _fail show filesystem by device(offline) return value wrong That's a new function being tested. Again, this belongs in a new test, not an existing regression test FWIW, Now that I've seen it used a couple of times, I don't really like the mess that checking errors via PIPESTATUS results in, especially as _run_btrfs_util_prog filesystem show $FIRST_POOL_DEV | \ _filter_btrfs_filesystem_show $TOTAL_DEVS $UUID Gives *exactly* the same failure detection Thanks for the _run_btrfs_util_prog macro hint. That's great and makes codes look much nicer. BTW, do I need to cleanup all the $BTRFS_UTIL_PROG in the original test? Thanks, Qu Cheers, Dave. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] xfstests: btrfs: fix up 001.out
On Fri, Jan 02, 2015 at 09:04:29PM +0800, Anand Jain wrote: The subvol delete output has changed with btrfs-progs -Delete subvolume 'SCRATCH_MNT/snap' +Delete subvolume (no-commit): 'SCRATCH_MNT/snap' so fix 001 failing. Signed-off-by: Anand Jain anand.j...@oracle.com v2: Thanks Filipe for mentioning now we have _run_btrfs_util_prog. and commit update --- tests/btrfs/001 | 2 +- tests/btrfs/001.out | 1 - 2 files changed, 1 insertion(+), 2 deletions(-) diff --git a/tests/btrfs/001 b/tests/btrfs/001 index 8258d06..a7747c8 100755 --- a/tests/btrfs/001 +++ b/tests/btrfs/001 @@ -99,7 +99,7 @@ echo Listing subvolumes $BTRFS_UTIL_PROG subvolume list $SCRATCH_MNT | awk '{ print $NF }' # Delete the snapshot -$BTRFS_UTIL_PROG subvolume delete $SCRATCH_MNT/snap | _filter_btrfs_subvol_delete +_run_btrfs_util_prog subvolume delete $SCRATCH_MNT/snap This is also the wrong way to fix the problem. We have output filters for a reason, people: _filter_btrfs_subvol_delete() { _filter_scratch | _filter_transcation_commit_default } Simply becomes: _filter_btrfs_subvol_delete() { _filter_scratch | _filter_transcation_commit_default | \ sed -e 's/^Delete subvolume.*:/Delete subvolume/' } The golden output does not change - the filter simply removes the part of the message that changed between versions. Cheers, Dave. -- Dave Chinner da...@fromorbit.com -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs: Don't call btrfs_start_transaction() on frozen fs to avoid deadlock.
On Tue, Jan 20, 2015 at 8:09 PM, Qu Wenruo quwen...@cn.fujitsu.com wrote: Original Message Subject: Re: [PATCH] btrfs: Don't call btrfs_start_transaction() on frozen fs to avoid deadlock. From: Chris Mason c...@fb.com To: Qu Wenruo quwen...@cn.fujitsu.com Date: 2015年01月21日 09:05 On Tue, Jan 20, 2015 at 7:58 PM, Qu Wenruo quwen...@cn.fujitsu.com wrote: Original Message Subject: Re: [PATCH] btrfs: Don't call btrfs_start_transaction() on frozen fs to avoid deadlock. From: David Sterba dste...@suse.cz To: Qu Wenruo quwen...@cn.fujitsu.com Date: 2015年01月21日 01:13 On Mon, Jan 19, 2015 at 03:42:41PM +0800, Qu Wenruo wrote: --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -1000,6 +1000,14 @@ int btrfs_sync_fs(struct super_block *sb, int wait) */ if (fs_info-pending_changes == 0) return 0; +/* + * Test if the fs is frozen, or start_trasaction + * will deadlock on itself. + */ +if (__sb_start_write(sb, SB_FREEZE_FS, false)) +__sb_end_write(sb, SB_FREEZE_FS); +else +return 0; But what if someone freezes the FS after __sb_end_write() and before btrfs_start_transaction()? I don't see what keeps new freezers from coming in. -chris Either VFS::freeze_super() and VFS::syncfs() will hold the s_umount mutex, so freeze will not happen during sync. You're right. I was worried about the sync ioctl, but the mutex won't be held there to deadlock against. We'll be fine. -chris -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] fstests: btrfs/079: Fix wrong value passed to available space check.
Original Message Subject: Re: [PATCH] fstests: btrfs/079: Fix wrong value passed to available space check. From: Dave Chinner da...@fromorbit.com To: Qu Wenruo quwen...@cn.fujitsu.com Date: 2015年01月21日 12:19 On Mon, Dec 29, 2014 at 03:17:18PM +0800, Qu Wenruo wrote: Before the patch, we passed wrong value to _require_fs_space, which should be in unit of 1024, but passed in unit of GB. Yes, that needs fixing. Fix it and add better prompt for falloc failure. That doesn't Signed-off-by: Qu Wenruo quwen...@cn.fujitsu.com --- tests/btrfs/079 | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/tests/btrfs/079 b/tests/btrfs/079 index 202d3e6..99d0187 100755 --- a/tests/btrfs/079 +++ b/tests/btrfs/079 @@ -73,8 +73,9 @@ rm -f $seqres.full _scratch_mkfs $seqres.full 21 _scratch_mount -_require_fs_space $SCRATCH_MNT $(($filesize / 1024 / 1024 / 1024)) -$XFS_IO_PROG -f -c falloc 0 $filesize $testfile +_require_fs_space $SCRATCH_MNT $(($filesize / 1024)) +$XFS_IO_PROG -f -c falloc 0 $filesize $testfile || \ + _fail falloc failed If the falloc fails, then the golden output match will fail. Let the test run, regardless, because the first thing it does is try to overwrite the fallocated region where the success or failure of the writes are completely ignored. Hence a falloc failure should also be ignored... Cheers, Dave. OK, I'll send a V2 one for this. Thanks for reviewing. Qu -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] fstest: btrfs/006: Add extra check on return value and 'fi show' by device
On Wed, Jan 21, 2015 at 12:19:32PM +0800, Qu Wenruo wrote: From: Dave Chinner da...@fromorbit.com On Fri, Jan 16, 2015 at 09:57:10AM +0800, Qu Wenruo wrote: Reported in Red Hat BZ#1181627, 'btrfs fi show' on unmounted device will return 1 even no error happens. Please describe the bug in the commit message, don't make me have to go look at some web page to work out what you are trying to fix. OK, I'll make the description more detailed. Introduced by: commit 2513077f btrfs-progs: fix device missing of btrfs fi show with seed devices Patch fixing it: https://patchwork.kernel.org/patch/5626001/ btrfs-progs: Fix wrong return value when executing 'fi show' on umounted device. What btrfs progs commit introduced and fixed is not actually relevant to fstests. This just saves some time for some users who failed the test and what to check if the bug is fixed or not. Doesn't save me time - it takes me much longer as I have to use 3 different tools to follow those three references to try to understand why you are making the change. If I have to go read other stuff to understand the change you are making, then the description needs more detail and less external references I understand that I need to make the description more detailed. Although external reference does not help much for fstests, but it will really saves time for some tests. Like Fillipe Manana's test, which embedded such info into the test script, saved me a lot of time to find the regression commit. What it helps is people running a new fstests tree on an older distro kernels. It doesn't save time for reviewers, yet reviewer resources are in much scarcer supply than distro QA resources. If you want your changes reviewed faster, then commit messages need to be reviewer friendly. Further, if you review other people's changes, then they are much more likely to review your changes quickly. If the review burden is left to me all the time then, like Andrew Morton, I get grumpy when people say but it helps me. BTW, do I need to cleanup all the $BTRFS_UTIL_PROG in the original test? Not if it causes older btrfs progs installations to fail. Cheers, Dave. -- Dave Chinner da...@fromorbit.com -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] xfstests: btrfs: fix up 001.out
Dave, On 01/21/2015 12:26 PM, Dave Chinner wrote: On Fri, Jan 02, 2015 at 09:04:29PM +0800, Anand Jain wrote: The subvol delete output has changed with btrfs-progs -Delete subvolume 'SCRATCH_MNT/snap' +Delete subvolume (no-commit): 'SCRATCH_MNT/snap' so fix 001 failing. Signed-off-by: Anand Jain anand.j...@oracle.com v2: Thanks Filipe for mentioning now we have _run_btrfs_util_prog. and commit update --- tests/btrfs/001 | 2 +- tests/btrfs/001.out | 1 - 2 files changed, 1 insertion(+), 2 deletions(-) diff --git a/tests/btrfs/001 b/tests/btrfs/001 index 8258d06..a7747c8 100755 --- a/tests/btrfs/001 +++ b/tests/btrfs/001 @@ -99,7 +99,7 @@ echo Listing subvolumes $BTRFS_UTIL_PROG subvolume list $SCRATCH_MNT | awk '{ print $NF }' # Delete the snapshot -$BTRFS_UTIL_PROG subvolume delete $SCRATCH_MNT/snap | _filter_btrfs_subvol_delete +_run_btrfs_util_prog subvolume delete $SCRATCH_MNT/snap This is also the wrong way to fix the problem. We have output filters for a reason, people: actually I purposely discouraged using the output filters (instead use command exit code), mainly because output filters are unnecessary hindrance to the good changes. And UIs using btrfs-progs anyway has to depend on the exit code to check the command status, so we should rightfully check that in our test scripts, this may apply lightly for commands like show, but would fit well for commands like delete as in here. just my point of view. Thanks, _filter_btrfs_subvol_delete() { _filter_scratch | _filter_transcation_commit_default } Simply becomes: _filter_btrfs_subvol_delete() { _filter_scratch | _filter_transcation_commit_default | \ sed -e 's/^Delete subvolume.*:/Delete subvolume/' } The golden output does not change - the filter simply removes the part of the message that changed between versions. Cheers, Dave. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs: Don't call btrfs_start_transaction() on frozen fs to avoid deadlock.
Original Message Subject: Re: [PATCH] btrfs: Don't call btrfs_start_transaction() on frozen fs to avoid deadlock. From: David Sterba dste...@suse.cz To: Qu Wenruo quwen...@cn.fujitsu.com Date: 2015年01月21日 01:13 On Mon, Jan 19, 2015 at 03:42:41PM +0800, Qu Wenruo wrote: --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -1000,6 +1000,14 @@ int btrfs_sync_fs(struct super_block *sb, int wait) */ if (fs_info-pending_changes == 0) return 0; + /* +* Test if the fs is frozen, or start_trasaction +* will deadlock on itself. +*/ + if (__sb_start_write(sb, SB_FREEZE_FS, false)) + __sb_end_write(sb, SB_FREEZE_FS); + else + return 0; The more I look into that the more I think that the first fix is the right one. Has been pointed out in this thread, it is ok to skip processing the pending changes if the filesystem is frozen. That's good, for me, either this patch or the patch 2~5 in the patchset will solve the sync_fs() problem on frozen fs. Just different timing to start the new transaction. But the patchset one has the problem, which needs to deal with the sysfs interface changes, or sync_fs() will still cause deadlock. So I tried to revert the sysfs related patches, but it seems overkilled, needing extra btrfs_start_transaction* things. As you already picked this one, I'm completely OK with this. The pending changes have to flushed from sync (by design), we cannot use mnt_want_write or the sb_start* protections that. The btrfs_freeze callback can safely do the last commit, that's under s_umount held by vfs::freeze_super. Then any other new transaction would block. Any other call to btrfs_sync_fs will not find any active transaction and with this patch will not start one. Sounds safe to me. I think the right level to check is SB_FREEZE_WRITE though, to stop any potential writes as soon as possible and when the s_umount lock is still held in vfs::freeze_super. SB_FREEZE_WRITE seems good for me. But I didn't catch the difference between SB_FREEZE_FS(WRITE/PAGEFAULT/COMPLETE), since freeze() conflicts with sync_fs(), when we comes to btrfs_sync_fs(), the fs is either totally frozen or unfrozen and frozen level won't change during the protection of s_umount. Although SB_FREEZE_WRITE seems better in its meaning and makes it more readable. I'll collect the relevant patches and will send it for review. Thanks for collecting them and sending them out. Thanks, Qu trans = btrfs_start_transaction(root, 0); } else { return PTR_ERR(trans); -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs: Don't call btrfs_start_transaction() on frozen fs to avoid deadlock.
Original Message Subject: Re: [PATCH] btrfs: Don't call btrfs_start_transaction() on frozen fs to avoid deadlock. From: Miao Xie miao...@huawei.com To: Qu Wenruo quwen...@cn.fujitsu.com, Chris Mason c...@fb.com Date: 2015年01月21日 11:26 On Wed, 21 Jan 2015 11:15:41 +0800, Qu Wenruo wrote: Original Message Subject: Re: [PATCH] btrfs: Don't call btrfs_start_transaction() on frozen fs to avoid deadlock. From: Miao Xie miao...@huawei.com To: Chris Mason c...@fb.com, Qu Wenruo quwen...@cn.fujitsu.com Date: 2015年01月21日 11:10 On Tue, 20 Jan 2015 20:10:56 -0500, Chris Mason wrote: On Tue, Jan 20, 2015 at 8:09 PM, Qu Wenruo quwen...@cn.fujitsu.com wrote: Original Message Subject: Re: [PATCH] btrfs: Don't call btrfs_start_transaction() on frozen fs to avoid deadlock. From: Chris Mason c...@fb.com To: Qu Wenruo quwen...@cn.fujitsu.com Date: 2015年01月21日 09:05 On Tue, Jan 20, 2015 at 7:58 PM, Qu Wenruo quwen...@cn.fujitsu.com wrote: Original Message Subject: Re: [PATCH] btrfs: Don't call btrfs_start_transaction() on frozen fs to avoid deadlock. From: David Sterba dste...@suse.cz To: Qu Wenruo quwen...@cn.fujitsu.com Date: 2015年01月21日 01:13 On Mon, Jan 19, 2015 at 03:42:41PM +0800, Qu Wenruo wrote: --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -1000,6 +1000,14 @@ int btrfs_sync_fs(struct super_block *sb, int wait) */ if (fs_info-pending_changes == 0) return 0; +/* + * Test if the fs is frozen, or start_trasaction + * will deadlock on itself. + */ +if (__sb_start_write(sb, SB_FREEZE_FS, false)) +__sb_end_write(sb, SB_FREEZE_FS); +else +return 0; But what if someone freezes the FS after __sb_end_write() and before btrfs_start_transaction()? I don't see what keeps new freezers from coming in. -chris Either VFS::freeze_super() and VFS::syncfs() will hold the s_umount mutex, so freeze will not happen during sync. You're right. I was worried about the sync ioctl, but the mutex won't be held there to deadlock against. We'll be fine. There is another problem which is introduced by pending change. That is we will start and commmit a transaction by changing pending mount option after we set the fs to be R/O. Oh, I missed this problem. I think it is better that we don't start a new transaction for pending changes which are set after the transaction is committed, just make them be handled by the next transaction, This will cause another problem, nobody can ensure there will be next transaction and the change may never to written into disk. First, the pending changes is mount option, that is in-memory data. Second, the same problem would happen after you freeze fs. Pending changes are *not* only mount options. Feature change and label change are also pending changes if using sysfs. Normal ioctl label changing is not affected. For freeze, it's not the same problem since the fs will be unfreeze sooner or later and transaction will be initiated. For example, if we change the features/label through sysfs, and then umount the fs, It is different from pending change. No, now features/label changing using sysfs both use pending changes to do the commit. See BTRFS_PENDING_COMMIT bit. So freeze - change features/label - sync will still cause the deadlock in the same way, and you can try it yourself. Thanks, Qu If you want to change features/label, you should get write permission and make sure the fs is not be freezed because those are on-disk data. So the problem doesn't exist, or there is a bug. Thanks Miao since there is no write, there is no running transaction and if we don't start a new transaction, it won't be flushed to disk. Thanks, Qu the reason is: - Make the behavior of the fs be consistent(both freezed fs and unfreezed fs) - Data on the disk is right and integrated Thanks Miao . -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs: Don't call btrfs_start_transaction() on frozen fs to avoid deadlock.
On Wed, 21 Jan 2015 11:53:34 +0800, Qu Wenruo wrote: +/* + * Test if the fs is frozen, or start_trasaction + * will deadlock on itself. + */ +if (__sb_start_write(sb, SB_FREEZE_FS, false)) +__sb_end_write(sb, SB_FREEZE_FS); +else +return 0; But what if someone freezes the FS after __sb_end_write() and before btrfs_start_transaction()? I don't see what keeps new freezers from coming in. -chris Either VFS::freeze_super() and VFS::syncfs() will hold the s_umount mutex, so freeze will not happen during sync. You're right. I was worried about the sync ioctl, but the mutex won't be held there to deadlock against. We'll be fine. There is another problem which is introduced by pending change. That is we will start and commmit a transaction by changing pending mount option after we set the fs to be R/O. Oh, I missed this problem. I think it is better that we don't start a new transaction for pending changes which are set after the transaction is committed, just make them be handled by the next transaction, This will cause another problem, nobody can ensure there will be next transaction and the change may never to written into disk. First, the pending changes is mount option, that is in-memory data. Second, the same problem would happen after you freeze fs. Pending changes are *not* only mount options. Feature change and label change are also pending changes if using sysfs. My miss, I don't notice feature and label change by sysfs. But the implementation of feature and label change by sysfs is wrong, we can not change them without write permission. Normal ioctl label changing is not affected. For freeze, it's not the same problem since the fs will be unfreeze sooner or later and transaction will be initiated. You can not assume the operations of the users, they might freeze the fs and then shutdown the machine. For example, if we change the features/label through sysfs, and then umount the fs, It is different from pending change. No, now features/label changing using sysfs both use pending changes to do the commit. See BTRFS_PENDING_COMMIT bit. So freeze - change features/label - sync will still cause the deadlock in the same way, and you can try it yourself. As I said above, the implementation of sysfs feature and label change is wrong, it is better to separate them from the pending mount option change, make the sysfs feature and label change be done in the context of transaction after getting the write permission. If so, we needn't do anything special when sync the fs. In short, changing the sysfs feature and label change implementation and removing the unnecessary btrfs_start_transaction in sync_fs can fix the deadlock. Thanks Miao Thanks, Qu If you want to change features/label, you should get write permission and make sure the fs is not be freezed because those are on-disk data. So the problem doesn't exist, or there is a bug. Thanks Miao since there is no write, there is no running transaction and if we don't start a new transaction, it won't be flushed to disk. Thanks, Qu the reason is: - Make the behavior of the fs be consistent(both freezed fs and unfreezed fs) - Data on the disk is right and integrated Thanks Miao . . -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs: Don't call btrfs_start_transaction() on frozen fs to avoid deadlock.
Original Message Subject: Re: [PATCH] btrfs: Don't call btrfs_start_transaction() on frozen fs to avoid deadlock. From: Miao Xie miao...@huawei.com To: Qu Wenruo quwen...@cn.fujitsu.com, Chris Mason c...@fb.com Date: 2015年01月21日 15:04 On Wed, 21 Jan 2015 11:53:34 +0800, Qu Wenruo wrote: [snipped] This will cause another problem, nobody can ensure there will be next transaction and the change may never to written into disk. First, the pending changes is mount option, that is in-memory data. Second, the same problem would happen after you freeze fs. Pending changes are *not* only mount options. Feature change and label change are also pending changes if using sysfs. My miss, I don't notice feature and label change by sysfs. But the implementation of feature and label change by sysfs is wrong, we can not change them without write permission. Normal ioctl label changing is not affected. For freeze, it's not the same problem since the fs will be unfreeze sooner or later and transaction will be initiated. You can not assume the operations of the users, they might freeze the fs and then shutdown the machine. For example, if we change the features/label through sysfs, and then umount the fs, It is different from pending change. No, now features/label changing using sysfs both use pending changes to do the commit. See BTRFS_PENDING_COMMIT bit. So freeze - change features/label - sync will still cause the deadlock in the same way, and you can try it yourself. As I said above, the implementation of sysfs feature and label change is wrong, it is better to separate them from the pending mount option change, make the sysfs feature and label change be done in the context of transaction after getting the write permission. If so, we needn't do anything special when sync the fs. In short, changing the sysfs feature and label change implementation and removing the unnecessary btrfs_start_transaction in sync_fs can fix the deadlock. Your method will only fix the deadlock, but will introduce the risk like pending inode_cache will never be written to disk as I already explained. (If still using the fs_info-pending_changes mechanism) To ensure pending changes written to disk sync_fs() should start a transaction if needed, or there will be chance that no transaction can handle it. But I don't see the necessity to pending current work(inode_cache, feature/label changes) to next transaction. To David: I'm a little curious about why inode_cache needs to be delayed to next transaction. In btrfs_remount() we have s_umount mutex, and we synced the whole filesystem already, so there should be no running transaction and we can just set any mount option into fs_info. Or even in worst case, there is a racing window, we can still start a transaction and do the commit, a little overhead in such minor case won't impact the overall performance. For sysfs change, I prefer attach or start transaction method, and for mount option change, and such sysfs tuning is also minor case for a filesystem. What do you think about reverting the whole patchset and rework the sysfs interface? Thanks, Qu Thanks Miao Thanks, Qu If you want to change features/label, you should get write permission and make sure the fs is not be freezed because those are on-disk data. So the problem doesn't exist, or there is a bug. Thanks Miao since there is no write, there is no running transaction and if we don't start a new transaction, it won't be flushed to disk. Thanks, Qu the reason is: - Make the behavior of the fs be consistent(both freezed fs and unfreezed fs) - Data on the disk is right and integrated Thanks Miao . . -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2] fstests: btrfs/079: Fix wrong value passed to available space check.
Wrong value is passed to _require_fs_space, which should be in unit of kilobyte(1024), but passed in unit of gigabyte(1024^3). Fix it. Signed-off-by: Qu Wenruo quwen...@cn.fujitsu.com --- tests/btrfs/079 | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tests/btrfs/079 b/tests/btrfs/079 index 202d3e6..b4a4189 100755 --- a/tests/btrfs/079 +++ b/tests/btrfs/079 @@ -73,7 +73,7 @@ rm -f $seqres.full _scratch_mkfs $seqres.full 21 _scratch_mount -_require_fs_space $SCRATCH_MNT $(($filesize / 1024 / 1024 / 1024)) +_require_fs_space $SCRATCH_MNT $(($filesize / 1024)) $XFS_IO_PROG -f -c falloc 0 $filesize $testfile dd_work() { -- 2.2.2 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] fstests: btrfs/082: Check return value of btrfs filesystem show command executed on umounted device.
The return value should always be 0 if no problem happens, but btrfs filesystem show executed on umounted device will always return 1 even there is no problem. This testcase just checks it. Signed-off-by: Qu Wenruo quwen...@cn.fujitsu.com --- tests/btrfs/082 | 71 + tests/btrfs/082.out | 2 ++ tests/btrfs/group | 1 + 3 files changed, 74 insertions(+) create mode 100755 tests/btrfs/082 create mode 100644 tests/btrfs/082.out diff --git a/tests/btrfs/082 b/tests/btrfs/082 new file mode 100755 index 000..d33e836 --- /dev/null +++ b/tests/btrfs/082 @@ -0,0 +1,71 @@ +#! /bin/bash +# FS QA Test No. btrfs/082 +# +# Check return value of btrfs filesystem show command executed on +# umounted device. +# It should return 0 if nothing wrong happens. +# +# Regression in v3.18 btrfs-progs and fixed by the following patch: +# +#btrfs-progs: Fix wrong return value when executing 'fi show' on +#umounted device. +# +#--- +# Copyright (c) 2015 Fujitsu, Inc. All Rights Reserved. +# +# This program is free software; you can redistribute it and/or +# modify it under the terms of the GNU General Public License as +# published by the Free Software Foundation. +# +# This program is distributed in the hope that it would be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program; if not, write the Free Software Foundation, +# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA +#--- +# + +seq=`basename $0` +seqres=$RESULT_DIR/$seq +echo QA output created by $seq + +here=`pwd` +tmp=/tmp/$$ +status=1 # failure is the default! +trap _cleanup; exit \$status 0 1 2 3 15 + +_cleanup() +{ +cd / +rm -f $tmp.* +} + +# get standard environment, filters and checks +. ./common/rc +. ./common/filter.btrfs + +# real QA test starts here + +# Modify as appropriate. +_supported_fs btrfs +_supported_os Linux +_require_scratch +_require_scratch_dev_pool + +rm -f $seqres.full + +FIRST_POOL_DEV=`echo $SCRATCH_DEV_POOL | awk '{print $1}'` +TOTAL_DEVS=`echo $SCRATCH_DEV_POOL | wc -w` + +_scratch_pool_mkfs $seqres.full 21 || _fail mkfs failed + +_run_btrfs_util_prog filesystem show $FIRST_POOL_DEV | \ + _filter_btrfs_filesystem_show $TOTAL_DEVS + +# success, all done +echo Silence is golden +status=0 +exit diff --git a/tests/btrfs/082.out b/tests/btrfs/082.out new file mode 100644 index 000..2977f14 --- /dev/null +++ b/tests/btrfs/082.out @@ -0,0 +1,2 @@ +QA output created by 082 +Silence is golden diff --git a/tests/btrfs/group b/tests/btrfs/group index e2b..709664a 100644 --- a/tests/btrfs/group +++ b/tests/btrfs/group @@ -84,3 +84,4 @@ 079 auto rw metadata 080 auto snapshot 081 auto quick clone +082 auto quick -- 1.8.3.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html