Re: [PATCH 3/3] btrfs: qgroup: destroy related qgroup when removing subvolume if needed.
Hello, When removing a subvol, if the related qgroup has no child qgroup, we should destroy it at the same time. Also remove the TODO entry in qgroup.c. My two cents here: Take a quick look at this, i am not sure whether this is right way to do it. Actually, i think we can only remove a subvolume qgroup safely when both refer and excl are 0, because for subvolume removal case, to make qgroup accounting right, we need walk tree. At this time, qgroup walking might have not finished, if we remove this qgroup directly, accounting for this qgroup will be skipped. Maybe we could remove such qgroup when at qgroup accounting, we find an qgroup’s refer and excl are 0, we could remove it, or we do it at the background.. Also, there is another risk, that is to say, if qgroup accounting not work right, So we might need gurantee that subvolume it going to be deleted or have been deleted at the same time, then we could remove qgroup safely etc. Signed-off-by: Dongsheng Yang yangds.f...@cn.fujitsu.com --- fs/btrfs/inode.c | 4 fs/btrfs/qgroup.c | 1 - 2 files changed, 4 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index e687bb0..31de211 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -59,6 +59,7 @@ #include backref.h #include hash.h #include props.h +#include qgroup.h struct btrfs_iget_args { struct btrfs_key *location; @@ -4029,6 +4030,9 @@ int btrfs_unlink_subvol(struct btrfs_trans_handle *trans, ret = btrfs_update_inode_fallback(trans, root, dir); if (ret) btrfs_abort_transaction(trans, root, ret); + ret = btrfs_remove_qgroup(trans, root-fs_info, objectid); + if (ret == -EBUSY) + ret = 0; out: btrfs_free_path(path); return ret; diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c index c3b1e4f..303c078 100644 --- a/fs/btrfs/qgroup.c +++ b/fs/btrfs/qgroup.c @@ -35,7 +35,6 @@ #include qgroup.h /* TODO XXX FIXME - * - subvol delete - delete when ref goes to 0? delete limits also? * - reorganize keys * - compressed * - sync -- 1.8.4.2 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Best Regards, Wang Shilong -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/3] btrfs: qgroup: destroy related qgroup when removing subvolume if needed.
When removing a subvol, if the related qgroup has no child qgroup, we should destroy it at the same time. Also remove the TODO entry in qgroup.c. Signed-off-by: Dongsheng Yang yangds.f...@cn.fujitsu.com --- fs/btrfs/inode.c | 4 fs/btrfs/qgroup.c | 1 - 2 files changed, 4 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index e687bb0..31de211 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -59,6 +59,7 @@ #include backref.h #include hash.h #include props.h +#include qgroup.h struct btrfs_iget_args { struct btrfs_key *location; @@ -4029,6 +4030,9 @@ int btrfs_unlink_subvol(struct btrfs_trans_handle *trans, ret = btrfs_update_inode_fallback(trans, root, dir); if (ret) btrfs_abort_transaction(trans, root, ret); + ret = btrfs_remove_qgroup(trans, root-fs_info, objectid); + if (ret == -EBUSY) + ret = 0; out: btrfs_free_path(path); return ret; diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c index c3b1e4f..303c078 100644 --- a/fs/btrfs/qgroup.c +++ b/fs/btrfs/qgroup.c @@ -35,7 +35,6 @@ #include qgroup.h /* TODO XXX FIXME - * - subvol delete - delete when ref goes to 0? delete limits also? * - reorganize keys * - compressed * - sync -- 1.8.4.2 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/3] btrfs: qgroup: allow user to remove qgroup which has parent but no child.
When a qgroup has parents but no child, it should be removable in Theory. But currently, we can not remove it when it has either parent or child. Example: # btrfs quota enable /mnt # btrfs qgroup create 1/0 /mnt # btrfs qgroup create 2/0 /mnt # btrfs qgroup assign 1/0 2/0 /mnt # btrfs qgroup show -pcre /mnt qgroupid rfer excl max_rfer max_excl parent child -- - 0/5 16384 16384 00--- --- 1/0 0 0 002/0 --- 2/0 0 0 00--- 1/0 At this time, there is no subvol or qgroup depending on qgroup 1/0. Just a qgroup 2/0 is its parent, but 2/0 can work well without 1/0. So I think 1/0 should be removalbe. But: # btrfs qgroup destroy 1/0 /mnt ERROR: unable to destroy quota group: Device or resource busy This patch remove the check of qgroup-parent in removing it. Signed-off-by: Dongsheng Yang yangds.f...@cn.fujitsu.com --- fs/btrfs/qgroup.c | 30 +- 1 file changed, 25 insertions(+), 5 deletions(-) diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c index c2983a4..c3b1e4f 100644 --- a/fs/btrfs/qgroup.c +++ b/fs/btrfs/qgroup.c @@ -1048,7 +1048,7 @@ out: return ret; } -int btrfs_del_qgroup_relation(struct btrfs_trans_handle *trans, +int __del_qgroup_relation(struct btrfs_trans_handle *trans, struct btrfs_fs_info *fs_info, u64 src, u64 dst) { struct btrfs_root *quota_root; @@ -1058,7 +1058,6 @@ int btrfs_del_qgroup_relation(struct btrfs_trans_handle *trans, int ret = 0; int err; - mutex_lock(fs_info-qgroup_ioctl_lock); quota_root = fs_info-quota_root; if (!quota_root) { ret = -EINVAL; @@ -1089,7 +1088,18 @@ exist: del_relation_rb(fs_info, src, dst); spin_unlock(fs_info-qgroup_lock); out: + return ret; +} + +int btrfs_del_qgroup_relation(struct btrfs_trans_handle *trans, + struct btrfs_fs_info *fs_info, u64 src, u64 dst) +{ + int ret = 0; + + mutex_lock(fs_info-qgroup_ioctl_lock); + ret = __del_qgroup_relation(trans, fs_info, src, dst); mutex_unlock(fs_info-qgroup_ioctl_lock); + return ret; } @@ -1132,6 +1142,7 @@ int btrfs_remove_qgroup(struct btrfs_trans_handle *trans, { struct btrfs_root *quota_root; struct btrfs_qgroup *qgroup; + struct btrfs_qgroup_list *list; int ret = 0; mutex_lock(fs_info-qgroup_ioctl_lock); @@ -1146,15 +1157,24 @@ int btrfs_remove_qgroup(struct btrfs_trans_handle *trans, ret = -ENOENT; goto out; } else { - /* check if there are no relations to this qgroup */ - if (!list_empty(qgroup-groups) || - !list_empty(qgroup-members)) { + /* check if there are no children of this qgroup */ + if (!list_empty(qgroup-members)) { ret = -EBUSY; goto out; } } ret = del_qgroup_item(trans, quota_root, qgroupid); + while (!list_empty(qgroup-groups)) { + list = list_first_entry(qgroup-groups, + struct btrfs_qgroup_list, next_group); + ret = __del_qgroup_relation(trans, fs_info, + qgroupid, + list-group-qgroupid); + if (ret) + goto out; + } + spin_lock(fs_info-qgroup_lock); del_qgroup_rb(quota_root-fs_info, qgroupid); spin_unlock(fs_info-qgroup_lock); -- 1.8.4.2 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs: Don't call btrfs_start_transaction() on frozen fs to avoid deadlock.
On Wed, 21 Jan 2015 15:47:54 +0800, Qu Wenruo wrote: On Wed, 21 Jan 2015 11:53:34 +0800, Qu Wenruo wrote: [snipped] This will cause another problem, nobody can ensure there will be next transaction and the change may never to written into disk. First, the pending changes is mount option, that is in-memory data. Second, the same problem would happen after you freeze fs. Pending changes are *not* only mount options. Feature change and label change are also pending changes if using sysfs. My miss, I don't notice feature and label change by sysfs. But the implementation of feature and label change by sysfs is wrong, we can not change them without write permission. Normal ioctl label changing is not affected. For freeze, it's not the same problem since the fs will be unfreeze sooner or later and transaction will be initiated. You can not assume the operations of the users, they might freeze the fs and then shutdown the machine. For example, if we change the features/label through sysfs, and then umount the fs, It is different from pending change. No, now features/label changing using sysfs both use pending changes to do the commit. See BTRFS_PENDING_COMMIT bit. So freeze - change features/label - sync will still cause the deadlock in the same way, and you can try it yourself. As I said above, the implementation of sysfs feature and label change is wrong, it is better to separate them from the pending mount option change, make the sysfs feature and label change be done in the context of transaction after getting the write permission. If so, we needn't do anything special when sync the fs. In short, changing the sysfs feature and label change implementation and removing the unnecessary btrfs_start_transaction in sync_fs can fix the deadlock. Your method will only fix the deadlock, but will introduce the risk like pending inode_cache will never be written to disk as I already explained. (If still using the fs_info-pending_changes mechanism) To ensure pending changes written to disk sync_fs() should start a transaction if needed, or there will be chance that no transaction can handle it. We are sure that writting down the inode cache need transaction. But INODE_CACHE is not a forcible flag. Sometimes though you set it, it is very likely that the inode cache files are not created and the data is not written down because the fs might still be reading inode usage information, and this operation might span several transactions. So I think what you worried is not a problem. Thanks Miao But I don't see the necessity to pending current work(inode_cache, feature/label changes) to next transaction. To David: I'm a little curious about why inode_cache needs to be delayed to next transaction. In btrfs_remount() we have s_umount mutex, and we synced the whole filesystem already, so there should be no running transaction and we can just set any mount option into fs_info. Or even in worst case, there is a racing window, we can still start a transaction and do the commit, a little overhead in such minor case won't impact the overall performance. For sysfs change, I prefer attach or start transaction method, and for mount option change, and such sysfs tuning is also minor case for a filesystem. What do you think about reverting the whole patchset and rework the sysfs interface? Thanks, Qu Thanks Miao Thanks, Qu If you want to change features/label, you should get write permission and make sure the fs is not be freezed because those are on-disk data. So the problem doesn't exist, or there is a bug. Thanks Miao since there is no write, there is no running transaction and if we don't start a new transaction, it won't be flushed to disk. Thanks, Qu the reason is: - Make the behavior of the fs be consistent(both freezed fs and unfreezed fs) - Data on the disk is right and integrated Thanks Miao . . . -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: Recovery options for FS forced readonly due to 3.17 snapshot bug
-Original message- From: Brett King brett.k...@commandict.com.au Sent: Wed 01-21-2015 09:26 am Subject:RE: Recovery options for FS forced readonly due to 3.17 snapshot bug CC: linux-btrfs@vger.kernel.org; To: fdman...@gmail.com; From: Filipe David Manana fdman...@gmail.com Sent: Tue 01-20-2015 11:40 pm Subject: Re: Recovery options for FS forced readonly due to 3.17 snapshot bug To: brett.k...@commandict.com.au; CC: linux-btrfs@vger.kernel.org; On Tue, Jan 20, 2015 at 12:15 PM, brett.k...@commandict.com.au wrote: Hi, My FS has been forced readonly by the early 3.17 snapshot bug. After much reading, I'm looking for validation of some recovery scenarios: 1) btrfsck --repair under a later kernel. 2) replacing the devices one by one under a later kernel, effectively removing the corruption. 3) Just copying data across from the readonly source to a new FS created under a later kernel. Option 1 with check only (not repair) segfaults and coredumps on 3.18 (see below) .. are there some other options needed here or perhaps patches pending to rectify this ? Option 2 would be better than nothing, as I get to keep my snapshots and don't need any more surplus drives than 2x units of the largest size (data is RAID1). Option 3 is a fallback and basically forfeits the source FS including snapshots, is guaranteed to work (I can read the data fine), however requires 1:1 capacity in new disks to achieve. So being unable to complete a btrfsck check and generally scared off by a btrfsck -repair, will option 2 work ? I.E. can I still replace devices on a readonly FS ? And are there any other options ? Note the FS is very nearly full of data - I did also get continuous disk activity due to this high utilisation triggering the endless reclaim unused extent routine, for a few days prior to it being forced readonly. [root@array ~]# uname -r 3.18.2-200.fc21.x86_64 (upgraded since getting the issue) [root@array ~]# btrfsck /dev/sdm snip parent transid verify failed on 40198674350080 wanted 1623837 found 1622986 Ignoring transid failure Segmentation fault (core dumped) [root@array ~]# btrfs fi sh /export/archive/ Label: 'archive' uuid: 22c7663a-93ca-40a6-9491-26abaa62b924 Total devices 8 FS bytes used 18.07TiB devid1 size 5.46TiB used 5.46TiB path /dev/sdm devid2 size 3.64TiB used 3.64TiB path /dev/sdc devid3 size 3.64TiB used 3.64TiB path /dev/sdd devid4 size 5.46TiB used 5.46TiB path /dev/sdl devid5 size 4.55TiB used 4.55TiB path /dev/sdb devid8 size 4.55TiB used 4.55TiB path /dev/sdj devid9 size 4.55TiB used 4.55TiB path /dev/sdk devid 10 size 4.55TiB used 4.55TiB path /dev/sde Btrfs v3.18.1 [root@array ~]# df -h /export/archive/ Filesystem Size Used Avail Use% Mounted on /dev/sdm 19T 19T 123G 100% /export/archive [root@array ~]# btrfs fi df /export/archive/ Data, RAID1: total=18.17TiB, used=18.05TiB System, RAID1: total=32.00MiB, used=3.00MiB Metadata, RAID1: total=23.00GiB, used=22.49GiB GlobalReserve, single: total=512.00MiB, used=0.00B [root@array ~]# dmesg |grep -i btrfs |less [2.044134] Btrfs loaded [2.046327] BTRFS: device label array devid 1 transid 289965 /dev/sda4 [2.472321] BTRFS info (device sda4): disk space caching is enabled [2.513784] BTRFS: detected SSD devices, enabling SSD mode [2.985877] BTRFS info (device sda4): disk space caching is enabled [3.520737] BTRFS: device label archive devid 5 transid 1623944 /dev/sdf [3.544394] BTRFS: device label archive devid 3 transid 1623944 /dev/sdh [3.544553] BTRFS: device label archive devid 10 transid 1623944 /dev/sdi [3.544957] BTRFS: device label archive devid 2 transid 1623944 /dev/sdg [3.614636] BTRFS: device label archive devid 9 transid 1623944 /dev/sdc [3.663006] BTRFS: device label archive devid 8 transid 1623944 /dev/sdb [3.663092] BTRFS: device label archive devid 4 transid 1623944 /dev/sdd [3.685397] BTRFS: device label archive devid 1 transid 1623944 /dev/sde [3.701732] BTRFS info (device sde): disk space caching is enabled [3.903413] BTRFS: bdev /dev/sde errs: wr 2805, rd 1548, flush 0, corrupt 0, gen 0 [3.903421] BTRFS: bdev /dev/sdd errs: wr 2805, rd 0, flush 0, corrupt 0, gen 0 [3.903426] BTRFS: bdev /dev/sdb errs: wr 2805, rd 33, flush 0, corrupt 0, gen 0 [3.903430] BTRFS: bdev /dev/sdc errs: wr 2806, rd 0, flush 0, corrupt 0, gen 0 You have device write and read errors here - the snapshots issue with 3.17 wouldn't bump these device error stats. Even if you were affected by the