Re: [PATCH 3/3] btrfs: qgroup: destroy related qgroup when removing subvolume if needed.

2015-01-21 Thread Wang Shilong
Hello,

 
 When removing a subvol, if the related qgroup has no child qgroup, we should 
 destroy
 it at the same time. Also remove the TODO entry in qgroup.c.

My two cents here:

Take a quick look at this, i am not sure whether this is right way to do it.

Actually, i think we can only remove a subvolume qgroup safely when both refer 
and excl are 0,
because for subvolume removal case, to make qgroup accounting right, we need 
walk tree.

At this time, qgroup walking might have not finished, if we remove this qgroup 
directly,
accounting for this qgroup will be skipped.

Maybe we could remove such qgroup when at qgroup accounting, we find an 
qgroup’s refer and excl
are 0, we could remove it, or we do it at the background..

Also, there is another risk, that is to say, if qgroup accounting not work 
right, So
we might need gurantee that subvolume it going to be deleted or have been 
deleted at the same time,
then we could remove qgroup safely etc.


 Signed-off-by: Dongsheng Yang yangds.f...@cn.fujitsu.com
 ---
 fs/btrfs/inode.c  | 4 
 fs/btrfs/qgroup.c | 1 -
 2 files changed, 4 insertions(+), 1 deletion(-)
 
 diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
 index e687bb0..31de211 100644
 --- a/fs/btrfs/inode.c
 +++ b/fs/btrfs/inode.c
 @@ -59,6 +59,7 @@
 #include backref.h
 #include hash.h
 #include props.h
 +#include qgroup.h
 
 struct btrfs_iget_args {
   struct btrfs_key *location;
 @@ -4029,6 +4030,9 @@ int btrfs_unlink_subvol(struct btrfs_trans_handle 
 *trans,
   ret = btrfs_update_inode_fallback(trans, root, dir);
   if (ret)
   btrfs_abort_transaction(trans, root, ret);
 + ret = btrfs_remove_qgroup(trans, root-fs_info, objectid);
 + if (ret == -EBUSY)
 + ret = 0;
 out:
   btrfs_free_path(path);
   return ret;
 diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
 index c3b1e4f..303c078 100644
 --- a/fs/btrfs/qgroup.c
 +++ b/fs/btrfs/qgroup.c
 @@ -35,7 +35,6 @@
 #include qgroup.h
 
 /* TODO XXX FIXME
 - *  - subvol delete - delete when ref goes to 0? delete limits also?
  *  - reorganize keys
  *  - compressed
  *  - sync
 -- 
 1.8.4.2
 
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

Best Regards,
Wang Shilong

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/3] btrfs: qgroup: destroy related qgroup when removing subvolume if needed.

2015-01-21 Thread Dongsheng Yang
When removing a subvol, if the related qgroup has no child qgroup, we should 
destroy
it at the same time. Also remove the TODO entry in qgroup.c.

Signed-off-by: Dongsheng Yang yangds.f...@cn.fujitsu.com
---
 fs/btrfs/inode.c  | 4 
 fs/btrfs/qgroup.c | 1 -
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index e687bb0..31de211 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -59,6 +59,7 @@
 #include backref.h
 #include hash.h
 #include props.h
+#include qgroup.h
 
 struct btrfs_iget_args {
struct btrfs_key *location;
@@ -4029,6 +4030,9 @@ int btrfs_unlink_subvol(struct btrfs_trans_handle *trans,
ret = btrfs_update_inode_fallback(trans, root, dir);
if (ret)
btrfs_abort_transaction(trans, root, ret);
+   ret = btrfs_remove_qgroup(trans, root-fs_info, objectid);
+   if (ret == -EBUSY)
+   ret = 0;
 out:
btrfs_free_path(path);
return ret;
diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index c3b1e4f..303c078 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -35,7 +35,6 @@
 #include qgroup.h
 
 /* TODO XXX FIXME
- *  - subvol delete - delete when ref goes to 0? delete limits also?
  *  - reorganize keys
  *  - compressed
  *  - sync
-- 
1.8.4.2

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/3] btrfs: qgroup: allow user to remove qgroup which has parent but no child.

2015-01-21 Thread Dongsheng Yang
When a qgroup has parents but no child, it should be removable in
Theory. But currently, we can not remove it when it has
either parent or child.

Example:
# btrfs quota enable /mnt
# btrfs qgroup create 1/0 /mnt
# btrfs qgroup create 2/0 /mnt
# btrfs qgroup assign 1/0 2/0 /mnt
# btrfs qgroup show -pcre /mnt
qgroupid rfer  excl  max_rfer max_excl parent  child
       --  -
0/5  16384 16384 00--- ---
1/0  0 0 002/0 ---
2/0  0 0 00--- 1/0

At this time, there is no subvol or qgroup depending on qgroup 1/0.
Just a qgroup 2/0 is its parent, but 2/0 can work well without
1/0. So I think 1/0 should be removalbe. But:
# btrfs qgroup destroy 1/0 /mnt
ERROR: unable to destroy quota group: Device or resource busy

This patch remove the check of qgroup-parent in removing it.

Signed-off-by: Dongsheng Yang yangds.f...@cn.fujitsu.com
---
 fs/btrfs/qgroup.c | 30 +-
 1 file changed, 25 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index c2983a4..c3b1e4f 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -1048,7 +1048,7 @@ out:
return ret;
 }
 
-int btrfs_del_qgroup_relation(struct btrfs_trans_handle *trans,
+int __del_qgroup_relation(struct btrfs_trans_handle *trans,
  struct btrfs_fs_info *fs_info, u64 src, u64 dst)
 {
struct btrfs_root *quota_root;
@@ -1058,7 +1058,6 @@ int btrfs_del_qgroup_relation(struct btrfs_trans_handle 
*trans,
int ret = 0;
int err;
 
-   mutex_lock(fs_info-qgroup_ioctl_lock);
quota_root = fs_info-quota_root;
if (!quota_root) {
ret = -EINVAL;
@@ -1089,7 +1088,18 @@ exist:
del_relation_rb(fs_info, src, dst);
spin_unlock(fs_info-qgroup_lock);
 out:
+   return ret;
+}
+
+int btrfs_del_qgroup_relation(struct btrfs_trans_handle *trans,
+ struct btrfs_fs_info *fs_info, u64 src, u64 dst)
+{
+   int ret = 0;
+
+   mutex_lock(fs_info-qgroup_ioctl_lock);
+   ret = __del_qgroup_relation(trans, fs_info, src, dst);
mutex_unlock(fs_info-qgroup_ioctl_lock);
+
return ret;
 }
 
@@ -1132,6 +1142,7 @@ int btrfs_remove_qgroup(struct btrfs_trans_handle *trans,
 {
struct btrfs_root *quota_root;
struct btrfs_qgroup *qgroup;
+   struct btrfs_qgroup_list *list;
int ret = 0;
 
mutex_lock(fs_info-qgroup_ioctl_lock);
@@ -1146,15 +1157,24 @@ int btrfs_remove_qgroup(struct btrfs_trans_handle 
*trans,
ret = -ENOENT;
goto out;
} else {
-   /* check if there are no relations to this qgroup */
-   if (!list_empty(qgroup-groups) ||
-   !list_empty(qgroup-members)) {
+   /* check if there are no children of this qgroup */
+   if (!list_empty(qgroup-members)) {
ret = -EBUSY;
goto out;
}
}
ret = del_qgroup_item(trans, quota_root, qgroupid);
 
+   while (!list_empty(qgroup-groups)) {
+   list = list_first_entry(qgroup-groups,
+   struct btrfs_qgroup_list, next_group);
+   ret = __del_qgroup_relation(trans, fs_info,
+  qgroupid,
+  list-group-qgroupid);
+   if (ret)
+   goto out;
+   }
+
spin_lock(fs_info-qgroup_lock);
del_qgroup_rb(quota_root-fs_info, qgroupid);
spin_unlock(fs_info-qgroup_lock);
-- 
1.8.4.2

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: Don't call btrfs_start_transaction() on frozen fs to avoid deadlock.

2015-01-21 Thread Miao Xie
On Wed, 21 Jan 2015 15:47:54 +0800, Qu Wenruo wrote:
 On Wed, 21 Jan 2015 11:53:34 +0800, Qu Wenruo wrote:
 [snipped]
 This will cause another problem, nobody can ensure there will be next
 transaction and the change may
 never to written into disk.
 First, the pending changes is mount option, that is in-memory data.
 Second, the same problem would happen after you freeze fs.
 Pending changes are *not* only mount options. Feature change and label 
 change
 are also pending changes if using sysfs.
 My miss, I don't notice feature and label change by sysfs.

 But the implementation of feature and label change by sysfs is wrong, we can
 not change them without write permission.

 Normal ioctl label changing is not affected.

 For freeze, it's not the same problem since the fs will be unfreeze sooner 
 or
 later and transaction will be initiated.
 You can not assume the operations of the users, they might freeze the fs and
 then shutdown the machine.

 For example, if we change the features/label through sysfs, and then 
 umount
 the fs,
 It is different from pending change.
 No, now features/label changing using sysfs both use pending changes to do 
 the
 commit.
 See BTRFS_PENDING_COMMIT bit.
 So freeze - change features/label - sync will still cause the deadlock in 
 the
 same way,
 and you can try it yourself.
 As I said above, the implementation of sysfs feature and label change is 
 wrong,
 it is better to separate them from the pending mount option change, make the
 sysfs feature and label change be done in the context of transaction after
 getting the write permission. If so, we needn't do anything special when sync
 the fs.

 In short, changing the sysfs feature and label change implementation and
 removing the unnecessary btrfs_start_transaction in sync_fs can fix the
 deadlock.
 Your method will only fix the deadlock, but will introduce the risk like 
 pending
 inode_cache will never
 be written to disk as I already explained. (If still using the
 fs_info-pending_changes mechanism)
 To ensure pending changes written to disk sync_fs() should start a transaction
 if needed,
 or there will be chance that no transaction can handle it.

We are sure that writting down the inode cache need transaction. But INODE_CACHE
is not a forcible flag. Sometimes though you set it, it is very likely that the
inode cache files are not created and the data is not written down because the
fs might still be reading inode usage information, and this operation might span
several transactions. So I think what you worried is not a problem.

Thanks
Miao

 
 But I don't see the necessity to pending current work(inode_cache, 
 feature/label
 changes) to next transaction.
 
 To David:
 I'm a little curious about why inode_cache needs to be delayed to next 
 transaction.
 In btrfs_remount() we have s_umount mutex, and we synced the whole filesystem
 already,
 so there should be no running transaction and we can just set any mount option
 into fs_info.
 
 Or even in worst case, there is a racing window, we can still start a
 transaction and do the commit,
 a little overhead in such minor case won't impact the overall performance.
 
 For sysfs change, I prefer attach or start transaction method, and for mount
 option change, and
 such sysfs tuning is also minor case for a filesystem.
 
 What do you think about reverting the whole patchset and rework the sysfs
 interface?
 
 Thanks,
 Qu

 Thanks
 Miao

 Thanks,
 Qu

 If you want to change features/label,  you should get write permission and 
 make
 sure the fs is not be freezed because those are on-disk data. So the 
 problem
 doesn't exist, or there is a bug.

 Thanks
 Miao

 since there is no write, there is no running transaction and if we don't
 start a
 new transaction,
 it won't be flushed to disk.

 Thanks,
 Qu
 the reason is:
 - Make the behavior of the fs be consistent(both freezed fs and 
 unfreezed fs)
 - Data on the disk is right and integrated


 Thanks
 Miao
 .

 .

 
 .
 

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Recovery options for FS forced readonly due to 3.17 snapshot bug

2015-01-21 Thread Brett King
-Original message-
From:   Brett King brett.k...@commandict.com.au
Sent:   Wed 01-21-2015 09:26 am
Subject:RE: Recovery options for FS forced readonly due to 3.17 
snapshot bug
CC: linux-btrfs@vger.kernel.org; 
To: fdman...@gmail.com; 
 From: Filipe David Manana fdman...@gmail.com
 Sent: Tue 01-20-2015 11:40 pm
 Subject:  Re: Recovery options for FS forced readonly due to 3.17 
 snapshot bug
 To:   brett.k...@commandict.com.au; 
 CC:   linux-btrfs@vger.kernel.org; 
  On Tue, Jan 20, 2015 at 12:15 PM,  brett.k...@commandict.com.au wrote:
   Hi,
   My FS has been forced readonly by the early 3.17 snapshot bug. After much 
  reading, I'm looking for validation of some recovery scenarios:
  
   1) btrfsck --repair under a later kernel.
  
   2) replacing the devices one by one under a later kernel, effectively 
  removing the corruption.
  
   3) Just copying data across from the readonly source to a new FS created 
  under a later kernel.
  
   Option 1 with check only (not repair) segfaults and coredumps on 3.18 
   (see 
  below) .. are there some other options needed here or perhaps patches 
  pending 
  to rectify this ?
  
   Option 2 would be better than nothing, as I get to keep my snapshots and 
  don't need any more surplus drives than 2x units of the largest size (data 
  is 
  RAID1).
  
   Option 3 is a fallback and basically forfeits the source FS including 
  snapshots, is guaranteed to work (I can read the data fine), however 
  requires 
  1:1 capacity in new disks to achieve.
  
   So being unable to complete a btrfsck check and generally scared off by a 
  btrfsck -repair, will option 2 work ? I.E. can I still replace devices on a 
  readonly FS ?
  
   And are there any other options ?
  
   Note the FS is very nearly full of data - I did also get continuous disk 
  activity due to this high utilisation triggering the endless reclaim unused 
  extent routine, for a few days prior to it being forced readonly.
  
   [root@array ~]# uname -r
   3.18.2-200.fc21.x86_64
  
   (upgraded since getting the issue)
  
   [root@array ~]# btrfsck /dev/sdm
   snip
   parent transid verify failed on 40198674350080 wanted 1623837 found 
   1622986
   Ignoring transid failure
   Segmentation fault (core dumped)
  
   [root@array ~]# btrfs fi sh /export/archive/
   Label: 'archive'  uuid: 22c7663a-93ca-40a6-9491-26abaa62b924
   Total devices 8 FS bytes used 18.07TiB
   devid1 size 5.46TiB used 5.46TiB path /dev/sdm
   devid2 size 3.64TiB used 3.64TiB path /dev/sdc
   devid3 size 3.64TiB used 3.64TiB path /dev/sdd
   devid4 size 5.46TiB used 5.46TiB path /dev/sdl
   devid5 size 4.55TiB used 4.55TiB path /dev/sdb
   devid8 size 4.55TiB used 4.55TiB path /dev/sdj
   devid9 size 4.55TiB used 4.55TiB path /dev/sdk
   devid   10 size 4.55TiB used 4.55TiB path /dev/sde
  
   Btrfs v3.18.1
  
   [root@array ~]# df -h /export/archive/
   Filesystem  Size  Used Avail Use% Mounted on
   /dev/sdm 19T   19T  123G 100% /export/archive
  
   [root@array ~]# btrfs fi df /export/archive/
   Data, RAID1: total=18.17TiB, used=18.05TiB
   System, RAID1: total=32.00MiB, used=3.00MiB
   Metadata, RAID1: total=23.00GiB, used=22.49GiB
   GlobalReserve, single: total=512.00MiB, used=0.00B
  
   [root@array ~]# dmesg |grep -i btrfs |less
   [2.044134] Btrfs loaded
   [2.046327] BTRFS: device label array devid 1 transid 289965 /dev/sda4
   [2.472321] BTRFS info (device sda4): disk space caching is enabled
   [2.513784] BTRFS: detected SSD devices, enabling SSD mode
   [2.985877] BTRFS info (device sda4): disk space caching is enabled
   [3.520737] BTRFS: device label archive devid 5 transid 1623944 
   /dev/sdf
   [3.544394] BTRFS: device label archive devid 3 transid 1623944 
   /dev/sdh
   [3.544553] BTRFS: device label archive devid 10 transid 1623944 
   /dev/sdi
   [3.544957] BTRFS: device label archive devid 2 transid 1623944 
   /dev/sdg
   [3.614636] BTRFS: device label archive devid 9 transid 1623944 
   /dev/sdc
   [3.663006] BTRFS: device label archive devid 8 transid 1623944 
   /dev/sdb
   [3.663092] BTRFS: device label archive devid 4 transid 1623944 
   /dev/sdd
   [3.685397] BTRFS: device label archive devid 1 transid 1623944 
   /dev/sde
   [3.701732] BTRFS info (device sde): disk space caching is enabled
   [3.903413] BTRFS: bdev /dev/sde errs: wr 2805, rd 1548, flush 0, 
 corrupt 
  0, gen 0
   [3.903421] BTRFS: bdev /dev/sdd errs: wr 2805, rd 0, flush 0, corrupt 
 0, 
  gen 0
   [3.903426] BTRFS: bdev /dev/sdb errs: wr 2805, rd 33, flush 0, 
   corrupt 
 0, 
  gen 0
   [3.903430] BTRFS: bdev /dev/sdc errs: wr 2806, rd 0, flush 0, corrupt 
 0, 
  gen 0
  
  You have device write and read errors here - the snapshots issue with
  3.17 wouldn't bump these device error stats.
  Even if you were affected by the