[PATCH RFC v2 1/5] btrfs: Fix the bug that fs_info-pending_changes is never cleared.

2015-01-20 Thread Qu Wenruo
Fs_info-pending_changes is never cleared since the original code uses
cmpxchg(fs_info-pending_changes, 0, 0), which will only clear it if
pending_changes is already 0.

This will cause a lot of problem when mount it with inode_cache mount
option.
If the btrfs is mounted as inode_cache, pending_changes will always be
1, even when the fs is frozen.

Signed-off-by: Qu Wenruo quwen...@cn.fujitsu.com
---
 fs/btrfs/transaction.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index a605d4e..e88b59d 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -2118,7 +2118,7 @@ void btrfs_apply_pending_changes(struct btrfs_fs_info 
*fs_info)
unsigned long prev;
unsigned long bit;
 
-   prev = cmpxchg(fs_info-pending_changes, 0, 0);
+   prev = xchg(fs_info-pending_changes, 0);
if (!prev)
return;
 
-- 
2.2.2

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH RFC v2 0/5] Fix bugs introduced by the new pending changes

2015-01-20 Thread Qu Wenruo
Commit 572d9ab7845ea0e0 (btrfs: add support for processing pending
changes) introduced several bugs which will eventually cause a
deadlock.

The deadlock can be triggered by fstests/generic/068 with inode_cache
mount option.

The deadlock happens in the following flow:
moutn with inode_cache - freeze fs - sync - unfreeze.

Sync_fs stack:  Thaw_fs stack:
(Holding s_umount)  (Waiting s_umount)
|- btrfs_sync_fs()
   |- start_transaction()
  (Waiting thaw_fs to unfreeze the fs)

The problem has several causes:
1) Cmpxchg in btrfs_apply_pending_changes() doesn't work
Cmpxchg in btrfs_apply_pending_changes() will only clear
fs_info-pending_changes if it is already 0.
So fs_info-pending_changes will never cleared, and every sync_fs() on
frozen btrfs will try to start a new transaction and deadlock.

So patch 1 fixes it by using xchg() other than cmpxchg().

2) btrfs_freeze() doesn't handle and clear pending_changes
If btrfs start a transaction if there are pending changes but no running
transaction, sync should never start a transaction on frozen fs.
(Except the following sysfs case)

So patch 2~3 fixes it by adding pending changes handler in
btrfs_freeze().

3) Changes through sysfs interface can create pending changes without
   waiting for unfreezing.
Since sysfs changes doesn't go through normal open routine,
which will initiate sb_start_*write() and waiting for unfreezing,
changes through sysfs like changing label will set
fs_info-pending_changes to non-zero, and later sync_fs will deadlock
again.

So patch 4~5 reverts the commits relating patches to fix such problem.

Qu Wenruo (5):
  btrfs: Fix the bug that fs_info-pending_changes is never cleared.
  btrfs: Add btrfs_start_transaction_freeze() to start transaction in   
 btrfs_freeze()
  btrfs: Handle pending changes in btrfs_freeze().
  Revert btrfs: move commit out of sysfs when changing label
  Revert btrfs: move commit out of sysfs when changing features

 fs/btrfs/super.c   | 52 --
 fs/btrfs/sysfs.c   | 34 -
 fs/btrfs/transaction.c | 10 +-
 fs/btrfs/transaction.h |  2 ++
 4 files changed, 57 insertions(+), 41 deletions(-)

-- 
2.2.2

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH RFC v2 2/5] btrfs: Add btrfs_start_transaction_freeze() to start transaction in btrfs_freeze()

2015-01-20 Thread Qu Wenruo
Add btrfs_start_transaction_freeze() function.

This is needed for btrfs_freeze() to handle the pending changes.
If there is no running transaction when btrfs_freeze() is called,
btrfs_freeze() needs to start a transaction to handle the pending
changes, and it can't initial a sb_start_intwrite() call.

Signed-off-by: Qu Wenruo quwen...@cn.fujitsu.com
---
 fs/btrfs/transaction.c | 8 
 fs/btrfs/transaction.h | 2 ++
 2 files changed, 10 insertions(+)

diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index e88b59d..13a2b67 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -570,6 +570,14 @@ struct btrfs_trans_handle *btrfs_start_transaction_lflush(
 BTRFS_RESERVE_FLUSH_LIMIT);
 }
 
+/* Only used in btrfs_freeze() */
+struct btrfs_trans_handle *btrfs_start_transaction_freeze(
+   struct btrfs_root *root, int num_items)
+{
+   return start_transaction(root, num_items, __TRANS_START,
+BTRFS_RESERVE_FLUSH_ALL);
+}
+
 struct btrfs_trans_handle *btrfs_join_transaction(struct btrfs_root *root)
 {
return start_transaction(root, 0, TRANS_JOIN, 0);
diff --git a/fs/btrfs/transaction.h b/fs/btrfs/transaction.h
index 00ed29c..49bd7eb 100644
--- a/fs/btrfs/transaction.h
+++ b/fs/btrfs/transaction.h
@@ -140,6 +140,8 @@ struct btrfs_trans_handle *btrfs_start_transaction(struct 
btrfs_root *root,
   int num_items);
 struct btrfs_trans_handle *btrfs_start_transaction_lflush(
struct btrfs_root *root, int num_items);
+struct btrfs_trans_handle *btrfs_start_transaction_freeze(
+   struct btrfs_root *root, int num_items);
 struct btrfs_trans_handle *btrfs_join_transaction(struct btrfs_root *root);
 struct btrfs_trans_handle *btrfs_join_transaction_nolock(struct btrfs_root 
*root);
 struct btrfs_trans_handle *btrfs_attach_transaction(struct btrfs_root *root);
-- 
2.2.2

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH RFC v2 4/5] Revert btrfs: move commit out of sysfs when changing label

2015-01-20 Thread Qu Wenruo
This reverts commit a6f69dc8018d (btrfs: move commit out of sysfs when
changing label).
Unlike most btrfs file operations, sysfs don't go through the normal
open/read/write routine which will initiate a sb_start_*write/read.

This is very dangerous on frozen btrfs, if using the pending changes
method, sysfs change can make fs_info-pending_changes to 1, and later
sync_fs() call will cause deadlock.

So revert the patch to deal the deadlock caused by the following
workload:
freeze() - change lable - sync() - unfreeze()

Signed-off-by: Qu Wenruo quwen...@cn.fujitsu.com
---
 fs/btrfs/sysfs.c | 21 +
 1 file changed, 13 insertions(+), 8 deletions(-)

diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c
index 92db3f6..226f726 100644
--- a/fs/btrfs/sysfs.c
+++ b/fs/btrfs/sysfs.c
@@ -369,6 +369,9 @@ static ssize_t btrfs_label_store(struct kobject *kobj,
 const char *buf, size_t len)
 {
struct btrfs_fs_info *fs_info = to_fs_info(kobj);
+   struct btrfs_trans_handle *trans;
+   struct btrfs_root *root = fs_info-fs_root;
+   int ret;
size_t p_len;
 
if (fs_info-sb-s_flags  MS_RDONLY)
@@ -383,18 +386,20 @@ static ssize_t btrfs_label_store(struct kobject *kobj,
if (p_len = BTRFS_LABEL_SIZE)
return -EINVAL;
 
-   spin_lock(fs_info-super_lock);
+   trans = btrfs_start_transaction(root, 0);
+   if (IS_ERR(trans))
+   return PTR_ERR(trans);
+
+   spin_lock(root-fs_info-super_lock);
memset(fs_info-super_copy-label, 0, BTRFS_LABEL_SIZE);
memcpy(fs_info-super_copy-label, buf, p_len);
-   spin_unlock(fs_info-super_lock);
+   spin_unlock(root-fs_info-super_lock);
+   ret = btrfs_commit_transaction(trans, root);
 
-   /*
-* We don't want to do full transaction commit from inside sysfs
-*/
-   btrfs_set_pending(fs_info, COMMIT);
-   wake_up_process(fs_info-transaction_kthread);
+   if (!ret)
+   return len;
 
-   return len;
+   return ret;
 }
 BTRFS_ATTR_RW(label, btrfs_label_show, btrfs_label_store);
 
-- 
2.2.2

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Recovery options for FS forced readonly due to 3.17 snapshot bug

2015-01-20 Thread Hugo Mills
On Tue, Jan 20, 2015 at 11:15:22PM +1100, brett.k...@commandict.com.au wrote:
 Hi,
 My FS has been forced readonly by the early 3.17 snapshot bug. After much 
 reading, I'm looking for validation of some recovery scenarios:
 
 1) btrfsck --repair under a later kernel.

   Kernel won't make a difference here, only the version of the btrfs
progs you're using, which should be as recent as possible, but at
least 3.17 (best to try one of the 3.18.x releases).

   It's entirely possible that you've got a different issue as well
which is preventing the btrfs check from repairing the FS.

 2) replacing the devices one by one under a later kernel, effectively 
 removing the corruption.

   This won't work -- the FS would still try to migrate the broken
metadata.

 3) Just copying data across from the readonly source to a new FS created 
 under a later kernel.

   That would work.

 Option 1 with check only (not repair) segfaults and coredumps on 3.18 (see 
 below) .. are there some other options needed here or perhaps patches pending 
 to rectify this ?
 
 Option 2 would be better than nothing, as I get to keep my snapshots and 
 don't need any more surplus drives than 2x units of the largest size (data is 
 RAID1).
 
 Option 3 is a fallback and basically forfeits the source FS including 
 snapshots, is guaranteed to work (I can read the data fine), however requires 
 1:1 capacity in new disks to achieve.
 
 So being unable to complete a btrfsck check and generally scared off by a 
 btrfsck -repair, will option 2 work ? I.E. can I still replace devices on a 
 readonly FS ?
 
 And are there any other options ?
 
 Note the FS is very nearly full of data - I did also get continuous disk 
 activity due to this high utilisation triggering the endless reclaim unused 
 extent routine, for a few days prior to it being forced readonly.
 
 [root@array ~]# uname -r
 3.18.2-200.fc21.x86_64
 
 (upgraded since getting the issue)
 
 [root@array ~]# btrfsck /dev/sdm
 snip
 parent transid verify failed on 40198674350080 wanted 1623837 found 1622986
 Ignoring transid failure
 Segmentation fault (core dumped)

   Even at its worst, this should not be segfaulting. This should be
considered a bug in btrfs check at the very least.

   Hugo.

 [root@array ~]# btrfs fi sh /export/archive/
 Label: 'archive'  uuid: 22c7663a-93ca-40a6-9491-26abaa62b924
 Total devices 8 FS bytes used 18.07TiB
 devid1 size 5.46TiB used 5.46TiB path /dev/sdm
 devid2 size 3.64TiB used 3.64TiB path /dev/sdc
 devid3 size 3.64TiB used 3.64TiB path /dev/sdd
 devid4 size 5.46TiB used 5.46TiB path /dev/sdl
 devid5 size 4.55TiB used 4.55TiB path /dev/sdb
 devid8 size 4.55TiB used 4.55TiB path /dev/sdj
 devid9 size 4.55TiB used 4.55TiB path /dev/sdk
 devid   10 size 4.55TiB used 4.55TiB path /dev/sde
 
 Btrfs v3.18.1
 
 [root@array ~]# df -h /export/archive/
 Filesystem  Size  Used Avail Use% Mounted on
 /dev/sdm 19T   19T  123G 100% /export/archive
 
 [root@array ~]# btrfs fi df /export/archive/
 Data, RAID1: total=18.17TiB, used=18.05TiB
 System, RAID1: total=32.00MiB, used=3.00MiB
 Metadata, RAID1: total=23.00GiB, used=22.49GiB
 GlobalReserve, single: total=512.00MiB, used=0.00B
 
 [root@array ~]# dmesg |grep -i btrfs |less
 [2.044134] Btrfs loaded
 [2.046327] BTRFS: device label array devid 1 transid 289965 /dev/sda4
 [2.472321] BTRFS info (device sda4): disk space caching is enabled
 [2.513784] BTRFS: detected SSD devices, enabling SSD mode
 [2.985877] BTRFS info (device sda4): disk space caching is enabled
 [3.520737] BTRFS: device label archive devid 5 transid 1623944 /dev/sdf
 [3.544394] BTRFS: device label archive devid 3 transid 1623944 /dev/sdh
 [3.544553] BTRFS: device label archive devid 10 transid 1623944 /dev/sdi
 [3.544957] BTRFS: device label archive devid 2 transid 1623944 /dev/sdg
 [3.614636] BTRFS: device label archive devid 9 transid 1623944 /dev/sdc
 [3.663006] BTRFS: device label archive devid 8 transid 1623944 /dev/sdb
 [3.663092] BTRFS: device label archive devid 4 transid 1623944 /dev/sdd
 [3.685397] BTRFS: device label archive devid 1 transid 1623944 /dev/sde
 [3.701732] BTRFS info (device sde): disk space caching is enabled
 [3.903413] BTRFS: bdev /dev/sde errs: wr 2805, rd 1548, flush 0, corrupt 
 0, gen 0
 [3.903421] BTRFS: bdev /dev/sdd errs: wr 2805, rd 0, flush 0, corrupt 0, 
 gen 0
 [3.903426] BTRFS: bdev /dev/sdb errs: wr 2805, rd 33, flush 0, corrupt 0, 
 gen 0
 [3.903430] BTRFS: bdev /dev/sdc errs: wr 2806, rd 0, flush 0, corrupt 0, 
 gen 0
 [ 1432.231219] BTRFS: space cache generation (1623006) does not match inode 
 (1623943)
 [ 1432.231283] BTRFS warning (device sde): failed to load free space cache 
 for block group 57021632675840, rebuild it now
 [ 1434.603652] BTRFS (device sde): parent transid verify failed on 
 57022998740992 wanted 1623896 

Re: Recovery options for FS forced readonly due to 3.17 snapshot bug

2015-01-20 Thread Filipe David Manana
On Tue, Jan 20, 2015 at 12:15 PM,  brett.k...@commandict.com.au wrote:
 Hi,
 My FS has been forced readonly by the early 3.17 snapshot bug. After much 
 reading, I'm looking for validation of some recovery scenarios:

 1) btrfsck --repair under a later kernel.

 2) replacing the devices one by one under a later kernel, effectively 
 removing the corruption.

 3) Just copying data across from the readonly source to a new FS created 
 under a later kernel.

 Option 1 with check only (not repair) segfaults and coredumps on 3.18 (see 
 below) .. are there some other options needed here or perhaps patches pending 
 to rectify this ?

 Option 2 would be better than nothing, as I get to keep my snapshots and 
 don't need any more surplus drives than 2x units of the largest size (data is 
 RAID1).

 Option 3 is a fallback and basically forfeits the source FS including 
 snapshots, is guaranteed to work (I can read the data fine), however requires 
 1:1 capacity in new disks to achieve.

 So being unable to complete a btrfsck check and generally scared off by a 
 btrfsck -repair, will option 2 work ? I.E. can I still replace devices on a 
 readonly FS ?

 And are there any other options ?

 Note the FS is very nearly full of data - I did also get continuous disk 
 activity due to this high utilisation triggering the endless reclaim unused 
 extent routine, for a few days prior to it being forced readonly.

 [root@array ~]# uname -r
 3.18.2-200.fc21.x86_64

 (upgraded since getting the issue)

 [root@array ~]# btrfsck /dev/sdm
 snip
 parent transid verify failed on 40198674350080 wanted 1623837 found 1622986
 Ignoring transid failure
 Segmentation fault (core dumped)

 [root@array ~]# btrfs fi sh /export/archive/
 Label: 'archive'  uuid: 22c7663a-93ca-40a6-9491-26abaa62b924
 Total devices 8 FS bytes used 18.07TiB
 devid1 size 5.46TiB used 5.46TiB path /dev/sdm
 devid2 size 3.64TiB used 3.64TiB path /dev/sdc
 devid3 size 3.64TiB used 3.64TiB path /dev/sdd
 devid4 size 5.46TiB used 5.46TiB path /dev/sdl
 devid5 size 4.55TiB used 4.55TiB path /dev/sdb
 devid8 size 4.55TiB used 4.55TiB path /dev/sdj
 devid9 size 4.55TiB used 4.55TiB path /dev/sdk
 devid   10 size 4.55TiB used 4.55TiB path /dev/sde

 Btrfs v3.18.1

 [root@array ~]# df -h /export/archive/
 Filesystem  Size  Used Avail Use% Mounted on
 /dev/sdm 19T   19T  123G 100% /export/archive

 [root@array ~]# btrfs fi df /export/archive/
 Data, RAID1: total=18.17TiB, used=18.05TiB
 System, RAID1: total=32.00MiB, used=3.00MiB
 Metadata, RAID1: total=23.00GiB, used=22.49GiB
 GlobalReserve, single: total=512.00MiB, used=0.00B

 [root@array ~]# dmesg |grep -i btrfs |less
 [2.044134] Btrfs loaded
 [2.046327] BTRFS: device label array devid 1 transid 289965 /dev/sda4
 [2.472321] BTRFS info (device sda4): disk space caching is enabled
 [2.513784] BTRFS: detected SSD devices, enabling SSD mode
 [2.985877] BTRFS info (device sda4): disk space caching is enabled
 [3.520737] BTRFS: device label archive devid 5 transid 1623944 /dev/sdf
 [3.544394] BTRFS: device label archive devid 3 transid 1623944 /dev/sdh
 [3.544553] BTRFS: device label archive devid 10 transid 1623944 /dev/sdi
 [3.544957] BTRFS: device label archive devid 2 transid 1623944 /dev/sdg
 [3.614636] BTRFS: device label archive devid 9 transid 1623944 /dev/sdc
 [3.663006] BTRFS: device label archive devid 8 transid 1623944 /dev/sdb
 [3.663092] BTRFS: device label archive devid 4 transid 1623944 /dev/sdd
 [3.685397] BTRFS: device label archive devid 1 transid 1623944 /dev/sde
 [3.701732] BTRFS info (device sde): disk space caching is enabled
 [3.903413] BTRFS: bdev /dev/sde errs: wr 2805, rd 1548, flush 0, corrupt 
 0, gen 0
 [3.903421] BTRFS: bdev /dev/sdd errs: wr 2805, rd 0, flush 0, corrupt 0, 
 gen 0
 [3.903426] BTRFS: bdev /dev/sdb errs: wr 2805, rd 33, flush 0, corrupt 0, 
 gen 0
 [3.903430] BTRFS: bdev /dev/sdc errs: wr 2806, rd 0, flush 0, corrupt 0, 
 gen 0

You have device write and read errors here - the snapshots issue with
3.17 wouldn't bump these device error stats.
Even if you were affected by the snapshots issue, you clearly have
here another problem, very likely hardware related due to those stats.

If you are affected by the snapshots issue, btrfs check/btrfsck as of
btrfs-progs v3.17+ should be able to fix it and if you are affected by
it, it reports at a very early stage (without --repair) one or more
messages with the following format:

root %llu has a root item with a more recent gen (%llu) compared to
the found root node (%llu)\n

If the snapshots issue from 3.17.0/1 was the only issue you had, those
messages would be reported by fsck before it segfaults (at least for
all cases I've seen).

Also when one is affected by the snapshots issue, it should only cause
problems when reading from the affected readonly 

[PATCH] Btrfs: fix setup_leaf_for_split() to avoid leaf corruption

2015-01-20 Thread Filipe Manana
We were incorrectly detecting when the target key didn't exist anymore
after releasing the path and re-searching the tree. This could make
us split or duplicate (btrfs_split_item() and btrfs_duplicate_item()
are its only callers at the moment) an item when we should not.

For the case of duplicating an item, we currently only duplicate
checksum items (csum tree) and file extent items (fs/subvol trees).
For the checksum items we end up overriding the item completely,
but for file extent items we update only some of their fields in
the copy (done in __btrfs_drop_extents), which means we can end up
having a logical corruption for some values.

Also for the case where we duplicate a file extent item it will make
us produce a leaf with a wrong key order, as btrfs_duplicate_item()
advances us to the next slot and then its caller sets a smaller key
on the new item at that slot (like in __btrfs_drop_extents() e.g.).
Alternatively if the tree search in setup_leaf_for_split() leaves
with path-slots[0] == btrfs_header_nritems(path-nodes[0]), we end
up accessing beyond the leaf's end (when we check if the item's size
has changed) and make our caller insert an item at the invalid slot
btrfs_header_nritems(path-nodes[0]) + 1, causing an invalid memory
access if the leaf is full or nearly full.

This issue has been present since the introduction of this function
in 2009:

Btrfs: Add btrfs_duplicate_item
commit ad48fd754676bfae4139be1a897b1ea58f9aaf21

Signed-off-by: Filipe Manana fdman...@suse.com
---
 fs/btrfs/ctree.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index 14a72ed..78e8993c1 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -4356,13 +4356,15 @@ static noinline int setup_leaf_for_split(struct 
btrfs_trans_handle *trans,
path-search_for_split = 1;
ret = btrfs_search_slot(trans, root, key, path, 0, 1);
path-search_for_split = 0;
+   if (ret  0)
+   ret = -EAGAIN;
if (ret  0)
goto err;
 
ret = -EAGAIN;
leaf = path-nodes[0];
-   /* if our item isn't there or got smaller, return now */
-   if (ret  0 || item_size != btrfs_item_size_nr(leaf, path-slots[0]))
+   /* if our item isn't there, return now */
+   if (item_size != btrfs_item_size_nr(leaf, path-slots[0]))
goto err;
 
/* the leaf has  changed, it now has room.  return now */
-- 
2.1.3

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: Don't call btrfs_start_transaction() on frozen fs to avoid deadlock.

2015-01-20 Thread David Sterba
On Mon, Jan 19, 2015 at 03:42:41PM +0800, Qu Wenruo wrote:
 --- a/fs/btrfs/super.c
 +++ b/fs/btrfs/super.c
 @@ -1000,6 +1000,14 @@ int btrfs_sync_fs(struct super_block *sb, int wait)
*/
   if (fs_info-pending_changes == 0)
   return 0;
 + /*
 +  * Test if the fs is frozen, or start_trasaction
 +  * will deadlock on itself.
 +  */
 + if (__sb_start_write(sb, SB_FREEZE_FS, false))
 + __sb_end_write(sb, SB_FREEZE_FS);
 + else
 + return 0;

The more I look into that the more I think that the first fix is the
right one.

Has been pointed out in this thread, it is ok to skip processing the
pending changes if the filesystem is frozen.

The pending changes have to flushed from sync (by design), we cannot use
mnt_want_write or the sb_start* protections that.

The btrfs_freeze callback can safely do the last commit, that's under
s_umount held by vfs::freeze_super. Then any other new transaction would
block. Any other call to btrfs_sync_fs will not find any active
transaction and with this patch will not start one. Sounds safe to me.

I think the right level to check is SB_FREEZE_WRITE though, to stop any
potential writes as soon as possible and when the s_umount lock is still
held in vfs::freeze_super.

I'll collect the relevant patches and will send it for review.


   trans = btrfs_start_transaction(root, 0);
   } else {
   return PTR_ERR(trans);
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Snapshot cannot be deleted

2015-01-20 Thread Theodore Ts'o
+linux-btrfs,-linux-ext4

On Tue, Jan 20, 2015 at 01:28:04PM +0100, Andreas Philipp wrote:
 
 Due to the known (and fixed) bug in kernel 3.17.0 one of my btrfs
 volume suffers from unreadable and - even worse - uneraseable
 snapshots. Whenever such a snapshot is accessed there is parent
 transid verify failed in dmesg.
 Is there any way to delete these snapshots?

You sent this to the wrong mailing list, so I'm reforwarding this to
the linux-btrfs list.

Cheers,

- Ted
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: Don't call btrfs_start_transaction() on frozen fs to avoid deadlock.

2015-01-20 Thread Chris Mason



On Tue, Jan 20, 2015 at 7:58 PM, Qu Wenruo quwen...@cn.fujitsu.com 
wrote:


 Original Message 
Subject: Re: [PATCH] btrfs: Don't call btrfs_start_transaction() on 
frozen fs to avoid deadlock.

From: David Sterba dste...@suse.cz
To: Qu Wenruo quwen...@cn.fujitsu.com
Date: 2015年01月21日 01:13

On Mon, Jan 19, 2015 at 03:42:41PM +0800, Qu Wenruo wrote:

--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -1000,6 +1000,14 @@ int btrfs_sync_fs(struct super_block *sb, 
int wait)

 */
if (fs_info-pending_changes == 0)
return 0;
+   /*
+* Test if the fs is frozen, or start_trasaction
+* will deadlock on itself.
+*/
+   if (__sb_start_write(sb, SB_FREEZE_FS, false))
+   __sb_end_write(sb, SB_FREEZE_FS);
+   else
+   return 0;


But what if someone freezes the FS after __sb_end_write() and before 
btrfs_start_transaction()?   I don't see what keeps new freezers from 
coming in.


-chris


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs convert running out of space

2015-01-20 Thread Chris Murphy
On Tue, Jan 20, 2015 at 4:04 PM, Gareth Pye gar...@cerberos.id.au wrote:
 Yeah, we don't have that much space spare :(

 File system has been going strong from when it was created with early
 RAID5 code, then converted to RAID10 with kernel 3.12.

 There aren't any nocow files to my knowledge but there are plenty of
 files larger than a gig on the file system. The first few results from
 logical-resolve have been for files in the 1G~2G range, so that could
 be some sticky spaghetti.

Are any of those big files in a snapshot? The snapshotting may be
pinning a bunch of large extents, so even if it seems like the volume
has enough space, it might actually be running out of space. All I can
think of is progressively removing the files that are implicated in
the conversion failure. That could mean just deleting older snapshots
that you probably don't need, progressively getting to the point where
you migrate those files off this fs to another one, and then delete
them (all instances in all subvol/snapshots) and just keep trying.

Is a btrfs check happy? Or does it complain about anything?

I've had quite good luck just adding a drive (two drives for raid1/10
volumes) to an existing btrfs volume, they don't have to be drdb, they
can be local block devices, either physical drives or LV's. I've even
done this with flash drives (kinda scary and slow but it worked).

I'd still suggest contingency planning in case this volume becomes
temperamental and you have no choice but to migrate it elsewhere.
Better to do it on your timetable than the filesystem's.

-- 
Chris Murphy
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] fstests: btrfs/079: Fix wrong value passed to available space check.

2015-01-20 Thread Dave Chinner
On Mon, Dec 29, 2014 at 03:17:18PM +0800, Qu Wenruo wrote:
 Before the patch, we passed wrong value to _require_fs_space, which
 should be in unit of 1024, but passed in unit of GB.

Yes, that needs fixing.

 Fix it and add better prompt for falloc failure.

That doesn't

 
 Signed-off-by: Qu Wenruo quwen...@cn.fujitsu.com
 ---
  tests/btrfs/079 | 5 +++--
  1 file changed, 3 insertions(+), 2 deletions(-)
 
 diff --git a/tests/btrfs/079 b/tests/btrfs/079
 index 202d3e6..99d0187 100755
 --- a/tests/btrfs/079
 +++ b/tests/btrfs/079
 @@ -73,8 +73,9 @@ rm -f $seqres.full
  
  _scratch_mkfs $seqres.full 21
  _scratch_mount
 -_require_fs_space $SCRATCH_MNT $(($filesize / 1024 / 1024 / 1024))
 -$XFS_IO_PROG -f -c falloc 0 $filesize $testfile
 +_require_fs_space $SCRATCH_MNT $(($filesize / 1024))
 +$XFS_IO_PROG -f -c falloc 0 $filesize $testfile || \
 + _fail falloc failed

If the falloc fails, then the golden output match will fail. Let the
test run, regardless, because the first thing it does is try to
overwrite the fallocated region where the success or failure of the
writes are completely ignored. Hence a falloc failure should also be
ignored...

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: Don't call btrfs_start_transaction() on frozen fs to avoid deadlock.

2015-01-20 Thread Qu Wenruo


 Original Message 
Subject: Re: [PATCH] btrfs: Don't call btrfs_start_transaction() on 
frozen fs to avoid deadlock.

From: Chris Mason c...@fb.com
To: Qu Wenruo quwen...@cn.fujitsu.com
Date: 2015年01月21日 09:05



On Tue, Jan 20, 2015 at 7:58 PM, Qu Wenruo quwen...@cn.fujitsu.com 
wrote:


 Original Message 
Subject: Re: [PATCH] btrfs: Don't call btrfs_start_transaction() on 
frozen fs to avoid deadlock.

From: David Sterba dste...@suse.cz
To: Qu Wenruo quwen...@cn.fujitsu.com
Date: 2015年01月21日 01:13

On Mon, Jan 19, 2015 at 03:42:41PM +0800, Qu Wenruo wrote:

--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -1000,6 +1000,14 @@ int btrfs_sync_fs(struct super_block *sb, 
int wait)

   */
  if (fs_info-pending_changes == 0)
  return 0;
+/*
+ * Test if the fs is frozen, or start_trasaction
+ * will deadlock on itself.
+ */
+if (__sb_start_write(sb, SB_FREEZE_FS, false))
+__sb_end_write(sb, SB_FREEZE_FS);
+else
+return 0;


But what if someone freezes the FS after __sb_end_write() and before 
btrfs_start_transaction()?   I don't see what keeps new freezers from 
coming in.


-chris
Either VFS::freeze_super() and VFS::syncfs() will hold the s_umount 
mutex, so freeze will not happen

during sync.

Thanks,
Qu





--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: Don't call btrfs_start_transaction() on frozen fs to avoid deadlock.

2015-01-20 Thread Miao Xie
On Tue, 20 Jan 2015 20:10:56 -0500, Chris Mason wrote:
 On Tue, Jan 20, 2015 at 8:09 PM, Qu Wenruo quwen...@cn.fujitsu.com wrote:

  Original Message 
 Subject: Re: [PATCH] btrfs: Don't call btrfs_start_transaction() on frozen fs
 to avoid deadlock.
 From: Chris Mason c...@fb.com
 To: Qu Wenruo quwen...@cn.fujitsu.com
 Date: 2015年01月21日 09:05


 On Tue, Jan 20, 2015 at 7:58 PM, Qu Wenruo quwen...@cn.fujitsu.com wrote:

  Original Message 
 Subject: Re: [PATCH] btrfs: Don't call btrfs_start_transaction() on frozen
 fs to avoid deadlock.
 From: David Sterba dste...@suse.cz
 To: Qu Wenruo quwen...@cn.fujitsu.com
 Date: 2015年01月21日 01:13
 On Mon, Jan 19, 2015 at 03:42:41PM +0800, Qu Wenruo wrote:
 --- a/fs/btrfs/super.c
 +++ b/fs/btrfs/super.c
 @@ -1000,6 +1000,14 @@ int btrfs_sync_fs(struct super_block *sb, int 
 wait)
*/
   if (fs_info-pending_changes == 0)
   return 0;
 +/*
 + * Test if the fs is frozen, or start_trasaction
 + * will deadlock on itself.
 + */
 +if (__sb_start_write(sb, SB_FREEZE_FS, false))
 +__sb_end_write(sb, SB_FREEZE_FS);
 +else
 +return 0;

 But what if someone freezes the FS after __sb_end_write() and before
 btrfs_start_transaction()?   I don't see what keeps new freezers from 
 coming in.

 -chris
 Either VFS::freeze_super() and VFS::syncfs() will hold the s_umount mutex, so
 freeze will not happen
 during sync.
 
 You're right.  I was worried about the sync ioctl, but the mutex won't be held
 there to deadlock against.  We'll be fine.

There is another problem which is introduced by pending change. That is we will
start and commmit a transaction by changing pending mount option after we set
the fs to be R/O.

I think it is better that we don't start a new transaction for pending changes
which are set after the transaction is committed, just make them be handled by
the next transaction, the reason is:
- Make the behavior of the fs be consistent(both freezed fs and unfreezed fs)
- Data on the disk is right and integrated


Thanks
Miao
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] fstest: btrfs/006: Add extra check on return value and 'fi show' by device

2015-01-20 Thread Qu Wenruo


 Original Message 
Subject: Re: [PATCH] fstest: btrfs/006: Add extra check on return value 
and 'fi show' by device

From: Dave Chinner da...@fromorbit.com
To: Qu Wenruo quwen...@cn.fujitsu.com
Date: 2015年01月21日 12:03

On Fri, Jan 16, 2015 at 09:57:10AM +0800, Qu Wenruo wrote:

Reported in Red Hat BZ#1181627, 'btrfs fi show' on unmounted device will
return 1 even no error happens.

Please describe the bug in the commit message, don't make me have to
go look at some web page to work out what you are trying to fix.

OK, I'll make the description more detailed.



Introduced by: commit 2513077f
btrfs-progs: fix device missing of btrfs fi show with seed devices

Patch fixing it:
https://patchwork.kernel.org/patch/5626001/
btrfs-progs: Fix wrong return value when executing 'fi show' on
umounted device.

What btrfs progs commit introduced and fixed is not actually
relevant to fstests.
This just saves some time for some users who failed the test and what to 
check if the bug is fixed or not.

If I have to go read other stuff to understand
the change you are making, then the description needs more detail
and less external references

I understand that I need to make the description more detailed.
Although external reference does not help much for fstests, but it will 
really saves time

for some tests.
Like Fillipe Manana's test, which embedded such info into the test 
script, saved me a lot of

time to find the regression commit.




Reported-by: Vratislav Podzimek vpodz...@redhat.com
Signed-off-by: Qu Wenruo quwen...@cn.fujitsu.com
---
  tests/btrfs/006 | 51 ---
  tests/btrfs/006.out | 10 ++
  2 files changed, 54 insertions(+), 7 deletions(-)

diff --git a/tests/btrfs/006 b/tests/btrfs/006
index 715fd80..2d8c1c0 100755
--- a/tests/btrfs/006
+++ b/tests/btrfs/006
@@ -62,33 +62,70 @@ _scratch_pool_mkfs  $seqres.full 21 || _fail mkfs 
failed
  
  # These have to be done unmounted...?

  echo == Set filesystem label to $LABEL
-$BTRFS_UTIL_PROG filesystem label $SCRATCH_DEV $LABEL
+$BTRFS_UTIL_PROG filesystem label $SCRATCH_DEV $LABEL || \
+   _fail set lable failed

Apart from the typo, why not _run_btrfs_util_prog?

Oh thanks a lot for the existing infrastructure.

With this one, the PIPESTATUS[] is no longer needed.


However, this will make the test fail on older btrfs progs
installations that we previously passing the test just fine, right?

No, old tests will still pass, since it returns 0 as expected.
Add the return values test just in case.


IOWs, if you want to test that `btrfs fi show` always returns the
correct value, don't add that checking to an existing test: write a
new test that tests the validity of the return value

OK, creating a new one seems reasonable enough for me.




  echo == Get filesystem label
-$BTRFS_UTIL_PROG filesystem label $SCRATCH_DEV
+$BTRFS_UTIL_PROG filesystem label $SCRATCH_DEV || \
+   _fail get lable failed
+
+echo == Show filesystem by device(offline)
+$BTRFS_UTIL_PROG filesystem show $FIRST_POOL_DEV | \
+   _filter_btrfs_filesystem_show $TOTAL_DEVS $UUID
+[ ${PIPESTATUS[0]} -ne 0 ]  \
+   _fail show filesystem by device(offline) return value wrong

That's a new function being tested. Again, this belongs in a new
test, not an existing regression test

FWIW, Now that I've seen it used a couple of times, I don't really
like the mess that checking errors via PIPESTATUS results in,
especially as

_run_btrfs_util_prog filesystem show $FIRST_POOL_DEV | \
_filter_btrfs_filesystem_show $TOTAL_DEVS $UUID

Gives *exactly* the same failure detection
Thanks for the _run_btrfs_util_prog macro hint. That's great and makes 
codes look much nicer.


BTW, do I need to cleanup all the $BTRFS_UTIL_PROG in the original test?

Thanks,
Qu


Cheers,

Dave.


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] xfstests: btrfs: fix up 001.out

2015-01-20 Thread Dave Chinner
On Fri, Jan 02, 2015 at 09:04:29PM +0800, Anand Jain wrote:
 The subvol delete output has changed with btrfs-progs
 -Delete subvolume 'SCRATCH_MNT/snap'
 +Delete subvolume (no-commit): 'SCRATCH_MNT/snap'
 
 so fix 001 failing.
 
 Signed-off-by: Anand Jain anand.j...@oracle.com
 
 v2: Thanks Filipe for mentioning now we have _run_btrfs_util_prog. and
 commit update
 ---
  tests/btrfs/001 | 2 +-
  tests/btrfs/001.out | 1 -
  2 files changed, 1 insertion(+), 2 deletions(-)
 
 diff --git a/tests/btrfs/001 b/tests/btrfs/001
 index 8258d06..a7747c8 100755
 --- a/tests/btrfs/001
 +++ b/tests/btrfs/001
 @@ -99,7 +99,7 @@ echo Listing subvolumes
  $BTRFS_UTIL_PROG subvolume list $SCRATCH_MNT | awk '{ print $NF }'
  
  # Delete the snapshot
 -$BTRFS_UTIL_PROG subvolume delete $SCRATCH_MNT/snap | 
 _filter_btrfs_subvol_delete
 +_run_btrfs_util_prog subvolume delete $SCRATCH_MNT/snap

This is also the wrong way to fix the problem.

We have output filters for a reason, people:

_filter_btrfs_subvol_delete()
{
_filter_scratch | _filter_transcation_commit_default
}

Simply becomes:

_filter_btrfs_subvol_delete()
{
_filter_scratch | _filter_transcation_commit_default | \
sed -e 's/^Delete subvolume.*:/Delete subvolume/'
}

The golden output does not change - the filter simply removes the
part of the message that changed between versions.

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: Don't call btrfs_start_transaction() on frozen fs to avoid deadlock.

2015-01-20 Thread Chris Mason
On Tue, Jan 20, 2015 at 8:09 PM, Qu Wenruo quwen...@cn.fujitsu.com 
wrote:


 Original Message 
Subject: Re: [PATCH] btrfs: Don't call btrfs_start_transaction() on 
frozen fs to avoid deadlock.

From: Chris Mason c...@fb.com
To: Qu Wenruo quwen...@cn.fujitsu.com
Date: 2015年01月21日 09:05



On Tue, Jan 20, 2015 at 7:58 PM, Qu Wenruo quwen...@cn.fujitsu.com 
wrote:


 Original Message 
Subject: Re: [PATCH] btrfs: Don't call btrfs_start_transaction() on 
frozen fs to avoid deadlock.

From: David Sterba dste...@suse.cz
To: Qu Wenruo quwen...@cn.fujitsu.com
Date: 2015年01月21日 01:13

On Mon, Jan 19, 2015 at 03:42:41PM +0800, Qu Wenruo wrote:

--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -1000,6 +1000,14 @@ int btrfs_sync_fs(struct super_block *sb, 
int wait)

   */
  if (fs_info-pending_changes == 0)
  return 0;
+/*
+ * Test if the fs is frozen, or start_trasaction
+ * will deadlock on itself.
+ */
+if (__sb_start_write(sb, SB_FREEZE_FS, false))
+__sb_end_write(sb, SB_FREEZE_FS);
+else
+return 0;


But what if someone freezes the FS after __sb_end_write() and before 
btrfs_start_transaction()?   I don't see what keeps new freezers 
from coming in.


-chris
Either VFS::freeze_super() and VFS::syncfs() will hold the s_umount 
mutex, so freeze will not happen

during sync.


You're right.  I was worried about the sync ioctl, but the mutex won't 
be held there to deadlock against.  We'll be fine.


-chris



--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] fstests: btrfs/079: Fix wrong value passed to available space check.

2015-01-20 Thread Qu Wenruo


 Original Message 
Subject: Re: [PATCH] fstests: btrfs/079: Fix wrong value passed to 
available space check.

From: Dave Chinner da...@fromorbit.com
To: Qu Wenruo quwen...@cn.fujitsu.com
Date: 2015年01月21日 12:19

On Mon, Dec 29, 2014 at 03:17:18PM +0800, Qu Wenruo wrote:

Before the patch, we passed wrong value to _require_fs_space, which
should be in unit of 1024, but passed in unit of GB.

Yes, that needs fixing.


Fix it and add better prompt for falloc failure.

That doesn't


Signed-off-by: Qu Wenruo quwen...@cn.fujitsu.com
---
  tests/btrfs/079 | 5 +++--
  1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/tests/btrfs/079 b/tests/btrfs/079
index 202d3e6..99d0187 100755
--- a/tests/btrfs/079
+++ b/tests/btrfs/079
@@ -73,8 +73,9 @@ rm -f $seqres.full
  
  _scratch_mkfs $seqres.full 21

  _scratch_mount
-_require_fs_space $SCRATCH_MNT $(($filesize / 1024 / 1024 / 1024))
-$XFS_IO_PROG -f -c falloc 0 $filesize $testfile
+_require_fs_space $SCRATCH_MNT $(($filesize / 1024))
+$XFS_IO_PROG -f -c falloc 0 $filesize $testfile || \
+   _fail falloc failed

If the falloc fails, then the golden output match will fail. Let the
test run, regardless, because the first thing it does is try to
overwrite the fallocated region where the success or failure of the
writes are completely ignored. Hence a falloc failure should also be
ignored...

Cheers,

Dave.

OK, I'll send a V2 one for this.

Thanks for reviewing.
Qu
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] fstest: btrfs/006: Add extra check on return value and 'fi show' by device

2015-01-20 Thread Dave Chinner
On Wed, Jan 21, 2015 at 12:19:32PM +0800, Qu Wenruo wrote:
 From: Dave Chinner da...@fromorbit.com
 On Fri, Jan 16, 2015 at 09:57:10AM +0800, Qu Wenruo wrote:
 Reported in Red Hat BZ#1181627, 'btrfs fi show' on unmounted device will
 return 1 even no error happens.
 Please describe the bug in the commit message, don't make me have to
 go look at some web page to work out what you are trying to fix.
 OK, I'll make the description more detailed.
 
 Introduced by: commit 2513077f
 btrfs-progs: fix device missing of btrfs fi show with seed devices
 
 Patch fixing it:
 https://patchwork.kernel.org/patch/5626001/
 btrfs-progs: Fix wrong return value when executing 'fi show' on
 umounted device.
 What btrfs progs commit introduced and fixed is not actually
 relevant to fstests.
 This just saves some time for some users who failed the test and
 what to check if the bug is fixed or not.

Doesn't save me time - it takes me much longer as I have to use 3
different tools to follow those three references to try to
understand why you are making the change.

 If I have to go read other stuff to understand
 the change you are making, then the description needs more detail
 and less external references

 I understand that I need to make the description more detailed.
 Although external reference does not help much for fstests, but it
 will really saves time for some tests.  Like Fillipe Manana's
 test, which embedded such info into the test script, saved me a
 lot of time to find the regression commit.

What it helps is people running a new fstests tree on an older
distro kernels. It doesn't save time for reviewers, yet reviewer
resources are in much scarcer supply than distro QA resources.
If you want your changes reviewed faster, then commit messages need
to be reviewer friendly. Further, if you review other people's
changes, then they are much more likely to review your changes
quickly.

If the review burden is left to me all the time then, like Andrew
Morton, I get grumpy when people say but it helps me.

 BTW, do I need to cleanup all the $BTRFS_UTIL_PROG in the original test?

Not if it causes older btrfs progs installations to fail.

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] xfstests: btrfs: fix up 001.out

2015-01-20 Thread Anand Jain


Dave,

On 01/21/2015 12:26 PM, Dave Chinner wrote:

On Fri, Jan 02, 2015 at 09:04:29PM +0800, Anand Jain wrote:

The subvol delete output has changed with btrfs-progs
 -Delete subvolume 'SCRATCH_MNT/snap'
 +Delete subvolume (no-commit): 'SCRATCH_MNT/snap'

so fix 001 failing.

Signed-off-by: Anand Jain anand.j...@oracle.com

v2: Thanks Filipe for mentioning now we have _run_btrfs_util_prog. and
 commit update
---
  tests/btrfs/001 | 2 +-
  tests/btrfs/001.out | 1 -
  2 files changed, 1 insertion(+), 2 deletions(-)

diff --git a/tests/btrfs/001 b/tests/btrfs/001
index 8258d06..a7747c8 100755
--- a/tests/btrfs/001
+++ b/tests/btrfs/001
@@ -99,7 +99,7 @@ echo Listing subvolumes
  $BTRFS_UTIL_PROG subvolume list $SCRATCH_MNT | awk '{ print $NF }'

  # Delete the snapshot
-$BTRFS_UTIL_PROG subvolume delete $SCRATCH_MNT/snap | 
_filter_btrfs_subvol_delete
+_run_btrfs_util_prog subvolume delete $SCRATCH_MNT/snap


This is also the wrong way to fix the problem.

We have output filters for a reason, people:


 actually I purposely discouraged using the output filters (instead use
 command exit code), mainly because output filters are
 unnecessary hindrance to the good changes. And UIs using btrfs-progs
 anyway has to depend on the exit code to check the command status,
 so we should rightfully check that in our test scripts, this may
 apply lightly for commands like show, but would fit well for commands
 like delete as in here. just my point of view.

Thanks,


_filter_btrfs_subvol_delete()
{
_filter_scratch | _filter_transcation_commit_default
}

Simply becomes:

_filter_btrfs_subvol_delete()
{
_filter_scratch | _filter_transcation_commit_default | \
sed -e 's/^Delete subvolume.*:/Delete subvolume/'
}

The golden output does not change - the filter simply removes the
part of the message that changed between versions.

Cheers,

Dave.


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: Don't call btrfs_start_transaction() on frozen fs to avoid deadlock.

2015-01-20 Thread Qu Wenruo


 Original Message 
Subject: Re: [PATCH] btrfs: Don't call btrfs_start_transaction() on 
frozen fs to avoid deadlock.

From: David Sterba dste...@suse.cz
To: Qu Wenruo quwen...@cn.fujitsu.com
Date: 2015年01月21日 01:13

On Mon, Jan 19, 2015 at 03:42:41PM +0800, Qu Wenruo wrote:

--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -1000,6 +1000,14 @@ int btrfs_sync_fs(struct super_block *sb, int wait)
 */
if (fs_info-pending_changes == 0)
return 0;
+   /*
+* Test if the fs is frozen, or start_trasaction
+* will deadlock on itself.
+*/
+   if (__sb_start_write(sb, SB_FREEZE_FS, false))
+   __sb_end_write(sb, SB_FREEZE_FS);
+   else
+   return 0;

The more I look into that the more I think that the first fix is the
right one.

Has been pointed out in this thread, it is ok to skip processing the
pending changes if the filesystem is frozen.
That's good, for me, either this patch or the patch 2~5 in the patchset 
will solve the sync_fs() problem

on frozen fs. Just different timing to start the new transaction.

But the patchset one has the problem, which needs to deal with the sysfs 
interface changes, or sync_fs()

will still cause deadlock.
So I tried to revert the sysfs related patches, but it seems overkilled, 
needing extra btrfs_start_transaction*

things.

As you already picked this one, I'm completely OK with this.


The pending changes have to flushed from sync (by design), we cannot use
mnt_want_write or the sb_start* protections that.

The btrfs_freeze callback can safely do the last commit, that's under
s_umount held by vfs::freeze_super. Then any other new transaction would
block. Any other call to btrfs_sync_fs will not find any active
transaction and with this patch will not start one. Sounds safe to me.

I think the right level to check is SB_FREEZE_WRITE though, to stop any
potential writes as soon as possible and when the s_umount lock is still
held in vfs::freeze_super.

SB_FREEZE_WRITE seems good for me.
But I didn't catch the difference between 
SB_FREEZE_FS(WRITE/PAGEFAULT/COMPLETE),

since freeze() conflicts with sync_fs(), when we comes to btrfs_sync_fs(),
the fs is either totally frozen or unfrozen and frozen level won't 
change during the protection of s_umount.


Although SB_FREEZE_WRITE seems better in its meaning and makes it more 
readable.


I'll collect the relevant patches and will send it for review.

Thanks for collecting them and sending them out.

Thanks,
Qu




trans = btrfs_start_transaction(root, 0);
} else {
return PTR_ERR(trans);


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: Don't call btrfs_start_transaction() on frozen fs to avoid deadlock.

2015-01-20 Thread Qu Wenruo


 Original Message 
Subject: Re: [PATCH] btrfs: Don't call btrfs_start_transaction() on 
frozen fs to avoid deadlock.

From: Miao Xie miao...@huawei.com
To: Qu Wenruo quwen...@cn.fujitsu.com, Chris Mason c...@fb.com
Date: 2015年01月21日 11:26

On Wed, 21 Jan 2015 11:15:41 +0800, Qu Wenruo wrote:

 Original Message 
Subject: Re: [PATCH] btrfs: Don't call btrfs_start_transaction() on frozen fs to
avoid deadlock.
From: Miao Xie miao...@huawei.com
To: Chris Mason c...@fb.com, Qu Wenruo quwen...@cn.fujitsu.com
Date: 2015年01月21日 11:10

On Tue, 20 Jan 2015 20:10:56 -0500, Chris Mason wrote:

On Tue, Jan 20, 2015 at 8:09 PM, Qu Wenruo quwen...@cn.fujitsu.com wrote:

 Original Message 
Subject: Re: [PATCH] btrfs: Don't call btrfs_start_transaction() on frozen fs
to avoid deadlock.
From: Chris Mason c...@fb.com
To: Qu Wenruo quwen...@cn.fujitsu.com
Date: 2015年01月21日 09:05

On Tue, Jan 20, 2015 at 7:58 PM, Qu Wenruo quwen...@cn.fujitsu.com wrote:

 Original Message 
Subject: Re: [PATCH] btrfs: Don't call btrfs_start_transaction() on frozen
fs to avoid deadlock.
From: David Sterba dste...@suse.cz
To: Qu Wenruo quwen...@cn.fujitsu.com
Date: 2015年01月21日 01:13

On Mon, Jan 19, 2015 at 03:42:41PM +0800, Qu Wenruo wrote:

--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -1000,6 +1000,14 @@ int btrfs_sync_fs(struct super_block *sb, int wait)
 */
if (fs_info-pending_changes == 0)
return 0;
+/*
+ * Test if the fs is frozen, or start_trasaction
+ * will deadlock on itself.
+ */
+if (__sb_start_write(sb, SB_FREEZE_FS, false))
+__sb_end_write(sb, SB_FREEZE_FS);
+else
+return 0;

But what if someone freezes the FS after __sb_end_write() and before
btrfs_start_transaction()?   I don't see what keeps new freezers from
coming in.

-chris

Either VFS::freeze_super() and VFS::syncfs() will hold the s_umount mutex, so
freeze will not happen
during sync.

You're right.  I was worried about the sync ioctl, but the mutex won't be held
there to deadlock against.  We'll be fine.

There is another problem which is introduced by pending change. That is we will
start and commmit a transaction by changing pending mount option after we set
the fs to be R/O.

Oh, I missed this problem.

I think it is better that we don't start a new transaction for pending changes
which are set after the transaction is committed, just make them be handled by
the next transaction,

This will cause another problem, nobody can ensure there will be next
transaction and the change may
never to written into disk.

First, the pending changes is mount option, that is in-memory data.
Second, the same problem would happen after you freeze fs.
Pending changes are *not* only mount options. Feature change and label 
change are also pending changes if using sysfs.

Normal ioctl label changing is not affected.

For freeze, it's not the same problem since the fs will be unfreeze 
sooner or later and transaction will be initiated.



For example, if we change the features/label through sysfs, and then umount the 
fs,

It is different from pending change.
No, now features/label changing using sysfs both use pending changes to 
do the commit.

See BTRFS_PENDING_COMMIT bit.
So freeze - change features/label - sync will still cause the deadlock 
in the same way,

and you can try it yourself.

Thanks,
Qu


If you want to change features/label,  you should get write permission and make
sure the fs is not be freezed because those are on-disk data. So the problem
doesn't exist, or there is a bug.

Thanks
Miao


since there is no write, there is no running transaction and if we don't start a
new transaction,
it won't be flushed to disk.

Thanks,
Qu

the reason is:
- Make the behavior of the fs be consistent(both freezed fs and unfreezed fs)
- Data on the disk is right and integrated


Thanks
Miao

.



--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: Don't call btrfs_start_transaction() on frozen fs to avoid deadlock.

2015-01-20 Thread Miao Xie
On Wed, 21 Jan 2015 11:53:34 +0800, Qu Wenruo wrote:
 +/*
 + * Test if the fs is frozen, or start_trasaction
 + * will deadlock on itself.
 + */
 +if (__sb_start_write(sb, SB_FREEZE_FS, false))
 +__sb_end_write(sb, SB_FREEZE_FS);
 +else
 +return 0;
 But what if someone freezes the FS after __sb_end_write() and before
 btrfs_start_transaction()?   I don't see what keeps new freezers from
 coming in.

 -chris
 Either VFS::freeze_super() and VFS::syncfs() will hold the s_umount 
 mutex, so
 freeze will not happen
 during sync.
 You're right.  I was worried about the sync ioctl, but the mutex won't be 
 held
 there to deadlock against.  We'll be fine.
 There is another problem which is introduced by pending change. That is we 
 will
 start and commmit a transaction by changing pending mount option after we 
 set
 the fs to be R/O.
 Oh, I missed this problem.
 I think it is better that we don't start a new transaction for pending 
 changes
 which are set after the transaction is committed, just make them be 
 handled by
 the next transaction,
 This will cause another problem, nobody can ensure there will be next
 transaction and the change may
 never to written into disk.
 First, the pending changes is mount option, that is in-memory data.
 Second, the same problem would happen after you freeze fs.
 Pending changes are *not* only mount options. Feature change and label change
 are also pending changes if using sysfs.

My miss, I don't notice feature and label change by sysfs.

But the implementation of feature and label change by sysfs is wrong, we can
not change them without write permission.

 Normal ioctl label changing is not affected.
 
 For freeze, it's not the same problem since the fs will be unfreeze sooner or
 later and transaction will be initiated.

You can not assume the operations of the users, they might freeze the fs and
then shutdown the machine.


 For example, if we change the features/label through sysfs, and then umount
 the fs,
 It is different from pending change.
 No, now features/label changing using sysfs both use pending changes to do the
 commit.
 See BTRFS_PENDING_COMMIT bit.
 So freeze - change features/label - sync will still cause the deadlock in 
 the
 same way,
 and you can try it yourself.

As I said above, the implementation of sysfs feature and label change is wrong,
it is better to separate them from the pending mount option change, make the
sysfs feature and label change be done in the context of transaction after
getting the write permission. If so, we needn't do anything special when sync
the fs.

In short, changing the sysfs feature and label change implementation and
removing the unnecessary btrfs_start_transaction in sync_fs can fix the
deadlock.

Thanks
Miao

 
 Thanks,
 Qu
 
 If you want to change features/label,  you should get write permission and 
 make
 sure the fs is not be freezed because those are on-disk data. So the problem
 doesn't exist, or there is a bug.

 Thanks
 Miao

 since there is no write, there is no running transaction and if we don't 
 start a
 new transaction,
 it won't be flushed to disk.

 Thanks,
 Qu
 the reason is:
 - Make the behavior of the fs be consistent(both freezed fs and unfreezed 
 fs)
 - Data on the disk is right and integrated


 Thanks
 Miao
 .

 
 .
 

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: Don't call btrfs_start_transaction() on frozen fs to avoid deadlock.

2015-01-20 Thread Qu Wenruo


 Original Message 
Subject: Re: [PATCH] btrfs: Don't call btrfs_start_transaction() on 
frozen fs to avoid deadlock.

From: Miao Xie miao...@huawei.com
To: Qu Wenruo quwen...@cn.fujitsu.com, Chris Mason c...@fb.com
Date: 2015年01月21日 15:04

On Wed, 21 Jan 2015 11:53:34 +0800, Qu Wenruo wrote:

[snipped]
This will cause another problem, nobody can ensure there will be next
transaction and the change may
never to written into disk.

First, the pending changes is mount option, that is in-memory data.
Second, the same problem would happen after you freeze fs.

Pending changes are *not* only mount options. Feature change and label change
are also pending changes if using sysfs.

My miss, I don't notice feature and label change by sysfs.

But the implementation of feature and label change by sysfs is wrong, we can
not change them without write permission.


Normal ioctl label changing is not affected.

For freeze, it's not the same problem since the fs will be unfreeze sooner or
later and transaction will be initiated.

You can not assume the operations of the users, they might freeze the fs and
then shutdown the machine.


For example, if we change the features/label through sysfs, and then umount
the fs,

It is different from pending change.

No, now features/label changing using sysfs both use pending changes to do the
commit.
See BTRFS_PENDING_COMMIT bit.
So freeze - change features/label - sync will still cause the deadlock in the
same way,
and you can try it yourself.

As I said above, the implementation of sysfs feature and label change is wrong,
it is better to separate them from the pending mount option change, make the
sysfs feature and label change be done in the context of transaction after
getting the write permission. If so, we needn't do anything special when sync
the fs.

In short, changing the sysfs feature and label change implementation and
removing the unnecessary btrfs_start_transaction in sync_fs can fix the
deadlock.
Your method will only fix the deadlock, but will introduce the risk like 
pending inode_cache will never
be written to disk as I already explained. (If still using the 
fs_info-pending_changes mechanism)
To ensure pending changes written to disk sync_fs() should start a 
transaction if needed,

or there will be chance that no transaction can handle it.

But I don't see the necessity to pending current work(inode_cache, 
feature/label changes) to next transaction.


To David:
I'm a little curious about why inode_cache needs to be delayed to next 
transaction.
In btrfs_remount() we have s_umount mutex, and we synced the whole 
filesystem already,
so there should be no running transaction and we can just set any mount 
option into fs_info.


Or even in worst case, there is a racing window, we can still start a 
transaction and do the commit,

a little overhead in such minor case won't impact the overall performance.

For sysfs change, I prefer attach or start transaction method, and for 
mount option change, and

such sysfs tuning is also minor case for a filesystem.

What do you think about reverting the whole patchset and rework the 
sysfs interface?


Thanks,
Qu


Thanks
Miao


Thanks,
Qu


If you want to change features/label,  you should get write permission and make
sure the fs is not be freezed because those are on-disk data. So the problem
doesn't exist, or there is a bug.

Thanks
Miao


since there is no write, there is no running transaction and if we don't start a
new transaction,
it won't be flushed to disk.

Thanks,
Qu

the reason is:
- Make the behavior of the fs be consistent(both freezed fs and unfreezed fs)
- Data on the disk is right and integrated


Thanks
Miao

.


.



--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] fstests: btrfs/079: Fix wrong value passed to available space check.

2015-01-20 Thread Qu Wenruo
Wrong value is passed to _require_fs_space, which should be in unit of
kilobyte(1024), but passed in unit of gigabyte(1024^3).

Fix it.

Signed-off-by: Qu Wenruo quwen...@cn.fujitsu.com
---
 tests/btrfs/079 | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/btrfs/079 b/tests/btrfs/079
index 202d3e6..b4a4189 100755
--- a/tests/btrfs/079
+++ b/tests/btrfs/079
@@ -73,7 +73,7 @@ rm -f $seqres.full
 
 _scratch_mkfs $seqres.full 21
 _scratch_mount
-_require_fs_space $SCRATCH_MNT $(($filesize / 1024 / 1024 / 1024))
+_require_fs_space $SCRATCH_MNT $(($filesize / 1024))
 $XFS_IO_PROG -f -c falloc 0 $filesize $testfile
 
 dd_work() {
-- 
2.2.2

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] fstests: btrfs/082: Check return value of btrfs filesystem show command executed on umounted device.

2015-01-20 Thread Qu Wenruo
The return value should always be 0 if no problem happens, but
btrfs filesystem show executed on umounted device will always return 1
even there is no problem.

This testcase just checks it.

Signed-off-by: Qu Wenruo quwen...@cn.fujitsu.com
---
 tests/btrfs/082 | 71 +
 tests/btrfs/082.out |  2 ++
 tests/btrfs/group   |  1 +
 3 files changed, 74 insertions(+)
 create mode 100755 tests/btrfs/082
 create mode 100644 tests/btrfs/082.out

diff --git a/tests/btrfs/082 b/tests/btrfs/082
new file mode 100755
index 000..d33e836
--- /dev/null
+++ b/tests/btrfs/082
@@ -0,0 +1,71 @@
+#! /bin/bash
+# FS QA Test No. btrfs/082
+#
+# Check return value of btrfs filesystem show command executed on
+# umounted device.
+# It should return 0 if nothing wrong happens.
+#
+# Regression in v3.18 btrfs-progs and fixed by the following patch:
+#
+#btrfs-progs: Fix wrong return value when executing 'fi show' on
+#umounted device.
+#
+#---
+# Copyright (c) 2015 Fujitsu, Inc.  All Rights Reserved.
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it would be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write the Free Software Foundation,
+# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#---
+#
+
+seq=`basename $0`
+seqres=$RESULT_DIR/$seq
+echo QA output created by $seq
+
+here=`pwd`
+tmp=/tmp/$$
+status=1   # failure is the default!
+trap _cleanup; exit \$status 0 1 2 3 15
+
+_cleanup()
+{
+cd /
+rm -f $tmp.*
+}
+
+# get standard environment, filters and checks
+. ./common/rc
+. ./common/filter.btrfs
+
+# real QA test starts here
+
+# Modify as appropriate.
+_supported_fs btrfs
+_supported_os Linux
+_require_scratch
+_require_scratch_dev_pool
+
+rm -f $seqres.full
+
+FIRST_POOL_DEV=`echo $SCRATCH_DEV_POOL | awk '{print $1}'`
+TOTAL_DEVS=`echo $SCRATCH_DEV_POOL | wc -w`
+
+_scratch_pool_mkfs  $seqres.full 21 || _fail mkfs failed
+
+_run_btrfs_util_prog filesystem show $FIRST_POOL_DEV | \
+   _filter_btrfs_filesystem_show $TOTAL_DEVS
+
+# success, all done
+echo Silence is golden
+status=0
+exit
diff --git a/tests/btrfs/082.out b/tests/btrfs/082.out
new file mode 100644
index 000..2977f14
--- /dev/null
+++ b/tests/btrfs/082.out
@@ -0,0 +1,2 @@
+QA output created by 082
+Silence is golden
diff --git a/tests/btrfs/group b/tests/btrfs/group
index e2b..709664a 100644
--- a/tests/btrfs/group
+++ b/tests/btrfs/group
@@ -84,3 +84,4 @@
 079 auto rw metadata
 080 auto snapshot
 081 auto quick clone
+082 auto quick
-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html