Re: [PATCH] btrfs: Don't continue mounting when superblock csum mismatches even generation is less than 10.

2014-08-06 Thread Qu Wenruo

It seems that the patch is rejected in patchwork,

Could any one tell me the reason?

Thanks,
Qu
 Original Message 
Subject: [PATCH] btrfs: Don't continue mounting when superblock csum 
mismatches even generation is less than 10.

From: Qu Wenruo 
To: linux-btrfs@vger.kernel.org
Date: 2014年06月24日 16:49

Revert kernel commit 667e7d94a1683661cff5fe9a0fa0d7f8fdd2c007.
(Btrfs: allow superblock mismatch from older mkfs by Chris Mason)

Above commit will cause disaster if someone try to mount a newly created but
later corrupted btrfs filesystem.

And before btrfs entered mainline, btrfs-progs has already superblock
checksum. See btrfs-progs commit: 5ccd1715fa2eaad0b26037bb53706779c8c93b5f
(superblock duplication by Yan Zheng).
Before commit 5ccd17, mkfs.btrfs uses 16K as super offset, while current btrfs
uses 64K super offset, anyway old btrfs without super csum will not be
mountable due to the change of super offset.

So backward compatibility is not a problem.

Signed-off-by: Qu Wenruo 
---
  fs/btrfs/disk-io.c | 6 --
  1 file changed, 6 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 8bb4aa1..dbfb2a3 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -400,12 +400,6 @@ static int btrfs_check_super_csum(char *raw_disk_sb)
  
  		if (memcmp(raw_disk_sb, result, csum_size))

ret = 1;
-
-   if (ret && btrfs_super_generation(disk_sb) < 10) {
-   printk(KERN_WARNING
-   "BTRFS: super block crcs don't match, older mkfs 
detected\n");
-   ret = 0;
-   }
}
  
  	if (csum_type >= ARRAY_SIZE(btrfs_csum_sizes)) {


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/3] btrfs-progs: remove unused parameter in rollback for btrfs-convert

2014-08-06 Thread Gui Hecheng
The @force parameter for function @do_rollback is never checked
or used, remove it.

Signed-off-by: Gui Hecheng 
---
 btrfs-convert.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/btrfs-convert.c b/btrfs-convert.c
index 2ea6a09..21bbc70 100644
--- a/btrfs-convert.c
+++ b/btrfs-convert.c
@@ -2376,7 +2376,7 @@ fail:
return -1;
 }
 
-static int do_rollback(const char *devname, int force)
+static int do_rollback(const char *devname)
 {
int fd = -1;
int ret;
@@ -2754,7 +2754,7 @@ int main(int argc, char *argv[])
}
 
if (rollback) {
-   ret = do_rollback(file, 0);
+   ret = do_rollback(file);
} else {
ret = do_convert(file, datacsum, packing, noxattr);
}
-- 
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/3] btrfs-progs: make close_ctree return void

2014-08-06 Thread Gui Hecheng
The close_ctree always returns 0 and the stuff that depends on
its return value is of no sense.
Just make close_ctree return void.

Signed-off-by: Gui Hecheng 
---
 btrfs-convert.c| 14 +++---
 btrfs-debug-tree.c |  3 ++-
 btrfs-image.c  |  4 ++--
 disk-io.c  |  3 +--
 disk-io.h  |  2 +-
 mkfs.c |  3 +--
 6 files changed, 10 insertions(+), 19 deletions(-)

diff --git a/btrfs-convert.c b/btrfs-convert.c
index 21bbc70..c685fcf 100644
--- a/btrfs-convert.c
+++ b/btrfs-convert.c
@@ -2297,11 +2297,7 @@ static int do_convert(const char *devname, int datacsum, 
int packing,
fprintf(stderr, "error during cleanup_sys_chunk %d\n", ret);
goto fail;
}
-   ret = close_ctree(root);
-   if (ret) {
-   fprintf(stderr, "error during close_ctree %d\n", ret);
-   goto fail;
-   }
+   close_ctree(root);
close_ext2fs(ext2_fs);
 
/*
@@ -2325,7 +2321,7 @@ static int do_convert(const char *devname, int datacsum, 
int packing,
fprintf(stderr, "error during fixup_chunk_tree\n");
goto fail;
}
-   ret = close_ctree(root);
+   close_ctree(root);
close(fd);
 
printf("conversion complete.\n");
@@ -2591,11 +2587,7 @@ next_extent:
ret = btrfs_commit_transaction(trans, root);
BUG_ON(ret);
 
-   ret = close_ctree(root);
-   if (ret) {
-   fprintf(stderr, "error during close_ctree %d\n", ret);
-   goto fail;
-   }
+   close_ctree(root);
 
/* zero btrfs super block mirrors */
memset(buf, 0, sectorsize);
diff --git a/btrfs-debug-tree.c b/btrfs-debug-tree.c
index e46500d..aeeeb74 100644
--- a/btrfs-debug-tree.c
+++ b/btrfs-debug-tree.c
@@ -408,5 +408,6 @@ no_node:
printf("uuid %s\n", uuidbuf);
printf("%s\n", BTRFS_BUILD_VERSION);
 close_root:
-   return close_ctree(root);
+   close_ctree(root);
+   return 0;
 }
diff --git a/btrfs-image.c b/btrfs-image.c
index 985aa26..bb3d0a9 100644
--- a/btrfs-image.c
+++ b/btrfs-image.c
@@ -1333,8 +1333,8 @@ out:
metadump_destroy(&metadump, num_threads);
 
btrfs_free_path(path);
-   ret = close_ctree(root);
-   return err ? err : ret;
+   close_ctree(root);
+   return err ? err : 0;
 }
 
 static void update_super_old(u8 *buffer)
diff --git a/disk-io.c b/disk-io.c
index d10d647..f13d46d 100644
--- a/disk-io.c
+++ b/disk-io.c
@@ -1410,7 +1410,7 @@ int write_ctree_super(struct btrfs_trans_handle *trans,
return ret;
 }
 
-int close_ctree(struct btrfs_root *root)
+void close_ctree(struct btrfs_root *root)
 {
int ret;
struct btrfs_trans_handle *trans;
@@ -1436,7 +1436,6 @@ int close_ctree(struct btrfs_root *root)
btrfs_close_devices(fs_info->fs_devices);
btrfs_cleanup_all_caches(fs_info);
btrfs_free_fs_info(fs_info);
-   return 0;
 }
 
 int clean_tree_block(struct btrfs_trans_handle *trans, struct btrfs_root *root,
diff --git a/disk-io.h b/disk-io.h
index 13d4420..87a69b5 100644
--- a/disk-io.h
+++ b/disk-io.h
@@ -78,7 +78,7 @@ struct btrfs_root *open_ctree_fd(int fp, const char *path, 
u64 sb_bytenr,
 struct btrfs_fs_info *open_ctree_fs_info(const char *filename,
 u64 sb_bytenr, u64 root_tree_bytenr,
 enum btrfs_open_ctree_flags flags);
-int close_ctree(struct btrfs_root *root);
+void close_ctree(struct btrfs_root *root);
 int write_all_supers(struct btrfs_root *root);
 int write_ctree_super(struct btrfs_trans_handle *trans,
  struct btrfs_root *root);
diff --git a/mkfs.c b/mkfs.c
index d980d4a..5e55e72 100644
--- a/mkfs.c
+++ b/mkfs.c
@@ -1661,8 +1661,7 @@ raid_groups:
BUG_ON(ret);
}
 
-   ret = close_ctree(root);
-   BUG_ON(ret);
+   close_ctree(root);
free(label);
return 0;
 }
-- 
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/3] btrfs-progs: check option conflict for btrfs-convert

2014-08-06 Thread Gui Hecheng
The -d, -i, -n options make no sense to rollback.
Check the improper usages such as:
# btrfs-convert -r -d 

Signed-off-by: Gui Hecheng 
---
 btrfs-convert.c | 13 +
 1 file changed, 13 insertions(+)

diff --git a/btrfs-convert.c b/btrfs-convert.c
index 952e3e6..2ea6a09 100644
--- a/btrfs-convert.c
+++ b/btrfs-convert.c
@@ -2699,6 +2699,7 @@ int main(int argc, char *argv[])
int noxattr = 0;
int datacsum = 1;
int rollback = 0;
+   int usage_error = 0;
char *file;
while(1) {
int c = getopt(argc, argv, "dinr");
@@ -2729,6 +2730,18 @@ int main(int argc, char *argv[])
return 1;
}
 
+   if (rollback) {
+   if (!datacsum || noxattr || !packing) {
+   fprintf(stderr, "Usage error: -d, -i, -n options do not 
apply to rollback\n");
+   usage_error++;
+   }
+   }
+
+   if (usage_error) {
+   print_usage();
+   return 1;
+   }
+
file = argv[optind];
ret = check_mounted(file);
if (ret < 0) {
-- 
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: fix compressed write corruption on enospc

2014-08-06 Thread Chris Mason
On 08/06/2014 11:18 AM, Chris Mason wrote:
> On 08/06/2014 10:43 AM, Martin Steigerwald wrote:
>> Am Mittwoch, 6. August 2014, 09:35:51 schrieb Chris Mason:
>>> On 08/06/2014 06:21 AM, Martin Steigerwald wrote:
> I think this should go to stable. Thanks, Liu.
>>>
>>> I'm definitely tagging this for stable.
>>>
 Unfortunately this fix does not seem to fix all lockups.
>>>
>>> The traces below are a little different, could you please send the whole
>>> file?
>>
>> Will paste it at the end.
> 
> [90496.156016] kworker/u8:14   D 880044e38540 0 21050  2 
> 0x
> [90496.157683] Workqueue: btrfs-delalloc normal_work_helper [btrfs]
> [90496.159320]  88022880f990 0002 880407f649b0 
> 88022880ffd8
> [90496.160997]  880044e38000 00013040 880044e38000 
> 7fff
> [90496.162686]  880301383aa0 0002 814705d0 
> 880301383a98
> [90496.164360] Call Trace:
> [90496.166028]  [] ? michael_mic.part.6+0x21/0x21
> [90496.167854]  [] schedule+0x64/0x66
> [90496.169574]  [] schedule_timeout+0x2f/0x114
> [90496.171221]  [] ? wake_up_process+0x2f/0x32
> [90496.172867]  [] ? get_parent_ip+0xd/0x3c
> [90496.174472]  [] ? preempt_count_add+0x7b/0x8e
> [90496.176053]  [] __wait_for_common+0x11e/0x163
> [90496.177619]  [] ? __wait_for_common+0x11e/0x163
> [90496.179173]  [] ? wake_up_state+0xd/0xd
> [90496.180728]  [] wait_for_completion+0x1f/0x21
> [90496.182285]  [] btrfs_async_run_delayed_refs+0xbf/0xd9 
> [btrfs]
> [90496.183833]  [] __btrfs_end_transaction+0x2b6/0x2ec 
> [btrfs]
> [90496.185380]  [] btrfs_end_transaction+0xb/0xd [btrfs]
> [90496.186940]  [] find_free_extent+0x8a9/0x976 [btrfs]
> [90496.189464]  [] btrfs_reserve_extent+0x6f/0x119 [btrfs]
> [90496.191326]  [] cow_file_range+0x1a6/0x377 [btrfs]
> [90496.193080]  [] ? extent_write_locked_range+0x10c/0x11e 
> [btrfs]
> [90496.194659]  [] submit_compressed_extents+0x100/0x412 
> [btrfs]
> [90496.196225]  [] ? debug_smp_processor_id+0x17/0x19
> [90496.197776]  [] async_cow_submit+0x82/0x87 [btrfs]
> [90496.199383]  [] normal_work_helper+0x153/0x224 [btrfs]
> [90496.200944]  [] process_one_work+0x16f/0x2b8
> [90496.202483]  [] worker_thread+0x27b/0x32e
> [90496.204000]  [] ? cancel_delayed_work_sync+0x10/0x10
> [90496.205514]  [] kthread+0xb2/0xba
> [90496.207040]  [] ? ap_handle_dropped_data+0xf/0xc8
> [90496.208565]  [] ? __kthread_parkme+0x62/0x62
> [90496.210096]  [] ret_from_fork+0x7c/0xb0
> [90496.211618]  [] ? __kthread_parkme+0x62/0x62
> 
> 
> Ok, this should explain the hang.  submit_compressed_extents is calling
> cow_file_range with a locked page.
> 
> cow_file_range is trying to find a free extent and in the process is
> calling btrfs_end_transaction, which is running the async delayed refs,
> which is trying to write dirty pages, which is waiting for your locked
> page.
> 
> I should be able to reproduce this ;)

This part of the trace is relatively new because Liu Bo's patch made us
redirty the pages, making it more likely that we'd try to write them
during commit.

But, at the end of the day we have a fundamental deadlock with
committing a transaction while holding a locked page from an ordered file.

For now, I'm ripping out the strict ordered file and going back to a
best-effort filemap_flush like ext4 is using.

-chris

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: Avoid trucating page or punching hole in a already existed hole.

2014-08-06 Thread Qu Wenruo

Hi Filipe,

Thanks for the test result, I'll investigate it soon.

Also I'll fix the code style problem too.

Thanks,
Qu
 Original Message 
Subject: Re: [PATCH] btrfs: Avoid trucating page or punching hole in a 
already existed hole.

From: Filipe David Manana 
To: Qu Wenruo 
Date: 2014年08月07日 02:58

On Fri, May 30, 2014 at 8:16 AM, Qu Wenruo  wrote:

btrfs_punch_hole() will truncate unaligned pages or punch hole on a
already existed hole.
This will cause unneeded zero page or holes splitting the original huge
hole.

This patch will skip already existed holes before any page truncating or
hole punching.

Signed-off-by: Qu Wenruo 

Hi Qu,

FYI, this change seems to introduce some regressions when using the
NO_HOLES feature, and it's easy to reproduce with xfstests, where at
least 3 tests fail in a deterministic way:

$ cat /home/fdmanana/git/hub/xfstests/local.config
export TEST_DEV=/dev/sdb
export TEST_DIR=/home/fdmanana/btrfs-tests/dev
export SCRATCH_MNT="/home/fdmanana/btrfs-tests/scratch_1"
export FSTYP=btrfs

$ cd /home/fdmanana/git/hub/xfstests
$ mkfs.btrfs -f -O no-holes /dev/sdb
Performing full device TRIM (100.00GiB) ...
Turning ON incompat feature 'extref': increased hardlink limit per file to 65536
fs created label (null) on /dev/sdb
nodesize 16384 leafsize 16384 sectorsize 4096 size 100.00GiB
Btrfs v3.14.1-96-gcc7fd5a-dirty

$ ./check generic/075  generic/091  generic/112
FSTYP -- btrfs
PLATFORM  -- Linux/x86_64 debian-vm3 3.16.0-rc6-fdm-btrfs-next-38+

generic/075 87s ... [failed, exit status 1] - output mismatch (see
/home/fdmanana/git/hub/xfstests/results//generic/075.out.bad)
 --- tests/generic/075.out 2014-08-06 20:30:02.986012249 +0100
 +++ /home/fdmanana/git/hub/xfstests/results//generic/075.out.bad
2014-08-06 20:42:23.386012249 +0100
 @@ -4,15 +4,5 @@
  ---
  fsx.0 : -d -N numops -S 0
  ---
 -
 
 -fsx.1 : -d -N numops -S 0 -x
 
 ...
 (Run 'diff -u tests/generic/075.out
/home/fdmanana/git/hub/xfstests/results//generic/075.out.bad'  to see
the entire diff)
generic/091 56s ... [failed, exit status 1] - output mismatch (see
/home/fdmanana/git/hub/xfstests/results//generic/091.out.bad)
 --- tests/generic/091.out 2014-02-21 19:11:09.460443001 +
 +++ /home/fdmanana/git/hub/xfstests/results//generic/091.out.bad
2014-08-06 20:42:25.262012249 +0100
 @@ -1,7 +1,601 @@
  QA output created by 091
  fsx -N 1 -l 50 -r PSIZE -t BSIZE -w BSIZE -Z -R -W
 -fsx -N 1 -o 8192 -l 50 -r PSIZE -t BSIZE -w BSIZE -Z -R -W
 -fsx -N 1 -o 32768 -l 50 -r PSIZE -t BSIZE -w BSIZE -Z -R -W
 -fsx -N 1 -o 8192 -l 50 -r PSIZE -t BSIZE -w BSIZE -Z -R -W
 -fsx -N 1 -o 32768 -l 50 -r PSIZE -t BSIZE -w BSIZE -Z -R -W
 -fsx -N 1 -o 128000 -l 50 -r PSIZE -t BSIZE -w BSIZE -Z -W
 ...
 (Run 'diff -u tests/generic/091.out
/home/fdmanana/git/hub/xfstests/results//generic/091.out.bad'  to see
the entire diff)
generic/112 93s ... [failed, exit status 1] - output mismatch (see
/home/fdmanana/git/hub/xfstests/results//generic/112.out.bad)
 --- tests/generic/112.out 2014-02-21 19:11:09.460443001 +
 +++ /home/fdmanana/git/hub/xfstests/results//generic/112.out.bad
2014-08-06 20:42:28.930012249 +0100
 @@ -4,15 +4,5 @@
  ---
  fsx.0 : -A -d -N numops -S 0
  ---
 -
 
 -fsx.1 : -A -d -N numops -S 0 -x
 
 ...
 (Run 'diff -u tests/generic/112.out
/home/fdmanana/git/hub/xfstests/results//generic/112.out.bad'  to see
the entire diff)
Ran: generic/075 generic/091 generic/112
Failures: generic/075 generic/091 generic/112
Failed 3 of 3 tests


Without NO_HOLES, those tests pass. With NO_HOLES and without this
patch, the tests pass too.
Do you think you can take a look at this?

(A couple nitpick comments below too)

Thanks Qu



---
  fs/btrfs/file.c | 112 +---
  1 file changed, 98 insertions(+), 14 deletions(-)

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index ae6af07..93915d1 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -2168,6 +2168,37 @@ out:
 return 0;
  }

+/*
+ * Find a hole extent on given inode and change start/len to the end of hole
+ * extent.(hole/vacuum extent whose em->start <= start &&
+ *em->start + em->len > start)
+ * When a hole extent is found, return 1 and modify start/len.
+ */
+static int find_first_non_hole(struct inode *inode, u64 *start, u64 *len)
+{
+   struct extent_map *em;
+   int ret = 0;
+
+   em = btrfs_get_extent(inode, NULL, 0, *start, *len

btrfs send handle missing space on target device

2014-08-06 Thread GEO
Hi,

I wonder how I should handle missing space on the backup device when doing 
incremental backup with btrfs send?
I mean for the initial bootstrap it is easy, I simply compare the free space 
of the target device with the size of home (we are talking about incremental 
backups of home), but how should I handle this with

sudo btrfs send -p @home-backup @home-backup | sudo btrfs receive /mnt/backup/

What will happen by default if the space on the target is not enough? Will the 
command start writing to  /mnt/backup/ at all and stop when the disk is full, 
or will it return an error code? If it doesn't return an error code, how can I 
handle this before the transaction?

Thanks in advance!
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: Avoid trucating page or punching hole in a already existed hole.

2014-08-06 Thread Filipe David Manana
On Fri, May 30, 2014 at 8:16 AM, Qu Wenruo  wrote:
> btrfs_punch_hole() will truncate unaligned pages or punch hole on a
> already existed hole.
> This will cause unneeded zero page or holes splitting the original huge
> hole.
>
> This patch will skip already existed holes before any page truncating or
> hole punching.
>
> Signed-off-by: Qu Wenruo 

Hi Qu,

FYI, this change seems to introduce some regressions when using the
NO_HOLES feature, and it's easy to reproduce with xfstests, where at
least 3 tests fail in a deterministic way:

$ cat /home/fdmanana/git/hub/xfstests/local.config
export TEST_DEV=/dev/sdb
export TEST_DIR=/home/fdmanana/btrfs-tests/dev
export SCRATCH_MNT="/home/fdmanana/btrfs-tests/scratch_1"
export FSTYP=btrfs

$ cd /home/fdmanana/git/hub/xfstests
$ mkfs.btrfs -f -O no-holes /dev/sdb
Performing full device TRIM (100.00GiB) ...
Turning ON incompat feature 'extref': increased hardlink limit per file to 65536
fs created label (null) on /dev/sdb
nodesize 16384 leafsize 16384 sectorsize 4096 size 100.00GiB
Btrfs v3.14.1-96-gcc7fd5a-dirty

$ ./check generic/075  generic/091  generic/112
FSTYP -- btrfs
PLATFORM  -- Linux/x86_64 debian-vm3 3.16.0-rc6-fdm-btrfs-next-38+

generic/075 87s ... [failed, exit status 1] - output mismatch (see
/home/fdmanana/git/hub/xfstests/results//generic/075.out.bad)
--- tests/generic/075.out 2014-08-06 20:30:02.986012249 +0100
+++ /home/fdmanana/git/hub/xfstests/results//generic/075.out.bad
2014-08-06 20:42:23.386012249 +0100
@@ -4,15 +4,5 @@
 ---
 fsx.0 : -d -N numops -S 0
 ---
-

-fsx.1 : -d -N numops -S 0 -x

...
(Run 'diff -u tests/generic/075.out
/home/fdmanana/git/hub/xfstests/results//generic/075.out.bad'  to see
the entire diff)
generic/091 56s ... [failed, exit status 1] - output mismatch (see
/home/fdmanana/git/hub/xfstests/results//generic/091.out.bad)
--- tests/generic/091.out 2014-02-21 19:11:09.460443001 +
+++ /home/fdmanana/git/hub/xfstests/results//generic/091.out.bad
2014-08-06 20:42:25.262012249 +0100
@@ -1,7 +1,601 @@
 QA output created by 091
 fsx -N 1 -l 50 -r PSIZE -t BSIZE -w BSIZE -Z -R -W
-fsx -N 1 -o 8192 -l 50 -r PSIZE -t BSIZE -w BSIZE -Z -R -W
-fsx -N 1 -o 32768 -l 50 -r PSIZE -t BSIZE -w BSIZE -Z -R -W
-fsx -N 1 -o 8192 -l 50 -r PSIZE -t BSIZE -w BSIZE -Z -R -W
-fsx -N 1 -o 32768 -l 50 -r PSIZE -t BSIZE -w BSIZE -Z -R -W
-fsx -N 1 -o 128000 -l 50 -r PSIZE -t BSIZE -w BSIZE -Z -W
...
(Run 'diff -u tests/generic/091.out
/home/fdmanana/git/hub/xfstests/results//generic/091.out.bad'  to see
the entire diff)
generic/112 93s ... [failed, exit status 1] - output mismatch (see
/home/fdmanana/git/hub/xfstests/results//generic/112.out.bad)
--- tests/generic/112.out 2014-02-21 19:11:09.460443001 +
+++ /home/fdmanana/git/hub/xfstests/results//generic/112.out.bad
2014-08-06 20:42:28.930012249 +0100
@@ -4,15 +4,5 @@
 ---
 fsx.0 : -A -d -N numops -S 0
 ---
-

-fsx.1 : -A -d -N numops -S 0 -x

...
(Run 'diff -u tests/generic/112.out
/home/fdmanana/git/hub/xfstests/results//generic/112.out.bad'  to see
the entire diff)
Ran: generic/075 generic/091 generic/112
Failures: generic/075 generic/091 generic/112
Failed 3 of 3 tests


Without NO_HOLES, those tests pass. With NO_HOLES and without this
patch, the tests pass too.
Do you think you can take a look at this?

(A couple nitpick comments below too)

Thanks Qu


> ---
>  fs/btrfs/file.c | 112 
> +---
>  1 file changed, 98 insertions(+), 14 deletions(-)
>
> diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
> index ae6af07..93915d1 100644
> --- a/fs/btrfs/file.c
> +++ b/fs/btrfs/file.c
> @@ -2168,6 +2168,37 @@ out:
> return 0;
>  }
>
> +/*
> + * Find a hole extent on given inode and change start/len to the end of hole
> + * extent.(hole/vacuum extent whose em->start <= start &&
> + *em->start + em->len > start)
> + * When a hole extent is found, return 1 and modify start/len.
> + */
> +static int find_first_non_hole(struct inode *inode, u64 *start, u64 *len)
> +{
> +   struct extent_map *em;
> +   int ret = 0;
> +
> +   em = btrfs_get_extent(inode, NULL, 0, *start, *len, 0);
> +   if (IS_ERR_OR_NULL(em)) {
> +   if (!em)
> +   ret = -ENOMEM;
> +   else
> +   ret = PTR_ERR(em);
> +   return ret;
> +   }
> +
> +   /* Hole or vacuum extent(only exists in no-hole mode) */
> +  

Re: btrfs send/receive does not preserve +C (nocow) attribute

2014-08-06 Thread Juan Orti Alcaine
El Miércoles, 6 de agosto de 2014 17:56:52 Jakub Klinkovský escribió:
> I've been testing btrfs send/receive, and to my surprise, the +C (nocow)
> attribute I have set on directory for virtual machines images is not
> preserved. Is this expected? Having found no information about this issue,
> I'm asking here.
> 
> Info:
> $ uname -a
> Linux asusntb 3.15.8-1-ARCH #1 SMP PREEMPT Fri Aug 1 08:51:42 CEST 2014
> x86_64 GNU/Linux $ btrfs --version
> Btrfs v3.14.2-dirty
> 
> Both source and target btrfs filesystems are mounted with the same mount
> options:
>   rw,nosuid,nodev,noatime,compress=lzo,space_cache,autodefrag,commit=60
> 
> $ lsattr ~/virtual_machines/
> ---C virtual_machines/archlinux-btrfs.raw
> ---C virtual_machines/archlinux.raw
> ---C virtual_machines/winxp.raw
> 
> $ lsattr /media/backup/virtual_machines/2014-08-06/
> 
> /media/backup/virtual_machines/2014-08-06/archlinux-btrfs.raw
>  /media/backup/virtual_machines/2014-08-06/archlinux.raw
>  /media/backup/virtual_machines/2014-08-06/winxp.raw
> 
> Regards,
> 
> --
> jlk

Nor do file capabilities get preserved:
https://bugzilla.kernel.org/show_bug.cgi?id=68891
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


btrfs send/receive does not preserve +C (nocow) attribute

2014-08-06 Thread Jakub Klinkovský
I've been testing btrfs send/receive, and to my surprise, the +C (nocow)
attribute I have set on directory for virtual machines images is not preserved.
Is this expected? Having found no information about this issue, I'm asking here.

Info:
$ uname -a
Linux asusntb 3.15.8-1-ARCH #1 SMP PREEMPT Fri Aug 1 08:51:42 CEST 2014 x86_64 
GNU/Linux
$ btrfs --version
Btrfs v3.14.2-dirty

Both source and target btrfs filesystems are mounted with the same mount
options:
  rw,nosuid,nodev,noatime,compress=lzo,space_cache,autodefrag,commit=60

$ lsattr ~/virtual_machines/
---C virtual_machines/archlinux-btrfs.raw
---C virtual_machines/archlinux.raw
---C virtual_machines/winxp.raw

$ lsattr /media/backup/virtual_machines/2014-08-06/
 /media/backup/virtual_machines/2014-08-06/archlinux-btrfs.raw
 /media/backup/virtual_machines/2014-08-06/archlinux.raw
 /media/backup/virtual_machines/2014-08-06/winxp.raw

Regards,

--
jlk


pgp8mk6NTiazg.pgp
Description: PGP signature


Re: [PATCH] Btrfs: fix compressed write corruption on enospc

2014-08-06 Thread Chris Mason
On 08/06/2014 10:43 AM, Martin Steigerwald wrote:
> Am Mittwoch, 6. August 2014, 09:35:51 schrieb Chris Mason:
>> On 08/06/2014 06:21 AM, Martin Steigerwald wrote:
 I think this should go to stable. Thanks, Liu.
>>
>> I'm definitely tagging this for stable.
>>
>>> Unfortunately this fix does not seem to fix all lockups.
>>
>> The traces below are a little different, could you please send the whole
>> file?
> 
> Will paste it at the end.

[90496.156016] kworker/u8:14   D 880044e38540 0 21050  2 0x
[90496.157683] Workqueue: btrfs-delalloc normal_work_helper [btrfs]
[90496.159320]  88022880f990 0002 880407f649b0 
88022880ffd8
[90496.160997]  880044e38000 00013040 880044e38000 
7fff
[90496.162686]  880301383aa0 0002 814705d0 
880301383a98
[90496.164360] Call Trace:
[90496.166028]  [] ? michael_mic.part.6+0x21/0x21
[90496.167854]  [] schedule+0x64/0x66
[90496.169574]  [] schedule_timeout+0x2f/0x114
[90496.171221]  [] ? wake_up_process+0x2f/0x32
[90496.172867]  [] ? get_parent_ip+0xd/0x3c
[90496.174472]  [] ? preempt_count_add+0x7b/0x8e
[90496.176053]  [] __wait_for_common+0x11e/0x163
[90496.177619]  [] ? __wait_for_common+0x11e/0x163
[90496.179173]  [] ? wake_up_state+0xd/0xd
[90496.180728]  [] wait_for_completion+0x1f/0x21
[90496.182285]  [] btrfs_async_run_delayed_refs+0xbf/0xd9 
[btrfs]
[90496.183833]  [] __btrfs_end_transaction+0x2b6/0x2ec [btrfs]
[90496.185380]  [] btrfs_end_transaction+0xb/0xd [btrfs]
[90496.186940]  [] find_free_extent+0x8a9/0x976 [btrfs]
[90496.189464]  [] btrfs_reserve_extent+0x6f/0x119 [btrfs]
[90496.191326]  [] cow_file_range+0x1a6/0x377 [btrfs]
[90496.193080]  [] ? extent_write_locked_range+0x10c/0x11e 
[btrfs]
[90496.194659]  [] submit_compressed_extents+0x100/0x412 
[btrfs]
[90496.196225]  [] ? debug_smp_processor_id+0x17/0x19
[90496.197776]  [] async_cow_submit+0x82/0x87 [btrfs]
[90496.199383]  [] normal_work_helper+0x153/0x224 [btrfs]
[90496.200944]  [] process_one_work+0x16f/0x2b8
[90496.202483]  [] worker_thread+0x27b/0x32e
[90496.204000]  [] ? cancel_delayed_work_sync+0x10/0x10
[90496.205514]  [] kthread+0xb2/0xba
[90496.207040]  [] ? ap_handle_dropped_data+0xf/0xc8
[90496.208565]  [] ? __kthread_parkme+0x62/0x62
[90496.210096]  [] ret_from_fork+0x7c/0xb0
[90496.211618]  [] ? __kthread_parkme+0x62/0x62


Ok, this should explain the hang.  submit_compressed_extents is calling
cow_file_range with a locked page.

cow_file_range is trying to find a free extent and in the process is
calling btrfs_end_transaction, which is running the async delayed refs,
which is trying to write dirty pages, which is waiting for your locked
page.

I should be able to reproduce this ;)

-chris

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: Fix memory corruption by ulist_add_merge() on 32bit arch

2014-08-06 Thread Takashi Iwai
At Wed, 30 Jul 2014 19:45:36 +0200,
Takashi Iwai wrote:
> 
> At Wed, 30 Jul 2014 13:00:24 -0400,
> Josef Bacik wrote:
> > 
> > On 07/30/2014 12:35 PM, Takashi Iwai wrote:
> > > At Wed, 30 Jul 2014 12:01:31 -0400,
> > > Josef Bacik wrote:
> > >>
> > >> On 07/30/2014 11:52 AM, Takashi Iwai wrote:
> > >>> At Wed, 30 Jul 2014 11:40:14 -0400,
> > >>> Josef Bacik wrote:
> > 
> >  On 07/30/2014 11:05 AM, Takashi Iwai wrote:
> > > At Wed, 30 Jul 2014 17:01:52 +0200,
> > > Takashi Iwai wrote:
> > >>
> > >> At Wed, 30 Jul 2014 10:29:46 -0400,
> > >> Josef Bacik wrote:
> > >>>
> > >>> On 07/30/2014 05:57 AM, Takashi Iwai wrote:
> >  At Mon, 28 Jul 2014 16:01:55 +0200,
> >  Takashi Iwai wrote:
> > >
> > > At Mon, 28 Jul 2014 15:48:41 +0200,
> > > Takashi Iwai wrote:
> > >>
> > >> At Mon, 28 Jul 2014 09:16:48 -0400,
> > >> Josef Bacik wrote:
> > >>>
> > >>> On 07/28/2014 04:57 AM, Takashi Iwai wrote:
> >  We've got bug reports that btrfs crashes when quota is enabled 
> >  on
> >  32bit kernel, typically with the Oops like below:
> >    BUG: unable to handle kernel NULL pointer dereference at 
> >  0004
> >    IP: [] find_parent_nodes+0x360/0x1380 [btrfs]
> >    *pde = 
> >    Oops:  [#1] SMP
> >    CPU: 0 PID: 151 Comm: kworker/u8:2 Tainted: G S  W 
> >  3.15.2-1.gd43d97e-default #1
> >    Workqueue: btrfs-qgroup-rescan normal_work_helper [btrfs]
> >    task: f1478130 ti: f147c000 task.ti: f147c000
> >    EIP: 0060:[] EFLAGS: 00010213 CPU: 0
> >    EIP is at find_parent_nodes+0x360/0x1380 [btrfs]
> >    EAX: f147dda8 EBX: f147ddb0 ECX: 0011 EDX: 
> >    ESI:  EDI: f147dda4 EBP: f147ddf8 ESP: f147dd38
> > DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
> >    CR0: 8005003b CR2: 0004 CR3: 00bf3000 CR4: 0690
> >    Stack:
> >   f147dda4 0050 0001  
> >  0001 0050
> > 0001  d3059000 0001 0022 00a8 
> >   
> >  00a1   0001  
> >   1180
> >    Call Trace:
> > [] __btrfs_find_all_roots+0x9d/0xf0 [btrfs]
> > [] btrfs_qgroup_rescan_worker+0x401/0x760 
> >  [btrfs]
> > [] normal_work_helper+0xc8/0x270 [btrfs]
> > [] process_one_work+0x11b/0x390
> > [] worker_thread+0x101/0x340
> > [] kthread+0x9b/0xb0
> > [] ret_from_kernel_thread+0x21/0x30
> > [] kthread_create_on_node+0x110/0x110
> > 
> >  This indicates a NULL corruption in prefs_delayed list.  The 
> >  further
> >  investigation and bisection pointed that the call of 
> >  ulist_add_merge()
> >  results in the corruption.
> > 
> >  ulist_add_merge() takes u64 as aux and writes a 64bit value 
> >  into
> >  old_aux.  The callers of this function in backref.c, however, 
> >  pass a
> >  pointer of a pointer to old_aux.  That is, the function 
> >  overwrites
> >  64bit value on 32bit pointer.  This caused a NULL in the 
> >  adjacent
> >  variable, in this case, prefs_delayed.
> > 
> >  Here is a quick attempt to band-aid over this: a new function,
> >  ulist_add_merge_ptr() is introduced to pass/store properly a 
> >  pointer
> >  value instead of u64.  There are still ugly void ** cast 
> >  remaining
> >  in the callers because void ** cannot be taken implicitly.  
> >  But, it's
> >  safer than explicit cast to u64, anyway.
> > 
> >  Bugzilla: 
> >  https://urldefense.proofpoint.com/v1/url?u=https://bugzilla.novell.com/show_bug.cgi?id%3D887046&k=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0A&r=cKCbChRKsMpTX8ybrSkonQ%3D%3D%0A&m=m3qrbo6ngjqKO%2B7ofuwRfQflb9Cx%2FXrF8TKejkPjxfA%3D%0A&s=199a5b6f0ed181925e9ba2c1060fe20d1c8ad2831dd1d96cc7eddd2a343fa72b
> >  Cc:  [v3.11+]
> >  Signed-off-by: Takashi Iwai 
> >  ---
> > 
> >  Alternatively, we can change the argument of aux and old_aux 
> >  to a
> >  pointer from u64, as backref.c is the only user of 
> >  ulist_add_merge()
> >  function.  I'll cook up another 

Re: [PATCH] Btrfs: fix compressed write corruption on enospc

2014-08-06 Thread Martin Steigerwald
Am Mittwoch, 6. August 2014, 09:35:51 schrieb Chris Mason:
> On 08/06/2014 06:21 AM, Martin Steigerwald wrote:
> >> I think this should go to stable. Thanks, Liu.
> 
> I'm definitely tagging this for stable.
> 
> > Unfortunately this fix does not seem to fix all lockups.
> 
> The traces below are a little different, could you please send the whole
> file?

Will paste it at the end.
 
> > Just had a hard lockup again during java-bases CrashPlanPROe app backuping
> > company data which is stored on BTRFS via ecryptfs to central Backup
> > server.
> > 
> > It basically happened on about the first heavy write I/O occasion after
> > the BTRFS trees filled the complete device:
> > 
> > I am now balancing the trees down to lower sizes manually with
> > 
> > btrfs balance start -dusage=10 /home
> > 
> > btrfs balance start -musage=10 /home
> > 
> > and raising values. BTW I got out of space with trying both at the same
> > time:
> > 
> > merkaba:~#1> btrfs balance start -dusage=10 -musage=10 /home
> > ERROR: error during balancing '/home' - No space left on device
> > There may be more info in syslog - try dmesg | tail
> > 
> > merkaba:~#1> btrfs fi sh /home
> > Label: 'home'  uuid: […]
> > 
> > Total devices 2 FS bytes used 128.76GiB
> > devid1 size 160.00GiB used 146.00GiB path /dev/dm-0
> > devid2 size 160.00GiB used 146.00GiB path
> > /dev/mapper/sata-home
> > 
> > So I am pretty sure meanwhile that hangs can best be trigger *if* BTRFS
> > trees fill the complete device.
> > 
> > I will try to keep tree sizes down as a work-around for now even it if
> > means additional write access towards the SSD devices.
> > 
> > And make sure tree sizes stay down on my first server BTRFS as well
> > although this uses debian backport kernel 3.14 and thus may not be
> > affected.
> > 
> > Are there any other fixes to try out? I really like to see this resolved.
> > Its in two stable kernel revisions already: 3.15 and 3.16. And by this it
> > means if not fixed next Debian stable (Jessie) will be affected by it.
> > 
> > 
> > Some kern.log (have stored the complete file)
[…]

Attached, xz compressed, since 180 KB as plaintext in a mail is a bit much.

This is complete from today resuming from hibernation today morning, the BTRFS 
hang, the reboot and the first balancing runs to make BTRFS more stable again.

Interestingly it was ecryptfs reporting issues before the BTRFS hang got 
reported. There are several a bit different looking traces in there.

Thanks,
-- 
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7

kern.log.xz
Description: application/xz


Re: [PATCH] Btrfs: fix compressed write corruption on enospc

2014-08-06 Thread Chris Mason
On 08/06/2014 06:21 AM, Martin Steigerwald wrote:

>> I think this should go to stable. Thanks, Liu.

I'm definitely tagging this for stable.

> 
> Unfortunately this fix does not seem to fix all lockups.

The traces below are a little different, could you please send the whole
file?

-chris

> 
> Just had a hard lockup again during java-bases CrashPlanPROe app backuping
> company data which is stored on BTRFS via ecryptfs to central Backup server.
> 
> It basically happened on about the first heavy write I/O occasion after
> the BTRFS trees filled the complete device:
> 
> I am now balancing the trees down to lower sizes manually with
> 
> btrfs balance start -dusage=10 /home
> 
> btrfs balance start -musage=10 /home
> 
> and raising values. BTW I got out of space with trying both at the same time:
> 
> merkaba:~#1> btrfs balance start -dusage=10 -musage=10 /home
> ERROR: error during balancing '/home' - No space left on device
> There may be more info in syslog - try dmesg | tail
> 
> merkaba:~#1> btrfs fi sh /home
> Label: 'home'  uuid: […]
> Total devices 2 FS bytes used 128.76GiB
> devid1 size 160.00GiB used 146.00GiB path /dev/dm-0
> devid2 size 160.00GiB used 146.00GiB path /dev/mapper/sata-home
> 
> So I am pretty sure meanwhile that hangs can best be trigger *if* BTRFS
> trees fill the complete device.
> 
> I will try to keep tree sizes down as a work-around for now even it if means
> additional write access towards the SSD devices.
> 
> And make sure tree sizes stay down on my first server BTRFS as well although
> this uses debian backport kernel 3.14 and thus may not be affected.
> 
> Are there any other fixes to try out? I really like to see this resolved. Its
> in two stable kernel revisions already: 3.15 and 3.16. And by this it means
> if not fixed next Debian stable (Jessie) will be affected by it.
> 
> 
> Some kern.log (have stored the complete file)
> 
> Aug  6 12:01:16 merkaba kernel: [90496.262084] INFO: task java:21301 blocked 
> for more than 120 seconds.
> Aug  6 12:01:16 merkaba kernel: [90496.263626]   Tainted: G   O  
> 3.16.0-tp520-fixcompwrite+ #3
> Aug  6 12:01:16 merkaba kernel: [90496.265159] "echo 0 > 
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Aug  6 12:01:16 merkaba kernel: [90496.266756] javaD 
> 880044e3cef0 0 21301  1 0x
> Aug  6 12:01:16 merkaba kernel: [90496.268353]  8801960e3bd8 
> 0002 880407f649b0 8801960e3fd8
> Aug  6 12:01:16 merkaba kernel: [90496.269980]  880044e3c9b0 
> 00013040 880044e3c9b0 88041e293040
> Aug  6 12:01:16 merkaba kernel: [90496.271766]  88041e5c6868 
> 8801960e3c70 0002 810db1d9
> Aug  6 12:01:16 merkaba kernel: [90496.273383] Call Trace:
> Aug  6 12:01:16 merkaba kernel: [90496.275017]  [] ? 
> wait_on_page_read+0x37/0x37
> Aug  6 12:01:16 merkaba kernel: [90496.276630]  [] 
> schedule+0x64/0x66
> Aug  6 12:01:16 merkaba kernel: [90496.278209]  [] 
> io_schedule+0x57/0x76
> Aug  6 12:01:16 merkaba kernel: [90496.279817]  [] 
> sleep_on_page+0x9/0xd
> Aug  6 12:01:16 merkaba kernel: [90496.281403]  [] 
> __wait_on_bit_lock+0x41/0x85
> Aug  6 12:01:16 merkaba kernel: [90496.282991]  [] 
> __lock_page+0x70/0x7c
> Aug  6 12:01:16 merkaba kernel: [90496.284550]  [] ? 
> autoremove_wake_function+0x2f/0x2f
> Aug  6 12:01:16 merkaba kernel: [90496.286156]  [] 
> lock_page+0x1e/0x21 [btrfs]
> Aug  6 12:01:16 merkaba kernel: [90496.287742]  [] ? 
> lock_page+0x1e/0x21 [btrfs]
> Aug  6 12:01:16 merkaba kernel: [90496.289344]  [] 
> extent_write_cache_pages.isra.21.constprop.42+0x1a7/0x2d9 [btrfs]
> Aug  6 12:01:16 merkaba kernel: [90496.290955]  [] ? 
> find_get_pages_tag+0xfc/0x123
> Aug  6 12:01:16 merkaba kernel: [90496.292574]  [] 
> extent_writepages+0x46/0x57 [btrfs]
> Aug  6 12:01:16 merkaba kernel: [90496.294154]  [] ? 
> btrfs_submit_direct+0x3ef/0x3ef [btrfs]
> Aug  6 12:01:16 merkaba kernel: [90496.295760]  [] 
> btrfs_writepages+0x23/0x25 [btrfs]
> Aug  6 12:01:16 merkaba kernel: [90496.297492]  [] 
> do_writepages+0x1b/0x24
> Aug  6 12:01:16 merkaba kernel: [90496.299035]  [] 
> __filemap_fdatawrite_range+0x50/0x52
> Aug  6 12:01:16 merkaba kernel: [90496.300561]  [] 
> filemap_fdatawrite_range+0xe/0x10
> Aug  6 12:01:16 merkaba kernel: [90496.302118]  [] 
> btrfs_sync_file+0x67/0x2bd [btrfs]
> Aug  6 12:01:16 merkaba kernel: [90496.303630]  [] ? 
> __filemap_fdatawrite_range+0x50/0x52
> Aug  6 12:01:16 merkaba kernel: [90496.305158]  [] 
> vfs_fsync_range+0x1c/0x1e
> Aug  6 12:01:16 merkaba kernel: [90496.306669]  [] 
> vfs_fsync+0x17/0x19
> Aug  6 12:01:16 merkaba kernel: [90496.308197]  [] 
> ecryptfs_fsync+0x2f/0x34 [ecryptfs]
> Aug  6 12:01:16 merkaba kernel: [90496.309711]  [] 
> vfs_fsync_range+0x1c/0x1e
> Aug  6 12:01:16 merkaba kernel: [90496.311249]  [] 
> vfs_fsync+0x17/0x19
> Aug  6 12:01:16 merkaba kernel: [90496.312771]  [] 
> do_fsync+0x2c/0x45
> Aug  6 1

[PATCH] Btrfs: fix hole detection during file fsync

2014-08-06 Thread Filipe Manana
The file hole detection logic during a file fsync wasn't correct,
because it didn't look back (in a previous leaf) for the last file
extent item that can be in a leaf to the left of our leaf and that
has a generation lower than the current transaction id. This made it
assume that a hole exists when it really doesn't exist in the file.

Such false positive hole detection happens in the following scenario:

* We have a file that has many file extent items, covering 3 or more
  btree leafs (the first leaf must contain non file extent items too).

* Two ranges of the file are modified, with their extent items being
  located at 2 different leafs and those leafs aren't consecutive.

* When processing the second leaf, we weren't checking if some file
  extent item exists that is located in some leaf that is between
  our 2 leafs, and therefore assumed the range defined between the
  last file extent item in first leaf and the first file extent item
  in the second leaf matched a hole.

Fortunately this didn't result in overriding the log with wrong data,
instead it made the last loop in copy_items() attempt to insert a
duplicated key (for a hole file extent item), which makes the file
fsync code return with -EEXIST to file.c:btrfs_sync_file() which in
turn ends up doing a full transaction commit.

I could trigger this issue with the following test for xfstests (which
never fails, either without or with this patch). The last fsync call
results in a full transaction commit, due to the -EEXIST error mentioned
above. I could also observe this behaviour happening frequently when
running xfstests/generic/075 in a loop.

Test:

_cleanup()
{
_cleanup_flakey
rm -fr $tmp
}

# get standard environment, filters and checks
. ./common/rc
. ./common/filter
. ./common/dmflakey

# real QA test starts here
_supported_fs btrfs
_supported_os Linux
_require_scratch
_require_dm_flakey
_need_to_be_root

rm -f $seqres.full

# Create a file with many file extent items, each representing a 4Kb extent.
# These items span 3 btree leaves, of 16Kb each (default mkfs.btrfs leaf 
size
# as of btrfs-progs 3.12).
_scratch_mkfs -l 16384 >/dev/null 2>&1
_init_flakey
SAVE_MOUNT_OPTIONS="$MOUNT_OPTIONS"
MOUNT_OPTIONS="$MOUNT_OPTIONS -o commit=999"
_mount_flakey

# First fsync, inode has BTRFS_INODE_NEEDS_FULL_SYNC flag set.
$XFS_IO_PROG -f -c "pwrite -S 0x01 -b 4096 0 4096" -c "fsync" \
$SCRATCH_MNT/foo | _filter_xfs_io

# For any of the following fsync calls, inode doesn't have the flag
# BTRFS_INODE_NEEDS_FULL_SYNC set.
for ((i = 1; i <= 500; i++)); do
OFFSET=$((4096 * i))
LEN=4096
$XFS_IO_PROG -c "pwrite -S 0x01 $OFFSET $LEN" -c "fsync" \
$SCRATCH_MNT/foo | _filter_xfs_io
done

# Commit transaction and bump next transaction's id (to 7).
sync

# Truncate will set the BTRFS_INODE_NEEDS_FULL_SYNC flag in the btrfs's
# inode runtime flags.
$XFS_IO_PROG -c "truncate 2048000" $SCRATCH_MNT/foo

# Commit transaction and bump next transaction's id (to 8).
sync

# Touch 1 extent item from the first leaf and 1 from the last leaf. The leaf
# in the middle, containing only file extent items, isn't touched. So the
# next fsync, when calling btrfs_search_forward(), won't visit that middle
# leaf. First and 3rd leaf have generation 6, while middle one has 
generation 8.
$XFS_IO_PROG \
-c "pwrite -S 0xee -b 4096 0 4096" \
-c "pwrite -S 0xff -b 4096 2043904 4096" \
-c "fsync" \
$SCRATCH_MNT/foo | _filter_xfs_io

_load_flakey_table $FLAKEY_DROP_WRITES
md5sum $SCRATCH_MNT/foo | _filter_scratch
_unmount_flakey

_load_flakey_table $FLAKEY_ALLOW_WRITES
# During mount, we'll replay the log created by the fsync above, and the 
file's
# md5 digest should be the same we got before the unmount.
_mount_flakey
md5sum $SCRATCH_MNT/foo | _filter_scratch
_unmount_flakey
MOUNT_OPTIONS="$SAVE_MOUNT_OPTIONS"

status=0
exit

Signed-off-by: Filipe Manana 
---
 fs/btrfs/tree-log.c | 17 +
 1 file changed, 13 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
index df332dd..5a917a6 100644
--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -3296,7 +3296,7 @@ static noinline int copy_items(struct btrfs_trans_handle 
*trans,
struct list_head ordered_sums;
int skip_csum = BTRFS_I(inode)->flags & BTRFS_INODE_NODATASUM;
bool has_extents = false;
-   bool need_find_last_extent = (*last_extent == 0);
+   bool need_find_last_extent = true;
bool done = false;
 
INIT_LIST_HEAD(&ordered_sums);
@@ -3350,8 +3350,7 @@ static noinline int copy_items(struct btrfs_trans_handle 
*trans,
 */
if (ins_keys[i].type == BTRFS_EXTENT_DATA_KEY) {
has_exte

Re: Unrecoverable errors when the btrfs file system was modified outside the running OS

2014-08-06 Thread Justus Seifert
On Monday 04 August 2014 01:29:49 you wrote:
> Hello,
> 
> I just had a very frustrating experience with btrfs, which I was only
> able to resolve by rolling back to ext4 using the subvol btrfs-convert
> created. The same type of situation occurred before when I was using
> the ext file system and the result was far less disastrous.
> 
> The source of problem came from the fact that I have a Windows and
> Ubuntu 14.04 dual-boot setup and within Windows I also use VirtualBox
> to run the same Ubuntu with rawdisk. Today I updated Ubuntu to a new
> kernel within VirtualBox. At this point, I would usually shutdown
> VirtualBox before I let my machine go to hibernation. However, this
> time I forgot. And when the machine started up, it went directly into
> Ubuntu (because grub was updated and to avoid issues my VirtualBox
> setup didn't allow Ubuntu to see my Windows partitions). I did a
> grub-udpate, and rebooted back to Windows, where my VirtualBox was
> still up and running fine. The tragedy happened when I now shut down
> Ubuntu and VirtualBox. The btrfs file system was totally corrupted. I
> tried various combinations ro, recovery, nospace_cache, and
> clear_cache mount options, and it wouldn't mount. dmesg showed some
> "transid verify failed", "open_ctree failed" error messages. btrfs
> restore only retrieved three files.. btrfsck --repair and
> btrfs-zero-log didn't help either.
> 
> To my very surprise, btrfs-convert -r was able to use the subvol it
> created to roll back to ext4. But had I not converted from ext4 to
> btrfs, this would be an unrecoverable situation. Whether or not I have
> backups is a separate issue, being able to recover at least "somewhat"
> in this situation seems to be a desired feature for any file system.
> 
> Roc


btrfs is CoW
stale metadata means stale block info
stale block info means no idea where the files are on disk and no idea where 
free blocks are
btrfs does not know, that it does not know, where to read and where to write, 
it assumes the metadata cached in ram is correct and just reads and writes to 
the disk where it sees fit.
this is disastrous if the location of files changes
cow means the location of files and metadata changes with _every single write_ 
!

ext4 is not CoW
if you do not delete, grow, shrink or create files – only inplace edits or 
small appeds that do not allocate aditional blocks, then the location of the 
files on disk does not change.
this is often the case for most files and metadata on a root filesystem.
thus it is no surprise at all that the consequences of using stale metadata 
for a filesystem that was mounted elsewhere differs dramatically between ext4 
and btrfs.

please keep in mind that all filesystems assume that the ram is a safe place to 
keep metadataduring runtime,  since reading all the required metadata all the 
time would be a huge performance brake for modern filesystems.  old filesystems 
might be different.

if you want the metadata not to change then try a read-only filesystem for root 
for a while.  squashfs is the default root fs for openwrt and is ro.

sry kmail ate my draft for this mail once and i was too lazy to retype it 
completely thus only simple grammar and no spell checks.

if you are interested in more details, ask detailed questions.  duncan might 
also help you if you ask him.

regards
justus seifert

signature.asc
Description: This is a digitally signed message part.


Re: [PATCH] Btrfs: fix compressed write corruption on enospc

2014-08-06 Thread Martin Steigerwald
Am Mittwoch, 6. August 2014, 11:29:19 schrieb Hugo Mills:
> On Wed, Aug 06, 2014 at 12:21:59PM +0200, Martin Steigerwald wrote:
> > It basically happened on about the first heavy write I/O occasion after
> > the BTRFS trees filled the complete device:
> > 
> > I am now balancing the trees down to lower sizes manually with
> > 
> > btrfs balance start -dusage=10 /home
> > 
> > btrfs balance start -musage=10 /home
> 
>Note that balance has nothing to do with balancing the metadata
> trees. The tree structures are automatically balanced as part of their
> normal operation. A "btrfs balance start" is a much higher-level
> operation. It's called balance because the overall effect is to
> balance the data usage evenly across multiple devices. (Actually, to
> balance the available space evenly).
> 
>Also note that the data part isn't tree-structured, so referring to
> "balancing the trees" with a -d flag is doubly misleading. :)

Hmm, it makes used size in

merkaba:~> btrfs fi sh /home
Label: 'home'  uuid: […]
Total devices 2 FS bytes used 129.12GiB
devid1 size 160.00GiB used 142.03GiB path /dev/dm-0
devid2 size 160.00GiB used 142.03GiB path /dev/mapper/sata-home

and I thought this the is size used by the trees BTRFS creates.

So you say it does not balance shortest versus longest path but… as the tree 
algorithm does this automatically… but just the *data* in the tree?

In any way: I should not be required to do this kind of manual maintenance in 
order to prevent BTRFS from locking up hard on write accesses.

Ciao,
-- 
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7

signature.asc
Description: This is a digitally signed message part.


Re: [PATCH] Btrfs: fix compressed write corruption on enospc

2014-08-06 Thread Hugo Mills
On Wed, Aug 06, 2014 at 12:21:59PM +0200, Martin Steigerwald wrote:
> It basically happened on about the first heavy write I/O occasion after
> the BTRFS trees filled the complete device:
> 
> I am now balancing the trees down to lower sizes manually with
> 
> btrfs balance start -dusage=10 /home
> 
> btrfs balance start -musage=10 /home

   Note that balance has nothing to do with balancing the metadata
trees. The tree structures are automatically balanced as part of their
normal operation. A "btrfs balance start" is a much higher-level
operation. It's called balance because the overall effect is to
balance the data usage evenly across multiple devices. (Actually, to
balance the available space evenly).

   Also note that the data part isn't tree-structured, so referring to
"balancing the trees" with a -d flag is doubly misleading. :)

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
   --- You know... I'm sure this code would seem a lot better if I ---   
 never tried running it. 


signature.asc
Description: Digital signature


Re: [PATCH] Btrfs: fix compressed write corruption on enospc

2014-08-06 Thread Martin Steigerwald
Am Montag, 4. August 2014, 14:50:29 schrieb Martin Steigerwald:
> Am Freitag, 25. Juli 2014, 11:54:37 schrieb Martin Steigerwald:
> > Am Donnerstag, 24. Juli 2014, 22:48:05 schrieben Sie:
> > > When failing to allocate space for the whole compressed extent, we'll
> > > fallback to uncompressed IO, but we've forgotten to redirty the pages
> > > which belong to this compressed extent, and these 'clean' pages will
> > > simply skip 'submit' part and go to endio directly, at last we got data
> > > corruption as we write nothing.
> > > 
> > > Signed-off-by: Liu Bo 
> > > ---
> > > 
> > >  fs/btrfs/inode.c | 12 
> > >  1 file changed, 12 insertions(+)
> > > 
> > > diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> > > index 3668048..8ea7610 100644
> > > --- a/fs/btrfs/inode.c
> > > +++ b/fs/btrfs/inode.c
> > > 
> > > @@ -709,6 +709,18 @@ retry:
> > >   unlock_extent(io_tree, async_extent->start,
> > >   
> > > async_extent->start +
> > > async_extent->ram_size - 1);
> > > 
> > > +
> > > + /*
> > > +  * we need to redirty the pages if we decide to
> > > +  * fallback to uncompressed IO, otherwise we
> > > +  * will not submit these pages down to lower
> > > +  * layers.
> > > +  */
> > > + extent_range_redirty_for_io(inode,
> > > + async_extent->start,
> > > + async_extent->start +
> > > + async_extent->ram_size - 1);
> > > +
> > > 
> > >   goto retry;
> > >   
> > >   }
> > >   goto out_free;
> > 
> > I am testing this currently. So far no lockup. Lets see. Still has not
> > filled the the block device with trees completely after I balanced them:
> > 
> > Label: 'home'  uuid: […]
> > 
> > Total devices 2 FS bytes used 125.57GiB
> > devid1 size 160.00GiB used 153.00GiB path /dev/dm-0
> > devid2 size 160.00GiB used 153.00GiB path
> > /dev/mapper/sata-home
> > 
> > I believe the lockups happen more easily if the trees occupy all of disk
> > space. Well I will do some compiling of some KDE components which may let
> > BTRFS fill all space again.
> > 
> > This patch will mean it when it can´t make enough free space in the
> > (fragmented) tree it will write uncompressed?
> > 
> > This would mean that one would have a defragment trees regularily to allow
> > for writes to happen compressed at all times.
> > 
> > Well… of course still better than lockup or corruption.
> 
> So lookups so far anymore.
> 
> Tested with 3.16-rc5, 3.16-rc7, now running 3-16 final.
> 
> /home BTRFS only got today filled completely tree-wise:
> 
> merkaba:~> btrfs fi sh /home
> Label: 'home'  uuid: […]
> Total devices 2 FS bytes used 127.35GiB
> devid1 size 160.00GiB used 160.00GiB path /dev/mapper/msata-home
> devid2 size 160.00GiB used 160.00GiB path /dev/dm-3
> 
> But I had KDE and kernel compile running full throttle at that time and
> still good.
> 
> Tested-By: Martin Steigerwald 
> 
> 
> I think this should go to stable. Thanks, Liu.

Unfortunately this fix does not seem to fix all lockups.

Just had a hard lockup again during java-bases CrashPlanPROe app backuping
company data which is stored on BTRFS via ecryptfs to central Backup server.

It basically happened on about the first heavy write I/O occasion after
the BTRFS trees filled the complete device:

I am now balancing the trees down to lower sizes manually with

btrfs balance start -dusage=10 /home

btrfs balance start -musage=10 /home

and raising values. BTW I got out of space with trying both at the same time:

merkaba:~#1> btrfs balance start -dusage=10 -musage=10 /home
ERROR: error during balancing '/home' - No space left on device
There may be more info in syslog - try dmesg | tail

merkaba:~#1> btrfs fi sh /home
Label: 'home'  uuid: […]
Total devices 2 FS bytes used 128.76GiB
devid1 size 160.00GiB used 146.00GiB path /dev/dm-0
devid2 size 160.00GiB used 146.00GiB path /dev/mapper/sata-home

So I am pretty sure meanwhile that hangs can best be trigger *if* BTRFS
trees fill the complete device.

I will try to keep tree sizes down as a work-around for now even it if means
additional write access towards the SSD devices.

And make sure tree sizes stay down on my first server BTRFS as well although
this uses debian backport kernel 3.14 and thus may not be affected.

Are there any other fixes to try out? I really like to see this resolved. Its
in two stable kernel revisions already: 3.15 and 3.16. And by this it means
if not fixed next Debian stable (Jessie) will be affected by it.


Some kern

Recovering a 4xhdd RAID10 file system with 2 failed disks

2014-08-06 Thread TM
Recovering a 4xhdd RAID10 file system with 2 failed disks

Hi all,

  Quick and Dirty:
  4disk RAID10 with 2 missing devices, mounts as degraded,ro ,  readonly
scrub ends with no errors
  Recovery options:
  A/ If you had at least 3 hdds, you could replace/add a device
  B/ If you only have 2 hdds, even if scrub ro is ok, 
 you cannot replace/add a device
  So I guess the best option is:
  B.1/ create a new RAID0 filesystem , copy data over to the new filesystem,
 move the old drives to the new filesystem, re-balance the system as RAID10.
  B.2/ any other ways to recover I am missing ? anything easier/faster ?


  Long story:
  A couple of weeks back I had a failed hdd in a RAID10 4disk btrfs.
  I added a new device, removed the failed, but three days later after the
recovery, I ended up with another 2 failing disks.
  So I physically removed the failing 2 disks from the drive bays. 
  (sent one back to Seagate for replacement, the other one I kept it and
will send it later)
  (please note I do have a backup)

  Good thing is that the two drives I have left in this RAID10 , seem to
hold all data and data seems ok according to a read-only scrub.
  The remaining 2 disks from the RAID can be mounted with –o degraded,ro
  I did a read-only scrub on the filesystem (while mounted as –o
degraded,ro) and scrub ended with no errors. 
  I hope this ro scrub is 100% validation that I have not lost any files,
and all files are ok. 

  Just today I *tried* to inserted a new disk, and add it to the RAID10 setup.
  If I mount the filesystem as degraded,ro I cannot add a new device (btrfs
device add). And I cannot replace a disk (btrfs replace –r start).
  That is because the filesystem is mounted not only as degraded but as
read-only.
  But a two disk RAID10, can only be mounted as ro.
  This is by design
gitorious.org/linux-n900/linux-n900/commit
/bbb651e469d99f0088e286fdeb54acca7bb4ad4e
  
  But again, a RAID10 system should be recoverable somehow if the data is
all there but half of the disks are missing. 
(Ie. the raid0 data is there and only the raid1 part is missing. The
striped volume is ok, the mirror data is missing)
  If it was an ordinary RAID10 , replacing the two mirror disks at the same
time should be acceptable and the RAID should be recoverable.

  Myself I am lucky , since I still have one of the old failing disks in my
hands. (the other one is being RMAd currently)
  I can insert the old failing disk and mount the file system as degraded
(but not ro), and then run a btrfs replace or btrfs device add.

  But in case I did not have the old failing disk in my hands, or if the
disk was damaged beyond recognition/repair (eg not recognized in BIOS),
  as far as I understand it is impossible to add/replace drives in a file
system mounted as read-only.

  Am I missing something ?
  Is there a better and faster way to recover a RAID10 when only the striped
data is there but not the mirror data?

Thanks in advance,
TM


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs: cancel scrub/replace if the user space process receive SIGKILL.

2014-08-06 Thread Qu Wenruo
When impatient sysadmin is tired of waiting background running btrfs
scrub/replace and send SIGKILL to btrfs process, unlike
SIGINT/SIGTERM which can be caught by user space program and cancel the
scrub work, user space program will continue running until ioctl exits.

To keep it consistent with the behavior of btrfs-progs, which cancels
the work when SIGINT is received, this patch will make scrub routine to
check SIGKILL pending of current task and cancel the work if SIGKILL is
already pending.

Signed-off-by: Qu Wenruo 
---
 fs/btrfs/scrub.c | 29 +
 1 file changed, 25 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index b6d198f..0c8047f 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -2277,6 +2277,29 @@ static int get_raid56_logic_offset(u64 physical, int num,
return 1;
 }
 
+/*
+ * check whether the scrub is canceled
+ * if canceled, return -ECANCELED, else return 0
+ *
+ * scrub/replace will be canceled when
+ * 1) cancel is called manually
+ * 2) caller user space process(btrfs-progs) receive SIGKILL
+ * other signals can be caught in btrfs-progs using multi-thread
+ * and cancel the work.
+ * but SIGKILL can't be caught and btrfs-progs already fallen into ioctl
+ * so cancel current scrub to return asap if SIGKILL is received.
+ */
+static inline int is_scrub_canceled(struct btrfs_fs_info *fs_info,
+   struct scrub_ctx *sctx)
+{
+   int ret = 0;
+
+   if (unlikely(atomic_read(&fs_info->scrub_cancel_req) ||
+atomic_read(&sctx->cancel_req) ||
+__fatal_signal_pending(current)))
+   ret = -ECANCELED;
+   return ret;
+}
 static noinline_for_stack int scrub_stripe(struct scrub_ctx *sctx,
   struct map_lookup *map,
   struct btrfs_device *scrub_dev,
@@ -2420,11 +2443,9 @@ static noinline_for_stack int scrub_stripe(struct 
scrub_ctx *sctx,
/*
 * canceled?
 */
-   if (atomic_read(&fs_info->scrub_cancel_req) ||
-   atomic_read(&sctx->cancel_req)) {
-   ret = -ECANCELED;
+   ret = is_scrub_canceled(fs_info, sctx);
+   if (ret)
goto out;
-   }
/*
 * check to see if we have to pause
 */
-- 
2.0.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Recovering a 4xhdd RAID10 file system with 2 failed disks

2014-08-06 Thread TM
Recovering a 4xhdd RAID10 file system with 2 failed disks

Hi all,

  Quick and Dirty:
  4disk RAID10 with 2 missing devices, mounts as degraded,ro ,  readonly
scrub ends with no errors
  Recovery options:
  A/ If you had at least 3 hdds, you could replace/add a device
  B/ If you only have 2 hdds, even if scrub ro is ok, 
 you cannot replace/add a device
  So I guess the best option is:
  B.1/ create a new RAID0 filesystem , copy data over to the new filesystem,
 add the old drives to the new filesystem, re-balance the system as RAID10.
  B.2/ any other ways to recover that I am missing ? anything easier/faster ?


  Long story:
  A couple of weeks back I had a failed hdd in a RAID10 4disk btrfs.
  I added a new device, removed the failed, but three days later after the
recovery, I ended up with another 2 failing disks.
  So I physically removed the failing 2 disks from the drive bays. 
  (sent one back to Seagate for replacement, the other one I kept it and
will send it later)
  (please note I do have a backup)

  Good thing is that the two drives I have left in this RAID10 , seem to
hold all data and data seems ok according to a read-only scrub.
  The remaining 2 disks from the RAID can be mounted with –o degraded,ro
  I did a read-only scrub on the filesystem (while mounted as –o
degraded,ro) and scrub ended with no errors. 
  I hope this ro scrub is 100% validation that I have not lost any files,
and all files are ok. 

  Just today I *tried* to inserted a new disk, and add it to the RAID10 setup.
  If I mount the filesystem as degraded,ro I cannot add a new device (btrfs
device add). And I cannot replace a disk (btrfs replace –r start).
  That is because the filesystem is mounted not only as degraded but as
read-only.
  But a two disk RAID10, can only be mounted as ro.
  This is by design
gitorious.org/linux-n900/linux-n900/commit
/bbb651e469d99f0088e286fdeb54acca7bb4ad4e
  
  But again, a RAID10 system should be recoverable somehow if the data is
all there but half of the disks are missing. 
 (  Ie. the raid0 drives are there and only the raid1 part is missing. The
striped volume is ok, the mirror data is missing)
  If it was an ordinary RAID10 , replacing the two mirror disks at the same
time should be acceptable and the RAID should be recoverable.

  Myself I am lucky , since I still have one of the old failing disks in my
hands. (the other one is being RMAd currently)
  I can insert the old failing disk and mount the file system as degraded
(but not ro), and then run a btrfs replace or btrfs device add.

  But in case I did not have the old failing disk in my hands, or if the
disk was damaged beyond recognition/repair (eg not recognized in BIOS),
  as far as I understand it is impossible to add/replace drives in a file
system mounted as read-only.

  Am I missing something ?
  Is there a better and faster way to recover a RAID10 when only the striped
data is there but not the mirror data?

Thanks in advance,
TM


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html