Btrfs stable updates for 3.16.x (and others)

2014-08-19 Thread David Sterba
Hi stable team,

please add the following patches to stable trees.

Patch #3 applies to all currently live stables, a 7 years old bug. I've
briefly reviewed all 3 patches against 3.10/12/14/16 (ie. 3.4 skips #1
and #2).

Subjects:
Btrfs: read lock extent buffer while walking backrefs
Btrfs: fix compressed write corruption on enospc
Btrfs: fix csum tree corruption, duplicate and outdated checksums
Commits:
6f7ff6d7832c6be13e8c95598884dbc40ad69fb7
ce62003f690dff38d3164a632ec69efa15c32cbf
27b9a8122ff71a8cadfbffb9c4f0694300464f3b

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[btrfs] 8d875f95: xfstests.generic.226.fail

2014-08-19 Thread Fengguang Wu
Hi Chris,

We noticed an xfstests failure on commit

8d875f95da43c6a8f18f77869f2ef26e9594fecc ("btrfs: disable strict file flushes 
for renames and truncates")

It's 100% reproducible in the 5 test runs.

test case: snb-drag/xfstests/4HDD-btrfs-generic-mid

27b9a8122ff71a8  8d875f95da43c6a8f18f77869
---  -
%change   %stddev
   | /
 0   +Inf%  1 ± 0%  TOTAL xfstests.generic.226.fail

Thanks,
Fengguang
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] btrfs-progs: make close_ctree return void

2014-08-19 Thread David Sterba
On Thu, Aug 07, 2014 at 10:35:59AM +0800, Gui Hecheng wrote:
> The close_ctree always returns 0 and the stuff that depends on
> its return value is of no sense.
> Just make close_ctree return void.

You should not do that if the function contains BUG_ONs, this means the
error path is not handled, rather than trivial.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH] btrfs-progs: Move btrfstune to btrfs device tune

2014-08-19 Thread David Sterba
On Mon, Aug 11, 2014 at 03:17:11AM +0300, Timofey Titovets wrote:
> According to https://btrfs.wiki.kernel.org/index.php/Project_ideas#btrfs
> Quote:
> merge functionality of btrfstune, eg. under btrfs dev set-seed /dev/
> (discuss the command name though)

I've added this project idea long time ago and I'm afraid it's not valid
anymore, at least not in the proposed way.

> This patch is just code move
> After, user can tune btrfs parameters through:
> btrfs dev tune -xr /dev/sda2

The btrfstune utility works on an unmounted filesystem and affects the
whole filesystem, so the 'device' subgroup is not right here.

Most of the commands from the base utility on a mounted filesystem, so a
separate btrfstune makes some distinction. The reason for merging the
two was to avoid a 1MB binary for very simple thing, the generic
filesystem code can be shared with 'btrfs'.

The question is what's the right UI, a new subcommand, or via the
generic properties command? The property interface is not yet populated,
so it might be hard to imagine where the tuning settings would go.
Something like this:

$ btrfs prop set feature.skinny-metadata 1 /dev/sdx

The extended refs can be turned on even on a mounted filesystem, so this
would avoid doing 'echo 1 > /sys/fs/btrfs/UUID/features/extended_iref'.

At this moment I'm inclined to use the properties interface, which means
that the btrfstune utility will stay a bit longer. I'll update the
project idea to reflect this so it's not confusing anymore (sorry).
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [btrfs] 8d875f95: xfstests.generic.226.fail

2014-08-19 Thread David Sterba
On Tue, Aug 19, 2014 at 07:58:20PM +0800, Fengguang Wu wrote:
> We noticed an xfstests failure on commit
> 
> 8d875f95da43c6a8f18f77869f2ef26e9594fecc ("btrfs: disable strict file flushes 
> for renames and truncates")
> 
> It's 100% reproducible in the 5 test runs.

Same here, different mkfs configurations.

generic/226 28s ...[16:11:52] [16:12:55] - output mismatch (see 
/root/xfstests/results//generic/226.out.bad)
--- tests/generic/226.out   2013-05-29 17:16:03.0 +0200
+++ /root/xfstests/results//generic/226.out.bad 2014-08-19 
16:12:55.0 +0200
@@ -1,6 +1,8 @@
 QA output created by 226
 --> mkfs 256m filesystem
 --> 16 buffered 64m writes in a loop
-1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
+1 2 3 4 pwrite64: No space left on device
+5 6 7 8 9 10 11 12 pwrite64: No space left on device
+13 14 15 16

enospc on a small filesystem (256M)

# btrfs fi df /mnt/a2
System, single: total=4.00MiB, used=4.00KiB
Data+Metadata, single: total=252.00MiB, used=31.09MiB
GlobalReserve, single: total=4.00MiB, used=0.00B

$ df -h /mnt/a2
FilesystemSize  Used Avail Use% Mounted on
/dev/sda9 256M   16M  241M   6% /mnt/a2
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4] Btrfs: send, lower mem requirements for processing xattrs

2014-08-19 Thread David Sterba
On Mon, Aug 11, 2014 at 03:09:35AM +0100, Filipe Manana wrote:
> + if (name_len + data_len > buf_len) {
> + buf_len = name_len + data_len;
> + if (is_vmalloc_addr(buf)) {
> + vfree(buf);
> + buf = NULL;
> + } else {
> + char *tmp = krealloc(buf, buf_len, GFP_NOFS);

This could fail with a warning (high order allocation) so I suggest to
add __GFP_NOWARN, the vmalloc fallback will catch fragmented memory case
and fail if theres no memory.

> +
> + if (!tmp)
> + kfree(buf);
> + buf = tmp;
> + }
> + if (!buf) {
> + buf = vmalloc(buf_len);
> + if (!buf) {
> + ret = -ENOMEM;
> + goto out;
> + }
> + }
> + }
> +
>   read_extent_buffer(eb, buf, (unsigned long)(di + 1),
>   name_len + data_len);
>  
> @@ -1071,7 +1094,10 @@ static int iterate_dir_item(struct btrfs_root *root, 
> struct btrfs_path *path,
>   }
>  
>  out:
> - kfree(buf);
> + if (is_vmalloc_addr(buf))
> + vfree(buf);
> + else
> + kfree(buf);

There's even kvfree to do this.

>   return ret;
>  }
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [btrfs] 8d875f95: xfstests.generic.226.fail

2014-08-19 Thread Chris Mason
On 08/19/2014 10:23 AM, David Sterba wrote:
> On Tue, Aug 19, 2014 at 07:58:20PM +0800, Fengguang Wu wrote:
>> We noticed an xfstests failure on commit
>>
>> 8d875f95da43c6a8f18f77869f2ef26e9594fecc ("btrfs: disable strict file 
>> flushes for renames and truncates")
>>
>> It's 100% reproducible in the 5 test runs.
> 
> Same here, different mkfs configurations.
> 
> generic/226 28s ...[16:11:52] [16:12:55] - output mismatch (see 
> /root/xfstests/results//generic/226.out.bad)
> --- tests/generic/226.out   2013-05-29 17:16:03.0 +0200
> +++ /root/xfstests/results//generic/226.out.bad 2014-08-19 
> 16:12:55.0 +0200
> @@ -1,6 +1,8 @@
>  QA output created by 226
>  --> mkfs 256m filesystem
>  --> 16 buffered 64m writes in a loop
> -1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
> +1 2 3 4 pwrite64: No space left on device
> +5 6 7 8 9 10 11 12 pwrite64: No space left on device
> +13 14 15 16
> 
> enospc on a small filesystem (256M)

I'm calling filemap flush more often, but otherwise everything else is
the same.  I'll take a look.

-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3 v4] btrfs-progs: random fixes of btrfs-filesystem documentation

2014-08-19 Thread David Sterba
On Tue, Aug 12, 2014 at 05:06:01PM +0900, Satoru Takeuchi wrote:
> +By default, the show command scans all devices found in /proc/partitions.

The default scanning method is blkid, /proc/partitions used to be the
default before that. Scanning /proc/partitions is not done through the
'show' command, only during open_ctree afaics.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] btrfs-progs: random fixes of btrfs-filesystem documentation

2014-08-19 Thread David Sterba
On Mon, Aug 11, 2014 at 10:05:52AM -0700, Eric Sandeen wrote:
> (What seems to be missing, though, is why would the user ever choose to use 
> '-d?')

That's a fallback method if blkid or udev are not available. We've had
reports in the past that this functionality should not be dropped.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] btrfs-progs: Show error message if btrfs filesystem show failed to find any btrfs filesystem

2014-08-19 Thread David Sterba
On Mon, Aug 11, 2014 at 06:13:03PM +0900, Satoru Takeuchi wrote:
> From: Satoru Takeuchi 
> 
> Current btrfs doesn't display any error message if this command
> failed to find any btrfs filesystem corresponding to
> ||| which user specified.

I'm not sure if it is necessary to print anything. Like if grep printed
"Sorry I did not find any lines, please check your regexp"

Although, we can add a non-zero return value if there was anything found
or not.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: fix leak in qgroup_subtree_accounting() error path

2014-08-19 Thread Chris Mason


On 08/18/2014 05:42 PM, Mark Fasheh wrote:
> On Sun, Aug 17, 2014 at 03:09:21PM -0500, Eric Sandeen wrote:
>> Coverity pointed this out; in the newly added
>> qgroup_subtree_accounting(), if btrfs_find_all_roots()
>> returns an error, we leak at least the parents pointer,
>> and possibly the roots pointer, depending on what failure
>> occurs.
>>
>> If btrfs_find_all_roots() returns an error, we need to
>> free up all allocations before we return.  "roots" is
>> initialized to NULL, so it should be safe to free
>> it unconditionally (ulist_free() handles that case).
> 
> Great, thanks for this Eric.
> 
> Reviewed-by: Mark Fasheh 
> 

Thanks guys, this is queued.

-chris

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: fix crash on endio of reading corrupted block

2014-08-19 Thread Liu Bo
The crash is

[ cut here ]
kernel BUG at fs/btrfs/extent_io.c:2124!
[...]
Workqueue: btrfs-endio normal_work_helper [btrfs]
RIP: 0010:[]  [] 
end_bio_extent_readpage+0xb45/0xcd0 [btrfs]

This is in fact a regression.

It is because we forgot to increase @offset properly in reading corrupted block,
so that the @offset remains, and this leads to checksum errors while reading
left blocks queued up in the same bio, and then ends up with hiting the above
BUG_ON.

Reported-by: Chris Murphy 
Signed-off-by: Liu Bo 
---
 fs/btrfs/extent_io.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 3af4966..be41e4d 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2602,6 +2602,7 @@ static void end_bio_extent_readpage(struct bio *bio, int 
err)
test_bit(BIO_UPTODATE, &bio->bi_flags);
if (err)
uptodate = 0;
+   offset += len;
continue;
}
}
-- 
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: cleanup the same name in end_bio_extent_readpage

2014-08-19 Thread Liu Bo
We've defined a 'offset' out of bio_for_each_segment_all.

This is just a clean rename, no function changes.

Signed-off-by: Liu Bo 
---
 fs/btrfs/extent_io.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 3af4966..7e27ba7 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2609,12 +2609,12 @@ readpage_ok:
if (likely(uptodate)) {
loff_t i_size = i_size_read(inode);
pgoff_t end_index = i_size >> PAGE_CACHE_SHIFT;
-   unsigned offset;
+   unsigned off;
 
/* Zero out the end if this page straddles i_size */
-   offset = i_size & (PAGE_CACHE_SIZE-1);
-   if (page->index == end_index && offset)
-   zero_user_segment(page, offset, 
PAGE_CACHE_SIZE);
+   off = i_size & (PAGE_CACHE_SIZE-1);
+   if (page->index == end_index && off)
+   zero_user_segment(page, off, PAGE_CACHE_SIZE);
SetPageUptodate(page);
} else {
ClearPageUptodate(page);
-- 
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] btrfs-progs: random fixes of btrfs-filesystem documentation

2014-08-19 Thread Eric Sandeen
On 8/19/14, 10:10 AM, David Sterba wrote:
> On Mon, Aug 11, 2014 at 10:05:52AM -0700, Eric Sandeen wrote:
>> (What seems to be missing, though, is why would the user ever choose to use 
>> '-d?')
> 
> That's a fallback method if blkid or udev are not available. We've had
> reports in the past that this functionality should not be dropped.

Seems like using /proc/partitions would make more sense in that case
than a recursive scan of every file under /dev, wouldn't it?
Any details on those reports?

I'm just wondering when you might possibly have success looking deep
into the /dev tree if you didn't have success in /proc/partitions.

It looks like the functionality was added with:

commit 0dbd99fb3e117cd5f87eda492b6b4fab1b5bea23
Author: Goffredo Baroncelli 
Date:   Wed Jun 15 21:55:25 2011 +0200

Scan the devices listed in /proc/partitions

During the commands:
- btrfs filesystem show
- btrfs device scan
the devices "scanned" are extracted from /proc/partitions. This
should avoid to scan devices not suitable for a btrfs filesystem like cdrom
and floppy or to scan not existant devices.
The old behavior (scan all the block devices under /dev) may be
forced passing the "--all-devices" switch.

but I'm not sure why it was preserved.

It just seems a bit bizarre to have so many ways to get the same info.

Thanks,
-Eric

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 8/8] btrfs: rename total_bytes to avoid confusion

2014-08-19 Thread David Sterba
On Wed, Aug 13, 2014 at 02:24:26PM +0800, Anand Jain wrote:
> we are assigning number_devices to the total_bytes,
> that's very confusing for a moment
> 
> Signed-off-by: Anand Jain 
> ---
>  fs/btrfs/volumes.c | 10 +-
>  1 file changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index bf99e82..c0c360a 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -2253,7 +2253,7 @@ int btrfs_init_new_device(struct btrfs_root *root, char 
> *device_path)
>   struct list_head *devices;
>   struct super_block *sb = root->fs_info->sb;
>   struct rcu_string *name;
> - u64 total_bytes;
> + u64 ret_sz;

A 'tmp' would do, but whatever
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/8] btrfs: replace seed device followed by unmount causes kernel WARNING

2014-08-19 Thread David Sterba
On Wed, Aug 13, 2014 at 02:24:20PM +0800, Anand Jain wrote:
> reproducer:
> mount /dev/sdb /btrfs
> btrfs dev add /dev/sdc /btrfs
> btrfs rep start -B /dev/sdb /dev/sdd /btrfs
> umount /btrfs
> 
> WARNING: CPU: 0 PID: 12661 at fs/btrfs/volumes.c:891 
> __btrfs_close_devices+0x1b0/0x200 [btrfs]()
> ::
> 
> __btrfs_close_devices()
> ::
> WARN_ON(fs_devices->open_devices);
> 
> After the seed device has been replaced the new target device
> is no more a seed device. So we need to update the device
> numbers in the fs_devices as pointed by the fs_info.
> 
> Signed-off-by: Anand Jain 

A formality: if you get a reviewed-by from somebody and the patch does
not change in the next iteration, add the tag to the patch as well. This
will ensure the review credit is not lost.
Otherwise, pinging the maintainer with a forgotten reviewed-by also works.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH] btrfs-progs: Move btrfstune to btrfs device tune

2014-08-19 Thread Timofey Titovets
No problem =).
Then, just ignore patch.

2014-08-19 17:03 GMT+03:00 David Sterba :
> On Mon, Aug 11, 2014 at 03:17:11AM +0300, Timofey Titovets wrote:
>> According to https://btrfs.wiki.kernel.org/index.php/Project_ideas#btrfs
>> Quote:
>> merge functionality of btrfstune, eg. under btrfs dev set-seed /dev/
>> (discuss the command name though)
>
> I've added this project idea long time ago and I'm afraid it's not valid
> anymore, at least not in the proposed way.
>
>> This patch is just code move
>> After, user can tune btrfs parameters through:
>> btrfs dev tune -xr /dev/sda2
>
> The btrfstune utility works on an unmounted filesystem and affects the
> whole filesystem, so the 'device' subgroup is not right here.
>
> Most of the commands from the base utility on a mounted filesystem, so a
> separate btrfstune makes some distinction. The reason for merging the
> two was to avoid a 1MB binary for very simple thing, the generic
> filesystem code can be shared with 'btrfs'.
>
> The question is what's the right UI, a new subcommand, or via the
> generic properties command? The property interface is not yet populated,
> so it might be hard to imagine where the tuning settings would go.
> Something like this:
>
> $ btrfs prop set feature.skinny-metadata 1 /dev/sdx
>
> The extended refs can be turned on even on a mounted filesystem, so this
> would avoid doing 'echo 1 > /sys/fs/btrfs/UUID/features/extended_iref'.
>
> At this moment I'm inclined to use the properties interface, which means
> that the btrfstune utility will stay a bit longer. I'll update the
> project idea to reflect this so it's not confusing anymore (sorry).



-- 
Have a nice day,
Timofey.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Questions on using BtrFS for fileserver

2014-08-19 Thread M G Berberich
Hello,

we are thinking about using BtrFS on standard hardware for a
fileserver with about 50T (100T raw) of storage (25×4TByte).

This is what I understood so far. Is this right?

· incremental send/receive works.

· There is no support for hotspares (spare disks that automatically
  replaces faulty disk).

· BtrFS with RAID1 is fairly stable.

· RAID 5/6 spreads all data over all devices, leading to performance
  problems on large diskarrays, and there is no option to limit the
  numbers of disk per stripe so far.

Some questions:

· There where reports, that bcache with btrfs leads to corruption. Is
  this still so?

· If a disk failes, does BtrFS rebalance automatically? (This would
  give a a kind o hotspare behavior)

· Besides using bcache, are there any possibilities to boost
  performance by adding (dedicated) cache-SSDs to a BtrFS?

· Are there any reports/papers/web-pages about BtrFS-systems this size
  in use? Praises, complains, performance-reviews, whatever…

MfG
bmg

-- 
„Des is völlig wurscht, was heut beschlos- | M G Berberich
 sen wird: I bin sowieso dagegn!“  | berbe...@fmi.uni-passau.de
(SPD-Stadtrat Kurt Schindler; Regensburg)  | www.fmi.uni-passau.de/~berberic
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] btrfs-progs: update manpage with new option -f for btrfstune

2014-08-19 Thread David Sterba
On Mon, Jul 07, 2014 at 09:54:53AM +0800, Gui Hecheng wrote:
> The new option -f will force to do dangerous changes.
> e.g. clear the seeding flag.

missing signed-off-by

> --- a/Documentation/btrfstune.txt
> +++ b/Documentation/btrfstune.txt
> @@ -24,7 +24,8 @@ Enable seeding forces a fs readonly so that you can use it 
> to build other filesy
>  Enable extended inode refs.
>  -x::
>  Enable skinny metadata extent refs.
> -
> +-f::
> +Allow dangerous changes, e.g. clear the seeding flag

Please enhance this, we've discussed it under previous patch iterations.
Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Questions on using BtrFS for fileserver

2014-08-19 Thread Kyle Manna
> · Besides using bcache, are there any possibilities to boost
>   performance by adding (dedicated) cache-SSDs to a BtrFS?

dm-cache is in the mainline kernel and lvm2 recently added support to
make devicemapper configuration automatic.  In my opinion, dm-cache is
a little easier to use because you can add/remove/resize the cache
without recreating the filesystem.  If you're interested, take a peek
at the man page for lvmcache.

- Kyle
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: Don't continue mounting when superblock csum mismatches even generation is less than 10.

2014-08-19 Thread David Sterba
On Thu, Aug 07, 2014 at 10:51:15AM +0800, Qu Wenruo wrote:
> It seems that the patch is rejected in patchwork,

It was not me :)

> Could any one tell me the reason?

I'd understand that the patch is no longer needed after the original
problem went away, but it's not what you describe in your changelog.
>From that point the reason might not be compelling.

> >Above commit will cause disaster if someone try to mount a newly created but
> >later corrupted btrfs filesystem.

The generation after mkfs is something like 4 or 5, this means that the
corruption would have to happen in the first few transaction commits,
this is unlikely and the filesystem will be probably fairly empty at
that time.

If the concern is about corrupted generation counter itself in the
superblock, then yes this could hurt.

It's still possible to compare the 1st superblock with the copies, the
one at offset 64M is available in 99%, there are enough data to make a
decision what's actually corrupted. This could catch more corruption
than just the generation counter.

>From the output of btrfs-show-super:

generation  56392
chunk_root_generation   56392
cache_generation56392
uuid_tree_generation56392

the generation is duplicated several times, so a minimal patch could be
to do additional comparison with the others.

> >And before btrfs entered mainline, btrfs-progs has already superblock
> >checksum. See btrfs-progs commit: 5ccd1715fa2eaad0b26037bb53706779c8c93b5f
> >(superblock duplication by Yan Zheng).

The superblock checksum was not calculated the same way as in kernel,
but with the missing check this was not detected.

> >Before commit 5ccd17, mkfs.btrfs uses 16K as super offset, while current 
> >btrfs
> >uses 64K super offset, anyway old btrfs without super csum will not be
> >mountable due to the change of super offset.
> >
> >So backward compatibility is not a problem.

Superblocks at offset 16k are not supported anymore AFAICT.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: fs_mark test on btrfs on 3.16.0-rc6+ #1 SMP

2014-08-19 Thread Ming Lei
My miss. Thank you all for pointing out that actually ext4 performed much worse 
in this test. I am wondering whether there is some benchmarking has been done 
in all sorts of different workloads with comparison to ext4. I know btrfs vs 
ext4 is not the apple to apple test, but it will encourage users switch to 
btrfs.


-Original Message-
From: Miao Xie [mailto:mi...@cn.fujitsu.com] 
Sent: Monday, August 18, 2014 8:18 PM
To: Ming Lei; linux-btrfs@vger.kernel.org
Subject: Re: fs_mark test on btrfs on 3.16.0-rc6+ #1 SMP

On Mon, 18 Aug 2014 17:38:17 +, Ming Lei wrote:
> 
> Hi,
> 
> I ran the fs_mark test on a single empty hard drive. After the test, the df 
> -h results are:
> 
> /dev/sdk1 917G   39G  832G   5% /ext4
> /dev/sdj1 932G   53G  850G   6% /btrfs
> 
> The test result for btrfs shows it ran 15 hours. Note there is no file/dir 
> remove operation which I knew very slow compared with ext4.
> 
> [root@sh679 ~]# date;/root/fs_mark -v -n 100 -s 4096 -k -S 1 -D 
> 1000 -N 1000 -d /btrfs/ -t 10;date Mon Aug 11 11:32:54 PDT 2014
> 
> #  /root/fs_mark  -v  -n  100  -s  4096  -k  -S  1  -D  1000  -N  1000  
> -d  /btrfs/  -t  10 
> # Version 3.3, 10 thread(s) starting at Mon Aug 11 11:32:54 2014
> # Sync method: INBAND FSYNC: fsync() per file in write loop.
> # Directories:  Round Robin between directories across 1000 
> subdirectories with 1000 files per subdirectory.
> # File names: 40 bytes long, (16 initial bytes of time stamp with 
> 24 random bytes at end of name)
> # Files info: size 4096 bytes, written with an IO size of 16384 
> bytes per write
> # App overhead is time in microseconds spent in the test not 
> doing file writing related system calls.
> # All system call times are reported in microseconds
> FSUse%Count SizeFiles/sec App OverheadCREAT 
> (Min/Avg/Max)WRITE (Min/Avg/Max)FSYNC (Min/Avg/Max) 
> SYNC (Min/Avg/Max)CLOSE (Min/Avg/Max)   UNLINK (Min/Avg/Max)
>  8 1000 4096184.0155517800   33  
> 372937437   16 30941645054015  54203400   
>  0014 000
> Tue Aug 12 02:40:01 PDT 2014
> 
> For hours, the disk utilization was around 95% and cpu utilization for all 12 
> cores was very low and only one core showed around 26%wa.
> 
> 
> To compare with Ext4:
> The test for ext4 on a same model of hard drive ran 2.5 hours.  
> 
> [root@sh679 ~]# date;/root/fs_mark -v -n 100 -s 4096 -k -S 1 -D 
> 1000 -N 1000 -d /ext4/ -t 10;date Fri Aug  8 17:13:56 PDT 2014 #  
> /root/fs_mark  -v  -n  100  -s  4096  -k  -S  1  -D  1000  -N  1000  -d  
> /ext4/  -t  10
> # Version 3.3, 10 thread(s) starting at Fri Aug  8 17:13:56 2014
> # Sync method: INBAND FSYNC: fsync() per file in write loop.
> # Directories:  Round Robin between directories across 1000 
> subdirectories with 1000 files per subdirectory.
> # File names: 40 bytes long, (16 initial bytes of time stamp with 
> 24 random bytes at end of name)
> # Files info: size 4096 bytes, written with an IO size of 16384 
> bytes per write
> # App overhead is time in microseconds spent in the test not 
> doing file writing related system calls.
> # All system call times are reported in microseconds.
> 
> FSUse%Count SizeFiles/sec App OverheadCREAT 
> (Min/Avg/Max)WRITE (Min/Avg/Max)FSYNC (Min/Avg/Max) 
> SYNC (Min/Avg/Max)CLOSE (Min/Avg/Max)   UNLINK (Min/Avg/Max)
>  9 1000 4096105.0156950153   19  
> 449  17417596   15  20699843236894751  20443640   
>  0014 4149000
> Sat Aug  9 19:41:14 PDT 2014

From

> Fri Aug  8 17:13:56 PDT 2014

to
 
> Sat Aug  9 19:41:14 PDT 2014

It is not 2.5 hours, it's 26.5 hours.

Thanks
Miao

> 
> Is it a known issue with btrfs or do I need to adjust the default parameters 
> for btrfs (I remember use the default to make btrfs)? 
> 
> Mount command shows:
> /dev/sdk1 on /ext4 type ext4 (rw,relatime,seclabel,data=ordered)
> /dev/sdj1 on /btrfs type btrfs (rw,relatime,seclabel,nospace_cache)
> 
> Thanks
> Ming
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" 
> in the body of a message to majord...@vger.kernel.org More majordomo 
> info at  http://vger.kernel.org/majordomo-info.html
> .
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] cannot mount subvolume with selinux context

2014-08-19 Thread Zach Brown
On Tue, Aug 19, 2014 at 11:32:16AM +0800, Eryu Guan wrote:
> Hi,
> 
> Description of the problem:
> 
> mount btrfs with selinux context, then create a subvolume, the new
> subvolume cannot be mounted, even with the same context.
> 
> mkfs -t btrfs /dev/sda5
> mount -o context=system_u:object_r:nfs_t:s0 /dev/sda5 /mnt/btrfs
> btrfs subvolume create /mnt/btrfs/subvol
> mount -o subvol=subvol,context=system_u:object_r:nfs_t:s0 /dev/sda5 /mnt/test

Submit a xfstest?

> The security_sb_copy_data() takes out selinux context data to
> "secdata", then mount_subvol() calls mount_fs() (via vfs_kern_mount())
> again without selinux context, so mount_subvol() fails, which fails
> the whole mount.
> 
> Not sure what's the proper fix. Zach suggestted that the fix will
> probably be to rework the vfs functions a bit as he said in rh
> bugzilla[1].

Yeah, I have no idea what'd be preferred here:

 - rework the vfs _kern_ mount api to offer one that doesn't mess with
   selinux mount options
 - add a flag to have the second _kern_ mount ignore selinux (but not
   MS_KERNMOUNT?)
 - binary data and fs selinux handling?  (like nfs)

- z
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] btrfs: Use backup superblocks if and only if the first superblock is valid but corrupted.

2014-08-19 Thread David Sterba
On Sun, Jul 27, 2014 at 10:53:04PM -0400, Austin S Hemmelgarn wrote:
> >>> But, for right now I'd prefer the admin get involved in using the backup
> >>> supers.  I think silently using the backups is going to lead to
> >>> surprises.
> >> Maybe there could be a mount non-default mount-option to use backup
> >> superblocks iff the first one is corrupted, and then log a warning
> >> whenever this actually happens?  Not handling stuff like this
> >> automatically really hurts HA use cases.
> >>
> >>
> > This seems better and comments also shows this idea.
> > What about merging the behavior into 'recovery' mount option or adding a
> > new mount option?
> Personally, I'd add a new mount option, but make recovery imply that option.

I agree with that, though we do not need introduce an extra option if
the meaning is denendent on 'recovery', but rather make it a mode of
recovery (and we could add more in the future). Eg.

$ mount -o recovery=sb

which would try to use all valid backup superblocks to mount.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Questions on using BtrFS for fileserver

2014-08-19 Thread Austin S Hemmelgarn
On 2014-08-19 12:21, M G Berberich wrote:
> Hello,
> 
> we are thinking about using BtrFS on standard hardware for a
> fileserver with about 50T (100T raw) of storage (25×4TByte).
> 
> This is what I understood so far. Is this right?
> 
> · incremental send/receive works.
> 
> · There is no support for hotspares (spare disks that automatically
>   replaces faulty disk).
> 
> · BtrFS with RAID1 is fairly stable.
> 
> · RAID 5/6 spreads all data over all devices, leading to performance
>   problems on large diskarrays, and there is no option to limit the
>   numbers of disk per stripe so far.
> 
> Some questions:
> 
> · There where reports, that bcache with btrfs leads to corruption. Is
>   this still so?
Based on some testing I did last month, bcache with anything has the
potential to cause data corruption.
> 
> · If a disk failes, does BtrFS rebalance automatically? (This would
>   give a a kind o hotspare behavior)
No, but it wouldn't be hard to write a simple monitoring program to do
this from userspace.  IIRC, the big issue is that you need to add a
device in-place of the failed one for the re-balance to work.
> 
> · Besides using bcache, are there any possibilities to boost
>   performance by adding (dedicated) cache-SSDs to a BtrFS?
Like mentioned in one of the other responses, I would suggest looking
into dm-cache.  BTRFS itself does not have any functionality for this,
although there has been talk of implementing device priorities for
reads, which could provide a similar performance boost.
> 
> · Are there any reports/papers/web-pages about BtrFS-systems this size
>   in use? Praises, complains, performance-reviews, whatever…
While it doesn't quite fit the description, I have had very good success
with a very active 2TB BTRFS RAID10 filesystem consisting of BTRFS on
four unpartitioned 1TB SATA III hard drives.  The filesystem gets in
excess of 100GB of data written to it each day (almost all rewrites
however), and is what I use for /home, /var/log, and /var/lib, and I've
had no issues with it that were caused by BTRFS, and in-fact, the very
fact that it uses BTRFS helped me recover data when the storage
controller they are connected to went bad.  On average, I get about 125%
of raw disk performance on writes, and about 110% on reads.

If you are using a very large number of disks, then I would not suggest
that you use BTRFS RAID10, but instead BTRFS RAID1, as RAID10 will try
to stripe things across ALL of the devices in the filesystem, and unless
you have no more than about four times as many disks as storage
controllers (that is, each controller has no more than four disks
attached to it), the overhead outweighs the benefit of striping the data.

Also, just to make sure it's clear, in BTRFS RAID1, each block gets
written EXACTLY twice.  On the plus side though, this means that if you
do set-up a caching mechanism, you may be able to keep most of the array
spun down a majority of the time.



smime.p7s
Description: S/MIME Cryptographic Signature


Re: [PATCH] Btrfs: cleanup the same name in end_bio_extent_readpage

2014-08-19 Thread Chris Mason
On 08/19/2014 11:32 AM, Liu Bo wrote:
> We've defined a 'offset' out of bio_for_each_segment_all.

This isn't causing problems though?  It should just be shadowing the
bio_for_each_segment_all variable for the duration of the curlies.

No objection as a cleanup, just making sure I'm not missing something.

-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: Don't continue mounting when superblock csum mismatches even generation is less than 10.

2014-08-19 Thread Chris Mason
On 08/06/2014 10:51 PM, Qu Wenruo wrote:
> It seems that the patch is rejected in patchwork,
> 
> Could any one tell me the reason?

I had nack'd it because I was worried at the time about the super crc
errors that Dave had found in the past.  Sorry, I really thought I had
sent email about it.

But Dave has a great point in his reply about validating the super
generation.

-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: fix crash on endio of reading corrupted block

2014-08-19 Thread Chris Mason


On 08/19/2014 11:33 AM, Liu Bo wrote:
> The crash is
> 
> [ cut here ]
> kernel BUG at fs/btrfs/extent_io.c:2124!
> [...]
> Workqueue: btrfs-endio normal_work_helper [btrfs]
> RIP: 0010:[]  [] 
> end_bio_extent_readpage+0xb45/0xcd0 [btrfs]
> 
> This is in fact a regression.
> 
> It is because we forgot to increase @offset properly in reading corrupted 
> block,
> so that the @offset remains, and this leads to checksum errors while reading
> left blocks queued up in the same bio, and then ends up with hiting the above
> BUG_ON.

Thanks Chris and Liu, this is queued.

-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Questions on using BtrFS for fileserver

2014-08-19 Thread Mitch Harder
On Tue, Aug 19, 2014 at 11:21 AM, M G Berberich
 wrote:
> Hello,
>
> we are thinking about using BtrFS on standard hardware for a
> fileserver with about 50T (100T raw) of storage (25×4TByte).
>

I would recommend carefully reading this thread titled: "1 week to
rebuid 4x 3TB raid10 is a long time!"

http://comments.gmane.org/gmane.comp.file-systems.btrfs/36969

There are multiple methods for replacing a device in a Btrfs RAID
array.  If I understand the conclusions of this thread, you might
still expect 12-14 hours to rebuild after replacing a 4 TByte device,
assuming you use the optimal replace commands.

With 25 devices, that leaves an uncomfortable period of time where
another device might fail.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Questions on using BtrFS for fileserver

2014-08-19 Thread Andrej Manduch
Hi,

On 08/19/2014 06:21 PM, M G Berberich wrote:> · Are there any
reports/papers/web-pages about BtrFS-systems this size
>   in use? Praises, complains, performance-reviews, whatever…

I don't know about papers or benchmarks but few weeks ago there was a
guy who has problem with really long mounting with btrfs with similiar size.
https://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg36226.html

And I would not recommend 3TB disks. *I'm not btrfs dev* but as far as I
know there is a quite different between rebuilding disk on real RAID and
btrfs RAID. The problem is btrfs has RAID on filesystem level not on hw
level so there is bigger mechanical overheat on drives and thus it take
significantli longer than regular RAID.

--
b.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: fix crash on endio of reading corrupted block

2014-08-19 Thread Eric Sandeen
On 8/19/14, 10:33 AM, Liu Bo wrote:
> The crash is
> 
> [ cut here ]
> kernel BUG at fs/btrfs/extent_io.c:2124!
> [...]
> Workqueue: btrfs-endio normal_work_helper [btrfs]
> RIP: 0010:[]  [] 
> end_bio_extent_readpage+0xb45/0xcd0 [btrfs]
> 
> This is in fact a regression.

It'd be helpful to identify the commit, or at least kernel release, which caused
the regression.

> It is because we forgot to increase @offset properly in reading corrupted 
> block,
> so that the @offset remains, and this leads to checksum errors while reading
> left blocks queued up in the same bio, and then ends up with hiting the above
> BUG_ON.

So does that mean that any checksum error on this path will crash the kernel?

That sounds like this bug has exposed a more fundamental problem, no?

Thanks,
-Eric

> Reported-by: Chris Murphy 
> Signed-off-by: Liu Bo 
> ---
>  fs/btrfs/extent_io.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index 3af4966..be41e4d 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -2602,6 +2602,7 @@ static void end_bio_extent_readpage(struct bio *bio, 
> int err)
>   test_bit(BIO_UPTODATE, &bio->bi_flags);
>   if (err)
>   uptodate = 0;
> + offset += len;
>   continue;
>   }
>   }
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Questions on using BtrFS for fileserver

2014-08-19 Thread Roman Mamedov
On Tue, 19 Aug 2014 18:21:52 +0200
M G Berberich  wrote:

> · BtrFS with RAID1 is fairly stable.

Maybe, but it's not optimized for performance: reads are not balanced in the
most optimal way, and writes may end up being submitted sequentially rather
than in parallel to two devices, resulting in write performance that's way
less than that of a single device.

> · RAID 5/6 spreads all data over all devices, leading to performance
>   problems on large diskarrays, and there is no option to limit the
>   numbers of disk per stripe so far.

AFAIK Btrfs RAID 5/6 is not yet ready to be used in a production environment;

In your case I would recommend considering Btrfs on top of two 12-disk mdadm
RAID6 arrays, or three 8-disk ones, leaving one HDD as a shared hot spare.

To join the mdadm arrays into a larger block device you can use either LVM, or
Btrfs itself, with the "single" profile for data.

-- 
With respect,
Roman


signature.asc
Description: PGP signature


Re: btrfs receive problem on ARM kirkwood NAS with kernel 3.16.0 and btrfs-progs 3.14.2

2014-08-19 Thread Zach Brown
On Sun, Aug 17, 2014 at 02:44:34PM +0200, Klaus Holler wrote:
> Hello list,
> 
> I want to use an ARM kirkwood based NSA325v2 NAS (dubbed "Receiver") for
> receiving btrfs snapshots done on several hosts, e.g. a Core Duo laptop
> running kubuntu 14.04 LTS (dubbed "Source"), storing them on a 3TB WD
> red disk (having GPT label, partitions created with parted).
> 
> But all the btrfs receive commands on 'Receiver' fail soon with e.g.:
>   ERROR: writing to initrd.img-3.13.0-24-generic.original failed. File
> too large
> ... and that stops reception/snapshot creation.

...

> Increasing the verbosity with "-v -v" for btrfs receive shows the
> following differences between receive operations on 'Receiver' and
> 'OtherHost', both of them using the identical inputfile
> /boot/.snapshot/20140816-1310-boot_kernel3.16.0.btrfs-send
> 
> * the chown and chmod operations are different -> resulting in
> weird/wrong permissions and sizes on 'Receiver' side.
> * what's "stransid", this is the first line that differs

This is interesting, thanks for going to the trouble to show those
diffs.

That the commands and strings match up show us that the basic tlv header
chaining is working.  But the u64 attribute values are sometimes messed
up.  And messed up in a specific way.  A variable number of low order
bytes are magically appearing.

(gdb) print/x 11709972488
$2 = 0x2b9f80008
(gdb) print/x 178680
$3 = 0x2b9f8

(gdb) print/x 588032
$6 = 0x8f900
(gdb) print/x 2297
$7 = 0x8f9

Some light googling makes me think that the Marvell Kirkwood is not
friendly at all to unaligned accesses.

The (biting tongue) send and receive code is playing some games with
casting aligned and unaligned pointers.  Maybe that's upsetting the arm
toolchain/kirkwood.  Does this completely untested patch to btrfs-progs,
to be run on the receiver, do anything?

- z

diff --git a/send-stream.c b/send-stream.c
index 88e18e2..4f8dd83 100644
--- a/send-stream.c
+++ b/send-stream.c
@@ -204,7 +204,7 @@ out:
int __len; \
TLV_GET(s, attr, (void**)&__tmp, &__len); \
TLV_CHECK_LEN(sizeof(*__tmp), __len); \
-   *v = le##bits##_to_cpu(*__tmp); \
+   *v = get_unaligned_le##bits(__tmp); \
} while (0)
 
 #define TLV_GET_U8(s, attr, v) TLV_GET_INT(s, attr, 8, v)
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs receive problem on ARM kirkwood NAS with kernel 3.16.0 and btrfs-progs 3.14.2

2014-08-19 Thread Hugo Mills
On Tue, Aug 19, 2014 at 03:10:55PM -0700, Zach Brown wrote:
> On Sun, Aug 17, 2014 at 02:44:34PM +0200, Klaus Holler wrote:
> > Hello list,
> > 
> > I want to use an ARM kirkwood based NSA325v2 NAS (dubbed "Receiver") for
> > receiving btrfs snapshots done on several hosts, e.g. a Core Duo laptop
> > running kubuntu 14.04 LTS (dubbed "Source"), storing them on a 3TB WD
> > red disk (having GPT label, partitions created with parted).
> > 
> > But all the btrfs receive commands on 'Receiver' fail soon with e.g.:
> >   ERROR: writing to initrd.img-3.13.0-24-generic.original failed. File
> > too large
> > ... and that stops reception/snapshot creation.
> 
> ...
> 
> > Increasing the verbosity with "-v -v" for btrfs receive shows the
> > following differences between receive operations on 'Receiver' and
> > 'OtherHost', both of them using the identical inputfile
> > /boot/.snapshot/20140816-1310-boot_kernel3.16.0.btrfs-send
> > 
> > * the chown and chmod operations are different -> resulting in
> > weird/wrong permissions and sizes on 'Receiver' side.
> > * what's "stransid", this is the first line that differs
> 
> This is interesting, thanks for going to the trouble to show those
> diffs.
> 
> That the commands and strings match up show us that the basic tlv header
> chaining is working.  But the u64 attribute values are sometimes messed
> up.  And messed up in a specific way.  A variable number of low order
> bytes are magically appearing.
> 
> (gdb) print/x 11709972488
> $2 = 0x2b9f80008
> (gdb) print/x 178680
> $3 = 0x2b9f8
> 
> (gdb) print/x 588032
> $6 = 0x8f900
> (gdb) print/x 2297
> $7 = 0x8f9
> 
> Some light googling makes me think that the Marvell Kirkwood is not
> friendly at all to unaligned accesses.

   ARM isn't in general -- it never has been, even 20 years ago in the
ARM3 days when I was writing code in ARM assembler. We've been bitten
by this before in btrfs (mkfs on ARM works, mounting it fails fast,
because userspace has a trap to fix unaligned accesses, and the kernel
doesn't).

> The (biting tongue) send and receive code is playing some games with
> casting aligned and unaligned pointers.  Maybe that's upsetting the arm
> toolchain/kirkwood.

   Almost certainly the toolchain isn't identifying the unaligned
accesses, and thus building code that uses them causes stuff to break.

   There's a workaround for userspace that you can use to verify that
this is indeed the problem: echo 2 >/proc/cpu/alignment will tell the
kernel to fix up unaligned accesses initiated in userspace. It's a
performance killer, but it should serve to identify whether the
problem is actually this.

   Hugo.

>  Does this completely untested patch to btrfs-progs,
> to be run on the receiver, do anything?
> 
> - z
> 
> diff --git a/send-stream.c b/send-stream.c
> index 88e18e2..4f8dd83 100644
> --- a/send-stream.c
> +++ b/send-stream.c
> @@ -204,7 +204,7 @@ out:
> int __len; \
> TLV_GET(s, attr, (void**)&__tmp, &__len); \
> TLV_CHECK_LEN(sizeof(*__tmp), __len); \
> -   *v = le##bits##_to_cpu(*__tmp); \
> +   *v = get_unaligned_le##bits(__tmp); \
> } while (0)
>  
>  #define TLV_GET_U8(s, attr, v) TLV_GET_INT(s, attr, 8, v)

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
   --- "There's a Martian war machine outside -- they want to talk ---   
to you about a cure for the common cold."


signature.asc
Description: Digital signature


Re: [PATCH] btrfs: Don't continue mounting when superblock csum mismatches even generation is less than 10.

2014-08-19 Thread Qu Wenruo


 Original Message 
Subject: Re: [PATCH] btrfs: Don't continue mounting when superblock csum 
mismatches even generation is less than 10.

From: David Sterba 
To: Qu Wenruo 
Date: 2014年08月20日 01:18

On Thu, Aug 07, 2014 at 10:51:15AM +0800, Qu Wenruo wrote:

It seems that the patch is rejected in patchwork,

It was not me :)


Could any one tell me the reason?

I'd understand that the patch is no longer needed after the original
problem went away, but it's not what you describe in your changelog.
 From that point the reason might not be compelling.


Above commit will cause disaster if someone try to mount a newly created but
later corrupted btrfs filesystem.

The generation after mkfs is something like 4 or 5, this means that the
corruption would have to happen in the first few transaction commits,
this is unlikely and the filesystem will be probably fairly empty at
that time.

If the concern is about corrupted generation counter itself in the
superblock, then yes this could hurt.

It's still possible to compare the 1st superblock with the copies, the
one at offset 64M is available in 99%, there are enough data to make a
decision what's actually corrupted. This could catch more corruption
than just the generation counter.

 From the output of btrfs-show-super:

generation  56392
chunk_root_generation   56392
cache_generation56392
uuid_tree_generation56392

the generation is duplicated several times, so a minimal patch could be
to do additional comparison with the others.

Thanks for the explaination.
But in fact, when investigating some bugs (not kernel bugzilla but 
proprietary one), I found not only one but two
disk images whose superblock csum doesn't match and a lot of values go 
crazy.
For example, num_devices goes to 871878361089 and serval bits diffs in 
dev_item.fsid and fsid.

BTW, cache generation is also crazy.

Normally, such superblock should not be mountable since the csum doesn't 
match.
But due to the mentioned commit, the generation (4) is below 10 and 
kernel just ignore the csum error,
and finally, a kernel BUG is triggered, since a lot of things go wrong 
anything is possible.


So I sent the patch and hope to avoid such problem.

Thanks,
Qu



And before btrfs entered mainline, btrfs-progs has already superblock
checksum. See btrfs-progs commit: 5ccd1715fa2eaad0b26037bb53706779c8c93b5f
(superblock duplication by Yan Zheng).

The superblock checksum was not calculated the same way as in kernel,
but with the missing check this was not detected.


Before commit 5ccd17, mkfs.btrfs uses 16K as super offset, while current btrfs
uses 64K super offset, anyway old btrfs without super csum will not be
mountable due to the change of super offset.

So backward compatibility is not a problem.

Superblocks at offset 16k are not supported anymore AFAICT.


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 8/8 v2] btrfs: rename total_bytes to avoid confusion

2014-08-19 Thread Anand Jain
we are assigning number_devices to the total_bytes,
that's very confusing for a moment

Signed-off-by: Anand Jain 
---
v2: accepts David comment renames ret_sz to tmp

 fs/btrfs/volumes.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index bf99e82..718f734 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -2253,7 +2253,7 @@ int btrfs_init_new_device(struct btrfs_root *root, char 
*device_path)
struct list_head *devices;
struct super_block *sb = root->fs_info->sb;
struct rcu_string *name;
-   u64 total_bytes;
+   u64 tmp;
int seeding_dev = 0;
int ret = 0;
 
@@ -2356,13 +2356,13 @@ int btrfs_init_new_device(struct btrfs_root *root, char 
*device_path)
if (!blk_queue_nonrot(bdev_get_queue(bdev)))
root->fs_info->fs_devices->rotating = 1;
 
-   total_bytes = btrfs_super_total_bytes(root->fs_info->super_copy);
+   tmp = btrfs_super_total_bytes(root->fs_info->super_copy);
btrfs_set_super_total_bytes(root->fs_info->super_copy,
-   total_bytes + device->total_bytes);
+   tmp + device->total_bytes);
 
-   total_bytes = btrfs_super_num_devices(root->fs_info->super_copy);
+   tmp = btrfs_super_num_devices(root->fs_info->super_copy);
btrfs_set_super_num_devices(root->fs_info->super_copy,
-   total_bytes + 1);
+   tmp + 1);
 
/* add sysfs device entry */
btrfs_kobj_add_device(root->fs_info, device);
-- 
2.0.0.153.g79d

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/8 v2] btrfs: replace seed device followed by unmount causes kernel WARNING

2014-08-19 Thread Anand Jain
reproducer:
mount /dev/sdb /btrfs
btrfs dev add /dev/sdc /btrfs
btrfs rep start -B /dev/sdb /dev/sdd /btrfs
umount /btrfs

WARNING: CPU: 0 PID: 12661 at fs/btrfs/volumes.c:891 
__btrfs_close_devices+0x1b0/0x200 [btrfs]()
::

__btrfs_close_devices()
::
WARN_ON(fs_devices->open_devices);

After the seed device has been replaced the new target device
is no more a seed device. So we need to update the device
numbers in the fs_devices as pointed by the fs_info.

Signed-off-by: Anand Jain 
Reviewed-by: Miao Xie 
---
v2: sorry had missed the Reviewed by tag, Thxs David

 fs/btrfs/volumes.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 5fd0132..f098ae7 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1964,7 +1964,13 @@ void btrfs_rm_dev_replace_srcdev(struct btrfs_fs_info 
*fs_info,
 
WARN_ON(!mutex_is_locked(&fs_info->fs_devices->device_list_mutex));
 
-   fs_devices = fs_info->fs_devices;
+   /*
+* in case of fs with no seed, srcdev->fs_devices will point
+* to fs_devices of fs_info. However when the dev being replaced is
+* a seed dev it will point to the seed's local fs_devices. In short
+* srcdev will have its correct fs_devices in both the cases.
+*/
+   fs_devices = srcdev->fs_devices;
 
list_del_rcu(&srcdev->dev_list);
list_del_rcu(&srcdev->dev_alloc_list);
-- 
2.0.0.153.g79d

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: Don't continue mounting when superblock csum mismatches even generation is less than 10.

2014-08-19 Thread Qu Wenruo


 Original Message 
Subject: Re: [PATCH] btrfs: Don't continue mounting when superblock csum 
mismatches even generation is less than 10.

From: Chris Mason 
To: Qu Wenruo , 
Date: 2014年08月20日 03:48

On 08/06/2014 10:51 PM, Qu Wenruo wrote:

It seems that the patch is rejected in patchwork,

Could any one tell me the reason?

I had nack'd it because I was worried at the time about the super crc
errors that Dave had found in the past.  Sorry, I really thought I had
sent email about it.

But Dave has a great point in his reply about validating the super
generation.

Thanks for the reason.
I'll search and look at Dave's mail and dig into it.

Although as mentioned in the reply to David,
the main problem is that I found two disk images with crazy values in 
superblock and wrong csum,

but generation is still 4, and ignoring the csum error caused kernel BUG.

Thanks,
Qu



-chris


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] btrfs-progs: canonicalize dm device name before update kernel

2014-08-19 Thread Anand Jain



On 15/08/2014 12:30, Eryu Guan wrote:

On Fri, Aug 15, 2014 at 09:50:34AM +0800, Anand Jain wrote:


Eryu,

  btrfs dev scan -d option is there for legacy reasons. The new method
  is using libblkid to find btrfs devs.

David/Zach, is it time to remove -d option ? or mention deprecated.


  But your test case show problem using btrfsck as well. thats nice!
  The fix for this is in the kernel, which would return busy
  if the device path is being updated when the device is mounted.
  Can you try with Chris integration branch ? mainly the patch..

-
commit 4e5c146442b23437d23a2bd81b95f13dfeaffe88
Author: Anand Jain 
Date:   Thu Jul 3 18:22:05 2014 +0800

 Btrfs: device_list_add() should not update list when mounted
-


Thanks Anand, this patch fixed the issue, btrfsck reports "Device or
resource busy" now.


 Thanks for testing.


[root@hp-dl388g8-01 ~]# btrfsck /dev/mapper/rhel_hp--dl388g8--01-btrfs--2
ERROR: device scan failed '/dev/dm-3' - Device or resource busy
Checking filesystem on /dev/mapper/rhel_hp--dl388g8--01-btrfs--2
UUID: 1104d6d6-2653-496b-8d67-184d522dd632
checking extents
checking free space cache
cache and super generation don't match, space cache will be invalidated
checking fs roots
checking csums
checking root refs
found 114688 bytes used err is 0
total csum bytes: 0
total tree bytes: 114688
total fs tree bytes: 32768
total extent tree bytes: 16384
btree space waste bytes: 108436
file data blocks allocated: 0
  referenced 0
Btrfs v3.12

But btrfsck is still scanning unrelated devices when checking a btrfs
with multiple devices. In my case, I was checking btrfs on lv btrfs-2,
but btrfs-1(dm-3) was scanned too.


 That's expected, as of now the only way to find a partner device is
 by scanning the available devices.

Anand




Hope my first patch could fix this issue in an expected way.

Thanks,
Eryu



Thanks, Anand


On 14/08/2014 19:40, Eryu Guan wrote:

A btrfsck or btrfs device scan -d operation could change the device
name of other mounted btrfs in kernel, if the other btrfs is on lvm
device.

Assume that we have two btrfs filesystems, kernel is 3.16.0-rc4+

[root@hp-dl388eg8-01 btrfs-progs]# btrfs fi show
Label: none  uuid: 1aba7da5-ce2b-4af0-a716-db732abc60b2
 Total devices 1 FS bytes used 384.00KiB
 devid1 size 15.00GiB used 2.04GiB path 
/dev/mapper/rhel_hp--dl388eg8--01-testlv1

Label: none  uuid: 26ff4f12-f6d9-4cbc-aae2-57febeefde37
 Total devices 2 FS bytes used 112.00KiB
 devid1 size 15.00GiB used 2.03GiB path 
/dev/mapper/rhel_hp--dl388eg8--01-testlv2
 devid2 size 15.00GiB used 2.01GiB path 
/dev/mapper/rhel_hp--dl388eg8--01-testlv3

Btrfs v3.14.2

And testlv1 was mounted at /mnt/btrfs

[root@hp-dl388eg8-01 btrfs-progs]# df -TP /mnt/btrfs
FilesystemType  1024-blocks  Used Available 
Capacity Mounted on
/dev/mapper/rhel_hp--dl388eg8--01-testlv1 btrfs15728640   512  13602560 
  1% /mnt/btrfs

Now run btrfsck on testlv2 or btrfs device scan -d, which will scan
all btrfs devices and somehow change the device name.

[root@hp-dl388eg8-01 btrfs-progs]# btrfsck /dev/mapper/rhel_hp--dl388eg8--01-testlv2 
>/dev/null 2>&1

[root@hp-dl388eg8-01 btrfs-progs]# df -TP /mnt/btrfs
Filesystem Type  1024-blocks  Used Available Capacity Mounted on
/dev/dm-3  btrfs15728640   512  13602560   1% /mnt/btrfs
[root@hp-dl388eg8-01 btrfs-progs]# btrfs fi show
Label: none  uuid: 1aba7da5-ce2b-4af0-a716-db732abc60b2
 Total devices 1 FS bytes used 384.00KiB
 devid1 size 15.00GiB used 2.04GiB path /dev/dm-3

Label: none  uuid: 26ff4f12-f6d9-4cbc-aae2-57febeefde37
 Total devices 2 FS bytes used 112.00KiB
 devid1 size 15.00GiB used 2.03GiB path 
/dev/mapper/rhel_hp--dl388eg8--01-testlv2
 devid2 size 15.00GiB used 2.01GiB path 
/dev/mapper/rhel_hp--dl388eg8--01-testlv3

Btrfs v3.14.2

Now calling btrfs_register_one_device with canonicalized dm device name.

Signed-off-by: Eryu Guan 
---

With patch 1 applied, btrfsck won't change the device name,
but btrfs device scan -d still does.

  utils.c | 44 +---
  1 file changed, 41 insertions(+), 3 deletions(-)

diff --git a/utils.c b/utils.c
index f54e749..3567094 100644
--- a/utils.c
+++ b/utils.c
@@ -985,6 +985,32 @@ static int blk_file_in_dev_list(struct btrfs_fs_devices* 
fs_devices,
  }

  /*
+ * Convert dm-N device name to /dev/mapper/name
+ */
+static void canonicalize_dm_name(char *devnode, char *path, int len)
+{
+   char *buf = NULL;
+   FILE *sysfsp = NULL;
+
+   buf = malloc(PATH_MAX);
+   if (!buf)
+   return;
+
+   snprintf(buf, PATH_MAX, "/sys/block/%s/dm/name", devnode);
+   sysfsp = fopen(buf, "r");
+   if (!sysfsp)
+   goto out;
+
+   if (fgets(buf, PATH_MAX, sysfsp)) {
+   buf[strlen(buf) - 1] = '\0';
+   snprintf(path, len - 1, "/dev/map

Re: [BUG] cannot mount subvolume with selinux context

2014-08-19 Thread Eryu Guan
On Tue, Aug 19, 2014 at 10:28:54AM -0700, Zach Brown wrote:
> On Tue, Aug 19, 2014 at 11:32:16AM +0800, Eryu Guan wrote:
> > Hi,
> > 
> > Description of the problem:
> > 
> > mount btrfs with selinux context, then create a subvolume, the new
> > subvolume cannot be mounted, even with the same context.
> > 
> > mkfs -t btrfs /dev/sda5
> > mount -o context=system_u:object_r:nfs_t:s0 /dev/sda5 /mnt/btrfs
> > btrfs subvolume create /mnt/btrfs/subvol
> > mount -o subvol=subvol,context=system_u:object_r:nfs_t:s0 /dev/sda5 
> > /mnt/test
> 
> Submit a xfstest?

Sure, will do.

Thanks,
Eryu
> 
> > The security_sb_copy_data() takes out selinux context data to
> > "secdata", then mount_subvol() calls mount_fs() (via vfs_kern_mount())
> > again without selinux context, so mount_subvol() fails, which fails
> > the whole mount.
> > 
> > Not sure what's the proper fix. Zach suggestted that the fix will
> > probably be to rework the vfs functions a bit as he said in rh
> > bugzilla[1].
> 
> Yeah, I have no idea what'd be preferred here:
> 
>  - rework the vfs _kern_ mount api to offer one that doesn't mess with
>selinux mount options
>  - add a flag to have the second _kern_ mount ignore selinux (but not
>MS_KERNMOUNT?)
>  - binary data and fs selinux handling?  (like nfs)
> 
> - z
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Questions on using BtrFS for fileserver

2014-08-19 Thread Marc MERLIN
On Tue, Aug 19, 2014 at 06:21:52PM +0200, M G Berberich wrote:
> · incremental send/receive works.
 
Yes.

> · There is no support for hotspares (spare disks that automatically
>   replaces faulty disk).

Correct

> · BtrFS with RAID1 is fairly stable.

>From what I know.

> · RAID 5/6 spreads all data over all devices, leading to performance
>   problems on large diskarrays, and there is no option to limit the
>   numbers of disk per stripe so far.

Not sure about the performance issue, but either way, don't use RAID5/6
with btrfs for anything else than playing around. The code is not
finished.

> · If a disk failes, does BtrFS rebalance automatically? (This would
>   give a a kind o hotspare behavior)
 
No, not for raid5/6.

> · Are there any reports/papers/web-pages about BtrFS-systems this size
>   in use? Praises, complains, performance-reviews, whatever…

Use md-raid5 which is known and true, and put btrfs on top.
And still have backups, be ready for btrfs to become unusable (speed
and/or deadlocks), get trashed, or some other problem.
It's not guaranteed to happen, but the odds are far from being 0 either,
so either your data is throwaway, or have good backups.

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | PGP 1024R/763BE901
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs receive problem on ARM kirkwood NAS with kernel 3.16.0 and btrfs-progs 3.14.2

2014-08-19 Thread Daniel Mizyrycki

Thank you Hugo!  Amazing. It almost work all the way,

According to some tests I did, echo 2 >/proc/cpu/alignment does allow in 
fact btrfs receive to work in most cases. For the tests, a x86_64 for 
send, a armv5tel for receive and 2 subvolumes (one with just a few

data and binary files and the other a full root partition) were used.
The send blobs were md5sum and verified at receive side matched.
The small blob was properly process by btrfs receive (file sha1s and 
metadata all matched).

The big blob with the root partition did partially succeeded as it ended
abruptly with ERROR: lsetxattr var/log/journal 
system.posix_acl_default=. failed. Operation not supported. I checked

a few restored files and their sha1 and metadata matched.

Daniel


On 08/19/14 15:22, Hugo Mills wrote:

On Tue, Aug 19, 2014 at 03:10:55PM -0700, Zach Brown wrote:

On Sun, Aug 17, 2014 at 02:44:34PM +0200, Klaus Holler wrote:

Hello list,

I want to use an ARM kirkwood based NSA325v2 NAS (dubbed "Receiver") for
receiving btrfs snapshots done on several hosts, e.g. a Core Duo laptop
running kubuntu 14.04 LTS (dubbed "Source"), storing them on a 3TB WD
red disk (having GPT label, partitions created with parted).

But all the btrfs receive commands on 'Receiver' fail soon with e.g.:
   ERROR: writing to initrd.img-3.13.0-24-generic.original failed. File
too large
... and that stops reception/snapshot creation.


...


Increasing the verbosity with "-v -v" for btrfs receive shows the
following differences between receive operations on 'Receiver' and
'OtherHost', both of them using the identical inputfile
/boot/.snapshot/20140816-1310-boot_kernel3.16.0.btrfs-send

* the chown and chmod operations are different -> resulting in
weird/wrong permissions and sizes on 'Receiver' side.
* what's "stransid", this is the first line that differs


This is interesting, thanks for going to the trouble to show those
diffs.

That the commands and strings match up show us that the basic tlv header
chaining is working.  But the u64 attribute values are sometimes messed
up.  And messed up in a specific way.  A variable number of low order
bytes are magically appearing.

(gdb) print/x 11709972488
$2 = 0x2b9f80008
(gdb) print/x 178680
$3 = 0x2b9f8

(gdb) print/x 588032
$6 = 0x8f900
(gdb) print/x 2297
$7 = 0x8f9

Some light googling makes me think that the Marvell Kirkwood is not
friendly at all to unaligned accesses.


ARM isn't in general -- it never has been, even 20 years ago in the
ARM3 days when I was writing code in ARM assembler. We've been bitten
by this before in btrfs (mkfs on ARM works, mounting it fails fast,
because userspace has a trap to fix unaligned accesses, and the kernel
doesn't).


The (biting tongue) send and receive code is playing some games with
casting aligned and unaligned pointers.  Maybe that's upsetting the arm
toolchain/kirkwood.


Almost certainly the toolchain isn't identifying the unaligned
accesses, and thus building code that uses them causes stuff to break.

There's a workaround for userspace that you can use to verify that
this is indeed the problem: echo 2 >/proc/cpu/alignment will tell the
kernel to fix up unaligned accesses initiated in userspace. It's a
performance killer, but it should serve to identify whether the
problem is actually this.

Hugo.


  Does this completely untested patch to btrfs-progs,
to be run on the receiver, do anything?

- z

diff --git a/send-stream.c b/send-stream.c
index 88e18e2..4f8dd83 100644
--- a/send-stream.c
+++ b/send-stream.c
@@ -204,7 +204,7 @@ out:
 int __len; \
 TLV_GET(s, attr, (void**)&__tmp, &__len); \
 TLV_CHECK_LEN(sizeof(*__tmp), __len); \
-   *v = le##bits##_to_cpu(*__tmp); \
+   *v = get_unaligned_le##bits(__tmp); \
 } while (0)

  #define TLV_GET_U8(s, attr, v) TLV_GET_INT(s, attr, 8, v)



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html