Btrfs stable updates for 3.16.x (and others)
Hi stable team, please add the following patches to stable trees. Patch #3 applies to all currently live stables, a 7 years old bug. I've briefly reviewed all 3 patches against 3.10/12/14/16 (ie. 3.4 skips #1 and #2). Subjects: Btrfs: read lock extent buffer while walking backrefs Btrfs: fix compressed write corruption on enospc Btrfs: fix csum tree corruption, duplicate and outdated checksums Commits: 6f7ff6d7832c6be13e8c95598884dbc40ad69fb7 ce62003f690dff38d3164a632ec69efa15c32cbf 27b9a8122ff71a8cadfbffb9c4f0694300464f3b Thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[btrfs] 8d875f95: xfstests.generic.226.fail
Hi Chris, We noticed an xfstests failure on commit 8d875f95da43c6a8f18f77869f2ef26e9594fecc ("btrfs: disable strict file flushes for renames and truncates") It's 100% reproducible in the 5 test runs. test case: snb-drag/xfstests/4HDD-btrfs-generic-mid 27b9a8122ff71a8 8d875f95da43c6a8f18f77869 --- - %change %stddev | / 0 +Inf% 1 ± 0% TOTAL xfstests.generic.226.fail Thanks, Fengguang -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] btrfs-progs: make close_ctree return void
On Thu, Aug 07, 2014 at 10:35:59AM +0800, Gui Hecheng wrote: > The close_ctree always returns 0 and the stuff that depends on > its return value is of no sense. > Just make close_ctree return void. You should not do that if the function contains BUG_ONs, this means the error path is not handled, rather than trivial. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] btrfs-progs: Move btrfstune to btrfs device tune
On Mon, Aug 11, 2014 at 03:17:11AM +0300, Timofey Titovets wrote: > According to https://btrfs.wiki.kernel.org/index.php/Project_ideas#btrfs > Quote: > merge functionality of btrfstune, eg. under btrfs dev set-seed /dev/ > (discuss the command name though) I've added this project idea long time ago and I'm afraid it's not valid anymore, at least not in the proposed way. > This patch is just code move > After, user can tune btrfs parameters through: > btrfs dev tune -xr /dev/sda2 The btrfstune utility works on an unmounted filesystem and affects the whole filesystem, so the 'device' subgroup is not right here. Most of the commands from the base utility on a mounted filesystem, so a separate btrfstune makes some distinction. The reason for merging the two was to avoid a 1MB binary for very simple thing, the generic filesystem code can be shared with 'btrfs'. The question is what's the right UI, a new subcommand, or via the generic properties command? The property interface is not yet populated, so it might be hard to imagine where the tuning settings would go. Something like this: $ btrfs prop set feature.skinny-metadata 1 /dev/sdx The extended refs can be turned on even on a mounted filesystem, so this would avoid doing 'echo 1 > /sys/fs/btrfs/UUID/features/extended_iref'. At this moment I'm inclined to use the properties interface, which means that the btrfstune utility will stay a bit longer. I'll update the project idea to reflect this so it's not confusing anymore (sorry). -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [btrfs] 8d875f95: xfstests.generic.226.fail
On Tue, Aug 19, 2014 at 07:58:20PM +0800, Fengguang Wu wrote: > We noticed an xfstests failure on commit > > 8d875f95da43c6a8f18f77869f2ef26e9594fecc ("btrfs: disable strict file flushes > for renames and truncates") > > It's 100% reproducible in the 5 test runs. Same here, different mkfs configurations. generic/226 28s ...[16:11:52] [16:12:55] - output mismatch (see /root/xfstests/results//generic/226.out.bad) --- tests/generic/226.out 2013-05-29 17:16:03.0 +0200 +++ /root/xfstests/results//generic/226.out.bad 2014-08-19 16:12:55.0 +0200 @@ -1,6 +1,8 @@ QA output created by 226 --> mkfs 256m filesystem --> 16 buffered 64m writes in a loop -1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 +1 2 3 4 pwrite64: No space left on device +5 6 7 8 9 10 11 12 pwrite64: No space left on device +13 14 15 16 enospc on a small filesystem (256M) # btrfs fi df /mnt/a2 System, single: total=4.00MiB, used=4.00KiB Data+Metadata, single: total=252.00MiB, used=31.09MiB GlobalReserve, single: total=4.00MiB, used=0.00B $ df -h /mnt/a2 FilesystemSize Used Avail Use% Mounted on /dev/sda9 256M 16M 241M 6% /mnt/a2 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4] Btrfs: send, lower mem requirements for processing xattrs
On Mon, Aug 11, 2014 at 03:09:35AM +0100, Filipe Manana wrote: > + if (name_len + data_len > buf_len) { > + buf_len = name_len + data_len; > + if (is_vmalloc_addr(buf)) { > + vfree(buf); > + buf = NULL; > + } else { > + char *tmp = krealloc(buf, buf_len, GFP_NOFS); This could fail with a warning (high order allocation) so I suggest to add __GFP_NOWARN, the vmalloc fallback will catch fragmented memory case and fail if theres no memory. > + > + if (!tmp) > + kfree(buf); > + buf = tmp; > + } > + if (!buf) { > + buf = vmalloc(buf_len); > + if (!buf) { > + ret = -ENOMEM; > + goto out; > + } > + } > + } > + > read_extent_buffer(eb, buf, (unsigned long)(di + 1), > name_len + data_len); > > @@ -1071,7 +1094,10 @@ static int iterate_dir_item(struct btrfs_root *root, > struct btrfs_path *path, > } > > out: > - kfree(buf); > + if (is_vmalloc_addr(buf)) > + vfree(buf); > + else > + kfree(buf); There's even kvfree to do this. > return ret; > } -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [btrfs] 8d875f95: xfstests.generic.226.fail
On 08/19/2014 10:23 AM, David Sterba wrote: > On Tue, Aug 19, 2014 at 07:58:20PM +0800, Fengguang Wu wrote: >> We noticed an xfstests failure on commit >> >> 8d875f95da43c6a8f18f77869f2ef26e9594fecc ("btrfs: disable strict file >> flushes for renames and truncates") >> >> It's 100% reproducible in the 5 test runs. > > Same here, different mkfs configurations. > > generic/226 28s ...[16:11:52] [16:12:55] - output mismatch (see > /root/xfstests/results//generic/226.out.bad) > --- tests/generic/226.out 2013-05-29 17:16:03.0 +0200 > +++ /root/xfstests/results//generic/226.out.bad 2014-08-19 > 16:12:55.0 +0200 > @@ -1,6 +1,8 @@ > QA output created by 226 > --> mkfs 256m filesystem > --> 16 buffered 64m writes in a loop > -1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 > +1 2 3 4 pwrite64: No space left on device > +5 6 7 8 9 10 11 12 pwrite64: No space left on device > +13 14 15 16 > > enospc on a small filesystem (256M) I'm calling filemap flush more often, but otherwise everything else is the same. I'll take a look. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/3 v4] btrfs-progs: random fixes of btrfs-filesystem documentation
On Tue, Aug 12, 2014 at 05:06:01PM +0900, Satoru Takeuchi wrote: > +By default, the show command scans all devices found in /proc/partitions. The default scanning method is blkid, /proc/partitions used to be the default before that. Scanning /proc/partitions is not done through the 'show' command, only during open_ctree afaics. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/3] btrfs-progs: random fixes of btrfs-filesystem documentation
On Mon, Aug 11, 2014 at 10:05:52AM -0700, Eric Sandeen wrote: > (What seems to be missing, though, is why would the user ever choose to use > '-d?') That's a fallback method if blkid or udev are not available. We've had reports in the past that this functionality should not be dropped. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] btrfs-progs: Show error message if btrfs filesystem show failed to find any btrfs filesystem
On Mon, Aug 11, 2014 at 06:13:03PM +0900, Satoru Takeuchi wrote: > From: Satoru Takeuchi > > Current btrfs doesn't display any error message if this command > failed to find any btrfs filesystem corresponding to > ||| which user specified. I'm not sure if it is necessary to print anything. Like if grep printed "Sorry I did not find any lines, please check your regexp" Although, we can add a non-zero return value if there was anything found or not. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs: fix leak in qgroup_subtree_accounting() error path
On 08/18/2014 05:42 PM, Mark Fasheh wrote: > On Sun, Aug 17, 2014 at 03:09:21PM -0500, Eric Sandeen wrote: >> Coverity pointed this out; in the newly added >> qgroup_subtree_accounting(), if btrfs_find_all_roots() >> returns an error, we leak at least the parents pointer, >> and possibly the roots pointer, depending on what failure >> occurs. >> >> If btrfs_find_all_roots() returns an error, we need to >> free up all allocations before we return. "roots" is >> initialized to NULL, so it should be safe to free >> it unconditionally (ulist_free() handles that case). > > Great, thanks for this Eric. > > Reviewed-by: Mark Fasheh > Thanks guys, this is queued. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Btrfs: fix crash on endio of reading corrupted block
The crash is [ cut here ] kernel BUG at fs/btrfs/extent_io.c:2124! [...] Workqueue: btrfs-endio normal_work_helper [btrfs] RIP: 0010:[] [] end_bio_extent_readpage+0xb45/0xcd0 [btrfs] This is in fact a regression. It is because we forgot to increase @offset properly in reading corrupted block, so that the @offset remains, and this leads to checksum errors while reading left blocks queued up in the same bio, and then ends up with hiting the above BUG_ON. Reported-by: Chris Murphy Signed-off-by: Liu Bo --- fs/btrfs/extent_io.c | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 3af4966..be41e4d 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -2602,6 +2602,7 @@ static void end_bio_extent_readpage(struct bio *bio, int err) test_bit(BIO_UPTODATE, &bio->bi_flags); if (err) uptodate = 0; + offset += len; continue; } } -- 1.8.1.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Btrfs: cleanup the same name in end_bio_extent_readpage
We've defined a 'offset' out of bio_for_each_segment_all. This is just a clean rename, no function changes. Signed-off-by: Liu Bo --- fs/btrfs/extent_io.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 3af4966..7e27ba7 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -2609,12 +2609,12 @@ readpage_ok: if (likely(uptodate)) { loff_t i_size = i_size_read(inode); pgoff_t end_index = i_size >> PAGE_CACHE_SHIFT; - unsigned offset; + unsigned off; /* Zero out the end if this page straddles i_size */ - offset = i_size & (PAGE_CACHE_SIZE-1); - if (page->index == end_index && offset) - zero_user_segment(page, offset, PAGE_CACHE_SIZE); + off = i_size & (PAGE_CACHE_SIZE-1); + if (page->index == end_index && off) + zero_user_segment(page, off, PAGE_CACHE_SIZE); SetPageUptodate(page); } else { ClearPageUptodate(page); -- 1.8.1.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/3] btrfs-progs: random fixes of btrfs-filesystem documentation
On 8/19/14, 10:10 AM, David Sterba wrote: > On Mon, Aug 11, 2014 at 10:05:52AM -0700, Eric Sandeen wrote: >> (What seems to be missing, though, is why would the user ever choose to use >> '-d?') > > That's a fallback method if blkid or udev are not available. We've had > reports in the past that this functionality should not be dropped. Seems like using /proc/partitions would make more sense in that case than a recursive scan of every file under /dev, wouldn't it? Any details on those reports? I'm just wondering when you might possibly have success looking deep into the /dev tree if you didn't have success in /proc/partitions. It looks like the functionality was added with: commit 0dbd99fb3e117cd5f87eda492b6b4fab1b5bea23 Author: Goffredo Baroncelli Date: Wed Jun 15 21:55:25 2011 +0200 Scan the devices listed in /proc/partitions During the commands: - btrfs filesystem show - btrfs device scan the devices "scanned" are extracted from /proc/partitions. This should avoid to scan devices not suitable for a btrfs filesystem like cdrom and floppy or to scan not existant devices. The old behavior (scan all the block devices under /dev) may be forced passing the "--all-devices" switch. but I'm not sure why it was preserved. It just seems a bit bizarre to have so many ways to get the same info. Thanks, -Eric -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 8/8] btrfs: rename total_bytes to avoid confusion
On Wed, Aug 13, 2014 at 02:24:26PM +0800, Anand Jain wrote: > we are assigning number_devices to the total_bytes, > that's very confusing for a moment > > Signed-off-by: Anand Jain > --- > fs/btrfs/volumes.c | 10 +- > 1 file changed, 5 insertions(+), 5 deletions(-) > > diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c > index bf99e82..c0c360a 100644 > --- a/fs/btrfs/volumes.c > +++ b/fs/btrfs/volumes.c > @@ -2253,7 +2253,7 @@ int btrfs_init_new_device(struct btrfs_root *root, char > *device_path) > struct list_head *devices; > struct super_block *sb = root->fs_info->sb; > struct rcu_string *name; > - u64 total_bytes; > + u64 ret_sz; A 'tmp' would do, but whatever -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/8] btrfs: replace seed device followed by unmount causes kernel WARNING
On Wed, Aug 13, 2014 at 02:24:20PM +0800, Anand Jain wrote: > reproducer: > mount /dev/sdb /btrfs > btrfs dev add /dev/sdc /btrfs > btrfs rep start -B /dev/sdb /dev/sdd /btrfs > umount /btrfs > > WARNING: CPU: 0 PID: 12661 at fs/btrfs/volumes.c:891 > __btrfs_close_devices+0x1b0/0x200 [btrfs]() > :: > > __btrfs_close_devices() > :: > WARN_ON(fs_devices->open_devices); > > After the seed device has been replaced the new target device > is no more a seed device. So we need to update the device > numbers in the fs_devices as pointed by the fs_info. > > Signed-off-by: Anand Jain A formality: if you get a reviewed-by from somebody and the patch does not change in the next iteration, add the tag to the patch as well. This will ensure the review credit is not lost. Otherwise, pinging the maintainer with a forgotten reviewed-by also works. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] btrfs-progs: Move btrfstune to btrfs device tune
No problem =). Then, just ignore patch. 2014-08-19 17:03 GMT+03:00 David Sterba : > On Mon, Aug 11, 2014 at 03:17:11AM +0300, Timofey Titovets wrote: >> According to https://btrfs.wiki.kernel.org/index.php/Project_ideas#btrfs >> Quote: >> merge functionality of btrfstune, eg. under btrfs dev set-seed /dev/ >> (discuss the command name though) > > I've added this project idea long time ago and I'm afraid it's not valid > anymore, at least not in the proposed way. > >> This patch is just code move >> After, user can tune btrfs parameters through: >> btrfs dev tune -xr /dev/sda2 > > The btrfstune utility works on an unmounted filesystem and affects the > whole filesystem, so the 'device' subgroup is not right here. > > Most of the commands from the base utility on a mounted filesystem, so a > separate btrfstune makes some distinction. The reason for merging the > two was to avoid a 1MB binary for very simple thing, the generic > filesystem code can be shared with 'btrfs'. > > The question is what's the right UI, a new subcommand, or via the > generic properties command? The property interface is not yet populated, > so it might be hard to imagine where the tuning settings would go. > Something like this: > > $ btrfs prop set feature.skinny-metadata 1 /dev/sdx > > The extended refs can be turned on even on a mounted filesystem, so this > would avoid doing 'echo 1 > /sys/fs/btrfs/UUID/features/extended_iref'. > > At this moment I'm inclined to use the properties interface, which means > that the btrfstune utility will stay a bit longer. I'll update the > project idea to reflect this so it's not confusing anymore (sorry). -- Have a nice day, Timofey. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Questions on using BtrFS for fileserver
Hello, we are thinking about using BtrFS on standard hardware for a fileserver with about 50T (100T raw) of storage (25×4TByte). This is what I understood so far. Is this right? · incremental send/receive works. · There is no support for hotspares (spare disks that automatically replaces faulty disk). · BtrFS with RAID1 is fairly stable. · RAID 5/6 spreads all data over all devices, leading to performance problems on large diskarrays, and there is no option to limit the numbers of disk per stripe so far. Some questions: · There where reports, that bcache with btrfs leads to corruption. Is this still so? · If a disk failes, does BtrFS rebalance automatically? (This would give a a kind o hotspare behavior) · Besides using bcache, are there any possibilities to boost performance by adding (dedicated) cache-SSDs to a BtrFS? · Are there any reports/papers/web-pages about BtrFS-systems this size in use? Praises, complains, performance-reviews, whatever… MfG bmg -- „Des is völlig wurscht, was heut beschlos- | M G Berberich sen wird: I bin sowieso dagegn!“ | berbe...@fmi.uni-passau.de (SPD-Stadtrat Kurt Schindler; Regensburg) | www.fmi.uni-passau.de/~berberic -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] btrfs-progs: update manpage with new option -f for btrfstune
On Mon, Jul 07, 2014 at 09:54:53AM +0800, Gui Hecheng wrote: > The new option -f will force to do dangerous changes. > e.g. clear the seeding flag. missing signed-off-by > --- a/Documentation/btrfstune.txt > +++ b/Documentation/btrfstune.txt > @@ -24,7 +24,8 @@ Enable seeding forces a fs readonly so that you can use it > to build other filesy > Enable extended inode refs. > -x:: > Enable skinny metadata extent refs. > - > +-f:: > +Allow dangerous changes, e.g. clear the seeding flag Please enhance this, we've discussed it under previous patch iterations. Thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Questions on using BtrFS for fileserver
> · Besides using bcache, are there any possibilities to boost > performance by adding (dedicated) cache-SSDs to a BtrFS? dm-cache is in the mainline kernel and lvm2 recently added support to make devicemapper configuration automatic. In my opinion, dm-cache is a little easier to use because you can add/remove/resize the cache without recreating the filesystem. If you're interested, take a peek at the man page for lvmcache. - Kyle -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs: Don't continue mounting when superblock csum mismatches even generation is less than 10.
On Thu, Aug 07, 2014 at 10:51:15AM +0800, Qu Wenruo wrote: > It seems that the patch is rejected in patchwork, It was not me :) > Could any one tell me the reason? I'd understand that the patch is no longer needed after the original problem went away, but it's not what you describe in your changelog. >From that point the reason might not be compelling. > >Above commit will cause disaster if someone try to mount a newly created but > >later corrupted btrfs filesystem. The generation after mkfs is something like 4 or 5, this means that the corruption would have to happen in the first few transaction commits, this is unlikely and the filesystem will be probably fairly empty at that time. If the concern is about corrupted generation counter itself in the superblock, then yes this could hurt. It's still possible to compare the 1st superblock with the copies, the one at offset 64M is available in 99%, there are enough data to make a decision what's actually corrupted. This could catch more corruption than just the generation counter. >From the output of btrfs-show-super: generation 56392 chunk_root_generation 56392 cache_generation56392 uuid_tree_generation56392 the generation is duplicated several times, so a minimal patch could be to do additional comparison with the others. > >And before btrfs entered mainline, btrfs-progs has already superblock > >checksum. See btrfs-progs commit: 5ccd1715fa2eaad0b26037bb53706779c8c93b5f > >(superblock duplication by Yan Zheng). The superblock checksum was not calculated the same way as in kernel, but with the missing check this was not detected. > >Before commit 5ccd17, mkfs.btrfs uses 16K as super offset, while current > >btrfs > >uses 64K super offset, anyway old btrfs without super csum will not be > >mountable due to the change of super offset. > > > >So backward compatibility is not a problem. Superblocks at offset 16k are not supported anymore AFAICT. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: fs_mark test on btrfs on 3.16.0-rc6+ #1 SMP
My miss. Thank you all for pointing out that actually ext4 performed much worse in this test. I am wondering whether there is some benchmarking has been done in all sorts of different workloads with comparison to ext4. I know btrfs vs ext4 is not the apple to apple test, but it will encourage users switch to btrfs. -Original Message- From: Miao Xie [mailto:mi...@cn.fujitsu.com] Sent: Monday, August 18, 2014 8:18 PM To: Ming Lei; linux-btrfs@vger.kernel.org Subject: Re: fs_mark test on btrfs on 3.16.0-rc6+ #1 SMP On Mon, 18 Aug 2014 17:38:17 +, Ming Lei wrote: > > Hi, > > I ran the fs_mark test on a single empty hard drive. After the test, the df > -h results are: > > /dev/sdk1 917G 39G 832G 5% /ext4 > /dev/sdj1 932G 53G 850G 6% /btrfs > > The test result for btrfs shows it ran 15 hours. Note there is no file/dir > remove operation which I knew very slow compared with ext4. > > [root@sh679 ~]# date;/root/fs_mark -v -n 100 -s 4096 -k -S 1 -D > 1000 -N 1000 -d /btrfs/ -t 10;date Mon Aug 11 11:32:54 PDT 2014 > > # /root/fs_mark -v -n 100 -s 4096 -k -S 1 -D 1000 -N 1000 > -d /btrfs/ -t 10 > # Version 3.3, 10 thread(s) starting at Mon Aug 11 11:32:54 2014 > # Sync method: INBAND FSYNC: fsync() per file in write loop. > # Directories: Round Robin between directories across 1000 > subdirectories with 1000 files per subdirectory. > # File names: 40 bytes long, (16 initial bytes of time stamp with > 24 random bytes at end of name) > # Files info: size 4096 bytes, written with an IO size of 16384 > bytes per write > # App overhead is time in microseconds spent in the test not > doing file writing related system calls. > # All system call times are reported in microseconds > FSUse%Count SizeFiles/sec App OverheadCREAT > (Min/Avg/Max)WRITE (Min/Avg/Max)FSYNC (Min/Avg/Max) > SYNC (Min/Avg/Max)CLOSE (Min/Avg/Max) UNLINK (Min/Avg/Max) > 8 1000 4096184.0155517800 33 > 372937437 16 30941645054015 54203400 > 0014 000 > Tue Aug 12 02:40:01 PDT 2014 > > For hours, the disk utilization was around 95% and cpu utilization for all 12 > cores was very low and only one core showed around 26%wa. > > > To compare with Ext4: > The test for ext4 on a same model of hard drive ran 2.5 hours. > > [root@sh679 ~]# date;/root/fs_mark -v -n 100 -s 4096 -k -S 1 -D > 1000 -N 1000 -d /ext4/ -t 10;date Fri Aug 8 17:13:56 PDT 2014 # > /root/fs_mark -v -n 100 -s 4096 -k -S 1 -D 1000 -N 1000 -d > /ext4/ -t 10 > # Version 3.3, 10 thread(s) starting at Fri Aug 8 17:13:56 2014 > # Sync method: INBAND FSYNC: fsync() per file in write loop. > # Directories: Round Robin between directories across 1000 > subdirectories with 1000 files per subdirectory. > # File names: 40 bytes long, (16 initial bytes of time stamp with > 24 random bytes at end of name) > # Files info: size 4096 bytes, written with an IO size of 16384 > bytes per write > # App overhead is time in microseconds spent in the test not > doing file writing related system calls. > # All system call times are reported in microseconds. > > FSUse%Count SizeFiles/sec App OverheadCREAT > (Min/Avg/Max)WRITE (Min/Avg/Max)FSYNC (Min/Avg/Max) > SYNC (Min/Avg/Max)CLOSE (Min/Avg/Max) UNLINK (Min/Avg/Max) > 9 1000 4096105.0156950153 19 > 449 17417596 15 20699843236894751 20443640 > 0014 4149000 > Sat Aug 9 19:41:14 PDT 2014 From > Fri Aug 8 17:13:56 PDT 2014 to > Sat Aug 9 19:41:14 PDT 2014 It is not 2.5 hours, it's 26.5 hours. Thanks Miao > > Is it a known issue with btrfs or do I need to adjust the default parameters > for btrfs (I remember use the default to make btrfs)? > > Mount command shows: > /dev/sdk1 on /ext4 type ext4 (rw,relatime,seclabel,data=ordered) > /dev/sdj1 on /btrfs type btrfs (rw,relatime,seclabel,nospace_cache) > > Thanks > Ming > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" > in the body of a message to majord...@vger.kernel.org More majordomo > info at http://vger.kernel.org/majordomo-info.html > . > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [BUG] cannot mount subvolume with selinux context
On Tue, Aug 19, 2014 at 11:32:16AM +0800, Eryu Guan wrote: > Hi, > > Description of the problem: > > mount btrfs with selinux context, then create a subvolume, the new > subvolume cannot be mounted, even with the same context. > > mkfs -t btrfs /dev/sda5 > mount -o context=system_u:object_r:nfs_t:s0 /dev/sda5 /mnt/btrfs > btrfs subvolume create /mnt/btrfs/subvol > mount -o subvol=subvol,context=system_u:object_r:nfs_t:s0 /dev/sda5 /mnt/test Submit a xfstest? > The security_sb_copy_data() takes out selinux context data to > "secdata", then mount_subvol() calls mount_fs() (via vfs_kern_mount()) > again without selinux context, so mount_subvol() fails, which fails > the whole mount. > > Not sure what's the proper fix. Zach suggestted that the fix will > probably be to rework the vfs functions a bit as he said in rh > bugzilla[1]. Yeah, I have no idea what'd be preferred here: - rework the vfs _kern_ mount api to offer one that doesn't mess with selinux mount options - add a flag to have the second _kern_ mount ignore selinux (but not MS_KERNMOUNT?) - binary data and fs selinux handling? (like nfs) - z -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] btrfs: Use backup superblocks if and only if the first superblock is valid but corrupted.
On Sun, Jul 27, 2014 at 10:53:04PM -0400, Austin S Hemmelgarn wrote: > >>> But, for right now I'd prefer the admin get involved in using the backup > >>> supers. I think silently using the backups is going to lead to > >>> surprises. > >> Maybe there could be a mount non-default mount-option to use backup > >> superblocks iff the first one is corrupted, and then log a warning > >> whenever this actually happens? Not handling stuff like this > >> automatically really hurts HA use cases. > >> > >> > > This seems better and comments also shows this idea. > > What about merging the behavior into 'recovery' mount option or adding a > > new mount option? > Personally, I'd add a new mount option, but make recovery imply that option. I agree with that, though we do not need introduce an extra option if the meaning is denendent on 'recovery', but rather make it a mode of recovery (and we could add more in the future). Eg. $ mount -o recovery=sb which would try to use all valid backup superblocks to mount. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Questions on using BtrFS for fileserver
On 2014-08-19 12:21, M G Berberich wrote: > Hello, > > we are thinking about using BtrFS on standard hardware for a > fileserver with about 50T (100T raw) of storage (25×4TByte). > > This is what I understood so far. Is this right? > > · incremental send/receive works. > > · There is no support for hotspares (spare disks that automatically > replaces faulty disk). > > · BtrFS with RAID1 is fairly stable. > > · RAID 5/6 spreads all data over all devices, leading to performance > problems on large diskarrays, and there is no option to limit the > numbers of disk per stripe so far. > > Some questions: > > · There where reports, that bcache with btrfs leads to corruption. Is > this still so? Based on some testing I did last month, bcache with anything has the potential to cause data corruption. > > · If a disk failes, does BtrFS rebalance automatically? (This would > give a a kind o hotspare behavior) No, but it wouldn't be hard to write a simple monitoring program to do this from userspace. IIRC, the big issue is that you need to add a device in-place of the failed one for the re-balance to work. > > · Besides using bcache, are there any possibilities to boost > performance by adding (dedicated) cache-SSDs to a BtrFS? Like mentioned in one of the other responses, I would suggest looking into dm-cache. BTRFS itself does not have any functionality for this, although there has been talk of implementing device priorities for reads, which could provide a similar performance boost. > > · Are there any reports/papers/web-pages about BtrFS-systems this size > in use? Praises, complains, performance-reviews, whatever… While it doesn't quite fit the description, I have had very good success with a very active 2TB BTRFS RAID10 filesystem consisting of BTRFS on four unpartitioned 1TB SATA III hard drives. The filesystem gets in excess of 100GB of data written to it each day (almost all rewrites however), and is what I use for /home, /var/log, and /var/lib, and I've had no issues with it that were caused by BTRFS, and in-fact, the very fact that it uses BTRFS helped me recover data when the storage controller they are connected to went bad. On average, I get about 125% of raw disk performance on writes, and about 110% on reads. If you are using a very large number of disks, then I would not suggest that you use BTRFS RAID10, but instead BTRFS RAID1, as RAID10 will try to stripe things across ALL of the devices in the filesystem, and unless you have no more than about four times as many disks as storage controllers (that is, each controller has no more than four disks attached to it), the overhead outweighs the benefit of striping the data. Also, just to make sure it's clear, in BTRFS RAID1, each block gets written EXACTLY twice. On the plus side though, this means that if you do set-up a caching mechanism, you may be able to keep most of the array spun down a majority of the time. smime.p7s Description: S/MIME Cryptographic Signature
Re: [PATCH] Btrfs: cleanup the same name in end_bio_extent_readpage
On 08/19/2014 11:32 AM, Liu Bo wrote: > We've defined a 'offset' out of bio_for_each_segment_all. This isn't causing problems though? It should just be shadowing the bio_for_each_segment_all variable for the duration of the curlies. No objection as a cleanup, just making sure I'm not missing something. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs: Don't continue mounting when superblock csum mismatches even generation is less than 10.
On 08/06/2014 10:51 PM, Qu Wenruo wrote: > It seems that the patch is rejected in patchwork, > > Could any one tell me the reason? I had nack'd it because I was worried at the time about the super crc errors that Dave had found in the past. Sorry, I really thought I had sent email about it. But Dave has a great point in his reply about validating the super generation. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: fix crash on endio of reading corrupted block
On 08/19/2014 11:33 AM, Liu Bo wrote: > The crash is > > [ cut here ] > kernel BUG at fs/btrfs/extent_io.c:2124! > [...] > Workqueue: btrfs-endio normal_work_helper [btrfs] > RIP: 0010:[] [] > end_bio_extent_readpage+0xb45/0xcd0 [btrfs] > > This is in fact a regression. > > It is because we forgot to increase @offset properly in reading corrupted > block, > so that the @offset remains, and this leads to checksum errors while reading > left blocks queued up in the same bio, and then ends up with hiting the above > BUG_ON. Thanks Chris and Liu, this is queued. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Questions on using BtrFS for fileserver
On Tue, Aug 19, 2014 at 11:21 AM, M G Berberich wrote: > Hello, > > we are thinking about using BtrFS on standard hardware for a > fileserver with about 50T (100T raw) of storage (25×4TByte). > I would recommend carefully reading this thread titled: "1 week to rebuid 4x 3TB raid10 is a long time!" http://comments.gmane.org/gmane.comp.file-systems.btrfs/36969 There are multiple methods for replacing a device in a Btrfs RAID array. If I understand the conclusions of this thread, you might still expect 12-14 hours to rebuild after replacing a 4 TByte device, assuming you use the optimal replace commands. With 25 devices, that leaves an uncomfortable period of time where another device might fail. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Questions on using BtrFS for fileserver
Hi, On 08/19/2014 06:21 PM, M G Berberich wrote:> · Are there any reports/papers/web-pages about BtrFS-systems this size > in use? Praises, complains, performance-reviews, whatever… I don't know about papers or benchmarks but few weeks ago there was a guy who has problem with really long mounting with btrfs with similiar size. https://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg36226.html And I would not recommend 3TB disks. *I'm not btrfs dev* but as far as I know there is a quite different between rebuilding disk on real RAID and btrfs RAID. The problem is btrfs has RAID on filesystem level not on hw level so there is bigger mechanical overheat on drives and thus it take significantli longer than regular RAID. -- b. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: fix crash on endio of reading corrupted block
On 8/19/14, 10:33 AM, Liu Bo wrote: > The crash is > > [ cut here ] > kernel BUG at fs/btrfs/extent_io.c:2124! > [...] > Workqueue: btrfs-endio normal_work_helper [btrfs] > RIP: 0010:[] [] > end_bio_extent_readpage+0xb45/0xcd0 [btrfs] > > This is in fact a regression. It'd be helpful to identify the commit, or at least kernel release, which caused the regression. > It is because we forgot to increase @offset properly in reading corrupted > block, > so that the @offset remains, and this leads to checksum errors while reading > left blocks queued up in the same bio, and then ends up with hiting the above > BUG_ON. So does that mean that any checksum error on this path will crash the kernel? That sounds like this bug has exposed a more fundamental problem, no? Thanks, -Eric > Reported-by: Chris Murphy > Signed-off-by: Liu Bo > --- > fs/btrfs/extent_io.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c > index 3af4966..be41e4d 100644 > --- a/fs/btrfs/extent_io.c > +++ b/fs/btrfs/extent_io.c > @@ -2602,6 +2602,7 @@ static void end_bio_extent_readpage(struct bio *bio, > int err) > test_bit(BIO_UPTODATE, &bio->bi_flags); > if (err) > uptodate = 0; > + offset += len; > continue; > } > } > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Questions on using BtrFS for fileserver
On Tue, 19 Aug 2014 18:21:52 +0200 M G Berberich wrote: > · BtrFS with RAID1 is fairly stable. Maybe, but it's not optimized for performance: reads are not balanced in the most optimal way, and writes may end up being submitted sequentially rather than in parallel to two devices, resulting in write performance that's way less than that of a single device. > · RAID 5/6 spreads all data over all devices, leading to performance > problems on large diskarrays, and there is no option to limit the > numbers of disk per stripe so far. AFAIK Btrfs RAID 5/6 is not yet ready to be used in a production environment; In your case I would recommend considering Btrfs on top of two 12-disk mdadm RAID6 arrays, or three 8-disk ones, leaving one HDD as a shared hot spare. To join the mdadm arrays into a larger block device you can use either LVM, or Btrfs itself, with the "single" profile for data. -- With respect, Roman signature.asc Description: PGP signature
Re: btrfs receive problem on ARM kirkwood NAS with kernel 3.16.0 and btrfs-progs 3.14.2
On Sun, Aug 17, 2014 at 02:44:34PM +0200, Klaus Holler wrote: > Hello list, > > I want to use an ARM kirkwood based NSA325v2 NAS (dubbed "Receiver") for > receiving btrfs snapshots done on several hosts, e.g. a Core Duo laptop > running kubuntu 14.04 LTS (dubbed "Source"), storing them on a 3TB WD > red disk (having GPT label, partitions created with parted). > > But all the btrfs receive commands on 'Receiver' fail soon with e.g.: > ERROR: writing to initrd.img-3.13.0-24-generic.original failed. File > too large > ... and that stops reception/snapshot creation. ... > Increasing the verbosity with "-v -v" for btrfs receive shows the > following differences between receive operations on 'Receiver' and > 'OtherHost', both of them using the identical inputfile > /boot/.snapshot/20140816-1310-boot_kernel3.16.0.btrfs-send > > * the chown and chmod operations are different -> resulting in > weird/wrong permissions and sizes on 'Receiver' side. > * what's "stransid", this is the first line that differs This is interesting, thanks for going to the trouble to show those diffs. That the commands and strings match up show us that the basic tlv header chaining is working. But the u64 attribute values are sometimes messed up. And messed up in a specific way. A variable number of low order bytes are magically appearing. (gdb) print/x 11709972488 $2 = 0x2b9f80008 (gdb) print/x 178680 $3 = 0x2b9f8 (gdb) print/x 588032 $6 = 0x8f900 (gdb) print/x 2297 $7 = 0x8f9 Some light googling makes me think that the Marvell Kirkwood is not friendly at all to unaligned accesses. The (biting tongue) send and receive code is playing some games with casting aligned and unaligned pointers. Maybe that's upsetting the arm toolchain/kirkwood. Does this completely untested patch to btrfs-progs, to be run on the receiver, do anything? - z diff --git a/send-stream.c b/send-stream.c index 88e18e2..4f8dd83 100644 --- a/send-stream.c +++ b/send-stream.c @@ -204,7 +204,7 @@ out: int __len; \ TLV_GET(s, attr, (void**)&__tmp, &__len); \ TLV_CHECK_LEN(sizeof(*__tmp), __len); \ - *v = le##bits##_to_cpu(*__tmp); \ + *v = get_unaligned_le##bits(__tmp); \ } while (0) #define TLV_GET_U8(s, attr, v) TLV_GET_INT(s, attr, 8, v) -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs receive problem on ARM kirkwood NAS with kernel 3.16.0 and btrfs-progs 3.14.2
On Tue, Aug 19, 2014 at 03:10:55PM -0700, Zach Brown wrote: > On Sun, Aug 17, 2014 at 02:44:34PM +0200, Klaus Holler wrote: > > Hello list, > > > > I want to use an ARM kirkwood based NSA325v2 NAS (dubbed "Receiver") for > > receiving btrfs snapshots done on several hosts, e.g. a Core Duo laptop > > running kubuntu 14.04 LTS (dubbed "Source"), storing them on a 3TB WD > > red disk (having GPT label, partitions created with parted). > > > > But all the btrfs receive commands on 'Receiver' fail soon with e.g.: > > ERROR: writing to initrd.img-3.13.0-24-generic.original failed. File > > too large > > ... and that stops reception/snapshot creation. > > ... > > > Increasing the verbosity with "-v -v" for btrfs receive shows the > > following differences between receive operations on 'Receiver' and > > 'OtherHost', both of them using the identical inputfile > > /boot/.snapshot/20140816-1310-boot_kernel3.16.0.btrfs-send > > > > * the chown and chmod operations are different -> resulting in > > weird/wrong permissions and sizes on 'Receiver' side. > > * what's "stransid", this is the first line that differs > > This is interesting, thanks for going to the trouble to show those > diffs. > > That the commands and strings match up show us that the basic tlv header > chaining is working. But the u64 attribute values are sometimes messed > up. And messed up in a specific way. A variable number of low order > bytes are magically appearing. > > (gdb) print/x 11709972488 > $2 = 0x2b9f80008 > (gdb) print/x 178680 > $3 = 0x2b9f8 > > (gdb) print/x 588032 > $6 = 0x8f900 > (gdb) print/x 2297 > $7 = 0x8f9 > > Some light googling makes me think that the Marvell Kirkwood is not > friendly at all to unaligned accesses. ARM isn't in general -- it never has been, even 20 years ago in the ARM3 days when I was writing code in ARM assembler. We've been bitten by this before in btrfs (mkfs on ARM works, mounting it fails fast, because userspace has a trap to fix unaligned accesses, and the kernel doesn't). > The (biting tongue) send and receive code is playing some games with > casting aligned and unaligned pointers. Maybe that's upsetting the arm > toolchain/kirkwood. Almost certainly the toolchain isn't identifying the unaligned accesses, and thus building code that uses them causes stuff to break. There's a workaround for userspace that you can use to verify that this is indeed the problem: echo 2 >/proc/cpu/alignment will tell the kernel to fix up unaligned accesses initiated in userspace. It's a performance killer, but it should serve to identify whether the problem is actually this. Hugo. > Does this completely untested patch to btrfs-progs, > to be run on the receiver, do anything? > > - z > > diff --git a/send-stream.c b/send-stream.c > index 88e18e2..4f8dd83 100644 > --- a/send-stream.c > +++ b/send-stream.c > @@ -204,7 +204,7 @@ out: > int __len; \ > TLV_GET(s, attr, (void**)&__tmp, &__len); \ > TLV_CHECK_LEN(sizeof(*__tmp), __len); \ > - *v = le##bits##_to_cpu(*__tmp); \ > + *v = get_unaligned_le##bits(__tmp); \ > } while (0) > > #define TLV_GET_U8(s, attr, v) TLV_GET_INT(s, attr, 8, v) -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- "There's a Martian war machine outside -- they want to talk --- to you about a cure for the common cold." signature.asc Description: Digital signature
Re: [PATCH] btrfs: Don't continue mounting when superblock csum mismatches even generation is less than 10.
Original Message Subject: Re: [PATCH] btrfs: Don't continue mounting when superblock csum mismatches even generation is less than 10. From: David Sterba To: Qu Wenruo Date: 2014年08月20日 01:18 On Thu, Aug 07, 2014 at 10:51:15AM +0800, Qu Wenruo wrote: It seems that the patch is rejected in patchwork, It was not me :) Could any one tell me the reason? I'd understand that the patch is no longer needed after the original problem went away, but it's not what you describe in your changelog. From that point the reason might not be compelling. Above commit will cause disaster if someone try to mount a newly created but later corrupted btrfs filesystem. The generation after mkfs is something like 4 or 5, this means that the corruption would have to happen in the first few transaction commits, this is unlikely and the filesystem will be probably fairly empty at that time. If the concern is about corrupted generation counter itself in the superblock, then yes this could hurt. It's still possible to compare the 1st superblock with the copies, the one at offset 64M is available in 99%, there are enough data to make a decision what's actually corrupted. This could catch more corruption than just the generation counter. From the output of btrfs-show-super: generation 56392 chunk_root_generation 56392 cache_generation56392 uuid_tree_generation56392 the generation is duplicated several times, so a minimal patch could be to do additional comparison with the others. Thanks for the explaination. But in fact, when investigating some bugs (not kernel bugzilla but proprietary one), I found not only one but two disk images whose superblock csum doesn't match and a lot of values go crazy. For example, num_devices goes to 871878361089 and serval bits diffs in dev_item.fsid and fsid. BTW, cache generation is also crazy. Normally, such superblock should not be mountable since the csum doesn't match. But due to the mentioned commit, the generation (4) is below 10 and kernel just ignore the csum error, and finally, a kernel BUG is triggered, since a lot of things go wrong anything is possible. So I sent the patch and hope to avoid such problem. Thanks, Qu And before btrfs entered mainline, btrfs-progs has already superblock checksum. See btrfs-progs commit: 5ccd1715fa2eaad0b26037bb53706779c8c93b5f (superblock duplication by Yan Zheng). The superblock checksum was not calculated the same way as in kernel, but with the missing check this was not detected. Before commit 5ccd17, mkfs.btrfs uses 16K as super offset, while current btrfs uses 64K super offset, anyway old btrfs without super csum will not be mountable due to the change of super offset. So backward compatibility is not a problem. Superblocks at offset 16k are not supported anymore AFAICT. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 8/8 v2] btrfs: rename total_bytes to avoid confusion
we are assigning number_devices to the total_bytes, that's very confusing for a moment Signed-off-by: Anand Jain --- v2: accepts David comment renames ret_sz to tmp fs/btrfs/volumes.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index bf99e82..718f734 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -2253,7 +2253,7 @@ int btrfs_init_new_device(struct btrfs_root *root, char *device_path) struct list_head *devices; struct super_block *sb = root->fs_info->sb; struct rcu_string *name; - u64 total_bytes; + u64 tmp; int seeding_dev = 0; int ret = 0; @@ -2356,13 +2356,13 @@ int btrfs_init_new_device(struct btrfs_root *root, char *device_path) if (!blk_queue_nonrot(bdev_get_queue(bdev))) root->fs_info->fs_devices->rotating = 1; - total_bytes = btrfs_super_total_bytes(root->fs_info->super_copy); + tmp = btrfs_super_total_bytes(root->fs_info->super_copy); btrfs_set_super_total_bytes(root->fs_info->super_copy, - total_bytes + device->total_bytes); + tmp + device->total_bytes); - total_bytes = btrfs_super_num_devices(root->fs_info->super_copy); + tmp = btrfs_super_num_devices(root->fs_info->super_copy); btrfs_set_super_num_devices(root->fs_info->super_copy, - total_bytes + 1); + tmp + 1); /* add sysfs device entry */ btrfs_kobj_add_device(root->fs_info, device); -- 2.0.0.153.g79d -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/8 v2] btrfs: replace seed device followed by unmount causes kernel WARNING
reproducer: mount /dev/sdb /btrfs btrfs dev add /dev/sdc /btrfs btrfs rep start -B /dev/sdb /dev/sdd /btrfs umount /btrfs WARNING: CPU: 0 PID: 12661 at fs/btrfs/volumes.c:891 __btrfs_close_devices+0x1b0/0x200 [btrfs]() :: __btrfs_close_devices() :: WARN_ON(fs_devices->open_devices); After the seed device has been replaced the new target device is no more a seed device. So we need to update the device numbers in the fs_devices as pointed by the fs_info. Signed-off-by: Anand Jain Reviewed-by: Miao Xie --- v2: sorry had missed the Reviewed by tag, Thxs David fs/btrfs/volumes.c | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 5fd0132..f098ae7 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -1964,7 +1964,13 @@ void btrfs_rm_dev_replace_srcdev(struct btrfs_fs_info *fs_info, WARN_ON(!mutex_is_locked(&fs_info->fs_devices->device_list_mutex)); - fs_devices = fs_info->fs_devices; + /* +* in case of fs with no seed, srcdev->fs_devices will point +* to fs_devices of fs_info. However when the dev being replaced is +* a seed dev it will point to the seed's local fs_devices. In short +* srcdev will have its correct fs_devices in both the cases. +*/ + fs_devices = srcdev->fs_devices; list_del_rcu(&srcdev->dev_list); list_del_rcu(&srcdev->dev_alloc_list); -- 2.0.0.153.g79d -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs: Don't continue mounting when superblock csum mismatches even generation is less than 10.
Original Message Subject: Re: [PATCH] btrfs: Don't continue mounting when superblock csum mismatches even generation is less than 10. From: Chris Mason To: Qu Wenruo , Date: 2014年08月20日 03:48 On 08/06/2014 10:51 PM, Qu Wenruo wrote: It seems that the patch is rejected in patchwork, Could any one tell me the reason? I had nack'd it because I was worried at the time about the super crc errors that Dave had found in the past. Sorry, I really thought I had sent email about it. But Dave has a great point in his reply about validating the super generation. Thanks for the reason. I'll search and look at Dave's mail and dig into it. Although as mentioned in the reply to David, the main problem is that I found two disk images with crazy values in superblock and wrong csum, but generation is still 4, and ignoring the csum error caused kernel BUG. Thanks, Qu -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] btrfs-progs: canonicalize dm device name before update kernel
On 15/08/2014 12:30, Eryu Guan wrote: On Fri, Aug 15, 2014 at 09:50:34AM +0800, Anand Jain wrote: Eryu, btrfs dev scan -d option is there for legacy reasons. The new method is using libblkid to find btrfs devs. David/Zach, is it time to remove -d option ? or mention deprecated. But your test case show problem using btrfsck as well. thats nice! The fix for this is in the kernel, which would return busy if the device path is being updated when the device is mounted. Can you try with Chris integration branch ? mainly the patch.. - commit 4e5c146442b23437d23a2bd81b95f13dfeaffe88 Author: Anand Jain Date: Thu Jul 3 18:22:05 2014 +0800 Btrfs: device_list_add() should not update list when mounted - Thanks Anand, this patch fixed the issue, btrfsck reports "Device or resource busy" now. Thanks for testing. [root@hp-dl388g8-01 ~]# btrfsck /dev/mapper/rhel_hp--dl388g8--01-btrfs--2 ERROR: device scan failed '/dev/dm-3' - Device or resource busy Checking filesystem on /dev/mapper/rhel_hp--dl388g8--01-btrfs--2 UUID: 1104d6d6-2653-496b-8d67-184d522dd632 checking extents checking free space cache cache and super generation don't match, space cache will be invalidated checking fs roots checking csums checking root refs found 114688 bytes used err is 0 total csum bytes: 0 total tree bytes: 114688 total fs tree bytes: 32768 total extent tree bytes: 16384 btree space waste bytes: 108436 file data blocks allocated: 0 referenced 0 Btrfs v3.12 But btrfsck is still scanning unrelated devices when checking a btrfs with multiple devices. In my case, I was checking btrfs on lv btrfs-2, but btrfs-1(dm-3) was scanned too. That's expected, as of now the only way to find a partner device is by scanning the available devices. Anand Hope my first patch could fix this issue in an expected way. Thanks, Eryu Thanks, Anand On 14/08/2014 19:40, Eryu Guan wrote: A btrfsck or btrfs device scan -d operation could change the device name of other mounted btrfs in kernel, if the other btrfs is on lvm device. Assume that we have two btrfs filesystems, kernel is 3.16.0-rc4+ [root@hp-dl388eg8-01 btrfs-progs]# btrfs fi show Label: none uuid: 1aba7da5-ce2b-4af0-a716-db732abc60b2 Total devices 1 FS bytes used 384.00KiB devid1 size 15.00GiB used 2.04GiB path /dev/mapper/rhel_hp--dl388eg8--01-testlv1 Label: none uuid: 26ff4f12-f6d9-4cbc-aae2-57febeefde37 Total devices 2 FS bytes used 112.00KiB devid1 size 15.00GiB used 2.03GiB path /dev/mapper/rhel_hp--dl388eg8--01-testlv2 devid2 size 15.00GiB used 2.01GiB path /dev/mapper/rhel_hp--dl388eg8--01-testlv3 Btrfs v3.14.2 And testlv1 was mounted at /mnt/btrfs [root@hp-dl388eg8-01 btrfs-progs]# df -TP /mnt/btrfs FilesystemType 1024-blocks Used Available Capacity Mounted on /dev/mapper/rhel_hp--dl388eg8--01-testlv1 btrfs15728640 512 13602560 1% /mnt/btrfs Now run btrfsck on testlv2 or btrfs device scan -d, which will scan all btrfs devices and somehow change the device name. [root@hp-dl388eg8-01 btrfs-progs]# btrfsck /dev/mapper/rhel_hp--dl388eg8--01-testlv2 >/dev/null 2>&1 [root@hp-dl388eg8-01 btrfs-progs]# df -TP /mnt/btrfs Filesystem Type 1024-blocks Used Available Capacity Mounted on /dev/dm-3 btrfs15728640 512 13602560 1% /mnt/btrfs [root@hp-dl388eg8-01 btrfs-progs]# btrfs fi show Label: none uuid: 1aba7da5-ce2b-4af0-a716-db732abc60b2 Total devices 1 FS bytes used 384.00KiB devid1 size 15.00GiB used 2.04GiB path /dev/dm-3 Label: none uuid: 26ff4f12-f6d9-4cbc-aae2-57febeefde37 Total devices 2 FS bytes used 112.00KiB devid1 size 15.00GiB used 2.03GiB path /dev/mapper/rhel_hp--dl388eg8--01-testlv2 devid2 size 15.00GiB used 2.01GiB path /dev/mapper/rhel_hp--dl388eg8--01-testlv3 Btrfs v3.14.2 Now calling btrfs_register_one_device with canonicalized dm device name. Signed-off-by: Eryu Guan --- With patch 1 applied, btrfsck won't change the device name, but btrfs device scan -d still does. utils.c | 44 +--- 1 file changed, 41 insertions(+), 3 deletions(-) diff --git a/utils.c b/utils.c index f54e749..3567094 100644 --- a/utils.c +++ b/utils.c @@ -985,6 +985,32 @@ static int blk_file_in_dev_list(struct btrfs_fs_devices* fs_devices, } /* + * Convert dm-N device name to /dev/mapper/name + */ +static void canonicalize_dm_name(char *devnode, char *path, int len) +{ + char *buf = NULL; + FILE *sysfsp = NULL; + + buf = malloc(PATH_MAX); + if (!buf) + return; + + snprintf(buf, PATH_MAX, "/sys/block/%s/dm/name", devnode); + sysfsp = fopen(buf, "r"); + if (!sysfsp) + goto out; + + if (fgets(buf, PATH_MAX, sysfsp)) { + buf[strlen(buf) - 1] = '\0'; + snprintf(path, len - 1, "/dev/map
Re: [BUG] cannot mount subvolume with selinux context
On Tue, Aug 19, 2014 at 10:28:54AM -0700, Zach Brown wrote: > On Tue, Aug 19, 2014 at 11:32:16AM +0800, Eryu Guan wrote: > > Hi, > > > > Description of the problem: > > > > mount btrfs with selinux context, then create a subvolume, the new > > subvolume cannot be mounted, even with the same context. > > > > mkfs -t btrfs /dev/sda5 > > mount -o context=system_u:object_r:nfs_t:s0 /dev/sda5 /mnt/btrfs > > btrfs subvolume create /mnt/btrfs/subvol > > mount -o subvol=subvol,context=system_u:object_r:nfs_t:s0 /dev/sda5 > > /mnt/test > > Submit a xfstest? Sure, will do. Thanks, Eryu > > > The security_sb_copy_data() takes out selinux context data to > > "secdata", then mount_subvol() calls mount_fs() (via vfs_kern_mount()) > > again without selinux context, so mount_subvol() fails, which fails > > the whole mount. > > > > Not sure what's the proper fix. Zach suggestted that the fix will > > probably be to rework the vfs functions a bit as he said in rh > > bugzilla[1]. > > Yeah, I have no idea what'd be preferred here: > > - rework the vfs _kern_ mount api to offer one that doesn't mess with >selinux mount options > - add a flag to have the second _kern_ mount ignore selinux (but not >MS_KERNMOUNT?) > - binary data and fs selinux handling? (like nfs) > > - z -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Questions on using BtrFS for fileserver
On Tue, Aug 19, 2014 at 06:21:52PM +0200, M G Berberich wrote: > · incremental send/receive works. Yes. > · There is no support for hotspares (spare disks that automatically > replaces faulty disk). Correct > · BtrFS with RAID1 is fairly stable. >From what I know. > · RAID 5/6 spreads all data over all devices, leading to performance > problems on large diskarrays, and there is no option to limit the > numbers of disk per stripe so far. Not sure about the performance issue, but either way, don't use RAID5/6 with btrfs for anything else than playing around. The code is not finished. > · If a disk failes, does BtrFS rebalance automatically? (This would > give a a kind o hotspare behavior) No, not for raid5/6. > · Are there any reports/papers/web-pages about BtrFS-systems this size > in use? Praises, complains, performance-reviews, whatever… Use md-raid5 which is known and true, and put btrfs on top. And still have backups, be ready for btrfs to become unusable (speed and/or deadlocks), get trashed, or some other problem. It's not guaranteed to happen, but the odds are far from being 0 either, so either your data is throwaway, or have good backups. Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ | PGP 1024R/763BE901 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs receive problem on ARM kirkwood NAS with kernel 3.16.0 and btrfs-progs 3.14.2
Thank you Hugo! Amazing. It almost work all the way, According to some tests I did, echo 2 >/proc/cpu/alignment does allow in fact btrfs receive to work in most cases. For the tests, a x86_64 for send, a armv5tel for receive and 2 subvolumes (one with just a few data and binary files and the other a full root partition) were used. The send blobs were md5sum and verified at receive side matched. The small blob was properly process by btrfs receive (file sha1s and metadata all matched). The big blob with the root partition did partially succeeded as it ended abruptly with ERROR: lsetxattr var/log/journal system.posix_acl_default=. failed. Operation not supported. I checked a few restored files and their sha1 and metadata matched. Daniel On 08/19/14 15:22, Hugo Mills wrote: On Tue, Aug 19, 2014 at 03:10:55PM -0700, Zach Brown wrote: On Sun, Aug 17, 2014 at 02:44:34PM +0200, Klaus Holler wrote: Hello list, I want to use an ARM kirkwood based NSA325v2 NAS (dubbed "Receiver") for receiving btrfs snapshots done on several hosts, e.g. a Core Duo laptop running kubuntu 14.04 LTS (dubbed "Source"), storing them on a 3TB WD red disk (having GPT label, partitions created with parted). But all the btrfs receive commands on 'Receiver' fail soon with e.g.: ERROR: writing to initrd.img-3.13.0-24-generic.original failed. File too large ... and that stops reception/snapshot creation. ... Increasing the verbosity with "-v -v" for btrfs receive shows the following differences between receive operations on 'Receiver' and 'OtherHost', both of them using the identical inputfile /boot/.snapshot/20140816-1310-boot_kernel3.16.0.btrfs-send * the chown and chmod operations are different -> resulting in weird/wrong permissions and sizes on 'Receiver' side. * what's "stransid", this is the first line that differs This is interesting, thanks for going to the trouble to show those diffs. That the commands and strings match up show us that the basic tlv header chaining is working. But the u64 attribute values are sometimes messed up. And messed up in a specific way. A variable number of low order bytes are magically appearing. (gdb) print/x 11709972488 $2 = 0x2b9f80008 (gdb) print/x 178680 $3 = 0x2b9f8 (gdb) print/x 588032 $6 = 0x8f900 (gdb) print/x 2297 $7 = 0x8f9 Some light googling makes me think that the Marvell Kirkwood is not friendly at all to unaligned accesses. ARM isn't in general -- it never has been, even 20 years ago in the ARM3 days when I was writing code in ARM assembler. We've been bitten by this before in btrfs (mkfs on ARM works, mounting it fails fast, because userspace has a trap to fix unaligned accesses, and the kernel doesn't). The (biting tongue) send and receive code is playing some games with casting aligned and unaligned pointers. Maybe that's upsetting the arm toolchain/kirkwood. Almost certainly the toolchain isn't identifying the unaligned accesses, and thus building code that uses them causes stuff to break. There's a workaround for userspace that you can use to verify that this is indeed the problem: echo 2 >/proc/cpu/alignment will tell the kernel to fix up unaligned accesses initiated in userspace. It's a performance killer, but it should serve to identify whether the problem is actually this. Hugo. Does this completely untested patch to btrfs-progs, to be run on the receiver, do anything? - z diff --git a/send-stream.c b/send-stream.c index 88e18e2..4f8dd83 100644 --- a/send-stream.c +++ b/send-stream.c @@ -204,7 +204,7 @@ out: int __len; \ TLV_GET(s, attr, (void**)&__tmp, &__len); \ TLV_CHECK_LEN(sizeof(*__tmp), __len); \ - *v = le##bits##_to_cpu(*__tmp); \ + *v = get_unaligned_le##bits(__tmp); \ } while (0) #define TLV_GET_U8(s, attr, v) TLV_GET_INT(s, attr, 8, v) -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html