Re: [dm-devel] [PATCH next] Btrfs: fix comparison in __btrfs_map_block()
On Sun, Jul 17, 2016 at 03:51:03PM -0500, Mike Christie wrote: > > > > diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c > > index a69203a..6ee1e36 100644 > > --- a/fs/btrfs/volumes.c > > +++ b/fs/btrfs/volumes.c > > @@ -5533,7 +5533,7 @@ static int __btrfs_map_block(struct btrfs_fs_info > > *fs_info, int op, > > } > > > > } else if (map->type & BTRFS_BLOCK_GROUP_DUP) { > > - if (op == REQ_OP_WRITE || REQ_OP_DISCARD || > > + if (op == REQ_OP_WRITE || op == REQ_OP_DISCARD || > > op == REQ_GET_READ_MIRRORS) { > > num_stripes = map->num_stripes; > > } else if (mirror_num) { > > > > > Shoot. Dumb mistake by me. It is of course correct. Ad while we're at it we need to fix up that REQ_GET_READ_MIRRORS thing. Overloading the op localally in a fs is going to create problems sooner or later as no one touching the generic values and/or the code mashalling it in different forms knows about it. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: mount btrfs takes 30 minutes, btrfs check runs out of memory
At 07/16/2016 07:17 PM, John Ettedgui wrote: On Thu, Jul 14, 2016 at 10:54 PM John Ettedgui> wrote: On Thu, Jul 14, 2016 at 10:26 PM Qu Wenruo > wrote: > Would increasing the leaf size help as well? > nodatacow seems unsafe Nodatacow is not that unsafe, as btrfs will still do data cow if it's needed, like rewriting data of another subvolume/snapshot. Alright. That would be one of the most obvious method if you do a lot of rewrite. > as for defrag, all my partitions are already on > autodefrag, so I assume that should be good. Or is manual once in a > while a good idea as well? AFAIK autodefrag will only help if you're doing appending write. Manual one will help more, but since btrfs has problem defraging extents shared by different subvolumes, I doubt the effect if you have a lot of subvolumes/snapshots. I don't have any subvolume/snapshot for the big partitions, my usage there is fairly simple. I'll have to add a regular defrag job then. Another method is to disable compression. For compression, file extent size up limit is 128K, while for non-compress case, it's 128M. So for the same 1G sized file, it would cause 8K extents using compression, while only 8 extents without compression. Now that might be something important, I do use LZO compression on all of them. Does this limit apply to only compressed files, or any file if the fs is mounted using the compression option? Would mounting these partitions without compression option and then defragmenting them reverse the compression? I've tried this for the slowest to mount partition. I changed its mount option to compression=no, run defrag and balance. Not sure if the latter was needed but I thought to try... like in the past it worked fine up to dusage=99 but with 100% I get a crash, oh well. The result of defrag + nocompress (I don't know how much it actually decompressed, and if it changed the limit Qu mentioned before) is about 26% less time spent to mount the partition, and it's no more my slowerst partition to mount.! Well, compression=no only affects any write after the mount option. And balance won't help to convert compressed extent to non-compressed one. But maybe the defrag will convert them to normal extents. The best method to de-compress them is, to read them out and rewrite them with compression=no mount option. I'll try just defragmenting another partition but keeping the compression on and see what difference I get there the same changes. I've tried the patch, which applied fine to my kernel (4.6.4) but I don't see any difference in mounting time, maybe I made a mistake or my issue is not really the same? Pretty possible that there is another problem causing the slow mount. The best method to verify is to do a ftrace on the btrfs mount. Here is the script I tested my patch: -- #!/bin/bash trace_dir=/sys/kernel/debug/tracing init_trace () { echo 0 > $trace_dir/tracing_on echo > $trace_dir/trace echo function_graph > $trace_dir/current_tracer echo > $trace_dir/set_ftrace_filter echo open_ctree >> $trace_dir/set_ftrace_filter echo btrfs_read_chunk_tree >> $trace_dir/set_ftrace_filter echo btrfs_read_block_groups>> $trace_dir/set_ftrace_filter # This will generate tons of trace, better to comment it out echo find_block_group >> $trace_dir/set_ftrace_filter echo 1 > $trace_dir/tracing_on } end_trace () { cp $trace_dir/trace $(dirname $0) echo 0 > $trace_dir/tracing_on echo > $trace_dir/set_ftrace_filter echo > $trace_dir/trace } init_trace echo start mounting time mount /dev/sdb /mnt/test echo mount done end_trace -- After executing the script, you got a file named "trace" at the same directory of the script. The content will be like: -- # tracer: function_graph # # CPU DURATION FUNCTION CALLS # | | | | | | | 1) $ 7670856 us | open_ctree [btrfs](); 2) * 13533.45 us |btrfs_read_chunk_tree [btrfs](); 2) # 1320.981 us |btrfs_init_space_info [btrfs](); 2) |btrfs_read_block_groups [btrfs]() { 2) * 10127.35 us | find_block_group [btrfs](); 2) 4.951 us| find_block_group [btrfs](); 2) * 26225.17 us | find_block_group [btrfs](); .. 3) * 26450.28 us | find_block_group [btrfs](); 3) * 11590.29 us | find_block_group [btrfs](); 3) $ 7557210 us |} /* btrfs_read_block_groups [btrfs] */ <<< -- And you can see open_ctree() function, the main part of btrfs mount, takes about 7.67 seconds to execute,
Re: [PATCH] vfs: allow FILE_EXTENT_SAME (dedupe_file_range) on a file opened ro
On Mon, Jul 18, 2016 at 12:13:38AM +0200, Adam Borowski wrote: > Instead of checking the mode of the file descriptor, let's check whether it > could have been opened rw. This allows fixing intermittent exec failures > when deduping a live system: anyone trying to exec a file currently being > deduped gets ETXTBSY. > > Issuing this ioctl on a ro file was already allowed for root/cap. > > Tested on btrfs and not-yet-merged xfs, as only them implement this ioctl. This is a resend of a patch I've targetted at the wrong maintainer (btrfs guys rather than Al Viro/vfs). Since then, I've tested it on xfs-devel (f0b34b677df10d9e3deffcd0b1c1f0234b80 atop of 4.7-rc5 and -rc7). Review so far: http://thread.gmane.org/gmane.comp.file-systems.btrfs/56563 An idea to relax the check and allow dedupe to everyone who can read the file was shot down because of concerns that in some edge cases it might be possible to clobber a targetted file. Thus, we're back to the original patch, requiring ro descriptor but rw permission. Meow! -- An imaginary friend squared is a real enemy. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] vfs: allow FILE_EXTENT_SAME (dedupe_file_range) on a file opened ro
Instead of checking the mode of the file descriptor, let's check whether it could have been opened rw. This allows fixing intermittent exec failures when deduping a live system: anyone trying to exec a file currently being deduped gets ETXTBSY. Issuing this ioctl on a ro file was already allowed for root/cap. Tested on btrfs and not-yet-merged xfs, as only them implement this ioctl. Signed-off-by: Adam Borowski--- fs/read_write.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/read_write.c b/fs/read_write.c index 933b53a..df59dc6 100644 --- a/fs/read_write.c +++ b/fs/read_write.c @@ -1723,7 +1723,7 @@ int vfs_dedupe_file_range(struct file *file, struct file_dedupe_range *same) if (info->reserved) { info->status = -EINVAL; - } else if (!(is_admin || (dst_file->f_mode & FMODE_WRITE))) { + } else if (!(is_admin || !inode_permission(dst, MAY_WRITE))) { info->status = -EINVAL; } else if (file->f_path.mnt != dst_file->f_path.mnt) { info->status = -EXDEV; -- 2.8.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Status of SMR with BTRFS
Am 17.07.2016 um 22:10 schrieb Henk Slager: > What kernel (version) did you use ? > I hope it included: > http://git.kernel.org/cgit/linux/kernel/git/mkp/linux.git/commit/?h=bugzilla-93581=7c4fbd50bfece00abf529bc96ac989dd2bb83ca4 > > so >= 4.4, as without this patch, it is quite problematic, if not > impossible, to use this 8TB Seagate SMR drive with linux without doing > other patches or setting/module changes. Thanks for that pointer, I tested kernels 3.18.28, 4.1.[17+19] and 4.5.0 . I had seen task aborts on the drive when io-stressing it with kernels 3.18 and 4.1 (and ext4), but I never figured out the exact reason. Since I'm currently stuck at kernel 4.1.x, I did not research this any further (kernels >=4.2 aren't usable in esxi-guests when using pass-through devices due to irq handling issues which lead to driver inits failing - I'm told vmware is still sitting on a fix). > Since this patch, I have been using the drive for cold storage > archiving, connected to a Baytrail SoC SATA port. I use bcache > (writethrough or writearound) on an 8TB GPT partition that has a LUKS > container that is Btrfs m-dup, d-single formatted and mounted > compress=lzo,noatine,nossd. It is only powered on once a month for a > day or so and then it receives incremental snapshots mostly or some > SSD or flash images of 10-50G. > I have more or less kept all the snapshots sofar, so chunks keep being > added to previously unwritten space, so as sequential as possible. Mhh, see that would be one too many layers of complexity for my taste in such a setup - the Seagate SMR drives are fast enough to handle Gbit-LAN speeds if they are served mostly large sequential chunks by the file system, which f2fs actually manages to do (cold storage in my scenario too). Btrfs does too many scattered writes for this to work without bandages (i.e. caching or snapshotting) (although I do see the advantage in having checksums for data which you write once and then read like once every year). > If free space would be heavily fragmented and also files would be > heavily fragmented and the disk would be very full, adding new files > or modifying would be very slow. You see than many seconds that the > drive is active but no traffic on the SATA link. Also then there is > the risk that the default '/sys/block/$(kerneldevname)/device/timeout' > of 30 secs is too low, and that the kernel might reset the SATA link. > A SATA link still happened 2x the last 1/2 year, I haven't really > looked at the details sofar, just rebooted at some point in time > later, but I will set the timeout at least higher, e.g. 180, and then > see if ata errors/resets still occur. It might be FW crashes as well. As far as I've tested f2fs never backed the SMR drive into a corner, which is probably due to it's sequential write pattern as a log-structured file system and it's background garbage collection (i.e. defragmentation) - even in a full state. I imagine this will probably not work out for hot data though. > > At least this SMR drive is not advised to use in raid setups. As > not-so-active array it might work if you use the right timeouts and > scterc etc, but if have seen how long the wait on the SATA link can be > and that makes me realize that the stamp 'Archive Drive' done by > Seagate has a clear reason. Agreed these drives do need special handling. For archival workloads with cold data they can be used if the file system is kind enough. I wouldn't be comfortable using these drives in any scenario where they might be backed into a corner in which case the wait times are far to uncalculable for my taste. --- Matthias -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH next] Btrfs: fix comparison in __btrfs_map_block()
On 07/15/2016 10:03 AM, Vincent Stehlé wrote: > Add missing comparison to op in expression, which was forgotten when doing > the REQ_OP transition. > > Fixes: b3d3fa519905 ("btrfs: update __btrfs_map_block for REQ_OP transition") > Signed-off-by: Vincent Stehlé> Cc: Mike Christie > Cc: Jens Axboe > --- > > > Hi, > > I saw that issue in linux next. > > Not sure if it is too late to squash the fix with commit b3d3fa519905 or > not... > > Best regards, > > Vincent. > > > fs/btrfs/volumes.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c > index a69203a..6ee1e36 100644 > --- a/fs/btrfs/volumes.c > +++ b/fs/btrfs/volumes.c > @@ -5533,7 +5533,7 @@ static int __btrfs_map_block(struct btrfs_fs_info > *fs_info, int op, > } > > } else if (map->type & BTRFS_BLOCK_GROUP_DUP) { > - if (op == REQ_OP_WRITE || REQ_OP_DISCARD || > + if (op == REQ_OP_WRITE || op == REQ_OP_DISCARD || > op == REQ_GET_READ_MIRRORS) { > num_stripes = map->num_stripes; > } else if (mirror_num) { > Shoot. Dumb mistake by me. It is of course correct. Reviewed-by: Mike Christie -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Status of SMR with BTRFS
>>> It's a Seagate Expansion Desktop 5TB (USB3). It is probably a >>> ST5000DM000. >> >> >> this is TGMR not SMR disk: >> >> http://www.seagate.com/www-content/product-content/desktop-hdd-fam/en-us/docs/100743772a.pdf >> So it still confirms to standard record strategy ... > > > I am not convinced. I had not heared TGMR before. But I find TGMR as a > technology for the head. > https://pics.computerbase.de/4/0/3/4/4/29-1080.455720475.jpg > > In any case: the drive behaves like a SMR drive: I ran a benchmark on it > with up to 200MB/s. > When copying a file onto the drive in parallel the rate in the benchmark > dropped to 7MB/s, while that particular file was copied at 40MB/s. It is very well possible that for a normal drive of 4TB or so you get this kind of behaviour. Suppose you have 2 tasks, 1 writing in with 4k blocksize to a 1G file at the beginning of the disk and the 2nd with 4k blocksize to a 1G file at the end of the disk. At the beginning you get sustained ~150MB/s, at the end ~75MB/s. Between every 4k write (or read) you move the head(s), so ~4ms lost. I was wondering how big the zones etc are and hopefully this is still true: http://blog.schmorp.de/data/smr/fast15-paper-aghayev.pdf > https://github.com/kdave/drafts/blob/master/btrfs/smr-mode.txt > And this does sound like improvements to BTRFS can be done for SMR in a > generic, not vendor/device specific manner. Maybe have a look at recent patches from Hannes R from SUSE (to 4.7 kernel AFAIK) and see what will be possible with Btrfs once this 'zone-handling' is all working on the lower layers. Currently, there is nothing special in Btrfs for SMR drives in recent kernels, but in my experience it works, if you keep device-managed SMR characteristics/limitations in mind. Maybe like a tape-archive or dvd-burner. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Status of SMR with BTRFS
On Sun, Jul 17, 2016 at 10:26 AM, Matthias Pragerwrote: > from my experience btrfs does work as badly with SMR drives (I only had > the opportunity to test on a 8TB Seagate device-managed drive) as ext4. > The initial performance is fine (for a few gigabytes / minutes), but > drops of a cliff as soon as the internal buffer-region for > non-sequential writes fills up (even though I tested large file SMB > transfers). What kernel (version) did you use ? I hope it included: http://git.kernel.org/cgit/linux/kernel/git/mkp/linux.git/commit/?h=bugzilla-93581=7c4fbd50bfece00abf529bc96ac989dd2bb83ca4 so >= 4.4, as without this patch, it is quite problematic, if not impossible, to use this 8TB Seagate SMR drive with linux without doing other patches or setting/module changes. Since this patch, I have been using the drive for cold storage archiving, connected to a Baytrail SoC SATA port. I use bcache (writethrough or writearound) on an 8TB GPT partition that has a LUKS container that is Btrfs m-dup, d-single formatted and mounted compress=lzo,noatine,nossd. It is only powered on once a month for a day or so and then it receives incremental snapshots mostly or some SSD or flash images of 10-50G. I have more or less kept all the snapshots sofar, so chunks keep being added to previously unwritten space, so as sequential as possible. If free space would be heavily fragmented and also files would be heavily fragmented and the disk would be very full, adding new files or modifying would be very slow. You see than many seconds that the drive is active but no traffic on the SATA link. Also then there is the risk that the default '/sys/block/$(kerneldevname)/device/timeout' of 30 secs is too low, and that the kernel might reset the SATA link. A SATA link still happened 2x the last 1/2 year, I haven't really looked at the details sofar, just rebooted at some point in time later, but I will set the timeout at least higher, e.g. 180, and then see if ata errors/resets still occur. It might be FW crashes as well. > The only file system that worked really well with the 8TB Seagate SMR > drive was f2fs. I used 'mkfs.f2fs -o 0 -a 0 -s 9 /dev/sdx' to create one > and mounted it with noatime. -o means no additional over provisioning > (the 5% default is a lot of wasted space on a 8TB drive), -a 0 tells > f2fs not to use separate areas on the disks at the same time (which does > not perform well on hdds only on ssds) and finally -s 9 tells f2fs to > layout the file system in 1GB chunks. > I hammered this file system for some days (via SMB and via shred-script) > and it worked really well (performance and stability wise). Interesting that f2fs works well, although now thinking a bit, I am not so surprised that it works better than ext4 > I am considering using SMR drives for the next upgrades in my storage > server in the basement - the only things missing in f2fs are checksums > and raid1 support. But in my current setup (md-raid1+ext4) I don't get > checksums either so f2fs+smr is still on my road-map. Long term, I would > really like to switch to btrfs with it's built-in check summing (which > unfortunately does not work with NOCOW) and raid1. But some of the file > systems are almost 100% filled and I'm not trusting btrfs's stability > yet (and the manageability / handling of btrfs lacks behind compared to > say zfs). At least this SMR drive is not advised to use in raid setups. As not-so-active array it might work if you use the right timeouts and scterc etc, but if have seen how long the wait on the SATA link can be and that makes me realize that the stamp 'Archive Drive' done by Seagate has a clear reason. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [BUG] Btrfs scrub sometime recalculate wrong parity in raid5
On Sat, Jul 16, 2016 at 06:51:11PM +0300, Jarkko Lavinen wrote: > The modified script behaves very much like the original dd version. Not quite. The bad sector simulation works like old hard drives without error correction and bad block remapping. This changes the error behaviour. My script prints now kernel messages once the check_fs fails. The time range of messages messages is from the adding of the bad sector device to the point when check_fs fails. The parity test which often passes with the Goffredo's script, always fails with my bad sector version and scrub says the error is uncorrectable. In the kernel messages there are two buffer IO read errors but no write error as if scrub quits before writing? In the data2 test scrub again says the error is uncorrectable but according to the kernel messages the bad sector is read 4 times and written twice during the scrub. In my bad sector script the data2 is still corrupted and parity ok since the bad sector cannot be written and scrub likely quits earlier than in Goffredo's script. In his script the data2 gets fixed but the parity gets corrupted. Jarkko Lavinen $ bash h2.sh --- test 1: corrupt parity scrub started on mnt/., fsid 2625e2d0-420c-40b6-befa-97fc18eaed48 (pid=32490) ERROR: there are uncorrectable errors *** Wrong data on disk:off /dev/mapper/loop0:61931520 (parity) Data read ||, expected |0300 0303| Kernel messages in the test First Check_fs started Buffer I/O error on dev dm-0, logical block 15120, async page read Scrub started Second Check_fs started Buffer I/O error on dev dm-0, logical block 15120, async page read --- test 2: corrupt data2 scrub started on mnt/., fsid 8e506268-16c7-48fa-b176-0a8877f2a7aa (pid=434) ERROR: there are uncorrectable errors *** Wrong data on disk:off /dev/mapper/loop2:81854464 (data2) Data read ||, expected |bdbbb| Kernel messages in the test First Check_fs started Buffer I/O error on dev dm-2, logical block 19984, async page read Scrub started BTRFS warning (device dm-0): i/o error at logical 142802944 on dev /dev/mapper/loop2, sector 159872, root 5, inode 257, offset 65536, length 4096, links 1 (path: out.txt) BTRFS error (device dm-0): bdev /dev/mapper/loop2 errs: wr 0, rd 1, flush 0, corrupt 0, gen 0 BTRFS error (device dm-0): bdev /dev/mapper/loop2 errs: wr 1, rd 1, flush 0, corrupt 0, gen 0 BTRFS warning (device dm-0): i/o error at logical 142802944 on dev /dev/mapper/loop2, sector 159872, root 5, inode 257, offset 65536, length 4096, links 1 (path: out.txt) BTRFS error (device dm-0): bdev /dev/mapper/loop2 errs: wr 1, rd 2, flush 0, corrupt 0, gen 0 BTRFS error (device dm-0): unable to fixup (regular) error at logical 142802944 on dev /dev/mapper/loop2 BTRFS error (device dm-0): bdev /dev/mapper/loop2 errs: wr 2, rd 2, flush 0, corrupt 0, gen 0 BTRFS error (device dm-0): unable to fixup (regular) error at logical 142802944 on dev /dev/mapper/loop2 BTRFS error (device dm-0): bdev /dev/mapper/loop2 errs: wr 2, rd 3, flush 0, corrupt 0, gen 0 BTRFS error (device dm-0): bdev /dev/mapper/loop2 errs: wr 2, rd 4, flush 0, corrupt 0, gen 0 Second Check_fs started BTRFS info (device dm-0): bdev /dev/mapper/loop2 errs: wr 2, rd 4, flush 0, corrupt 0, gen 0 Buffer I/O error on dev dm-2, logical block 19984, async page read --- test 3: corrupt data1 scrub started on mnt/., fsid f8a4ecca-2475-4e5e-9651-65d9478b56fe (pid=856) ERROR: there are uncorrectable errors *** Wrong data on disk:off /dev/mapper/loop1:61931520 (data1) Data read ||, expected |adaaa| Kernel messages in the test First Check_fs started Buffer I/O error on dev dm-1, logical block 15120, async page read Scrub started BTRFS warning (device dm-0): i/o error at logical 142737408 on dev /dev/mapper/loop1, sector 120960, root 5, inode 257, offset 0, length 4096, links 1 (path: out.txt) BTRFS error (device dm-0): bdev /dev/mapper/loop1 errs: wr 0, rd 1, flush 0, corrupt 0, gen 0 BTRFS warning (device dm-0): i/o error at logical 142737408 on dev /dev/mapper/loop1, sector 120960, root 5, inode 257, offset 0, length 4096, links 1 (path: out.txt) BTRFS error (device dm-0): bdev /dev/mapper/loop1 errs: wr 0, rd 2, flush 0, corrupt 0, gen 0 BTRFS error (device dm-0): bdev /dev/mapper/loop1 errs: wr 1, rd 2, flush 0, corrupt 0, gen 0 BTRFS error (device dm-0): unable to fixup (regular) error at logical 142737408 on dev /dev/mapper/loop1 BTRFS error (device dm-0): unable to fixup (regular) error at logical 142737408 on dev /dev/mapper/loop1 BTRFS error (device dm-0): bdev /dev/mapper/loop1 errs: wr 1, rd 3, flush 0, corrupt 0, gen 0 Second Check_fs started BTRFS error (device dm-0): bdev /dev/mapper/loop1 errs: wr 1, rd 4, flush 0, corrupt 0, gen 0 BTRFS info (device dm-0): bdev /dev/mapper/loop1 errs: wr 1, rd 4, flush 0, corrupt 0, gen 0 Buffer I/O error on dev dm-1, logical block 15120, async page read --- test 4: corrupt data2; read without scrub *** Wrong data on disk:off /dev/mapper/loop2:81854464
Re: [PATCH 0/3] Btrfs: fix free space tree bitmaps+tests on big-endian systems
On Friday, July 15, 2016 12:15:15 PM Omar Sandoval wrote: > On Fri, Jul 15, 2016 at 12:34:10PM +0530, Chandan Rajendra wrote: > > On Thursday, July 14, 2016 07:47:04 PM Chris Mason wrote: > > > On 07/14/2016 07:31 PM, Omar Sandoval wrote: > > > > From: Omar Sandoval> > > > > > > > So it turns out that the free space tree bitmap handling has always been > > > > broken on big-endian systems. Totally my bad. > > > > > > > > Patch 1 fixes this. Technically, it's a disk format change for > > > > big-endian systems, but it never could have worked before, so I won't go > > > > through the trouble of any incompat bits. If you've somehow been using > > > > space_cache=v2 on a big-endian system (I doubt anyone is), you're going > > > > to want to mount with nospace_cache to clear it and wait for this to go > > > > in. > > > > > > > > Patch 2 fixes a similar error in the sanity tests (it's the same as the > > > > v2 I posted here [1]) and patch 3 expands the sanity tests to catch the > > > > oversight that patch 1 fixes. > > > > > > > > Applies to v4.7-rc7. No regressions in xfstests, and the sanity tests > > > > pass on x86_64 and MIPS. > > > > > > Thanks for fixing this up Omar. Any big endian friends want to try this > > > out in extended testing and make sure we've nailed it down? > > > > > > > Hi Omar & Chris, > > > > I will run fstests with this patchset applied on ppc64 BE and inform you > > about > > the results. > > > > Thanks, Chandan! I set up my xfstests for space_cache=v2 by doing: > > mkfs.btrfs "$TEST_DEV" > mount -o space_cache=v2 "$TEST_DEV" "$TEST_DIR" > umount "$TEST_DEV" > > and adding > > export MOUNT_OPTIONS="-o space_cache=v2" > > to local.config. btrfsck also needs the patch here [1]. > > Hi, I did execute the fstests tests suite on ppc64 BE as per above configuration and there were no new regressions. Also, I did execute fsx (via generic/127) thrice on the same filesystem instance, 1. With the unpatched kernel and later 2. With the patched kernel and again 3. With the unpatched kernel ... there were no new regressions when executing the above steps. Tested-by: Chandan Rajendra -- chandan -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Status of SMR with BTRFS
Hi Thomasz, @Dave I have added you to the conversation, as I refer to your notes (https://github.com/kdave/drafts/blob/master/btrfs/smr-mode.txt) thanks for your reply! It's a Seagate Expansion Desktop 5TB (USB3). It is probably a ST5000DM000. this is TGMR not SMR disk: http://www.seagate.com/www-content/product-content/desktop-hdd-fam/en-us/docs/100743772a.pdf So it still confirms to standard record strategy ... I am not convinced. I had not heared TGMR before. But I find TGMR as a technology for the head. https://pics.computerbase.de/4/0/3/4/4/29-1080.455720475.jpg In any case: the drive behaves like a SMR drive: I ran a benchmark on it with up to 200MB/s. When copying a file onto the drive in parallel the rate in the benchmark dropped to 7MB/s, while that particular file was copied at 40MB/s. There are two types: 1. SMR managed by device firmware. BTRFS sees that as a normal block device … problems you get are not related to BTRFS it self … That for sure. But the way BTRFS uses/writes data could cause problems in conjunction with these devices still, no? I'm sorry but I'm confused now, what "magical way of using/writing data" you actually mean ? AFAIK btrfs sees the disk as a block device Well, btrfs does write data very different to many other file systems. On every write the file is copied to another place, even if just one bit is changed. That's special and I am wondering whether that could cause problems. Now think slowly and thoroughly about it: who would write a code (and maintain it) for a file system that access device specific data for X amount of vendors with each having Y amount of model specific configurations/caveats/firmwares/protocols ... S.M.A.R.T. emerged to give a unifying interface to device statistics ... this is how bad it was ... Well, I'm no pro. But I found this: https://github.com/kdave/drafts/blob/master/btrfs/smr-mode.txt And this does sound like improvements to BTRFS can be done for SMR in a generic, not vendor/device specific manner. And I am wondering: a) whether it is advisable to use BTRFS on these drives before these improvements have been made already i) if not: Are there specific btrfs features that should be avoided, or btrfs in general? b) whether these improvements have been made already care about your data, do some research ... if not ... maybe raiserFS is for you :) You are right for sure. And that's what I do here. But I am far away from being able to judge myself, so I rely on support. Greetings, Hendrik --- Diese E-Mail wurde von Avast Antivirus-Software auf Viren geprüft. https://www.avast.com/antivirus -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Status of SMR with BTRFS
Hello Hendrik, from my experience btrfs does work as badly with SMR drives (I only had the opportunity to test on a 8TB Seagate device-managed drive) as ext4. The initial performance is fine (for a few gigabytes / minutes), but drops of a cliff as soon as the internal buffer-region for non-sequential writes fills up (even though I tested large file SMB transfers). The only file system that worked really well with the 8TB Seagate SMR drive was f2fs. I used 'mkfs.f2fs -o 0 -a 0 -s 9 /dev/sdx' to create one and mounted it with noatime. -o means no additional over provisioning (the 5% default is a lot of wasted space on a 8TB drive), -a 0 tells f2fs not to use separate areas on the disks at the same time (which does not perform well on hdds only on ssds) and finally -s 9 tells f2fs to layout the file system in 1GB chunks. I hammered this file system for some days (via SMB and via shred-script) and it worked really well (performance and stability wise). I am considering using SMR drives for the next upgrades in my storage server in the basement - the only things missing in f2fs are checksums and raid1 support. But in my current setup (md-raid1+ext4) I don't get checksums either so f2fs+smr is still on my road-map. Long term, I would really like to switch to btrfs with it's built-in check summing (which unfortunately does not work with NOCOW) and raid1. But some of the file systems are almost 100% filled and I'm not trusting btrfs's stability yet (and the manageability / handling of btrfs lacks behind compared to say zfs). I realize this mails sounds very negative for btrfs, I'm sorry that was not my intention. I'm actually a big fan of btrfs and already running it on my test-server, but I fear it still needs quite some time to mature. That's why I really appreciate all the hard work of the btrfs-devs! Kind regards Matthias -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html