Re: [dm-devel] [PATCH next] Btrfs: fix comparison in __btrfs_map_block()

2016-07-17 Thread Christoph Hellwig
On Sun, Jul 17, 2016 at 03:51:03PM -0500, Mike Christie wrote:
> > 
> > diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> > index a69203a..6ee1e36 100644
> > --- a/fs/btrfs/volumes.c
> > +++ b/fs/btrfs/volumes.c
> > @@ -5533,7 +5533,7 @@ static int __btrfs_map_block(struct btrfs_fs_info 
> > *fs_info, int op,
> > }
> >  
> > } else if (map->type & BTRFS_BLOCK_GROUP_DUP) {
> > -   if (op == REQ_OP_WRITE || REQ_OP_DISCARD ||
> > +   if (op == REQ_OP_WRITE || op == REQ_OP_DISCARD ||
> > op == REQ_GET_READ_MIRRORS) {
> > num_stripes = map->num_stripes;
> > } else if (mirror_num) {
> > 
> 
> 
> Shoot. Dumb mistake by me. It is of course correct.

Ad while we're at it we need to fix up that REQ_GET_READ_MIRRORS thing.
Overloading the op localally in a fs is going to create problems sooner
or later as no one touching the generic values and/or the code
mashalling it in different forms knows about it.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: mount btrfs takes 30 minutes, btrfs check runs out of memory

2016-07-17 Thread Qu Wenruo



At 07/16/2016 07:17 PM, John Ettedgui wrote:

On Thu, Jul 14, 2016 at 10:54 PM John Ettedgui > wrote:

On Thu, Jul 14, 2016 at 10:26 PM Qu Wenruo > wrote:


> Would increasing the leaf size help as well?

> nodatacow seems unsafe


Nodatacow is not that unsafe, as btrfs will still do data cow if
it's
needed, like rewriting data of another subvolume/snapshot.

Alright.

That would be one of the most obvious method if you do a lot of
rewrite.

> as for defrag, all my partitions are already on
> autodefrag, so I assume that should be good. Or is manual once
in a
> while a good idea as well?
AFAIK autodefrag will only help if you're doing appending write.

Manual one will help more, but since btrfs has problem defraging
extents
shared by different subvolumes, I doubt the effect if you have a
lot of
subvolumes/snapshots.

I don't have any subvolume/snapshot for the big partitions, my usage
there is fairly simple. I'll have to add a regular defrag job then.


Another method is to disable compression.
For compression, file extent size up limit is 128K, while for
non-compress case, it's 128M.

So for the same 1G sized file, it would cause 8K extents using
compression, while only 8 extents without compression.

Now that might be something important, I do use LZO compression on
all of them.
Does this limit apply to only compressed files, or any file if the
fs is mounted using the compression option?
Would mounting these partitions without compression option and then
defragmenting them reverse the compression?

I've tried this for the slowest to mount partition.
I changed its mount option to compression=no, run defrag and balance.
Not sure if the latter was needed but I thought to try... like in the
past it worked fine up to dusage=99 but with 100% I get a crash, oh well.
The result of defrag + nocompress (I don't know how much it actually
decompressed, and if it changed the limit Qu mentioned before) is about
26% less time spent to mount the partition, and it's no more my slowerst
partition to mount.!


Well, compression=no only affects any write after the mount option.
And balance won't help to convert compressed extent to non-compressed one.

But maybe the defrag will convert them to normal extents.

The best method to de-compress them is, to read them out and rewrite 
them with compression=no mount option.




I'll try just defragmenting another partition but keeping the
compression on and see what difference I get there the same changes.

I've tried the patch, which applied fine to my kernel (4.6.4) but I
don't see any difference in mounting time, maybe I made a mistake or my
issue is not really the same?


Pretty possible that there is another problem causing the slow mount.

The best method to verify is to do a ftrace on the btrfs mount.
Here is the script I tested my patch:

--
#!/bin/bash

trace_dir=/sys/kernel/debug/tracing

init_trace () {
echo 0 > $trace_dir/tracing_on
echo > $trace_dir/trace
echo function_graph > $trace_dir/current_tracer
echo > $trace_dir/set_ftrace_filter

echo open_ctree >> $trace_dir/set_ftrace_filter
echo btrfs_read_chunk_tree  >> $trace_dir/set_ftrace_filter
echo btrfs_read_block_groups>> $trace_dir/set_ftrace_filter

# This will generate tons of trace, better to comment it out
echo find_block_group   >> $trace_dir/set_ftrace_filter

echo 1 > $trace_dir/tracing_on
}

end_trace () {
cp $trace_dir/trace $(dirname $0)
echo 0 > $trace_dir/tracing_on
echo > $trace_dir/set_ftrace_filter
echo > $trace_dir/trace
}

init_trace
echo start mounting
time mount /dev/sdb /mnt/test
echo mount done
end_trace
--

After executing the script, you got a file named "trace" at the same 
directory of the script.


The content will be like:
--
# tracer: function_graph
#
# CPU  DURATION  FUNCTION CALLS
# | |   | |   |   |   |
 1) $ 7670856 us  |  open_ctree [btrfs]();
 2) * 13533.45 us |btrfs_read_chunk_tree [btrfs]();
 2) # 1320.981 us |btrfs_init_space_info [btrfs]();
 2)   |btrfs_read_block_groups [btrfs]() {
 2) * 10127.35 us |  find_block_group [btrfs]();
 2)   4.951 us|  find_block_group [btrfs]();
 2) * 26225.17 us |  find_block_group [btrfs]();
..
 3) * 26450.28 us |  find_block_group [btrfs]();
 3) * 11590.29 us |  find_block_group [btrfs]();
 3) $ 7557210 us  |} /* btrfs_read_block_groups [btrfs] */ <<<
--

And you can see open_ctree() function, the main part of btrfs mount, 
takes about 7.67 seconds to execute, 

Re: [PATCH] vfs: allow FILE_EXTENT_SAME (dedupe_file_range) on a file opened ro

2016-07-17 Thread Adam Borowski
On Mon, Jul 18, 2016 at 12:13:38AM +0200, Adam Borowski wrote:
> Instead of checking the mode of the file descriptor, let's check whether it
> could have been opened rw.  This allows fixing intermittent exec failures
> when deduping a live system: anyone trying to exec a file currently being
> deduped gets ETXTBSY.
> 
> Issuing this ioctl on a ro file was already allowed for root/cap.
> 
> Tested on btrfs and not-yet-merged xfs, as only them implement this ioctl.

This is a resend of a patch I've targetted at the wrong maintainer (btrfs
guys rather than Al Viro/vfs).  Since then, I've tested it on xfs-devel
(f0b34b677df10d9e3deffcd0b1c1f0234b80 atop of 4.7-rc5 and -rc7).

Review so far:
http://thread.gmane.org/gmane.comp.file-systems.btrfs/56563

An idea to relax the check and allow dedupe to everyone who can read the
file was shot down because of concerns that in some edge cases it might be
possible to clobber a targetted file.  Thus, we're back to the original
patch, requiring ro descriptor but rw permission.


Meow!
-- 
An imaginary friend squared is a real enemy.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] vfs: allow FILE_EXTENT_SAME (dedupe_file_range) on a file opened ro

2016-07-17 Thread Adam Borowski
Instead of checking the mode of the file descriptor, let's check whether it
could have been opened rw.  This allows fixing intermittent exec failures
when deduping a live system: anyone trying to exec a file currently being
deduped gets ETXTBSY.

Issuing this ioctl on a ro file was already allowed for root/cap.

Tested on btrfs and not-yet-merged xfs, as only them implement this ioctl.

Signed-off-by: Adam Borowski 
---
 fs/read_write.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/read_write.c b/fs/read_write.c
index 933b53a..df59dc6 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -1723,7 +1723,7 @@ int vfs_dedupe_file_range(struct file *file, struct 
file_dedupe_range *same)
 
if (info->reserved) {
info->status = -EINVAL;
-   } else if (!(is_admin || (dst_file->f_mode & FMODE_WRITE))) {
+   } else if (!(is_admin || !inode_permission(dst, MAY_WRITE))) {
info->status = -EINVAL;
} else if (file->f_path.mnt != dst_file->f_path.mnt) {
info->status = -EXDEV;
-- 
2.8.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Status of SMR with BTRFS

2016-07-17 Thread Matthias Prager
Am 17.07.2016 um 22:10 schrieb Henk Slager:
> What kernel (version) did you use ?
> I hope it included:
> http://git.kernel.org/cgit/linux/kernel/git/mkp/linux.git/commit/?h=bugzilla-93581=7c4fbd50bfece00abf529bc96ac989dd2bb83ca4
> 
> so >= 4.4, as without this patch, it is quite problematic, if not
> impossible, to use this 8TB Seagate SMR drive with linux without doing
> other patches or setting/module changes.
Thanks for that pointer, I tested kernels 3.18.28, 4.1.[17+19] and 4.5.0
. I had seen task aborts on the drive when io-stressing it with kernels
3.18 and 4.1 (and ext4), but I never figured out the exact reason. Since
I'm currently stuck at kernel 4.1.x, I did not research this any further
(kernels >=4.2 aren't usable in esxi-guests when using pass-through
devices due to irq handling issues which lead to driver inits failing -
I'm told vmware is still sitting on a fix).


> Since this patch, I have been using the drive for cold storage
> archiving, connected to a Baytrail SoC SATA port. I use bcache
> (writethrough or writearound) on an 8TB GPT partition that has a LUKS
> container that is Btrfs m-dup, d-single formatted and mounted
> compress=lzo,noatine,nossd. It is only powered on once a month for a
> day or so and then it receives incremental snapshots mostly or some
> SSD or flash images of 10-50G.
> I have more or less kept all the snapshots sofar, so chunks keep being
> added to previously unwritten space, so as sequential as possible.
Mhh, see that would be one too many layers of complexity for my taste in
such a setup - the Seagate SMR drives are fast enough to handle Gbit-LAN
speeds if they are served mostly large sequential chunks by the file
system, which f2fs actually manages to do (cold storage in my scenario
too). Btrfs does too many scattered writes for this to work without
bandages (i.e. caching or snapshotting) (although I do see the advantage
in having checksums for data which you write once and then read like
once every year).


> If free space would be heavily fragmented and also files would be
> heavily fragmented and the disk would be very full, adding new files
> or modifying would be very slow. You see than many seconds that the
> drive is active but no traffic on the SATA link. Also then there is
> the risk that the default '/sys/block/$(kerneldevname)/device/timeout'
> of 30 secs is too low, and that the kernel might reset the SATA link.
> A SATA link still happened 2x the last 1/2 year, I haven't really
> looked at the details sofar, just rebooted at some point in time
> later, but I will set the timeout at least higher, e.g. 180, and then
> see if ata errors/resets still occur. It might be FW crashes as well.
As far as I've tested f2fs never backed the SMR drive into a corner,
which is probably due to it's sequential write pattern as a
log-structured file system and it's background garbage collection (i.e.
defragmentation) - even in a full state. I imagine this will probably
not work out for hot data though.


> 
> At least this SMR drive is not advised to use in raid setups. As
> not-so-active array it might work if you use the right timeouts and
> scterc etc, but if have seen how long the wait on the SATA link can be
> and that makes me realize that the stamp 'Archive Drive' done by
> Seagate has a clear reason.
Agreed these drives do need special handling. For archival workloads
with cold data they can be used if the file system is kind enough. I
wouldn't be comfortable using these drives in any scenario where they
might be backed into a corner in which case the wait times are far to
uncalculable for my taste.


---
Matthias
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH next] Btrfs: fix comparison in __btrfs_map_block()

2016-07-17 Thread Mike Christie
On 07/15/2016 10:03 AM, Vincent Stehlé wrote:
> Add missing comparison to op in expression, which was forgotten when doing
> the REQ_OP transition.
> 
> Fixes: b3d3fa519905 ("btrfs: update __btrfs_map_block for REQ_OP transition")
> Signed-off-by: Vincent Stehlé 
> Cc: Mike Christie 
> Cc: Jens Axboe 
> ---
> 
> 
> Hi,
> 
> I saw that issue in linux next.
> 
> Not sure if it is too late to squash the fix with commit b3d3fa519905 or
> not...
> 
> Best regards,
> 
> Vincent.
> 
> 
>  fs/btrfs/volumes.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index a69203a..6ee1e36 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -5533,7 +5533,7 @@ static int __btrfs_map_block(struct btrfs_fs_info 
> *fs_info, int op,
>   }
>  
>   } else if (map->type & BTRFS_BLOCK_GROUP_DUP) {
> - if (op == REQ_OP_WRITE || REQ_OP_DISCARD ||
> + if (op == REQ_OP_WRITE || op == REQ_OP_DISCARD ||
>   op == REQ_GET_READ_MIRRORS) {
>   num_stripes = map->num_stripes;
>   } else if (mirror_num) {
> 


Shoot. Dumb mistake by me. It is of course correct.

Reviewed-by: Mike Christie 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Status of SMR with BTRFS

2016-07-17 Thread Henk Slager
>>> It's a Seagate Expansion Desktop 5TB (USB3). It is probably a
>>> ST5000DM000.
>>
>>
>> this is TGMR not SMR disk:
>>
>> http://www.seagate.com/www-content/product-content/desktop-hdd-fam/en-us/docs/100743772a.pdf
>> So it still confirms to standard record strategy ...
>
>
> I am not convinced. I had not heared TGMR before. But I find TGMR as a
> technology for the head.
> https://pics.computerbase.de/4/0/3/4/4/29-1080.455720475.jpg
>
> In any case: the drive behaves like a SMR drive: I ran a benchmark on it
> with up to 200MB/s.
> When copying a file onto the drive in parallel the rate in the benchmark
> dropped to 7MB/s, while that particular file was copied at 40MB/s.

It is very well possible that for a normal drive of 4TB or so you get
this kind of behaviour. Suppose you have 2 tasks, 1 writing in with 4k
blocksize to a 1G file at the beginning of the disk and the 2nd with
4k blocksize to a 1G file at the end of the disk. At the beginning you
get sustained ~150MB/s, at the end ~75MB/s. Between every 4k write (or
read) you move the head(s), so ~4ms lost.

I was wondering how big the zones etc are and hopefully this is still true:
http://blog.schmorp.de/data/smr/fast15-paper-aghayev.pdf


> https://github.com/kdave/drafts/blob/master/btrfs/smr-mode.txt
> And this does sound like improvements to BTRFS can be done for SMR in a
> generic, not vendor/device specific manner.

Maybe have a look at recent patches from Hannes R from SUSE (to 4.7
kernel AFAIK) and see what will be possible with Btrfs once this
'zone-handling' is all working on the lower layers. Currently, there
is nothing special in Btrfs for SMR drives in recent kernels, but in
my experience it works, if you keep device-managed SMR
characteristics/limitations in mind. Maybe like a tape-archive or
dvd-burner.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Status of SMR with BTRFS

2016-07-17 Thread Henk Slager
On Sun, Jul 17, 2016 at 10:26 AM, Matthias Prager
 wrote:

> from my experience btrfs does work as badly with SMR drives (I only had
> the opportunity to test on a 8TB Seagate device-managed drive) as ext4.
> The initial performance is fine (for a few gigabytes / minutes), but
> drops of a cliff as soon as the internal buffer-region for
> non-sequential writes fills up (even though I tested large file SMB
> transfers).

What kernel (version) did you use ?
I hope it included:
http://git.kernel.org/cgit/linux/kernel/git/mkp/linux.git/commit/?h=bugzilla-93581=7c4fbd50bfece00abf529bc96ac989dd2bb83ca4

so >= 4.4, as without this patch, it is quite problematic, if not
impossible, to use this 8TB Seagate SMR drive with linux without doing
other patches or setting/module changes.

Since this patch, I have been using the drive for cold storage
archiving, connected to a Baytrail SoC SATA port. I use bcache
(writethrough or writearound) on an 8TB GPT partition that has a LUKS
container that is Btrfs m-dup, d-single formatted and mounted
compress=lzo,noatine,nossd. It is only powered on once a month for a
day or so and then it receives incremental snapshots mostly or some
SSD or flash images of 10-50G.
I have more or less kept all the snapshots sofar, so chunks keep being
added to previously unwritten space, so as sequential as possible.

If free space would be heavily fragmented and also files would be
heavily fragmented and the disk would be very full, adding new files
or modifying would be very slow. You see than many seconds that the
drive is active but no traffic on the SATA link. Also then there is
the risk that the default '/sys/block/$(kerneldevname)/device/timeout'
of 30 secs is too low, and that the kernel might reset the SATA link.
A SATA link still happened 2x the last 1/2 year, I haven't really
looked at the details sofar, just rebooted at some point in time
later, but I will set the timeout at least higher, e.g. 180, and then
see if ata errors/resets still occur. It might be FW crashes as well.

> The only file system that worked really well with the 8TB Seagate SMR
> drive was f2fs. I used 'mkfs.f2fs -o 0 -a 0 -s 9 /dev/sdx' to create one
> and mounted it with noatime. -o means no additional over provisioning
> (the 5% default is a lot of wasted space on a 8TB drive), -a 0 tells
> f2fs not to use separate areas on the disks at the same time (which does
> not perform well on hdds only on ssds) and finally -s 9 tells f2fs to
> layout the file system in 1GB chunks.
> I hammered this file system for some days (via SMB and via shred-script)
> and it worked really well (performance and stability wise).

Interesting that f2fs works well, although now thinking a bit, I am
not so surprised that it works better than ext4

> I am considering using SMR drives for the next upgrades in my storage
> server in the basement - the only things missing in f2fs are checksums
> and raid1 support. But in my current setup (md-raid1+ext4) I don't get
> checksums either so f2fs+smr is still on my road-map. Long term, I would
> really like to switch to btrfs with it's built-in check summing (which
> unfortunately does not work with NOCOW) and raid1. But some of the file
> systems are almost 100% filled and I'm not trusting btrfs's stability
> yet (and the manageability / handling of btrfs lacks behind compared to
> say zfs).

At least this SMR drive is not advised to use in raid setups. As
not-so-active array it might work if you use the right timeouts and
scterc etc, but if have seen how long the wait on the SATA link can be
and that makes me realize that the stamp 'Archive Drive' done by
Seagate has a clear reason.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] Btrfs scrub sometime recalculate wrong parity in raid5

2016-07-17 Thread Jarkko Lavinen
On Sat, Jul 16, 2016 at 06:51:11PM +0300, Jarkko Lavinen wrote:
>  The modified script behaves very much like the original dd version.

Not quite. The bad sector simulation works like old hard drives without error 
correction and bad block remapping. This changes the error behaviour.

My script prints now kernel messages once the check_fs fails. The time range of 
messages messages is from the adding of the bad sector device to the point when 
check_fs fails.

The parity test which often passes with the Goffredo's script, always fails 
with my bad sector version and scrub says the error is uncorrectable. In the 
kernel messages there are two buffer IO read errors but no write error as if 
scrub quits before writing?

In the data2 test scrub again says the error is uncorrectable but according to 
the kernel messages the bad sector is read 4 times and written twice during the 
scrub. In my bad sector script the data2 is still corrupted and parity ok since 
the bad sector cannot be written and scrub likely quits earlier than in 
Goffredo's script. In his script the data2 gets fixed but the parity gets 
corrupted.

Jarkko Lavinen

$ bash h2.sh
--- test 1: corrupt parity
scrub started on mnt/., fsid 2625e2d0-420c-40b6-befa-97fc18eaed48 (pid=32490)
ERROR: there are uncorrectable errors
*** Wrong data on disk:off /dev/mapper/loop0:61931520 (parity)
Data read ||, expected |0300 0303|

Kernel messages in the test
First Check_fs started
Buffer I/O error on dev dm-0, logical block 15120, async page read
Scrub started
Second Check_fs started
Buffer I/O error on dev dm-0, logical block 15120, async page read

--- test 2: corrupt data2
scrub started on mnt/., fsid 8e506268-16c7-48fa-b176-0a8877f2a7aa (pid=434)
ERROR: there are uncorrectable errors
*** Wrong data on disk:off /dev/mapper/loop2:81854464 (data2)
Data read ||, expected |bdbbb|

Kernel messages in the test
First Check_fs started
Buffer I/O error on dev dm-2, logical block 19984, async page read
Scrub started
BTRFS warning (device dm-0): i/o error at logical 142802944 on dev 
/dev/mapper/loop2, sector 159872, root 5, inode 257, offset 65536, length 4096, 
links 1 (path: out.txt)
BTRFS error (device dm-0): bdev /dev/mapper/loop2 errs: wr 0, rd 1, flush 0, 
corrupt 0, gen 0
BTRFS error (device dm-0): bdev /dev/mapper/loop2 errs: wr 1, rd 1, flush 0, 
corrupt 0, gen 0
BTRFS warning (device dm-0): i/o error at logical 142802944 on dev 
/dev/mapper/loop2, sector 159872, root 5, inode 257, offset 65536, length 4096, 
links 1 (path: out.txt)
BTRFS error (device dm-0): bdev /dev/mapper/loop2 errs: wr 1, rd 2, flush 0, 
corrupt 0, gen 0
BTRFS error (device dm-0): unable to fixup (regular) error at logical 142802944 
on dev /dev/mapper/loop2
BTRFS error (device dm-0): bdev /dev/mapper/loop2 errs: wr 2, rd 2, flush 0, 
corrupt 0, gen 0
BTRFS error (device dm-0): unable to fixup (regular) error at logical 142802944 
on dev /dev/mapper/loop2
BTRFS error (device dm-0): bdev /dev/mapper/loop2 errs: wr 2, rd 3, flush 0, 
corrupt 0, gen 0
BTRFS error (device dm-0): bdev /dev/mapper/loop2 errs: wr 2, rd 4, flush 0, 
corrupt 0, gen 0
Second Check_fs started
BTRFS info (device dm-0): bdev /dev/mapper/loop2 errs: wr 2, rd 4, flush 0, 
corrupt 0, gen 0
Buffer I/O error on dev dm-2, logical block 19984, async page read

--- test 3: corrupt data1
scrub started on mnt/., fsid f8a4ecca-2475-4e5e-9651-65d9478b56fe (pid=856)
ERROR: there are uncorrectable errors
*** Wrong data on disk:off /dev/mapper/loop1:61931520 (data1)
Data read ||, expected |adaaa|

Kernel messages in the test
First Check_fs started
Buffer I/O error on dev dm-1, logical block 15120, async page read
Scrub started
BTRFS warning (device dm-0): i/o error at logical 142737408 on dev 
/dev/mapper/loop1, sector 120960, root 5, inode 257, offset 0, length 4096, 
links 1 (path: out.txt)
BTRFS error (device dm-0): bdev /dev/mapper/loop1 errs: wr 0, rd 1, flush 0, 
corrupt 0, gen 0
BTRFS warning (device dm-0): i/o error at logical 142737408 on dev 
/dev/mapper/loop1, sector 120960, root 5, inode 257, offset 0, length 4096, 
links 1 (path: out.txt)
BTRFS error (device dm-0): bdev /dev/mapper/loop1 errs: wr 0, rd 2, flush 0, 
corrupt 0, gen 0
BTRFS error (device dm-0): bdev /dev/mapper/loop1 errs: wr 1, rd 2, flush 0, 
corrupt 0, gen 0
BTRFS error (device dm-0): unable to fixup (regular) error at logical 142737408 
on dev /dev/mapper/loop1
BTRFS error (device dm-0): unable to fixup (regular) error at logical 142737408 
on dev /dev/mapper/loop1
BTRFS error (device dm-0): bdev /dev/mapper/loop1 errs: wr 1, rd 3, flush 0, 
corrupt 0, gen 0
Second Check_fs started
BTRFS error (device dm-0): bdev /dev/mapper/loop1 errs: wr 1, rd 4, flush 0, 
corrupt 0, gen 0
BTRFS info (device dm-0): bdev /dev/mapper/loop1 errs: wr 1, rd 4, flush 0, 
corrupt 0, gen 0
Buffer I/O error on dev dm-1, logical block 15120, async page read

--- test 4: corrupt data2; read without scrub
*** Wrong data on disk:off /dev/mapper/loop2:81854464 

Re: [PATCH 0/3] Btrfs: fix free space tree bitmaps+tests on big-endian systems

2016-07-17 Thread Chandan Rajendra
On Friday, July 15, 2016 12:15:15 PM Omar Sandoval wrote:
> On Fri, Jul 15, 2016 at 12:34:10PM +0530, Chandan Rajendra wrote:
> > On Thursday, July 14, 2016 07:47:04 PM Chris Mason wrote:
> > > On 07/14/2016 07:31 PM, Omar Sandoval wrote:
> > > > From: Omar Sandoval 
> > > >
> > > > So it turns out that the free space tree bitmap handling has always been
> > > > broken on big-endian systems. Totally my bad.
> > > >
> > > > Patch 1 fixes this. Technically, it's a disk format change for
> > > > big-endian systems, but it never could have worked before, so I won't go
> > > > through the trouble of any incompat bits. If you've somehow been using
> > > > space_cache=v2 on a big-endian system (I doubt anyone is), you're going
> > > > to want to mount with nospace_cache to clear it and wait for this to go
> > > > in.
> > > >
> > > > Patch 2 fixes a similar error in the sanity tests (it's the same as the
> > > > v2 I posted here [1]) and patch 3 expands the sanity tests to catch the
> > > > oversight that patch 1 fixes.
> > > >
> > > > Applies to v4.7-rc7. No regressions in xfstests, and the sanity tests
> > > > pass on x86_64 and MIPS.
> > > 
> > > Thanks for fixing this up Omar.  Any big endian friends want to try this 
> > > out in extended testing and make sure we've nailed it down?
> > >
> > 
> > Hi Omar & Chris,
> > 
> > I will run fstests with this patchset applied on ppc64 BE and inform you 
> > about
> > the results.
> > 
> 
> Thanks, Chandan! I set up my xfstests for space_cache=v2 by doing:
> 
> mkfs.btrfs "$TEST_DEV"
> mount -o space_cache=v2 "$TEST_DEV" "$TEST_DIR"
> umount "$TEST_DEV"
> 
> and adding
> 
> export MOUNT_OPTIONS="-o space_cache=v2"
> 
> to local.config. btrfsck also needs the patch here [1].
> 
> 

Hi,

I did execute the fstests tests suite on ppc64 BE as per above configuration
and there were no new regressions. Also, I did execute fsx (via generic/127)
thrice on the same filesystem instance,
1. With the unpatched kernel and later
2. With the patched kernel and again
3. With the unpatched kernel
... there were no new regressions when executing the above steps.

Tested-by: Chandan Rajendra 

-- 
chandan

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Status of SMR with BTRFS

2016-07-17 Thread Hendrik Friedel

Hi Thomasz,

@Dave I have added you to the conversation, as I refer to your notes 
(https://github.com/kdave/drafts/blob/master/btrfs/smr-mode.txt)


thanks for your reply!


It's a Seagate Expansion Desktop 5TB (USB3). It is probably a ST5000DM000.


this is TGMR not SMR disk:
http://www.seagate.com/www-content/product-content/desktop-hdd-fam/en-us/docs/100743772a.pdf
So it still confirms to standard record strategy ...


I am not convinced. I had not heared TGMR before. But I find TGMR as a 
technology for the head.

https://pics.computerbase.de/4/0/3/4/4/29-1080.455720475.jpg

In any case: the drive behaves like a SMR drive: I ran a benchmark on it 
with up to 200MB/s.
When copying a file onto the drive in parallel the rate in the benchmark 
dropped to 7MB/s, while that particular file was copied at 40MB/s.





There are two types:
1. SMR managed by device firmware. BTRFS sees that as a normal block
device … problems you get are not related to BTRFS it self …


That for sure. But the way BTRFS uses/writes data could cause problems in
conjunction with these devices still, no?

I'm sorry but I'm confused now, what "magical way of using/writing
data" you actually mean ? AFAIK btrfs sees the disk as a block device


Well, btrfs does write data very different to many other file systems. 
On every write the file is copied to another place, even if just one bit 
is changed. That's special and I am wondering whether that could cause 
problems.



Now think slowly and thoroughly about it: who would write a code (and
maintain it) for a file system that access device specific data for X
amount of vendors with each having Y amount of model specific
configurations/caveats/firmwares/protocols ... S.M.A.R.T. emerged to
give a unifying interface to device statistics ... this is how bad it
was ...


Well, I'm no pro. But I found this:
https://github.com/kdave/drafts/blob/master/btrfs/smr-mode.txt
And this does sound like improvements to BTRFS can be done for SMR in a 
generic, not vendor/device specific manner.


And I am wondering:
a) whether it is advisable to use BTRFS on these drives before these 
improvements have been made already
  i) if not: Are there specific btrfs features that should be avoided, 
or btrfs in general?

b) whether these improvements have been made already


care about your data, do some research ... if not ... maybe raiserFS
is for you :)


You are right for sure. And that's what I do here. But I am far away 
from being able to judge myself, so I rely on support.


Greetings,
Hendrik


---
Diese E-Mail wurde von Avast Antivirus-Software auf Viren geprüft.
https://www.avast.com/antivirus

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Status of SMR with BTRFS

2016-07-17 Thread Matthias Prager
Hello Hendrik,

from my experience btrfs does work as badly with SMR drives (I only had
the opportunity to test on a 8TB Seagate device-managed drive) as ext4.
The initial performance is fine (for a few gigabytes / minutes), but
drops of a cliff as soon as the internal buffer-region for
non-sequential writes fills up (even though I tested large file SMB
transfers).

The only file system that worked really well with the 8TB Seagate SMR
drive was f2fs. I used 'mkfs.f2fs -o 0 -a 0 -s 9 /dev/sdx' to create one
and mounted it with noatime. -o means no additional over provisioning
(the 5% default is a lot of wasted space on a 8TB drive), -a 0 tells
f2fs not to use separate areas on the disks at the same time (which does
not perform well on hdds only on ssds) and finally -s 9 tells f2fs to
layout the file system in 1GB chunks.
I hammered this file system for some days (via SMB and via shred-script)
and it worked really well (performance and stability wise).

I am considering using SMR drives for the next upgrades in my storage
server in the basement - the only things missing in f2fs are checksums
and raid1 support. But in my current setup (md-raid1+ext4) I don't get
checksums either so f2fs+smr is still on my road-map. Long term, I would
really like to switch to btrfs with it's built-in check summing (which
unfortunately does not work with NOCOW) and raid1. But some of the file
systems are almost 100% filled and I'm not trusting btrfs's stability
yet (and the manageability / handling of btrfs lacks behind compared to
say zfs).


I realize this mails sounds very negative for btrfs, I'm sorry that was
not my intention. I'm actually a big fan of btrfs and already running it
on my test-server, but I fear it still needs quite some time to mature.
That's why I really appreciate all the hard work of the btrfs-devs!


Kind regards
Matthias
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html