Re: Regression in btrfs: properly set the termination value of ctx->pos in readdir
Hi, the patch btrfs: properly set the termination value of ctx->pos in readdir introduces a regression to me. A lot of stuff runs in "endless" or long running loops. An example strace looks like this: msgsnd(0, {1, "\3\0\0\0\247\r\0\0g8\0\0\0\0\0\0\0\0\0\0\345<\1\0\0\0\0\0\35\0\0\0"...}, 56, 0) = 0 msgrcv(32769, {1, "\3\0\0\0\247\r\0\0g8\0\0\0\0\0\0\0\0\0\0\345<\1\0\0\0\0\0\35\0\0\0"...}, 56, 0, 0) = 56 semop(98307, {{0, 1, SEM_UNDO}}, 1) = 0 newfstatat(AT_FDCWD, "changelog", {st_mode=S_IFREG|0644, st_size=148, ...}, AT_SYMLINK_NOFOLLOW) = 0 semop(98307, {{0, -1, SEM_UNDO}}, 1)= 0 msgsnd(0, {1, "\3\0\0\0\247\r\0\0h8\0\0\0\0\0\0\0\0\0\0\345<\1\0\0\0\0\0\35\0\0\0"...}, 56, 0) = 0 msgrcv(32769, {1, "\3\0\0\0\247\r\0\0h8\0\0\0\0\0\0\0\0\0\0\345<\1\0\0\0\0\0\35\0\0\0"...}, 56, 0, 0) = 56 semop(98307, {{0, 1, SEM_UNDO}}, 1) = 0 newfstatat(AT_FDCWD, "changelog", {st_mode=S_IFREG|0644, st_size=148, ...}, AT_SYMLINK_NOFOLLOW) = 0 semop(98307, {{0, -1, SEM_UNDO}}, 1)= 0 msgsnd(0, {1, "\3\0\0\0\247\r\0\0i8\0\0\0\0\0\0\0\0\0\0\345<\1\0\0\0\0\0\35\0\0\0"...}, 56, 0) = 0 msgrcv(32769, {1, "\3\0\0\0\247\r\0\0i8\0\0\0\0\0\0\0\0\0\0\345<\1\0\0\0\0\0\35\0\0\0"...}, 56, 0, 0) = 56 semop(98307, {{0, 1, SEM_UNDO}}, 1) = 0 newfstatat(AT_FDCWD, "changelog", {st_mode=S_IFREG|0644, st_size=148, ...}, AT_SYMLINK_NOFOLLOW) = 0 semop(98307, {{0, -1, SEM_UNDO}}, 1)= 0 msgsnd(0, {1, "\3\0\0\0\247\r\0\0j8\0\0\0\0\0\0\0\0\0\0\345<\1\0\0\0\0\0\35\0\0\0"...}, 56, 0) = 0 msgrcv(32769, {1, "\3\0\0\0\247\r\0\0j8\0\0\0\0\0\0\0\0\0\0\345<\1\0\0\0\0\0\35\0\0\0"...}, 56, 0, 0) = 56 semop(98307, {{0, 1, SEM_UNDO}}, 1) = 0 newfstatat(AT_FDCWD, "changelog", {st_mode=S_IFREG|0644, st_size=148, ...}, AT_SYMLINK_NOFOLLOW) = 0 semop(98307, {{0, -1, SEM_UNDO}}, 1)= 0 msgsnd(0, {1, "\3\0\0\0\247\r\0\0k8\0\0\0\0\0\0\0\0\0\0\345<\1\0\0\0\0\0\35\0\0\0"...}, 56, 0) = 0 msgrcv(32769, {1, "\3\0\0\0\247\r\0\0k8\0\0\0\0\0\0\0\0\0\0\345<\1\0\0\0\0\0\35\0\0\0"...}, 56, 0, 0) = 56 semop(98307, {{0, 1, SEM_UNDO}}, 1) = 0 newfstatat(AT_FDCWD, "changelog", {st_mode=S_IFREG|0644, st_size=148, ...}, AT_SYMLINK_NOFOLLOW) = 0 semop(98307, {{0, -1, SEM_UNDO}}, 1)= 0 msgsnd(0, {1, "\3\0\0\0\247\r\0\0l8\0\0\0\0\0\0\0\0\0\0\345<\1\0\0\0\0\0\35\0\0\0"...}, 56, 0) = 0 msgrcv(32769, {1, "\3\0\0\0\247\r\0\0l8\0\0\0\0\0\0\0\0\0\0\345<\1\0\0\0\0\0\35\0\0\0"...}, 56, 0, 0) = 56 semop(98307, {{0, 1, SEM_UNDO}}, 1) = 0 newfstatat(AT_FDCWD, "changelog", {st_mode=S_IFREG|0644, st_size=148, ...}, AT_SYMLINK_NOFOLLOW) = 0 semop(98307, {{0, -1, SEM_UNDO}}, 1)= 0 msgsnd(0, {1, "\3\0\0\0\247\r\0\0m8\0\0\0\0\0\0\0\0\0\0\345<\1\0\0\0\0\0\35\0\0\0"...}, 56, 0) = 0 msgrcv(32769, {1, "\3\0\0\0\247\r\0\0m8\0\0\0\0\0\0\0\0\0\0\345<\1\0\0\0\0\0\35\0\0\0"...}, 56, 0, 0) = 56 semop(98307, {{0, 1, SEM_UNDO}}, 1) = 0 newfstatat(AT_FDCWD, "changelog", {st_mode=S_IFREG|0644, st_size=148, ...}, AT_SYMLINK_NOFOLLOW) = 0 semop(98307, {{0, -1, SEM_UNDO}}, 1)= 0 msgsnd(0, {1, "\3\0\0\0\247\r\0\0n8\0\0\0\0\0\0\0\0\0\0\345<\1\0\0\0\0\0\35\0\0\0"...}, 56, 0) = 0 msgrcv(32769, {1, "\3\0\0\0\247\r\0\0n8\0\0\0\0\0\0\0\0\0\0\345<\1\0\0\0\0\0\35\0\0\0"...}, 56, 0, 0) = 56 semop(98307, {{0, 1, SEM_UNDO}}, 1) = 0 newfstatat(AT_FDCWD, "changelog", {st_mode=S_IFREG|0644, st_size=148, ...}, AT_SYMLINK_NOFOLLOW) = 0 semop(98307, {{0, -1, SEM_UNDO}}, 1)= 0 msgsnd(0, {1, "\3\0\0\0\247\r\0\0o8\0\0\0\0\0\0\0\0\0\0\345<\1\0\0\0\0\0\35\0\0\0"...}, 56, 0) = 0 msgrcv(32769, {1, "\3\0\0\0\247\r\0\0o8\0\0\0\0\0\0\0\0\0\0\345<\1\0\0\0\0\0\35\0\0\0"...}, 56, 0, 0) = 56 semop(98307, {{0, 1, SEM_UNDO}}, 1) = 0 newfstatat(AT_FDCWD, "changelog", {st_mode=S_IFREG|0644, st_size=148, ...}, AT_SYMLINK_NOFOLLOW) = 0 semop(98307, {{0, -1, SEM_UNDO}}, 1)= 0 msgsnd(0, {1, "\3\0\0\0\247\r\0\0p8\0\0\0\0\0\0\0\0\0\0\345<\1\0\0\0\0\0\35\0\0\0"...}, 56, 0) = 0 msgrcv(32769, {1, "\3\0\0\0\247\r\0\0p8\0\0\0\0\0\0\0\0\0\0\345<\1\0\0\0\0\0\35\0\0\0"...}, 56, 0, 0) = 56 semop(98307, {{0, 1, SEM_UNDO}}, 1) = 0 newfstatat(AT_FDCWD, "changelog", {st_mode=S_IFREG|0644, st_size=148, ...}, AT_SYMLINK_NOFOLLOW) = 0 semop(98307, {{0, -1, SEM_UNDO}}, 1)= 0 msgsnd(0, {1, "\3\0\0\0\247\r\0\0q8\0\0\0\0\0\0\0\0\0\0\345<\1\0\0\0\0\0\35\0\0\0"...}, 56, 0) = 0 msgrcv(32769, {1, "\3\0\0\0\247\r\0\0q8\0\0\0\0\0\0\0\0\0\0\345<\1\0\0\0\0\0\35\0\0\0"...}, 56, 0, 0) = 56 semop(98307, {{0, 1, SEM_UNDO}}, 1) = 0 newfstatat(AT_FDCWD, "changelog", {st_mode=S_IFREG|0644, st_size=148, ...}, AT_SYMLINK_NOFOLLOW) = 0 semop(98307, {{0, -1, SEM_UNDO}}, 1)= 0 msgsnd(0, {1, "\3\0\0\0\247\r\0\0r8\0\0\0\0\0\0\0\0\0\0\345<\1\0\0\0\0\0\35\0\0\0"...}, 56, 0) = 0 msgrcv(32769, {1, "\3\0\0\0\247\r\0\0r8\0\0\0\0\0\0\0\0\0\0\345<\1\0\0\0\0\0\35\0\0\0"...}, 56, 0, 0) = 56 semop(98307, {{0, 1, SEM_UNDO}}, 1) = 0 newfstatat(AT_FDCWD, "changelog", {st_mode=S_IFREG|0644, st_size=148, ...}, AT_SYMLINK_NOFOLLOW) = 0 semop(98307, {{0, -1, SEM_UNDO}}, 1)= 0 msgsnd(0, {1, "\3\0\0\0\247\r\0\0s8\0\0\0\0\0\0\0\0\0\0
Re: Regression in btrfs: properly set the termination value of ctx->pos in readdir
On Wed, Nov 11, 2015 at 12:57 PM, Stefan Priebe - Profihost AG wrote: > Hi, > > the patch btrfs: properly set the termination value of ctx->pos in > readdir introduces a regression to me. > > A lot of stuff runs in "endless" or long running loops. Just tested this and can confirm something is off. In a directory with several files, create a new directory and move all files into the new subdir. An immediately following ls will hang. The problem goes away after a manual sync. -h -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Process is blocked for more than 120 seconds
On 2015-11-09 14:25, Austin S Hemmelgarn wrote: > On 2015-11-07 07:22, Dmitry Katsubo wrote: >> Hi everyone, >> >> I have noticed the following in the log. The system continues to run, >> but I am not sure for how long it will be stable. Should I start >> worrying? Thanks in advance for the opinion. >> > This just means that a process was stuck in the D state (uninterruptible > I/O sleep) for more than 120 seconds. Depending on a number of factors, > this happening could mean: > 1. Absolutely nothing (if you have low-powered or older hardware, for > example, I get these regularly on a first generation Raspberry Pi if I > don't increase the timeout significantly) > 2. The program is doing a very large chunk of I/O (usually with the > O_DIRECT flag, although this probably isn't the case here) > 3. There's a bug in the blocked program (this is rarely the case when > this type of thing happens) > 4. There's a bug in the kernel (which is why this dumps a stack trace) > 5. The filesystem itself is messed up somehow, and the kernel isn't > handling it properly (technically a bug, but a more specific case of it). > 6. You're hardware is misbehaving, failing, or experienced a transient > error. > > Assuming you can rule out possibilities 1 and 6, I think that 4 is the > most likely cause, as all of the listed programs (I'm assuming that > 'master' is from postfix) are relatively well audited, and all of them > hit this at the same time. > > For what it's worth, if you want you can do: > echo 0 > /proc/sys/kernel/hung_task_timeout_secs > like the message says to stop these from appearing in the future, or use > some arbitrary number to change the timeout before these messages appear > (I usually use at least 150 on production systems, and more often 300, > although on something like a Raspberry Pi I often use timeouts as high > as 1800 seconds). Thanks for comments, Austin. The system is "normal" PC, running Intel Core 2 Duo Mobile @1.66GHz. "master" is indeed a postfix process. I haven't seen anything like that when I was on 3.16 kernel, but after I have upgraded to 4.2.3, I caught that message. I/O and CPU load are usually low, but it could be (6) from your list, as the system is generally very old (5+ years). As the problem appeared only once for passed 15 days, I think it is just a transient error. Thanks for clarifying the possible reasons. -- With best regards, Dmitry -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFCv3 00/12] xfstests: test the btrfs/xfs reflink/dedupe ioctls
On Mon, Nov 09, 2015 at 10:49:13AM -0800, Darrick J. Wong wrote: > I found a few more bugs in the kernel-side implementation, which might explain > that. I'm about to start working on making CoW less crappy, but I'll push all > the patches out to github. (I wasn't planning on patchbombing again until > December.) Yes, please push your WIP code out as often as possible! Btw, here is another additional fixup, as two tests were wrongly tagged as needing reflink instead of dedupe: diff --git a/tests/generic/827 b/tests/generic/827 index cc1bc52..c8ce7cf 100755 --- a/tests/generic/827 +++ b/tests/generic/827 @@ -47,7 +47,7 @@ _supported_fs generic _supported_os Linux _require_scratch -_require_xfs_io_command "reflink" +_require_xfs_io_command "dedupe" echo "Format and mount" _scratch_mkfs > $seqres.full 2>&1 diff --git a/tests/generic/828 b/tests/generic/828 index 1757985..f5b7298 100755 --- a/tests/generic/828 +++ b/tests/generic/828 @@ -47,7 +47,7 @@ _supported_fs generic _supported_os Linux _require_scratch -_require_xfs_io_command "reflink" +_require_xfs_io_command "dedupe" echo "Format and mount" _scratch_mkfs > $seqres.full 2>&1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v9 0/4] VFS: In-kernel copy system call
On 11/10/2015 10:38 PM, Al Viro wrote: > On Tue, Nov 10, 2015 at 04:53:29PM -0500, Anna Schumaker wrote: >> Copy system calls came up during Plumbers a while ago, mostly because several >> filesystems (including NFS and XFS) are currently working on copy >> acceleration >> implementations. We haven't heard from Zach Brown in a while, so I >> volunteered >> to push his patches upstream so individual filesystems don't need to keep >> writing their own ioctls. > > OK, taken for the next cycle. FWIW, I'm going to toss COMPAT_SYSCALL_DEFINE > counterpart in as well - right now it's doing only native, AFAICS. > Thanks, Al! -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: How to properly and efficiently balance RAID6 after more drives are added?
Sorry for the late reply to this list regarding this topic ... On 09/04/2015 01:04 PM, Duncan wrote: > And of course, only with 4.1 (nominally 3.19 but there were initial > problems) was raid6 mode fully code-complete and functional -- before > that, runtime worked, it calculated and wrote the parity stripes as it > should, but the code to recover from problems wasn't complete, so you > were effectively running a slow raid0 in terms of recovery ability, but > one that got "magically" updated to raid6 once the recovery code was > actually there and working. As other who write to this ML, I run into crashes when trying to do a balance of my filesystem. I moved through the different kernel versions and btrfs-tools and am currently running Kernel 4.3 + 4.3rc1 of the tools but still after like an hour of balancing (and actually moving chunks) the machine crashes horribly without giving any good stack trace or anything in the kernel log which I could report here :( Any ideas on how I could proceed to get some usable debug info for the devs to look at? > So I'm guessing you have some 8-strip-stripe chunks at say 20% full or > some such. There's 19.19 data TiB used of 22.85 TiB allocated, a spread > of over 3 TiB. A full nominal-size data stripe allocation, given 12 > devices in raid6, will be 10x1GiB data plus 2x1GiB parity, so there's > about 3.5 TiB / 10 GiB extra stripes worth of chunks, 350 stripes or so, > that should be freeable, roughly (the fact that you probably have 8- > strip, 12-strip, and 4-strip stripes, on the same filesystem, will of > course change that a bit, as will the fact that four devices are much > smaller than the other eight). The new devices have been in place for while (> 2 months) now, and are barely used. Why is there not more data being put onto the new disks? Even without a balance new data should spread evenly across all devices right? From the IOPs I can see that only the 8 disks which always have been in the box are doing any heavy lifting and the new disks are mostly idle. Anything I could do to narrow down where a certain file is stored across the devices? Regards Christian -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v9 0/4] VFS: In-kernel copy system call
On Tue, Nov 10, 2015 at 04:53:30PM -0500, Anna Schumaker wrote: > out: > fdput(f_in); > out1: > fdput(f_out); The fdput()s are in the wrong order. fdget(f_in) is first at the beginning, so fdput(f_in) needs to be last at the end. > /* this could be relaxed once a method supports cross-fs copies */ > if (inode_in->i_sb != inode_out->i_sb) > return -EXDEV; This allows the same superblock but different mounts --- is that intentional? The commit message says otherwise: it says the vfs entry point requires the same superblock and mount. Was there a decision made on FMODE_PREAD and FMODE_PWRITE? To me it seems logical that the if the user explicitly specifies an offset, then the corresponding mode should be checked. That would check whether the file is seekable or not, I believe. Note that e.g. sys_sendfile() does the same thing. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Potential to loose data in case of disk failure
Hi all, What am I missing or misunderstanding? I have a newly purchased laptop I want/need to multi boot different OSs on. As a result after partitioning I have ended up with two partitions on each of the two internal drives(sda3, sda8, sdb3 and sdb8). FWIW, sda3 and sdb3 are the same size and sda8 and sdb8 are the same size. As an end result I want one btrfs raid1 filesystem. For lack of better terms, sda3 and sda8 "concatenated" together, sdb3 and sdb8 "concatenated" together and then mirroring "sda" to "sdb" using only btrfs. So far have found no use-case to cover this. If I create a raid1 btrfs volume using all 4 "devices" as I understand it I would loose data if I were to loose a drive because two mirror possibilities would be: sda3 mirrored to sda8 sdb3 mirrored to sdb8 Is what I want to do possible without using MD-RAID and/or LVM? If so would someone point me to the documentation I missed. For whatever reason, I don't want to believe that this can't be done. I want to believe that the code in btrfs is smart enough to know that sda3 and sda8 are on the same drive and would not try to mirror data between them except in a test setup. I hope I just missed some documentation, somewhere. Thanks in advance for your help. And last but not least, thanks to all for your work on btrfs. Jim -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFCv3.1 00/11] xfstests: test the nfs/cifs/btrfs/xfs reflink/dedupe ioctls
Hi all, This is part of the third revision of an RFC for adding to XFS support for tracking reverse-mappings of physical blocks to file and metadata; and support for mapping multiple file logical blocks to the same physical block, more commonly known as reflinking. This patchset aims to make xfstests perform more rigorous testing of the NFS/CIFS/ocfs2/btrfs/XFS file clone, reflink, and dedupe ioctls. There are now tests of the basic functionality of the three ioctls; tests to ensure that the filesystem exhibits the expected copy on write semantics; tests to try to suss out race conditions in the new write paths; tests to ensure that the ioctls peform basic disk accounting correctly; tests of the interaction between reflink and the various fallocate verbs (allocate, punch, collapse, insert zeroes); and some attempts to test the upper limits of reflinking and ENOSPC behavior. Since the last posting, each test tries to reflink (or dedupe) on the test or scratch FS to decide if they're going to run, instead of guessing based on FS type. Per Dave's suggestion, I also converted the basic functionality tests to use fixed sizes so that I can use md5sum in the golden output to check that the file contents match exactly. Issues: * I think the race checks for dedupe could be a little sharper at finding mistakes. * The realtime reflink test (xfs/804) crashes XFS before we even get to the reflink attempt. * I started the numbering really high to prevent the tests from colliding with whatever new tests might arrive; this will require some intervention to fix. If you're going to start using this mess, you probably ought to just pull from my github trees for kernel[1], xfsprogs[2], xfstests[3], and xfs-docs[4]. They should just work with the btrfs that's in 4.3... and somewhat buggily with the 4.3 XFS patched with [1]. Comments and questions are, as always, welcome. --D [1] https://github.com/djwong/linux/tree/for-dave [2] https://github.com/djwong/xfsprogs/tree/for-dave [3] https://github.com/djwong/xfstests/tree/for-dave [4] https://github.com/djwong/xfs-documentation/tree/for-dave -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 08/11] reflink: test error conditions due to bad inputs
Check that we can feed bad inputs to reflink and it'll reject them. Signed-off-by: Darrick J. Wong --- tests/generic/839 | 106 + tests/generic/839.out | 19 + tests/generic/846 | 106 + tests/generic/846.out | 19 + tests/generic/group |2 + 5 files changed, 252 insertions(+) create mode 100755 tests/generic/839 create mode 100644 tests/generic/839.out create mode 100755 tests/generic/846 create mode 100644 tests/generic/846.out diff --git a/tests/generic/839 b/tests/generic/839 new file mode 100755 index 000..0b754b1 --- /dev/null +++ b/tests/generic/839 @@ -0,0 +1,106 @@ +#! /bin/bash +# FS QA Test No. 839 +# +# Check that various invalid reflink scenarios are rejected +# +#--- +# Copyright (c) 2015, Oracle and/or its affiliates. All Rights Reserved. +# +# This program is free software; you can redistribute it and/or +# modify it under the terms of the GNU General Public License as +# published by the Free Software Foundation. +# +# This program is distributed in the hope that it would be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program; if not, write the Free Software Foundation, +# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA +#--- + +seq=`basename "$0"` +seqres="$RESULT_DIR/$seq" +echo "QA output created by $seq" + +here=`pwd` +tmp=/tmp/$$ +status=1# failure is the default! +trap "_cleanup; exit \$status" 0 1 2 3 15 + +_cleanup() +{ +cd / +rm -rf "$tmp".* "$TESTDIR1" +} + +# get standard environment, filters and checks +. ./common/rc +. ./common/filter +. ./common/attr +. ./common/reflink + +# real QA test starts here +_supported_os Linux +_require_test_reflink +_require_scratch_reflink + +rm -f "$seqres.full" + +echo "Format and mount" +_scratch_mkfs > "$seqres.full" 2>&1 +_scratch_mount >> "$seqres.full" 2>&1 + +TESTDIR1="$TEST_DIR/test-$seq" +rm -rf "$TESTDIR1" +mkdir "$TESTDIR1" + +TESTDIR2=$SCRATCH_MNT/test-$seq +rm -rf "$TESTDIR2" +mkdir "$TESTDIR2" + +echo "Create the original files" +BLKSZ="$(stat -f $TESTDIR1 -c '%S')" +BLKS=1000 +MARGIN=50 +SZ=$((BLKSZ * BLKS)) +FREE_BLOCKS0=$(stat -f $TESTDIR1 -c '%f') +NR=4 +_pwrite_byte 0x61 0 $SZ "$TESTDIR1/file1" >> "$seqres.full" +_pwrite_byte 0x61 0 $SZ "$TESTDIR1/file2" >> "$seqres.full" +_pwrite_byte 0x61 0 $SZ "$TESTDIR2/file1" >> "$seqres.full" +_pwrite_byte 0x61 0 $SZ "$TESTDIR2/file2" >> "$seqres.full" +sync + +echo "Try cross-device reflink" +_reflink_range "$TESTDIR1/file1" 0 "$TESTDIR2/file1" 0 $BLKSZ + +echo "Try unaligned reflink" +_reflink_range "$TESTDIR1/file1" 37 "$TESTDIR1/file1" 59 23 + +echo "Try overlapping reflink" +_reflink_range "$TESTDIR1/file1" 0 "$TESTDIR1/file1" 1 $((BLKSZ * 2)) + +echo "Try reflink past EOF" +_reflink_range "$TESTDIR1/file1" $(( (BLKS + 10) * BLKSZ)) "$TESTDIR1/file1" 0 $BLKSZ + +chattr +i $TESTDIR1/file1 $TESTDIR1/file2 +echo "Try reflink on immutable files" +_reflink_range "$TESTDIR1/file1" 0 "$TESTDIR1/file2" 0 $BLKSZ 2>&1 | _filter_test_dir +chattr -i $TESTDIR1/file1 $TESTDIR1/file2 + +echo "Reflink two files" +_reflink_range "$TESTDIR1/file1" 0 "$TESTDIR1/file2" 0 $BLKSZ >> "$seqres.full" +_reflink_range "$TESTDIR2/file1" 0 "$TESTDIR2/file2" 0 $BLKSZ >> "$seqres.full" + +lsattr -l $TESTDIR1/ | _filter_test_dir +lsattr -l $TESTDIR2/ | _filter_scratch + +echo "Check scratch fs" +umount $SCRATCH_MNT +_check_scratch_fs + +# success, all done +status=0 +exit diff --git a/tests/generic/839.out b/tests/generic/839.out new file mode 100644 index 000..d8703b1 --- /dev/null +++ b/tests/generic/839.out @@ -0,0 +1,19 @@ +QA output created by 839 +Format and mount +Create the original files +Try cross-device reflink +XFS_IOC_CLONE_RANGE: Invalid cross-device link +Try unaligned reflink +XFS_IOC_CLONE_RANGE: Invalid argument +Try overlapping reflink +XFS_IOC_CLONE_RANGE: Invalid argument +Try reflink past EOF +XFS_IOC_CLONE_RANGE: Invalid argument +Try reflink on immutable files +TEST_DIR/test-839/file2: Permission denied +Reflink two files +TEST_DIR/test-839/file1 --- +TEST_DIR/test-839/file2 --- +SCRATCH_MNT/test-839/file1 --- +SCRATCH_MNT/test-839/file2 --- +Check scratch fs diff --git a/tests/generic/846 b/tests/generic/846 new file mode 100755 index 000..e425fbd --- /dev/null +++ b/tests/generic/846 @@ -0,0 +1,106 @@ +#! /bin/bash +# FS QA Test No. 839 +# +# Check that various invalid dedupe scenarios are rejected +# +#--- +# Copyright (c) 2015, Oracle and/or
[PATCH 10/11] reflink: test what happens when we hit resource limits
Add a few horrible opt-in stress tests to see what happens if we try to reflink the same block billions of times, and what happens if we run out of space while reflinking a file. Signed-off-by: Darrick J. Wong --- tests/generic/840 | 99 + tests/generic/840.out |0 tests/generic/841 | 81 tests/generic/841.out |5 ++ tests/generic/group |2 + 5 files changed, 187 insertions(+) create mode 100755 tests/generic/840 create mode 100644 tests/generic/840.out create mode 100755 tests/generic/841 create mode 100644 tests/generic/841.out diff --git a/tests/generic/840 b/tests/generic/840 new file mode 100755 index 000..137cc3d --- /dev/null +++ b/tests/generic/840 @@ -0,0 +1,99 @@ +#! /bin/bash +# FS QA Test No. 840 +# +# Try to hit the maximum reference count (eek!) +# +# This test runs extremely slowly, so it's not automatically run anywhere. +# +#--- +# Copyright (c) 2015, Oracle and/or its affiliates. All Rights Reserved. +# +# This program is free software; you can redistribute it and/or +# modify it under the terms of the GNU General Public License as +# published by the Free Software Foundation. +# +# This program is distributed in the hope that it would be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program; if not, write the Free Software Foundation, +# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA +#--- + +seq=`basename "$0"` +seqres="$RESULT_DIR/$seq" +echo "QA output created by $seq" + +here=`pwd` +tmp=/tmp/$$ +status=1# failure is the default! +trap "_cleanup; exit \$status" 0 1 2 3 15 + +_cleanup() +{ +cd / +rm -rf "$tmp".* "$TESTDIR1" +} + +# get standard environment, filters and checks +. ./common/rc +. ./common/filter +. ./common/attr +. ./common/reflink + +# real QA test starts here +_supported_os Linux +_require_scratch_reflink +_require_cp_reflink + +rm -f "$seqres.full" + +echo "Format and mount" +_scratch_mkfs > "$seqres.full" 2>&1 +_scratch_mount >> "$seqres.full" 2>&1 + +TESTDIR="$SCRATCH_MNT/test-$seq" +rm -rf "$TESTDIR" +mkdir "$TESTDIR" + +# Well let's hope the maximum reflink count is (less than (ha!)) 2^32... + +echo "Create a one block file" +BLKSZ="$(stat -f "$TESTDIR" -c '%S')" +_pwrite_byte 0x61 0 $BLKSZ "$TESTDIR/file1" >> "$seqres.full" +_pwrite_byte 0x62 0 $BLKSZ "$TESTDIR/file2" >> "$seqres.full" +_cp_reflink "$TESTDIR/file1" "$TESTDIR/file2" >> "$seqres.full" + +nr=32 +fnr=32 +for i in $(seq 0 $fnr); do + echo " ++ Reflink size $i, $(( (2 ** i) * BLKSZ)) bytes" | tee -a "$seqres.full" + n=$(( (2 ** i) * BLKSZ)) + _reflink_range "$TESTDIR/file1" 0 "$TESTDIR/file1" $n $n >> "$seqres.full" || break +done + +nrf=$((nr - fnr)) +echo "Clone $((2 ** nrf)) files" +seq 0 $((2 ** nrf)) | while read i; do + _cp-reflink "$TESTDIR/file1" "$TESTDIR/file1-$i" +done + +echo "Check scratch fs" +umount "$SCRATCH_MNT" +_check_scratch_fs + +echo "Remove big file and recheck" +_scratch_mount >> "$seqres.full" 2>&1 +umount "$SCRATCH_MNT" +_check_scratch_fs + +echo "Remove all files and recheck" +_scratch_mount >> "$seqres.full" 2>&1 +umount "$SCRATCH_MNT" +_check_scratch_fs + +# success, all done +status=0 +exit diff --git a/tests/generic/840.out b/tests/generic/840.out new file mode 100644 index 000..e69de29 diff --git a/tests/generic/841 b/tests/generic/841 new file mode 100755 index 000..b89dba4 --- /dev/null +++ b/tests/generic/841 @@ -0,0 +1,81 @@ +#! /bin/bash +# FS QA Test No. 841 +# +# Try to run out of space while cloning? +# +#--- +# Copyright (c) 2015, Oracle and/or its affiliates. All Rights Reserved. +# +# This program is free software; you can redistribute it and/or +# modify it under the terms of the GNU General Public License as +# published by the Free Software Foundation. +# +# This program is distributed in the hope that it would be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program; if not, write the Free Software Foundation, +# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA +#--- + +seq=`basename "$0"` +seqres="$RESULT_DIR/$seq" +echo "QA output created by $seq" + +here=`pwd` +tmp=/tmp/$$ +status=1# failure is the default! +trap "_cleanup;
[PATCH 09/11] xfs: test xfs-specific reflink pieces
Check that growfs and xfs_fsr still work properly on reflinked fses. Signed-off-by: Darrick J. Wong --- tests/xfs/800 | 79 tests/xfs/800.out |6 ++ tests/xfs/801 | 148 + tests/xfs/801.out | 27 ++ tests/xfs/802 | 87 +++ tests/xfs/802.out |6 ++ tests/xfs/803 | 109 +++ tests/xfs/803.out | 13 + tests/xfs/804 | 77 tests/xfs/804.out |0 tests/xfs/group |5 ++ 11 files changed, 557 insertions(+) create mode 100755 tests/xfs/800 create mode 100644 tests/xfs/800.out create mode 100755 tests/xfs/801 create mode 100644 tests/xfs/801.out create mode 100755 tests/xfs/802 create mode 100644 tests/xfs/802.out create mode 100755 tests/xfs/803 create mode 100644 tests/xfs/803.out create mode 100755 tests/xfs/804 create mode 100644 tests/xfs/804.out diff --git a/tests/xfs/800 b/tests/xfs/800 new file mode 100755 index 000..d9fe5b4 --- /dev/null +++ b/tests/xfs/800 @@ -0,0 +1,79 @@ +#! /bin/bash +# FS QA Test No. 800 +# +# Tests xfs_growfs on a reflinked filesystem +# +#--- +# Copyright (c) 2015, Oracle and/or its affiliates. All Rights Reserved. +# +# This program is free software; you can redistribute it and/or +# modify it under the terms of the GNU General Public License as +# published by the Free Software Foundation. +# +# This program is distributed in the hope that it would be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program; if not, write the Free Software Foundation, +# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA +#--- +# + +seq=`basename "$0"` +seqres="$RESULT_DIR/$seq" +echo "QA output created by $seq" + +here=`pwd` +tmp=/tmp/$$ +status=1# failure is the default! +trap "_cleanup; exit \$status" 0 1 2 3 15 + +_cleanup() +{ +cd / +rm -f "$tmp".* +} + +# get standard environment, filters and checks +. ./common/rc +. ./common/filter +. ./common/reflink + +# real QA test starts here +_supported_os Linux +_supported_fs xfs +_require_scratch_reflink +_require_cp_reflink + +echo "Format and mount" +_scratch_mkfs -d size=$((2 * 4096 * 4096)) -l size=4194304 > "$seqres.full" 2>&1 +_scratch_mount >> "$seqres.full" 2>&1 + +TESTDIR="$SCRATCH_MNT/test-$seq" +rm -rf "$TESTDIR" +mkdir "$TESTDIR" + +echo "Create the original file and reflink to copy1, copy2" +BLKSZ="$(stat -f "$TESTDIR" -c '%S')" +_pwrite_byte 0x61 0 $((BLKSZ * 14 + 71)) "$TESTDIR/original" >> "$seqres.full" +_cp_reflink "$TESTDIR/original" "$TESTDIR/copy1" +_cp_reflink "$TESTDIR/copy1" "$TESTDIR/copy2" + +echo "Grow fs" +"$XFS_GROWFS_PROG" "$SCRATCH_MNT" 2>&1 | _filter_growfs >> "$seqres.full" +_scratch_remount + +echo "Create more reflink copies" +_cp_reflink "$TESTDIR/original" "$TESTDIR/copy3" + +xfs_info "$SCRATCH_MNT" >> "$seqres.full" + +echo "Check scratch fs" +umount "$SCRATCH_MNT" +_check_scratch_fs + +# success, all done +status=0 +exit diff --git a/tests/xfs/800.out b/tests/xfs/800.out new file mode 100644 index 000..73d4f71 --- /dev/null +++ b/tests/xfs/800.out @@ -0,0 +1,6 @@ +QA output created by 800 +Format and mount +Create the original file and reflink to copy1, copy2 +Grow fs +Create more reflink copies +Check scratch fs diff --git a/tests/xfs/801 b/tests/xfs/801 new file mode 100755 index 000..172fb29 --- /dev/null +++ b/tests/xfs/801 @@ -0,0 +1,148 @@ +#! /bin/bash +# FS QA Test No. 801 +# +# Ensure that xfs_fsr un-reflinks files while defragmenting +# +#--- +# Copyright (c) 2015, Oracle and/or its affiliates. All Rights Reserved. +# +# This program is free software; you can redistribute it and/or +# modify it under the terms of the GNU General Public License as +# published by the Free Software Foundation. +# +# This program is distributed in the hope that it would be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program; if not, write the Free Software Foundation, +# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA +#--- +# + +seq=`basename "$0"` +seqres="$RESULT_DIR/$seq" +echo "QA output created by $seq" + +here=`pwd` +tmp=/tmp/$$ +status=1# failure is the default! +trap "_cleanup; exit \$st
[PATCH 11/11] reflink: test that CoW writes fail when we're out of space
Ensure that copy-on-writing a reflinked file when there's no free disk space reflects the desired ENOSPC back to userspace during the write call. Tests the buffered IO, direct IO, and mmap write paths. Signed-off-by: Darrick J. Wong --- common/rc |2 - tests/generic/842 | 107 tests/generic/842.out | 10 tests/generic/843 | 107 tests/generic/843.out | 10 tests/generic/844 | 109 + tests/generic/844.out |8 tests/generic/845 | 107 tests/generic/845.out | 10 tests/generic/group |4 ++ 10 files changed, 473 insertions(+), 1 deletion(-) create mode 100755 tests/generic/842 create mode 100644 tests/generic/842.out create mode 100755 tests/generic/843 create mode 100644 tests/generic/843.out create mode 100755 tests/generic/844 create mode 100644 tests/generic/844.out create mode 100755 tests/generic/845 create mode 100644 tests/generic/845.out diff --git a/common/rc b/common/rc index c016673..474fa84 100644 --- a/common/rc +++ b/common/rc @@ -101,7 +101,7 @@ _mwrite_byte() { mmap_len="$4" file="$5" - "$XFS_IO_PROG" -f -c "mmap -rw 0 $mmap_len" -c "pwrite -S $pattern $offset $len" "$file" + "$XFS_IO_PROG" -f -c "mmap -rw 0 $mmap_len" -c "mwrite -S $pattern $offset $len" "$file" } # ls -l w/ selinux sometimes puts a dot at the end: diff --git a/tests/generic/842 b/tests/generic/842 new file mode 100755 index 000..5eda56b --- /dev/null +++ b/tests/generic/842 @@ -0,0 +1,107 @@ +#! /bin/bash +# FS QA Test No. 842 +# +# Reflink a file, use up the rest of the space, then try to observe ENOSPC +# while copy-on-writing the file via the page cache. +# +#--- +# Copyright (c) 2015, Oracle and/or its affiliates. All Rights Reserved. +# +# This program is free software; you can redistribute it and/or +# modify it under the terms of the GNU General Public License as +# published by the Free Software Foundation. +# +# This program is distributed in the hope that it would be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program; if not, write the Free Software Foundation, +# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA +#--- + +seq=`basename "$0"` +seqres="$RESULT_DIR/$seq" +echo "QA output created by $seq" + +here=`pwd` +tmp=/tmp/$$ +status=1# failure is the default! +trap "_cleanup; exit \$status" 0 1 2 3 15 + +_cleanup() +{ +cd / +rm -rf "$tmp".* "$TESTDIR1" +} + +# get standard environment, filters and checks +. ./common/rc +. ./common/filter +. ./common/attr +. ./common/reflink + +# real QA test starts here +_supported_os Linux +_require_scratch_reflink +_require_cp_reflink + +rm -f "$seqres.full" + +echo "Format and mount" +_scratch_mkfs > "$seqres.full" 2>&1 +_scratch_mount >> "$seqres.full" 2>&1 + +TESTDIR="$SCRATCH_MNT/test-$seq" +rm -rf "$TESTDIR" +mkdir "$TESTDIR" + +echo "Reformat with appropriate size" +BLKSZ="$(stat -f "$TESTDIR" -c '%S')" +NR_BLKS=10240 +umount "$SCRATCH_MNT" +SZ_BYTES=$((NR_BLKS * 8 * BLKSZ)) +if [ $SZ_BYTES -lt $((32 * 1048576)) ]; then + SZ_BYTES=$((32 * 1048576)) +fi +_scratch_mkfs_sized $SZ_BYTES >> "$seqres.full" 2>&1 +_scratch_mount >> "$seqres.full" 2>&1 +rm -rf "$TESTDIR" +mkdir "$TESTDIR" + +echo "Create a big file and reflink it" +_pwrite_byte 0x61 0 $((BLKSZ * NR_BLKS)) "$TESTDIR/bigfile" >> "$seqres.full" 2>&1 +_cp_reflink "$TESTDIR/bigfile" "$TESTDIR/clonefile" +sync + +echo "Allocate the rest of the space" +NR_FREE="$(stat -f -c '%f' "$TESTDIR")" +touch "$TESTDIR/file0" "$TESTDIR/file1" +_pwrite_byte 0x61 0 $((BLKSZ * NR_FREE)) "$TESTDIR/eat_my_space" >> "$seqres.full" 2>&1 +sync + +echo "CoW the big file" +out="$(_pwrite_byte 0x62 0 $((BLKSZ * NR_BLKS)) "$TESTDIR/bigfile" 2>&1)" +echo "${out}" | grep -q "No space left on device" || echo "CoW should have failed with ENOSPC" +echo "${out}" >> "$seqres.full" 2>&1 +echo "${out}" + +echo "Remount and try CoW again" +_scratch_remount + +out="$(_pwrite_byte 0x62 0 $((BLKSZ * NR_BLKS)) "$TESTDIR/bigfile" 2>&1)" +echo "${out}" | grep -q "No space left on device" || echo "CoW should have failed with ENOSPC" +echo "${out}" >> "$seqres.full" 2>&1 +echo "${out}" + +#filefrag -v $TESTDIR/bigfile +#filefrag -v $TESTDIR/clonefile + +echo "Check scratch fs" +umount "$SCRATCH_MNT" +_check_scratch_fs + +# success, all done +status=0 +exit diff --git a/tests/generic/842.out b/tests/generic/842.out new file mode 100644 index
[PATCH 01/11] btrfs: move btrfs reflink tests to generic
Move the cp --reflink tests from btrfs/ to generic/ since xfs now supports that ioctl. Signed-off-by: Darrick J. Wong --- tests/btrfs/026 | 92 - tests/btrfs/026.out | 16 --- tests/btrfs/027 | 109 - tests/btrfs/027.out | 25 --- tests/btrfs/028 | 83 - tests/btrfs/028.out |7 --- tests/btrfs/group |3 - tests/generic/800 | 92 + tests/generic/800.out | 16 +++ tests/generic/801 | 109 + tests/generic/801.out | 25 +++ tests/generic/802 | 83 + tests/generic/802.out |7 +++ tests/generic/group |3 + 14 files changed, 335 insertions(+), 335 deletions(-) delete mode 100755 tests/btrfs/026 delete mode 100644 tests/btrfs/026.out delete mode 100755 tests/btrfs/027 delete mode 100644 tests/btrfs/027.out delete mode 100755 tests/btrfs/028 delete mode 100644 tests/btrfs/028.out create mode 100755 tests/generic/800 create mode 100644 tests/generic/800.out create mode 100755 tests/generic/801 create mode 100644 tests/generic/801.out create mode 100755 tests/generic/802 create mode 100644 tests/generic/802.out diff --git a/tests/btrfs/026 b/tests/btrfs/026 deleted file mode 100755 index 7559ca2..000 --- a/tests/btrfs/026 +++ /dev/null @@ -1,92 +0,0 @@ -#! /bin/bash -# FS QA Test No. 026 -# -# Tests file clone functionality of btrfs ("reflinks"): -# - Reflink a file -# - Reflink the reflinked file -# - Modify the original file -# - Modify the reflinked file -# -#--- -# Copyright (c) 2014, Oracle and/or its affiliates. All Rights Reserved. -# -# This program is free software; you can redistribute it and/or -# modify it under the terms of the GNU General Public License as -# published by the Free Software Foundation. -# -# This program is distributed in the hope that it would be useful, -# but WITHOUT ANY WARRANTY; without even the implied warranty of -# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the -# GNU General Public License for more details. -# -# You should have received a copy of the GNU General Public License -# along with this program; if not, write the Free Software Foundation, -# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA -#--- -# - -seq=`basename $0` -seqres=$RESULT_DIR/$seq -echo "QA output created by $seq" - -here=`pwd` -tmp=/tmp/$$ -status=1# failure is the default! -trap "_cleanup; exit \$status" 0 1 2 3 15 - -_cleanup() -{ -cd / -rm -f $tmp.* -} - -# get standard environment, filters and checks -. common/rc -. common/filter - -# real QA test starts here -_supported_fs btrfs -_supported_os Linux - -_require_xfs_io_command "fiemap" -_require_cp_reflink -_require_test - -TESTDIR1=$TEST_DIR/test-$seq -rm -rf $TESTDIR1 -mkdir $TESTDIR1 - -_checksum_files() { -for F in original copy1 copy2 -do -md5sum $TESTDIR1/$F | _filter_test_dir -done -} - -rm -f $seqres.full - -echo "Create the original file and reflink to copy1, copy2" -$XFS_IO_PROG -f -c 'pwrite -S 0x61 0 9000' $TESTDIR1/original \ ->> $seqres.full 2>&1 -cp --reflink $TESTDIR1/original $TESTDIR1/copy1 -cp --reflink $TESTDIR1/copy1 $TESTDIR1/copy2 -_verify_reflink $TESTDIR1/original $TESTDIR1/copy1 -_verify_reflink $TESTDIR1/original $TESTDIR1/copy2 -echo "Original md5sums:" -_checksum_files - -echo "Overwrite original file with new data" -$XFS_IO_PROG -c 'pwrite -S 0x62 0 9000' $TESTDIR1/original \ ->> $seqres.full 2>&1 -echo "md5sums after overwriting original:" -_checksum_files - -echo "Overwrite copy1 with different new data" -$XFS_IO_PROG -c 'pwrite -S 0x63 0 9000' $TESTDIR1/copy1 \ ->> $seqres.full 2>&1 -echo "md5sums after overwriting copy1:" -_checksum_files - -# success, all done -status=0 -exit diff --git a/tests/btrfs/026.out b/tests/btrfs/026.out deleted file mode 100644 index 3b90ff0..000 --- a/tests/btrfs/026.out +++ /dev/null @@ -1,16 +0,0 @@ -QA output created by 026 -Create the original file and reflink to copy1, copy2 -Original md5sums: -42d69d1a6d333a7ebdf64792a555e392 TEST_DIR/test-026/original -42d69d1a6d333a7ebdf64792a555e392 TEST_DIR/test-026/copy1 -42d69d1a6d333a7ebdf64792a555e392 TEST_DIR/test-026/copy2 -Overwrite original file with new data -md5sums after overwriting original: -4a847a25439532bf48b68c9e9536ed5b TEST_DIR/test-026/original -42d69d1a6d333a7ebdf64792a555e392 TEST_DIR/test-026/copy1 -42d69d1a6d333a7ebdf64792a555e392 TEST_DIR/test-026/copy2 -Overwrite copy1 with different new data -md5sums after overwriting copy1: -4a847a25439532bf48b68c9e9536ed5b TEST_DIR/test-026/original -e271cd47d9f62ebc96cb4e67ae4d16db TEST_DIR/test-
[PATCH 04/11] reflink: test CoW behaviors of reflinked files
Ensure that CoW happens correctly with buffered, directio, and mmap writes. Signed-off-by: Darrick J. Wong --- tests/generic/808 | 152 + tests/generic/808.out | 19 ++ tests/generic/809 | 151 + tests/generic/809.out | 19 ++ tests/generic/810 | 152 + tests/generic/810.out | 19 ++ tests/generic/837 | 91 + tests/generic/837.out |8 +++ tests/generic/838 | 91 + tests/generic/838.out |8 +++ tests/generic/group |5 ++ 11 files changed, 715 insertions(+) create mode 100755 tests/generic/808 create mode 100644 tests/generic/808.out create mode 100755 tests/generic/809 create mode 100644 tests/generic/809.out create mode 100755 tests/generic/810 create mode 100644 tests/generic/810.out create mode 100755 tests/generic/837 create mode 100644 tests/generic/837.out create mode 100755 tests/generic/838 create mode 100644 tests/generic/838.out diff --git a/tests/generic/808 b/tests/generic/808 new file mode 100755 index 000..3a3ec58 --- /dev/null +++ b/tests/generic/808 @@ -0,0 +1,152 @@ +#! /bin/bash +# FS QA Test No. 808 +# +# Ensuring that copy on write through the page cache works: +# - Reflink two files together +# - Write to the beginning, middle, and end +# - Check that the files are now different where we say they're different. +# +#--- +# Copyright (c) 2015, Oracle and/or its affiliates. All Rights Reserved. +# +# This program is free software; you can redistribute it and/or +# modify it under the terms of the GNU General Public License as +# published by the Free Software Foundation. +# +# This program is distributed in the hope that it would be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program; if not, write the Free Software Foundation, +# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA +#--- + +seq=`basename "$0"` +seqres="$RESULT_DIR/$seq" +echo "QA output created by $seq" + +here=`pwd` +tmp=/tmp/$$ +status=1# failure is the default! +trap "_cleanup; exit \$status" 0 1 2 3 15 + +_cleanup() +{ +cd / +rm -rf "$tmp".* "$TESTDIR" +} + +# get standard environment, filters and checks +. ./common/rc +. ./common/filter +. ./common/reflink + +# real QA test starts here +_supported_os Linux +_require_test_reflink +_require_cp_reflink + +rm -f "$seqres.full" + +TESTDIR="$TEST_DIR/test-$seq" +rm -rf "$TESTDIR" +mkdir "$TESTDIR" + +echo "Create the original files" +BLKSZ=65536 +_pwrite_byte 0x61 0 $((BLKSZ * 48 - 3)) "$TESTDIR/file1" >> "$seqres.full" +_cp_reflink "$TESTDIR/file1" "$TESTDIR/file2" >> "$seqres.full" +_pwrite_byte 0x61 0 $((BLKSZ * 48 - 3)) "$TESTDIR/file3" >> "$seqres.full" +_test_remount + +echo "Compare files" +md5sum "$TESTDIR/file1" | _filter_test_dir +md5sum "$TESTDIR/file2" | _filter_test_dir +md5sum "$TESTDIR/file3" | _filter_test_dir + +cmp -s "$TESTDIR/file1" "$TESTDIR/file2" || echo "Files 1-2 do not match" +cmp -s "$TESTDIR/file1" "$TESTDIR/file3" || echo "Files 1-3 do not match" +cmp -s "$TESTDIR/file2" "$TESTDIR/file3" || echo "Files 2-3 do not match" + +echo "pagecache CoW the second file" +_pwrite_byte 0x62 0 17 "$TESTDIR/file2" >> "$seqres.full" +_pwrite_byte 0x62 0 17 "$TESTDIR/file3" >> "$seqres.full" + +_pwrite_byte 0x62 $((BLKSZ * 16 - 34)) 17 "$TESTDIR/file2" >> "$seqres.full" +_pwrite_byte 0x62 $((BLKSZ * 16 - 34)) 17 "$TESTDIR/file3" >> "$seqres.full" + +_pwrite_byte 0x62 $((BLKSZ * 48 - 8)) 17 "$TESTDIR/file2" >> "$seqres.full" +_pwrite_byte 0x62 $((BLKSZ * 48 - 8)) 17 "$TESTDIR/file3" >> "$seqres.full" +_test_remount + +echo "Compare files" +md5sum "$TESTDIR/file1" | _filter_test_dir +md5sum "$TESTDIR/file2" | _filter_test_dir +md5sum "$TESTDIR/file3" | _filter_test_dir + +cmp -s "$TESTDIR/file1" "$TESTDIR/file2" || echo "Files 1-2 do not match (intentional)" +cmp -s "$TESTDIR/file1" "$TESTDIR/file3" || echo "Files 1-3 do not match (intentional)" +cmp -s "$TESTDIR/file2" "$TESTDIR/file3" || echo "Files 2-3 do not match" + +echo "Compare the CoW'd section to the before file" +_compare_range "$TESTDIR/file1" 0 "$TESTDIR/file2" 0 17 \ + || echo "Start sections do not match (intentional)" + +_compare_range "$TESTDIR/file1" $((BLKSZ * 16 - 34)) \ + "$TESTDIR/file2" $((BLKSZ * 16 - 34)) 17 \ + || echo "Middle sections do not match (intentional)" + +_compare_range "$TESTDIR/file1" $((BLKSZ * 48 - 8)) \ + "$TESTDIR/file2" $((BLKSZ * 48 - 8)) 17 \ + || echo "
[PATCH 03/11] reflink: basic tests of the reflink and dedupe ioctls
Test the operation of the btrfs (and now xfs) reflink and dedupe ioctls at various file offsets and with matching and nonmatching files. Signed-off-by: Darrick J. Wong --- tests/generic/803 | 92 +++ tests/generic/803.out |8 ++ tests/generic/804 | 93 +++ tests/generic/804.out | 11 +++ tests/generic/805 | 170 + tests/generic/805.out | 30 + tests/generic/806 | 92 +++ tests/generic/806.out |8 ++ tests/generic/807 | 92 +++ tests/generic/807.out | 12 +++ tests/generic/817 | 128 + tests/generic/817.out | 16 + tests/generic/818 | 128 + tests/generic/818.out | 17 + tests/generic/819 | 131 ++ tests/generic/819.out |8 ++ tests/generic/group |8 ++ 17 files changed, 1044 insertions(+) create mode 100755 tests/generic/803 create mode 100644 tests/generic/803.out create mode 100755 tests/generic/804 create mode 100644 tests/generic/804.out create mode 100755 tests/generic/805 create mode 100644 tests/generic/805.out create mode 100755 tests/generic/806 create mode 100644 tests/generic/806.out create mode 100755 tests/generic/807 create mode 100644 tests/generic/807.out create mode 100755 tests/generic/817 create mode 100644 tests/generic/817.out create mode 100755 tests/generic/818 create mode 100644 tests/generic/818.out create mode 100755 tests/generic/819 create mode 100644 tests/generic/819.out diff --git a/tests/generic/803 b/tests/generic/803 new file mode 100755 index 000..14c9e98 --- /dev/null +++ b/tests/generic/803 @@ -0,0 +1,92 @@ +#! /bin/bash +# FS QA Test No. 803 +# +# Ensure that we can reflink parts of two identical files: +# - Reflink identical parts of two identical files +# - Check that we still have identical contents +# +#--- +# Copyright (c) 2015, Oracle and/or its affiliates. All Rights Reserved. +# +# This program is free software; you can redistribute it and/or +# modify it under the terms of the GNU General Public License as +# published by the Free Software Foundation. +# +# This program is distributed in the hope that it would be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program; if not, write the Free Software Foundation, +# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA +#--- + +seq=`basename "$0"` +seqres="$RESULT_DIR/$seq" +echo "QA output created by $seq" + +here=`pwd` +tmp=/tmp/$$ +status=1# failure is the default! +trap "_cleanup; exit \$status" 0 1 2 3 15 + +_cleanup() +{ +cd / +rm -rf "$tmp".* "$TESTDIR" +} + +# get standard environment, filters and checks +. ./common/rc +. ./common/filter +. ./common/reflink + +# real QA test starts here +_supported_os Linux +_require_test_reflink + +rm -f "$seqres.full" + +TESTDIR="$TEST_DIR/test-$seq" +rm -rf "$TESTDIR" +mkdir "$TESTDIR" + +echo "Create the original files" +BLKSZ=65536 +_pwrite_byte 0x61 $((BLKSZ * 2)) $((BLKSZ * 6)) "$TESTDIR/file1" >> "$seqres.full" +_pwrite_byte 0x61 $((BLKSZ * 2)) $((BLKSZ * 6)) "$TESTDIR/file2" >> "$seqres.full" +_test_remount + +md5sum "$TESTDIR/file1" | _filter_test_dir +md5sum "$TESTDIR/file2" | _filter_test_dir + +_compare_range "$TESTDIR/file1" 0 "$TESTDIR/file2" 0 $((BLKSZ * 8)) \ + || echo "Files do not match" + +echo "Reflink the middle blocks together" +free_before="$(stat -f -c '%a' "$TESTDIR")" +_reflink_range "$TESTDIR/file1" $((BLKSZ * 4)) "$TESTDIR/file2" \ + $((BLKSZ * 4)) $((BLKSZ * 2)) >> "$seqres.full" +_test_remount +free_after="$(stat -f -c '%a' "$TESTDIR")" +echo "freesp changed by $free_before -> $free_after" >> "$seqres.full" + +echo "Compare sections" +md5sum "$TESTDIR/file1" | _filter_test_dir +md5sum "$TESTDIR/file2" | _filter_test_dir + +_compare_range "$TESTDIR/file1" 0 "$TESTDIR/file2" 0 $((BLKSZ * 4)) \ + || echo "Start sections do not match" + +_compare_range "$TESTDIR/file1" $((BLKSZ * 4)) "$TESTDIR/file2" \ + $((BLKSZ * 4)) $((BLKSZ * 2)) \ + || echo "Middle sections do not match" + +_compare_range "$TESTDIR/file1" $((BLKSZ * 6)) "$TESTDIR/file2" \ + $((BLKSZ * 6)) $((BLKSZ * 2)) \ + || echo "End sections do not match" + +# success, all done +status=0 +exit diff --git a/tests/generic/803.out b/tests/generic/803.out new file mode 100644 index 000..09099aa --- /dev/null +++ b/tests/generic/803.out @@ -0,0 +1,8 @@ +QA output created by 8
[PATCH 06/11] reflink: concurrent operations tests
Make sure that running reflink ops while other IO is ongoing doesn't break the filesystem. Signed-off-by: Darrick J. Wong --- tests/generic/821 | 97 + tests/generic/821.out |6 +++ tests/generic/822 | 97 + tests/generic/822.out |6 +++ tests/generic/823 | 93 +++ tests/generic/823.out |6 +++ tests/generic/824 | 93 +++ tests/generic/824.out |6 +++ tests/generic/825 | 105 + tests/generic/825.out |7 +++ tests/generic/826 | 105 + tests/generic/826.out |7 +++ tests/generic/827 | 95 tests/generic/827.out |7 +++ tests/generic/828 | 95 tests/generic/828.out |7 +++ tests/generic/829 | 79 + tests/generic/829.out |6 +++ tests/generic/group |9 19 files changed, 926 insertions(+) create mode 100755 tests/generic/821 create mode 100644 tests/generic/821.out create mode 100755 tests/generic/822 create mode 100644 tests/generic/822.out create mode 100755 tests/generic/823 create mode 100644 tests/generic/823.out create mode 100755 tests/generic/824 create mode 100644 tests/generic/824.out create mode 100755 tests/generic/825 create mode 100644 tests/generic/825.out create mode 100755 tests/generic/826 create mode 100644 tests/generic/826.out create mode 100755 tests/generic/827 create mode 100644 tests/generic/827.out create mode 100755 tests/generic/828 create mode 100644 tests/generic/828.out create mode 100755 tests/generic/829 create mode 100644 tests/generic/829.out diff --git a/tests/generic/821 b/tests/generic/821 new file mode 100755 index 000..d38eff7 --- /dev/null +++ b/tests/generic/821 @@ -0,0 +1,97 @@ +#! /bin/bash +# FS QA Test No. 821 +# +# Test for races or FS corruption when DIO writing to a file that's also +# the target of a reflink operation. +# +#--- +# Copyright (c) 2015 Oracle, Inc. All Rights Reserved. +# +# This program is free software; you can redistribute it and/or +# modify it under the terms of the GNU General Public License as +# published by the Free Software Foundation. +# +# This program is distributed in the hope that it would be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program; if not, write the Free Software Foundation, +# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA +#--- +# + +seq=`basename "$0"` +seqres="$RESULT_DIR/$seq" +echo "QA output created by $seq" + +here=`pwd` +tmp=/tmp/$$ +status=1 # failure is the default! +trap "_cleanup; exit \$status" 0 1 2 3 7 15 + +_cleanup() +{ +cd / +rm -rf "$tmp".* +wait +} + +# get standard environment, filters and checks +. ./common/rc +. ./common/filter +. ./common/reflink + +# real QA test starts here +_supported_os Linux +_require_scratch_reflink + +echo "Format and mount" +_scratch_mkfs > "$seqres.full" 2>&1 +_scratch_mount >> "$seqres.full" 2>&1 + +TESTDIR="$SCRATCH_MNT/test-$seq" +rm -rf "$TESTDIR" +mkdir "$TESTDIR" + +loops=1024 +nr_loops=$((loops - 1)) +BLKSZ=65536 + +echo "Initialize files" +echo > "$seqres.full" +_pwrite_byte 0x61 0 $((loops * BLKSZ)) "$TESTDIR/file1" >> "$seqres.full" +_pwrite_byte 0x62 0 $((loops * BLKSZ)) "$TESTDIR/file2" >> "$seqres.full" +_scratch_remount + +# Direct I/O overwriter... +overwrite() { + while [ ! -e "$TESTDIR/finished" ]; do + seq $nr_loops -1 0 | while read i; do + _pwrite_byte 0x63 $((i * BLKSZ)) $BLKSZ -d "$TESTDIR/file2" >> "$seqres.full" + done + done +} + +echo "Reflink and dio write the target" +overwrite & +seq 1 10 | while read j; do + seq 0 $nr_loops | while read i; do + _reflink_range "$TESTDIR/file1" $((i * BLKSZ)) \ + "$TESTDIR/file2" $((i * BLKSZ)) $BLKSZ >> "$seqres.full" + [ $? -ne 0 ] && exit + done +done +touch "$TESTDIR/finished" +wait + +echo "Check for damage" +umount "$SCRATCH_MNT" +_check_scratch_fs + +echo "Done" + +# success, all done +status=0 +exit diff --git a/tests/generic/821.out b/tests/generic/821.out new file mode 100644 index 000..ca6bc53 --- /dev/null +++ b/tests/generic/821.out @@ -0,0 +1,6 @@ +QA output created by 821 +Format and mount +Initialize files +Reflink and dio write the target +Check for dama
[PATCH 05/11] reflink: test the various fallocate modes
Check that the variants of fallocate (allocate, punch, zero range, collapse range, insert range) do the right thing when they're run against a range of reflinked blocks. Signed-off-by: Darrick J. Wong --- tests/generic/811 | 142 + tests/generic/811.out | 16 ++ tests/generic/812 | 142 + tests/generic/812.out | 19 +++ tests/generic/813 | 136 +++ tests/generic/813.out | 19 +++ tests/generic/814 | 139 tests/generic/814.out | 19 +++ tests/generic/815 | 116 tests/generic/815.out | 15 + tests/generic/816 | 136 +++ tests/generic/816.out | 19 +++ tests/generic/group |6 ++ 13 files changed, 924 insertions(+) create mode 100755 tests/generic/811 create mode 100644 tests/generic/811.out create mode 100755 tests/generic/812 create mode 100644 tests/generic/812.out create mode 100755 tests/generic/813 create mode 100644 tests/generic/813.out create mode 100755 tests/generic/814 create mode 100644 tests/generic/814.out create mode 100755 tests/generic/815 create mode 100644 tests/generic/815.out create mode 100755 tests/generic/816 create mode 100644 tests/generic/816.out diff --git a/tests/generic/811 b/tests/generic/811 new file mode 100755 index 000..7b09c05 --- /dev/null +++ b/tests/generic/811 @@ -0,0 +1,142 @@ +#! /bin/bash +# FS QA Test No. 811 +# +# Ensure that fallocate steps around reflinked ranges: +# - Reflink parts of two files together +# - Fallocate all the other sparse space. +# - Check that the reflinked areas are still there. +# +#--- +# Copyright (c) 2015, Oracle and/or its affiliates. All Rights Reserved. +# +# This program is free software; you can redistribute it and/or +# modify it under the terms of the GNU General Public License as +# published by the Free Software Foundation. +# +# This program is distributed in the hope that it would be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program; if not, write the Free Software Foundation, +# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA +#--- + +seq=`basename "$0"` +seqres="$RESULT_DIR/$seq" +echo "QA output created by $seq" + +here=`pwd` +tmp=/tmp/$$ +status=1# failure is the default! +trap "_cleanup; exit \$status" 0 1 2 3 15 + +_cleanup() +{ +cd / +rm -rf "$tmp".* "$TESTDIR" +} + +# get standard environment, filters and checks +. ./common/rc +. ./common/filter +. ./common/reflink + +# real QA test starts here +_supported_os Linux +_require_test_reflink +_require_cp_reflink +_require_xfs_io_command "falloc" +_require_xfs_io_command "truncate" + +rm -f "$seqres.full" + +TESTDIR="$TEST_DIR/test-$seq" +rm -rf "$TESTDIR" +mkdir "$TESTDIR" + +echo "Create the original files" +BLKSZ=65536 +_pwrite_byte 0x61 0 $((BLKSZ * 5 + 37)) "$TESTDIR/file1" >> "$seqres.full" + +_reflink_range "$TESTDIR/file1" $BLKSZ "$TESTDIR/file2" $BLKSZ \ + $((BLKSZ * 4 + 37)) >> "$seqres.full" + +"$XFS_IO_PROG" -f -c "truncate $((BLKSZ * 5 + 37))" "$TESTDIR/file3" >> "$seqres.full" +_reflink_range "$TESTDIR/file1" 0 "$TESTDIR/file3" 0 $BLKSZ >> "$seqres.full" + +"$XFS_IO_PROG" -f -c "truncate $((BLKSZ * 5 + 37))" "$TESTDIR/file4" >> "$seqres.full" +_reflink_range "$TESTDIR/file1" $BLKSZ "$TESTDIR/file4" $BLKSZ $BLKSZ >> "$seqres.full" +_reflink_range "$TESTDIR/file1" $((BLKSZ * 3)) "$TESTDIR/file4" $((BLKSZ * 3)) \ + $BLKSZ >> "$seqres.full" + +_cp_reflink "$TESTDIR/file1" "$TESTDIR/file5" +_test_remount + +echo "Compare sections" +md5sum "$TESTDIR/file1" | _filter_test_dir +md5sum "$TESTDIR/file2" | _filter_test_dir +md5sum "$TESTDIR/file3" | _filter_test_dir +md5sum "$TESTDIR/file4" | _filter_test_dir +md5sum "$TESTDIR/file5" | _filter_test_dir + +_compare_range "$TESTDIR/file1" $BLKSZ "$TESTDIR/file2" $BLKSZ \ + $((BLKSZ * 4 + 37)) \ + || echo "shared parts of files 1-2 changed" + +_compare_range "$TESTDIR/file1" 0 "$TESTDIR/file3" 0 $BLKSZ \ + || echo "shared parts of files 1-3 changed" + +_compare_range "$TESTDIR/file1" $BLKSZ "$TESTDIR/file4" $BLKSZ $BLKSZ \ + || echo "shared parts of files 1-4 changed" + +_compare_range "$TESTDIR/file1" 0 "$TESTDIR/file5" 0 $((BLKSZ * 5 + 37)) \ + || echo "shared parts of files 1-5 changed" + +echo "Compare files" +C1="$(_md5_checksum "$TESTDIR/file1")" +C2="$(_md5_checksum "$TESTDIR/file2")" +C3="$
[PATCH 07/11] reflink: test accuracy of free block counts
Check that the free block counts seem to be handled correctly in the reflink operation and subsequent attempts to rewrite reflinked copies. Signed-off-by: Darrick J. Wong --- tests/generic/830 | 78 ++ tests/generic/830.out |4 ++ tests/generic/831 | 98 + tests/generic/831.out |8 +++ tests/generic/832 | 103 +++ tests/generic/832.out |8 +++ tests/generic/833 | 102 +++ tests/generic/833.out |8 +++ tests/generic/834 | 110 ++ tests/generic/834.out | 11 tests/generic/835 | 114 +++ tests/generic/835.out | 11 tests/generic/836 | 129 + tests/generic/836.out | 32 tests/generic/group |7 +++ 15 files changed, 823 insertions(+) create mode 100755 tests/generic/830 create mode 100644 tests/generic/830.out create mode 100755 tests/generic/831 create mode 100644 tests/generic/831.out create mode 100755 tests/generic/832 create mode 100644 tests/generic/832.out create mode 100755 tests/generic/833 create mode 100644 tests/generic/833.out create mode 100755 tests/generic/834 create mode 100644 tests/generic/834.out create mode 100755 tests/generic/835 create mode 100644 tests/generic/835.out create mode 100755 tests/generic/836 create mode 100644 tests/generic/836.out diff --git a/tests/generic/830 b/tests/generic/830 new file mode 100755 index 000..d85bacf --- /dev/null +++ b/tests/generic/830 @@ -0,0 +1,78 @@ +#! /bin/bash +# FS QA Test No. 830 +# +# Ensure that reflinking a file N times doesn't eat a lot of blocks +# - Create a file and record fs block usage +# - Create some reflink copies +# - Compare fs block usage to before +# +#--- +# Copyright (c) 2015, Oracle and/or its affiliates. All Rights Reserved. +# +# This program is free software; you can redistribute it and/or +# modify it under the terms of the GNU General Public License as +# published by the Free Software Foundation. +# +# This program is distributed in the hope that it would be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program; if not, write the Free Software Foundation, +# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA +#--- + +seq=`basename "$0"` +seqres="$RESULT_DIR/$seq" +echo "QA output created by $seq" + +here=`pwd` +tmp=/tmp/$$ +status=1# failure is the default! +trap "_cleanup; exit \$status" 0 1 2 3 15 + +_cleanup() +{ +cd / +rm -rf "$tmp".* "$TESTDIR" +} + +# get standard environment, filters and checks +. ./common/rc +. ./common/filter +. ./common/reflink + +# real QA test starts here +_supported_os Linux +_require_test_reflink +_require_cp_reflink + +rm -f "$seqres.full" + +TESTDIR="$TEST_DIR/test-$seq" +rm -rf "$TESTDIR" +mkdir "$TESTDIR" + +echo "Create the original file blocks" +BLKSZ="$(stat -f $TESTDIR -c '%S')" +BLKS=2000 +MARGIN=100 +SZ=$((BLKSZ * BLKS)) +NR=7 +_pwrite_byte 0x61 0 $SZ "$TESTDIR/file1" >> "$seqres.full" +sync +FREE_BLOCKS0=$(stat -f "$TESTDIR" -c '%f') + +echo "Create the reflink copies" +for i in `seq 2 $NR`; do + _cp_reflink "$TESTDIR/file1" "$TESTDIR/file.$i" +done +_test_remount +FREE_BLOCKS1=$(stat -f "$TESTDIR" -c '%f') + +_within_tolerance "free blocks after reflink" $FREE_BLOCKS1 $FREE_BLOCKS0 $MARGIN -v + +# success, all done +status=0 +exit diff --git a/tests/generic/830.out b/tests/generic/830.out new file mode 100644 index 000..76e2f1d --- /dev/null +++ b/tests/generic/830.out @@ -0,0 +1,4 @@ +QA output created by 830 +Create the original file blocks +Create the reflink copies +free blocks after reflink is in range diff --git a/tests/generic/831 b/tests/generic/831 new file mode 100755 index 000..d729769 --- /dev/null +++ b/tests/generic/831 @@ -0,0 +1,98 @@ +#! /bin/bash +# FS QA Test No. 831 +# +# Ensure that deleting all copies of a file reflinked N times releases the blocks +# - Record fs block usage (0) +# - Create a file and some reflink copies +# - Record fs block usage (1) +# - Delete some copies of the file +# - Record fs block usage (2) +# - Delete all copies of the file +# - Compare fs block usage to (2), (1), and (0) +# +#--- +# Copyright (c) 2015, Oracle and/or its affiliates. All Rights Reserved. +# +# This program is free software; you can redistribute it and/or +# modify it under the terms of the GNU General Pub
[PATCH 02/11] generic/80[0-2]: support xfs in addition to btrfs
Modify the reflink tests to support xfs. Signed-off-by: Darrick J. Wong --- common/rc | 42 ++-- common/reflink| 179 + tests/btrfs/029 |1 tests/btrfs/031 |1 tests/btrfs/108 |1 tests/btrfs/109 |1 tests/generic/800 |3 + tests/generic/801 |3 + tests/generic/802 |3 + 9 files changed, 210 insertions(+), 24 deletions(-) create mode 100644 common/reflink diff --git a/common/rc b/common/rc index adf1edf..c016673 100644 --- a/common/rc +++ b/common/rc @@ -82,6 +82,27 @@ _md5_checksum() md5sum $1 | cut -d ' ' -f1 } +# Write a byte into a range of a file +_pwrite_byte() { + pattern="$1" + offset="$2" + len="$3" + file="$4" + xfs_io_args="$5" + + "$XFS_IO_PROG" $xfs_io_args -f -c "pwrite -S $pattern $offset $len" "$file" +} + +# mmap-write a byte into a range of a file +_mwrite_byte() { + pattern="$1" + offset="$2" + len="$3" + mmap_len="$4" + file="$5" + + "$XFS_IO_PROG" -f -c "mmap -rw 0 $mmap_len" -c "pwrite -S $pattern $offset $len" "$file" +} # ls -l w/ selinux sometimes puts a dot at the end: # -rwxrw-r--. id1 id2 file1 @@ -2569,12 +2590,6 @@ _require_ugid_map() fi } -_require_cp_reflink() -{ - cp --help | grep -q reflink || \ - _notrun "This test requires a cp with --reflink support." -} - _require_fssum() { FSSUM_PROG=$here/src/fssum @@ -2588,21 +2603,6 @@ _require_cloner() _notrun "cloner binary not present at $CLONER_PROG" } -# Given 2 files, verify that they have the same mapping but different -# inodes - i.e. an undisturbed reflink -# Silent if so, make noise if not -_verify_reflink() -{ - # not a hard link or symlink? - cmp -s <(stat -c '%i' $1) <(stat -c '%i' $2) \ - && echo "$1 and $2 are not reflinks: same inode number" - - # same mapping? - diff -u <($XFS_IO_PROG -c "fiemap" $1 | grep -v $1) \ - <($XFS_IO_PROG -c "fiemap" $2 | grep -v $2) \ - || echo "$1 and $2 are not reflinks: different extents" -} - _require_atime() { if [ "$FSTYP" == "nfs" ]; then diff --git a/common/reflink b/common/reflink new file mode 100644 index 000..d65816b --- /dev/null +++ b/common/reflink @@ -0,0 +1,179 @@ +##/bin/bash +# Routines for reflinking, deduping, and comparing parts of files. +#--- +# Copyright (c) 2015 Oracle. All Rights Reserved. +# This program is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation; either version 2 of the License, or +# (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program; if not, write to the Free Software +# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 +# USA +# +# Contact information: Oracle Corporation, 500 Oracle Parkway, +# Redwood Shores, CA 94065, USA, or: http://www.oracle.com/ +#--- + +# Check that cp has a reflink argument +_require_cp_reflink() +{ + cp --help | grep -q reflink || \ + _notrun "This test requires a cp with --reflink support." +} + +# Given 2 files, verify that they have the same mapping but different +# inodes - i.e. an undisturbed reflink +# Silent if so, make noise if not +_verify_reflink() +{ + # not a hard link or symlink? + cmp -s <(stat -c '%i' $1) <(stat -c '%i' $2) \ + && echo "$1 and $2 are not reflinks: same inode number" + + # same mapping? + diff -u <($XFS_IO_PROG -c "fiemap" $1 | grep -v $1) \ + <($XFS_IO_PROG -c "fiemap" $2 | grep -v $2) \ + || echo "$1 and $2 are not reflinks: different extents" +} + +# New reflink/dedupe helpers + +# this test requires the test fs support reflink... +_require_test_reflink() +{ + _require_test + _require_xfs_io_command "reflink" + + rm -rf "$TEST_DIR/file1" "$TEST_DIR/file2" + "$XFS_IO_PROG" -f -c "pwrite -S 0x61 0 65536" "$TEST_DIR/file1" > /dev/null + "$XFS_IO_PROG" -f -c "reflink $TEST_DIR/file1 0 0 65536" "$TEST_DIR/file2" > /dev/null + if [ ! -s "$TEST_DIR/file2" ]; then + rm -rf "$TEST_DIR/file1" "$TEST_DIR/file2" + _notrun "Reflink not supported by test filesystem type: $FSTYP" + fi + rm -rf "$TEST_DIR/file1" "$TEST_DIR/file2" +} + +# this test requires the scratch fs support reflink... +_requir
Re: Potential to loose data in case of disk failure
On Wed, Nov 11, 2015 at 11:30:57AM -0600, Jim Murphy wrote: > Hi all, > > What am I missing or misunderstanding? I have a newly > purchased laptop I want/need to multi boot different OSs > on. As a result after partitioning I have ended up with two > partitions on each of the two internal drives(sda3, sda8, > sdb3 and sdb8). FWIW, sda3 and sdb3 are the same size > and sda8 and sdb8 are the same size. As an end result > I want one btrfs raid1 filesystem. For lack of better terms, > sda3 and sda8 "concatenated" together, sdb3 and sdb8 > "concatenated" together and then mirroring "sda" to "sdb" > using only btrfs. So far have found no use-case to cover > this. > > If I create a raid1 btrfs volume using all 4 "devices" as I > understand it I would loose data if I were to loose a drive > because two mirror possibilities would be: > > sda3 mirrored to sda8 > sdb3 mirrored to sdb8 > > Is what I want to do possible without using MD-RAID and/or > LVM? If so would someone point me to the documentation > I missed. For whatever reason, I don't want to believe that > this can't be done. I want to believe that the code in btrfs > is smart enough to know that sda3 and sda8 are on the same > drive and would not try to mirror data between them except in > a test setup. I hope I just missed some documentation, > somewhere. > > Thanks in advance for your help. And last but not least, > thanks to all for your work on btrfs. > > Jim That's a pretty unusual setup, so I'm not surprised there's no quick and easy answer. The best solution in my opinion would be to shuffle your partitions around and combine sda3 and sda8 into a single partition. There's generally no reason to present btrfs with two different partitions on the same disk. If there's something that prevents you from doing that, you may be able to use RAID10 or RAID6 somehow. I'm not really sure, though, so I'll defer to others on the list for implementation details. --Sean -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
illegal snapshot, cannot be deleted
Hello, I use OpenSuse 13.2 on my Toshiba Satellite laptop. I noticed that I run out of disk space, checked documentation and I realized that there were many snapshots. I used Yast Snapper to delete snapshots. I noticed that one snapshot with number 748 could not be deleted. I entered terminal and after the command: snapper -c root delete 748 I got message Illegal snapshot. I woudl like to delete it since it is old one. Please find details about my system as requested on your wiki page. uname -a Linux linux-jjcc.site 3.16.7-29-desktop #1 SMP PREEMPT Fri Oct 23 00:46:04 UTC 2015 (6be6a97) i686 i686 i386 GNU/Linux btrfs --version btrfs-progs v4.0+20150429 btrfs fi show Label: none uuid: d6934db3-3ac9-49d0-83db-287be7b995a5 Total devices 1 FS bytes used 10.98GiB devid1 size 18.71GiB used 18.71GiB path /dev/sda6 btrfs fi df / Data, single: total=15.19GiB, used=10.37GiB System, DUP: total=8.00MiB, used=16.00KiB System, single: total=4.00MiB, used=0.00B Metadata, DUP: total=1.75GiB, used=622.53MiB Metadata, single: total=8.00MiB, used=0.00B GlobalReserve, single: total=208.00MiB, used=0.00B Please find attached dmesg.log as requested. Please advise what have to do in order to delete snapshot that is reported to be illegal. Thanks Vedran Vucic [0.00] Initializing cgroup subsys cpuset [0.00] Initializing cgroup subsys cpu [0.00] Initializing cgroup subsys cpuacct [0.00] Linux version 3.16.7-29-desktop (geeko@buildhost) (gcc version 4.8.3 20140627 [gcc-4_8-branch revision 212064] (SUSE Linux) ) #1 SMP PREEMPT Fri Oct 23 00:46:04 UTC 2015 (6be6a97) [0.00] e820: BIOS-provided physical RAM map: [0.00] BIOS-e820: [mem 0x-0x0009f7ff] usable [0.00] BIOS-e820: [mem 0x0009f800-0x0009] reserved [0.00] BIOS-e820: [mem 0x000dc000-0x000f] reserved [0.00] BIOS-e820: [mem 0x0010-0x5fe6] usable [0.00] BIOS-e820: [mem 0x5fe7-0x5fef] ACPI NVS [0.00] BIOS-e820: [mem 0x5ff0-0x5fff] reserved [0.00] BIOS-e820: [mem 0xfec0-0xfec0] reserved [0.00] BIOS-e820: [mem 0xfed0-0xfed003ff] reserved [0.00] BIOS-e820: [mem 0xfed14000-0xfed19fff] reserved [0.00] BIOS-e820: [mem 0xfed1c000-0xfed8] reserved [0.00] BIOS-e820: [mem 0xfee0-0xfee00fff] reserved [0.00] BIOS-e820: [mem 0xff00-0x] reserved [0.00] NX (Execute Disable) protection: active [0.00] SMBIOS 2.4 present. [0.00] DMI: TOSHIBA Satellite A200/CAPELL VALLEY(NAPA) CRB, BIOS 1.2003/24/2007 [0.00] e820: update [mem 0x-0x0fff] usable ==> reserved [0.00] e820: remove [mem 0x000a-0x000f] usable [0.00] e820: last_pfn = 0x5fe70 max_arch_pfn = 0x100 [0.00] MTRR default type: uncachable [0.00] MTRR fixed ranges enabled: [0.00] 0-9 write-back [0.00] A-B uncachable [0.00] C-C write-protect [0.00] D-D uncachable [0.00] E-F write-protect [0.00] MTRR variable ranges enabled: [0.00] 0 base 0 mask FC000 write-back [0.00] 1 base 04000 mask FE000 write-back [0.00] 2 base 05FF0 mask 0 uncachable [0.00] 3 disabled [0.00] 4 disabled [0.00] 5 disabled [0.00] 6 disabled [0.00] 7 disabled [0.00] x86 PAT enabled: cpu 0, old 0x7040600070406, new 0x7010600070106 [0.00] found SMP MP-table at [mem 0x000f71d0-0x000f71df] mapped at [c00f71d0] [0.00] Scanning 1 areas for low memory corruption [0.00] initial memory mapped: [mem 0x-0x011f] [0.00] Base memory trampoline at [c009b000] 9b000 size 16384 [0.00] init_memory_mapping: [mem 0x-0x000f] [0.00] [mem 0x-0x000f] page 4k [0.00] init_memory_mapping: [mem 0x36a0-0x36bf] [0.00] [mem 0x36a0-0x36bf] page 2M [0.00] init_memory_mapping: [mem 0x3400-0x369f] [0.00] [mem 0x3400-0x369f] page 2M [0.00] init_memory_mapping: [mem 0x0010-0x33ff] [0.00] [mem 0x0010-0x001f] page 4k [0.00] [mem 0x0020-0x33ff] page 2M [0.00] init_memory_mapping: [mem 0x36c0-0x36dfdfff] [0.00] [mem 0x36c0-0x36dfdfff] page 4k [0.00] BRK [0x00da2000, 0x00da2fff] PGTABLE [0.00] BRK [0x00da3000, 0x00da3fff] PGTABLE [0.00] BRK [0x00da4000, 0x00da4fff] PGTABLE [0.00] BRK [0x00da5000, 0x00da5fff] PGTABLE [0.00] BRK [0x00da6000, 0x00da6fff] PGTABLE [0.00] BRK [0x00da7000, 0x00da7fff] PGTABLE [0.00] RAMDISK: [mem 0x
Re: Potential to loose data in case of disk failure
On Wed, Nov 11, 2015 at 12:30 PM, Jim Murphy wrote: > Hi all, > > What am I missing or misunderstanding? I have a newly > purchased laptop I want/need to multi boot different OSs > on. As a result after partitioning I have ended up with two > partitions on each of the two internal drives(sda3, sda8, > sdb3 and sdb8). FWIW, sda3 and sdb3 are the same size > and sda8 and sdb8 are the same size. As an end result > I want one btrfs raid1 filesystem. For lack of better terms, > sda3 and sda8 "concatenated" together, sdb3 and sdb8 > "concatenated" together and then mirroring "sda" to "sdb" > using only btrfs. So far have found no use-case to cover > this. I'm going to assume that mkfs.btrfs -mraid1 -draid1 command is pointed at the two resulting /dev/mapper/X devices resulting from the linear concat. > > If I create a raid1 btrfs volume using all 4 "devices" as I > understand it I would loose data if I were to loose a drive > because two mirror possibilities would be: > > sda3 mirrored to sda8 > sdb3 mirrored to sdb8 Well you don't actually know how the mirroring will allocate is the problem with the arrangement. But yes, it's possible some chunks on sda3 will be mirrored to sda8, which is not what you'd want so the linear concat idea is fine using either the md driver or lvm. > Is what I want to do possible without using MD-RAID and/or > LVM? Yes, either are suitable for this purpose. The decision comes down to the user space tools, use the tool that you're most comfortable with. > If so would someone point me to the documentation > I missed. For whatever reason, I don't want to believe that > this can't be done. I want to believe that the code in btrfs > is smart enough to know that sda3 and sda8 are on the same > drive and would not try to mirror data between them except in > a test setup. I hope I just missed some documentation, > somewhere. As far as I know right now btrfs works strictly at the block device level, and considers different partitions different block devices, it doesn't grok the underlying physical device relationship. -- Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFCv3 00/12] xfstests: test the btrfs/xfs reflink/dedupe ioctls
On Mon, Nov 09, 2015 at 10:49:13AM -0800, Darrick J. Wong wrote: > On Sun, Nov 08, 2015 at 11:59:26PM -0800, Christoph Hellwig wrote: > > On Tue, Oct 06, 2015 at 10:12:57PM -0700, Darrick J. Wong wrote: > > > * I don't have any interesting NFS/CIFS setups for test. :( > > > > I have a banrch with client and server support for NFSv4.2 CLONE > > support: > > > > http://git.infradead.org/users/hch/pnfs.git/shortlog/refs/heads/reflink+clone > > > > For now you want to use btrfs on the server, as using reflinks on XFS > > seems to be a little unstable over NFS. > > I found a few more bugs in the kernel-side implementation, which might explain > that. I'm about to start working on making CoW less crappy, but I'll push all > the patches out to github. (I wasn't planning on patchbombing again until > December.) > > > > If you're going to start using this mess, you probably ought to just > > > pull from my github trees for kernel[1], xfsprogs[2], and xfstests[3]. > > > They should just work with the btrfs that's in 4.3. > > > > > > Comments and questions are, as always, welcome. > > > > Any reason the groups are called clone? I don't really have an opinion > > on clone vs reflink but given that the xfs_io command is reflink I'd > > rather be consistent. > > The existing btrfs reflink tests were tagged in the 'clone' group prior to my > patchset. > > > Otherwise I'd say get it merged ASAP, we can still fix up various > > details later. > > I'll merge your patch and repost the whole pile of tests. I'm almost ready to > send a pile of updates for the XFS on-disk structure document which add stuff > about the v5 format, rmapbt, and reflink. Darrick, can you renumber the xfstests against what is currently at the head of the repo? If both you an Christoph need them working, you may as well both patch against the main xfstests repo... Cheers, Dave. -- Dave Chinner da...@fromorbit.com -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT PULL] Fujitsu for 4.4
On Mon, Nov 09, 2015 at 06:27:26PM +0800, Zhao Lei wrote: > Hi Chris, > > This is collection of some bug fix and cleanup from fujitsu against btrfs in > v4.3, > the main patch is these 2: > Fix lost-data-profile caused by auto removing bg > Fix lost-data-profile caused by balance bg > It can solve the problem of 'lost all data profile by balance and > auto-remove-bg', > discussed in btrfs maillist. > > Plus some enhancement and cleanup in scrub. > > Would you please consider merging the following fixes to integration-4.4 > branch? Thanks, I have these all queued up. I verified the patches and cherry-picked them, since Linus doesn't like to see github pulls, but I do appreciate having them all in one place. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 00/15] btrfs: Hot spare and Auto replace
Hi Anand, Nice work. But I have some small questions about it. Anand Jain wrote on 2015/11/09 18:56 +0800: These set of patches provides btrfs hot spare and auto replace support for you review and comments. First, here below are the simple example steps to configure the same: Add a spare device: btrfs spare add /dev/sde -f I'm sorry but I didn't quite see the benefit of a spare device. Let's take the following example: 1) 2 RAID1 + 1 spare (A + B) + C 2) 3 RAID1 (A + B + C) Let's assume they are all 12G size, and there are 3 raid1 chunks. Each one is 3G size. In my understanding, in normal operation case: For case 1), all raid chunks should only be allocated into 2 RAID disks, and spare one should contains no raid1 chunks. A B C -- -- -- |free| |free| |free| -- -- || |3Ga1| |3Ga2| || -- -- || |3Gb1| |3Gb2| || -- -- || |3Gc1| |3Gc2| || -- -- -- For case 2), all raid1 chunks will be allocated into all 3 disks, making the allocation more fair. A B C -- -- -- |free| |free| |free| -- -- -- |free| |free| |free| -- -- -- |3Gb2| |3Ga1| |3Ga2| -- -- -- |3Gc1| |3Gc2| |3Gb1| -- -- -- At least in normal operation case, case 1) makes device C useless, and reduce the total usable space. In disk B failure case: For case 1), we can auto replace B with C. And it will copy all data chunks from A to C. Need to copy 9G data. And after replace: A B C -- -- -- |free| | X | |free| -- -- -- |3Ga1| | X |->|3Ga2| -- -- -- |3Gb1| | X |->|3Gb2| -- -- -- |3Gc1| | X |->|3Gc2| -- -- -- For case 2), we can just relocate and recover the bad chunks in B. It it should only need to copy 6G data. And after the "recovery", it should be much the same as case 1): A B C -- -- -- |free| | X | |free| -- -- -- |3Ga1|<\| X |/>|3Gc1| -- -- -- |3Gb2| || X |/ |3Ga2| -- -- -- |3Gc1| \| X | |3Gb1| -- -- -- IIRC, the only benefit of a spare device is, we can ensure there is enough space for a device place.(If the failing one is no larger than spare). But the cost is, increase in replace data copy and unfair chunk allocation. So I am not sure if the cost is good enough for the case. At least, enhancing the chunk relocation to fulfill the case 2) will bring a much smaller code base. Thanks, Qu OR if there is a spare device which is already added before the, just run btrfs dev scan [/dev/sde] this will register the spare device to the kernel. btrfs fi show Label: none uuid: 52f170c1-725c-457d-8cfd-d57090460091 Total devices 2 FS bytes used 112.00KiB devid1 size 2.00GiB used 417.50MiB path /dev/sdc devid2 size 2.00GiB used 417.50MiB path /dev/sdd Global spare device size 3.00GiB path /dev/sde Thats it. Auto replace: Replace happens automatically, that is when there is any write failed or flush failed, the device will be marked as failed, which will stop any further IO attempt to that device. And in the next commit thread cycle the auto replace will pick the spare device (/dev/sde is above example) to replace the failed device. And so the btrfs volume is back to a healthy state. Its btrfs Global spare: as of now only global hot spare is supported, that is hot spare(s) are for all the btrfs FS in the system. No spare when device failed: It would scan for spare device at the rate of transaction commit and will trigger the auto replace when ever spare device is added. Priority: In some future work there can be some chronological order to pick a spare and the failed device. Patches: Kernel: First, it needs, Qu's per chunk missing device patchset, which is part of the set here and also there is a light optimization (patch 5/15) which was required as part of this enhancement. Next patches 7,8/15 brings in support, to manage the transition of devices from online (no state) to offline OR failed state dynamically. On top of static device state like the current "missing" state. Patch 9/15 fixes a bug where in we should have blocked the incompatible feature at the device scan/add level instead/also at in the mount level. This is because we don't have to bring a device into the device list, if it is incompatible. Next patches 10,11,12,13/15 adds support for Spare device. For the details on how to add a spare device kindly see further below. For kernel with out spare feature supported the spare device is kept away. And when the kernel supports the spare device, it will inhibit from mounting it. Further these patch set provides helper function to pick a spare device and release a spare device back to the spare device pool. Patch 14/15 provides functio
Re: How to properly and efficiently balance RAID6 after more drives are added?
Christian Rohmann posted on Wed, 11 Nov 2015 15:17:19 +0100 as excerpted: > Sorry for the late reply to this list regarding this topic ... > > On 09/04/2015 01:04 PM, Duncan wrote: >> And of course, only with 4.1 (nominally 3.19 but there were initial >> problems) was raid6 mode fully code-complete and functional -- before >> that, runtime worked, it calculated and wrote the parity stripes as it >> should, but the code to recover from problems wasn't complete, so you >> were effectively running a slow raid0 in terms of recovery ability, but >> one that got "magically" updated to raid6 once the recovery code was >> actually there and working. > > As other who write to this ML, I run into crashes when trying to do a > balance of my filesystem. > I moved through the different kernel versions and btrfs-tools and am > currently running Kernel 4.3 + 4.3rc1 of the tools but still after like > an hour of balancing (and actually moving chunks) the machine crashes > horribly without giving any good stack trace or anything in the kernel > log which I could report here :( > > Any ideas on how I could proceed to get some usable debug info for the > devs to look at? I'm not a dev so my view into the real deep technical side is limited, but what I can say is this... Generally, crashes during balance indicate not so much bugs in the way the kernel handles existing balance (tho those sometimes occur as well, but the chances are relatively lower), but rather, a filesystem screwed up in a way that balance hasn't been taught to deal with yet. Of course there's two immediate points that can be made from that: 1) Newer kernels have been taught to deal with more bugs, so if you're not on current (which you are now), consider upgrading to current at least long enough to see if it already knows how to deal with it. 2) If a balance is crashing with a particular kernel, it's unlikely the problem will simply go away on its own, without either a kernel upgrade to one that knows how to deal with that problem, or in some cases, a filesystem change that unpins whatever was bad and lets it be deleted. Filesystem changes likely to do that sort of thing are removing your oldest snapshots, thereby freeing anything that had changed in newer snapshots and the working version, that was still being pinned by the old snapshots, or in the absence of snapshot pinning, removal of whatever often large possibly repeatedly edited file happened to be locking down whatever balance was choking on. Another point (based on a different factor) can be added in addition: 3) Raid56 mode is still relatively new, and it seems a number of users of the raid56 mode feature seem to be reporting what appears to me at least (considering my read of tracedumps is extremely limited) to be the same sort of balance bug, often with the same couldn't-get-a-trace pattern. This very likely indicates a remaining bug embedded deeply enough in the raid56 code that it has taken until now to trigger enough times to even begin to appear on the radar. Of course the fact that it so often no- traces doesn't help finding it, but the reports are getting common enough that at least to the informed non-dev list regular like me, there does seem to be a pattern emerging. This is a bit worrying, but it's /exactly/ the reason that I had suggested that people wait for at least two entirely "clean" kernel cycles without raid56 bugs before considering it as stable as is the rest of btrfs, and predicted that would likely be at least five kernel cycles (a year) after initial nominal-full-code release, putting it at 4.4 at the earliest. Since the last big raid56 bug was fixed fairly early in the 4.1 cycle, two clean series would be 4.2 and 4.3, which would again point to 4.4. But we now have this late-appearing bug just coming up on the radar, which if it does indeed end up raid56 related, both validates my earlier caution, and at least conservatively speaking, should reset that two-clean-kernel-cycles clock. However, given that the feature in general has been maturing in the mean time, I'd say reset it with only one clean kernel cycle this time, so again assuming the problem is indeed found to be in raid56 and that it's fixed before 4.4 release, I'd want 4.5 to be raid56 uneventful, and would then consider 4.6 raid56 maturity/ stability-comparable to btrfs in general, assuming no further raid56 bugs have appeared by its release. As to ideas for getting a trace, the best I can do is repeat what I've seen others suggest here, that will obviously take a bit more resources than some have available but that apparently has the best chance of working if it can be done in such instances, that being... Configure the test machine with a network-attached tty, and set it as your system console, so debug traces will dump to it. The kernel will try its best to dump traces to system-console as it considers that safe even after it considers itself too sc
Re: Potential to loose data in case of disk failure
Chris Murphy posted on Wed, 11 Nov 2015 18:13:22 -0500 as excerpted: > On Wed, Nov 11, 2015 at 12:30 PM, Jim Murphy > wrote: >> Hi all, >> >> What am I missing or misunderstanding? I have a newly purchased laptop >> I want/need to multi boot different OSs on. As a result after >> partitioning I have ended up with two partitions on each of the two >> internal drives(sda3, sda8, sdb3 and sdb8). FWIW, sda3 and sdb3 are >> the same size and sda8 and sdb8 are the same size. As an end result I >> want one btrfs raid1 filesystem. For lack of better terms, >> sda3 and sda8 "concatenated" together, sdb3 and sdb8 "concatenated" >> together and then mirroring "sda" to "sdb" using only btrfs. So far >> have found no use-case to cover this. There isn't any... using ONLY btrfs (as the OP specified). You need either mdraid or lvm to concatenate the two logical devices (partitions) on a single physical device into one, so then btrfs will see only two devices and put a raid1 copy on each. This is because (reordering a bit of your quote from further below)... > btrfs works strictly at the block device level, and considers different > partitions different block devices, it doesn't grok the underlying > physical device relationship. [end of reordered bit] > I'm going to assume that mkfs.btrfs -mraid1 -draid1 command is pointed > at the two resulting /dev/mapper/X devices resulting from the linear > concat. Except, under the conditions he specified, there will be no such linear concat mapper device available. >> If I create a raid1 btrfs volume using all 4 "devices" as I understand >> it I would loose data if I were to loose a drive because two mirror >> possibilities would be: >> >> sda3 mirrored to sda8 sdb3 mirrored to sdb8 > > Well you don't actually know how the mirroring will allocate is the > problem with the arrangement. But yes, it's possible some chunks on sda3 > will be mirrored to sda8, which is not what you'd want so the linear > concat idea is fine using either the md driver or lvm. > >> Is what I want to do possible without using MD-RAID and/or LVM? > > Yes, either are suitable for this purpose. The decision comes down to > the user space tools, use the tool that you're most comfortable with. It's not possible /without/ using them, no. Using them, yes, but that's not the question that was asked. As for the explanation, that's the part that I reordered above, btrfs only sees block devices. It doesn't know nor care what they're from. One workaround would be as Sean Greenslade's, using either a partitioning tool that can safely move partitions around without destroying the data in them, or simply copying off to backup, deleting the partitions and recreating them in a more workable layout, then restoring from backup to the new partitions, combining the two partitions on each physical device into one. Another workaround would be putting btrfs on top of an mdraid or lvm setup as above, thereby using software to overcome the hardware layout limitations. Yet another, the one I'd almost certainly use here unless the use-case made it too inconvenient or not even possible (one big file too big for either one alone), would be to simply create two entirely separate btrfs raid1 filesystems, each one composed of the two partitions of similar size. I'm already a strong booster of using partitioning to avoid putting all my data eggs in one basket, and already use multiple separate btrfs raid1s on partitions of the same two physical devices, here, so this wouldn't even be beyond what I'm already doing, here. And if the OP's already dealing with that many partitions, it sounds like he'd be able to handle it fairly well too. After all there's always symlinks and bind-mounts available to make parts of one filesystem available in arbitrary locations on another, if the location of the directories themselves is hard-coded to such an extent that you can't simply move them and point everything that was pointed at the old location to the new one, instead. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 00/15] btrfs: Hot spare and Auto replace
Qu Wenruo posted on Thu, 12 Nov 2015 10:15:09 +0800 as excerpted: > Anand Jain wrote on 2015/11/09 18:56 +0800: >> These set of patches provides btrfs hot spare and auto replace support >> for you review and comments. >> >> First, here below are the simple example steps to configure the same: >> >> Add a spare device: >> btrfs spare add /dev/sde -f > > I'm sorry but I didn't quite see the benefit of a spare device. You could ask the mdraid folks much the same question about spares there, and the answer would I think be very much the same... I'll just present a couple points of the several that can be made. Perhaps the biggest point for this particular case... What you're forgetting is that the work here introduces the _global_ spare -- one spare device (or pool of devices) for the whole set of btrfs, no matter how many independent btrfs there happen to be on a machine. Your example used just one filesystem, in which case this point is null and void, but what of the case where there's two? You can't have the same device be part of *both* filesystems. What if the device is part of btrfs A, but btrfs b is the one that loses a device? In your example, you're out of luck. But as a global spare, the "extra" device doesn't become attached to a specific btrfs until one of the existing devices goes bad. With working global spares, the first btrfs to have a bad device will see the spare and be able to grab it, no matter which of the two (or 10 or 100) separate btrfs it happens to be, as it's a _global_ spare, not actually attached to a specific btrfs until it is needed as a replacement. By extension, there's the spare _pool_. Suppose you have three separate btrfs and three separate "extra" devices. You can attach one to each btrfs and be fine... if the existing devices all play nice and a second one doesn't go out on any of them until all three have had one device go out. But what happens if one btrfs gets some real heavy unexpected use and loses three devices before the other two btrfs lose any? With global spares, the unlucky btrfs can call for one at a time, and assuming there's time for it to fully integrate before the next one dies, it can call for the next and the next, and get all three, one at a time, without the admin having to worry about manually device deleting the second and third devices from their other btrfs, to attach to the unlucky/greedy one. And that three btrfs, three-device global-spare-pool scenario, with an unlucky/greedy btrfs ending up getting all three spares, brings up a second point... In that scenario without global hot-spares, say you've added one more device to what ends up the unlucky btrfs than it'd need, so with auto- repair it can detect a failing device and automatically device-delete it down to its device-minimum (either due to raid level or due to capacity). Now another device fails. Oops! Can't auto-repair now! But in the global hot-spare-pool scenario, with one repair done, there's still two spares in the pool, so at the second device failure, it can automatically pull a second from the pool (where given the pool it can be instead of already attached to one of the other btrfs') and complete the second repair, still without admin intervention. Same again for the third. So an admin who doesn't want to have to intervene when he's supposedly on vacation can setup a queue of spares, and sure, if he's a good admin, when a device goes bad and a spare is automatically pulled in to replace it, he'll be notified, and he'll probably login to check logs and see what happened, but no problem, there's still others in the queue. In fact, since the common folk wisdom says this sort of bad event (someone you know getting a disease like cancer or dying, devices in your machines going bad, friends having their significant others leave them... at least here in the US, folk wisdom says it always happens in threes, so particularly once two happen, people start wondering who/where the third one is going to occur) happens in threes, a somewhat superstitious admin could ensure he had four, well, he's cautious too, so make it five, global spares setup, just in case. Then it wouldn't matter if the three devices going bad were all on the same btrfs, or one each on the three, or two on one and a third elsewhere, he'd still have two additional devices in the pool, just to cover his a** if the three /did/ go out. Now about time he loses a fourth, he better be on the phone confirming a ticket home, but even then, he still has the one still in the pool, as he was cautious, too, hopefully giving him time to actually /make/ it home before two more go out leaving the pool empty and a btrfs a device down. And if he's /that/ unlucky, well, maybe he better make a call to his lawyer confirming his last will and testament before he steps on that plane, too. =:^( Just a short mention of a third point, too. Devices in the pool p