RE: trim not working and irreparable errors from btrfsck
-Original Message- From: linux-btrfs-ow...@vger.kernel.org [mailto:linux-btrfs- ow...@vger.kernel.org] On Behalf Of Marc Joliet Sent: Friday, 14 August 2015 6:06 PM To: linux-btrfs@vger.kernel.org Subject: Re: trim not working and irreparable errors from btrfsck Am Thu, 13 Aug 2015 17:14:36 -0600 schrieb Chris Murphy li...@colorremedies.com: Right now I think there's no status because a.) no bug report and b.) not enough information. I was mainly asking because apparently there *is* a patch that helps some people affected by this, but nobody ever commented on it. Perhaps there's a reason for that, but I found it curious. (I see now that it was submitted in early January, in the thread [PATCH V2] Btrfs: really fix trim 0 bytes after a device delete.) I can open a bug (I mean, that's part of being a user of btrfs at this stage), I'm just surprised that nobody else has. I have to use that patch on one of my systems. I just assumed it was never merged because it wasn't quite ready yet. It seems to work fine for me though. Paul. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v5 1/3] xfstests: btrfs: add functions to create dm-error device
On Fri, Aug 14, 2015 at 06:47:02PM +0800, Anand Jain wrote: From: Anand Jain anand.j...@oracle.com Controlled EIO from the device is achieved using the dm device. Helper functions are at common/dmerror. Broadly steps will include calling _init_dmerror(). _init_dmerror() will use SCRATCH_DEV to create dm linear device and assign DMERROR_DEV to /dev/mapper/error-test. When test script is ready to get EIO, the test cases can call _load_dmerror_table() which then it will load the dm error. so that reading DMERROR_DEV will cause EIO. After the test case is complete, cleanup must be done by calling _cleanup_dmerror(). Signed-off-by: Anand Jain anand.j...@oracle.com Reviewed-by: Filipe Manana fdman...@suse.com --- v4-v5: No Change. keep up with the patch set v3-v4: rebase on latest xfstests code v2.1-v3: accepts Filipe Manana's review comments, thanks v2-v2.1: fixed missed typo error fixup in the commit. v1-v2: accepts Dave Chinner's review comments, thanks common/dmerror | 69 ++ common/rc | 9 2 files changed, 78 insertions(+) create mode 100644 common/dmerror diff --git a/common/dmerror b/common/dmerror new file mode 100644 index 000..f895d90 --- /dev/null +++ b/common/dmerror @@ -0,0 +1,69 @@ +##/bin/bash +# +# Copyright (c) 2015 Oracle. All Rights Reserved. +# +# This program is free software; you can redistribute it and/or +# modify it under the terms of the GNU General Public License as +# published by the Free Software Foundation. +# +# This program is distributed in the hope that it would be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program; if not, write the Free Software Foundation, +# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA +# +# +# common functions for setting up and tearing down a dmerror device + +_init_dmerror() +{ + $DMSETUP_PROG remove error-test /dev/null 21 + + local BLK_DEV_SIZE=`blockdev --getsz $SCRATCH_DEV` + + DMERROR_DEV='/dev/mapper/error-test' + + DMLINEAR_TABLE=0 $BLK_DEV_SIZE linear $SCRATCH_DEV 0 + + $DMSETUP_PROG create error-test --table $DMLINEAR_TABLE || \ + _fatal failed to create dm linear device + + DMERROR_TABLE=0 $BLK_DEV_SIZE error $SCRATCH_DEV 0 +} + +_scratch_mkfs_dmerror() +{ + $MKFS_BTRFS_PROG $* $DMERROR_DEV $seqres.full 21 || \ + _fatal failed to create mkfs.btrfs $* $DMERROR_DEV I didn't follow previous reviews, please correct me if I miss anything. Is dmerror only for btrfs testing? I saw $MKFS_BTRFS_PROG here. And do we need $MKFS_OPTIONS too? +} + +_mount_dmerror() +{ + mount -t $FSTYP $MOUNT_OPTIONS $DMERROR_DEV $SCRATCH_MNT $MOUNT_PROG ? +} + +_unmount_dmerror() +{ + $UMOUNT_PROGS $SCRATCH_MNT $UMOUNT_PROG, no S at the end. Thanks, Eryu +} + +_cleanup_dmerror() +{ + $UMOUNT_PROG $SCRATCH_MNT /dev/null 21 + $DMSETUP_PROG remove error-test /dev/null 21 +} + +_load_dmerror_table() +{ + $DMSETUP_PROG suspend error-test + [ $? -ne 0 ] _fatal failed to suspend error-test + + $DMSETUP_PROG load error-test --table $DMERROR_TABLE + [ $? -ne 0 ] _fatal failed to load error table error-test + + $DMSETUP_PROG resume error-test + [ $? -ne 0 ] _fatal failed to resume error-test +} diff --git a/common/rc b/common/rc index 70d2fa8..8d4da0e 100644 --- a/common/rc +++ b/common/rc @@ -1337,6 +1337,15 @@ _require_sane_bdev_flush() fi } +# this test requires the device mapper error target +# +_require_dmerror() +{ + _require_command $DMSETUP_PROG dmsetup + $DMSETUP_PROG targets | grep error /dev/null 21 + [ $? -ne 0 ] _notrun This test requires dm error support +} + # this test requires the device mapper flakey target # _require_dm_flakey() -- 2.4.1 -- To unsubscribe from this list: send the line unsubscribe fstests in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v5 2/3] xfstests: btrfs: test device replace, with EIO on the src dev
On Fri, Aug 14, 2015 at 06:47:03PM +0800, Anand Jain wrote: From: Anand Jain anand.j...@oracle.com This test case will test to confirm the replace works with the failed (EIO) replacing source device. EIO condition is achieved using the DM device. Signed-off-by: Anand Jain anand.j...@oracle.com Reviewed-by: Filipe Manana fdman...@suse.com --- v4-v5: rebase on latest xfstests code and accepts Filipe comment v3-v4: rebase on latest xfstests code v2-v3: accepts Filipe Manana's review comments, thanks v1-v2: accepts Dave Chinner's review comments, thanks tests/btrfs/098 | 81 + tests/btrfs/098.out | 11 tests/btrfs/group | 1 + 3 files changed, 93 insertions(+) create mode 100755 tests/btrfs/098 create mode 100644 tests/btrfs/098.out diff --git a/tests/btrfs/098 b/tests/btrfs/098 new file mode 100755 index 000..afb41d1 --- /dev/null +++ b/tests/btrfs/098 @@ -0,0 +1,81 @@ +#! /bin/bash +# FS QA Test No. btrfs/098 +# +#test device replace works when the source device has EIO Nitpick here, need a space after # :) +# +# Copyright (c) 2015 Oracle. All Rights Reserved. +# +# This program is free software; you can redistribute it and/or +# modify it under the terms of the GNU General Public License as +# published by the Free Software Foundation. +# +# This program is distributed in the hope that it would be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program; if not, write the Free Software Foundation, +# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA +# + +seq=`basename $0` +seqres=$RESULT_DIR/$seq +echo QA output created by $seq + +here=`pwd` +tmp=/tmp/$$ + +status=1 # failure is the default! +trap _cleanup; exit \$status 0 1 2 3 15 + + +_cleanup() +{ + _cleanup_dmerror + rm -f $tmp should be rm -f $tmp.* as many functions in common/rc and check create tmp files like $tmp.xxx Thanks, Eryu +} + +# get standard environment, filters and checks +. ./common/rc +. ./common/filter +. ./common/filter.btrfs +. ./common/dmerror + +_supported_fs btrfs +_supported_os Linux +_need_to_be_root +_require_scratch_dev_pool 3 +_require_dmerror + +rm -f $seqres.full + +dev1=`echo $SCRATCH_DEV_POOL | $AWK_PROG '{print $2}'` +dev2=`echo $SCRATCH_DEV_POOL | $AWK_PROG '{print $3}'` + +_init_dmerror +_scratch_mkfs_dmerror -f -d raid1 -m raid1 $dev1 +_mount_dmerror + +_run_btrfs_util_prog filesystem show -m $SCRATCH_MNT +$BTRFS_UTIL_PROG filesystem show -m $SCRATCH_MNT | _filter_btrfs_filesystem_show + +error_devid=`$BTRFS_UTIL_PROG filesystem show -m $SCRATCH_MNT |\ + egrep $DMERROR_DEV | $AWK_PROG '{print $2}'` + +snapshot_cmd=$BTRFS_UTIL_PROG subvolume snapshot -r $SCRATCH_MNT +snapshot_cmd=$snapshot_cmd $SCRATCH_MNT/snap_\`date +'%H_%M_%S_%N'\` +run_check $FSSTRESS_PROG -d $SCRATCH_MNT -n 200 -p 8 $FSSTRESS_AVOID -x \ + $snapshot_cmd -X 50 /dev/null + +# now load the error into the DMERROR_DEV +_load_dmerror_table + +_run_btrfs_util_prog replace start -B $error_devid $dev2 $SCRATCH_MNT + +_run_btrfs_util_prog filesystem show -m $SCRATCH_MNT +$BTRFS_UTIL_PROG filesystem show -m $SCRATCH_MNT | _filter_btrfs_filesystem_show + +echo === device replace completed + +status=0; exit diff --git a/tests/btrfs/098.out b/tests/btrfs/098.out new file mode 100644 index 000..eb2f87f --- /dev/null +++ b/tests/btrfs/098.out @@ -0,0 +1,11 @@ +QA output created by 098 +Label: none uuid: UUID + Total devices NUM FS bytes used SIZE + devid DEVID size SIZE used SIZE path SCRATCH_DEV + devid DEVID size SIZE used SIZE path /dev/mapper/error-test + +Label: none uuid: UUID + Total devices NUM FS bytes used SIZE + devid DEVID size SIZE used SIZE path SCRATCH_DEV + +=== device replace completed diff --git a/tests/btrfs/group b/tests/btrfs/group index e13865a..c8a53b5 100644 --- a/tests/btrfs/group +++ b/tests/btrfs/group @@ -100,3 +100,4 @@ 095 auto quick metadata 096 auto quick clone 097 auto quick send clone +098 auto quick replace -- 2.4.1 -- To unsubscribe from this list: send the line unsubscribe fstests in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v5 3/3] xfstests: btrfs: test device delete with EIO on src dev
On Fri, Aug 14, 2015 at 06:47:04PM +0800, Anand Jain wrote: From: Anand Jain anand.j...@oracle.com This test case tests if the device delete works with the failed (EIO) source device. EIO errors are achieved usign the DM device. This test would need following btrfs-progs and btrfs kernel patch btrfs-progs: device delete to accept devid Btrfs: device delete by devid However when btrfs-progs patch is not found this test will not run, and when kernel patch is not found btrfs-progs will fail gracefully and thus the test script. Signed-off-by: Anand Jain anand.j...@oracle.com --- v4-v5: rebase on latest xfstests code, and accepts Filipe comment v3-v4: rebase on latest xfstests code v2-v3: accepts Filipe Manana's review comments, thanks v1-v2: accepts Dave Chinner's review comments, thanks common/rc | 7 + tests/btrfs/099 | 82 + tests/btrfs/099.out | 11 +++ tests/btrfs/group | 1 + 4 files changed, 101 insertions(+) create mode 100755 tests/btrfs/099 create mode 100644 tests/btrfs/099.out diff --git a/common/rc b/common/rc index 8d4da0e..31a0328 100644 --- a/common/rc +++ b/common/rc @@ -2737,6 +2737,13 @@ _require_meta_uuid() umount $SCRATCH_MNT } +_require_btrfs_dev_del_by_devid() +{ + $BTRFS_UTIL_PROG device delete --help | egrep devid /dev/null 21 + [ $? -eq 0 ] || _notrun $BTRFS_UTIL_PROG too old \ + (must support 'btrfs device delete devid /mnt') +} + _get_total_inode() { if [ -z $1 ]; then diff --git a/tests/btrfs/099 b/tests/btrfs/099 new file mode 100755 index 000..4464e24 --- /dev/null +++ b/tests/btrfs/099 @@ -0,0 +1,82 @@ +#! /bin/bash +# FS QA Test No. btrfs/099 +# +# test device delete when the source device has EIO +# +# Copyright (c) 2015 Oracle. All Rights Reserved. +# +# This program is free software; you can redistribute it and/or +# modify it under the terms of the GNU General Public License as +# published by the Free Software Foundation. +# +# This program is distributed in the hope that it would be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program; if not, write the Free Software Foundation, +# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA +# + +seq=`basename $0` +seqres=$RESULT_DIR/$seq +echo QA output created by $seq + +here=`pwd` +tmp=/tmp/$$ + +status=1 # failure is the default! +trap _cleanup; exit \$status 0 1 2 3 15 + + +_cleanup() +{ + _cleanup_dmerror + rm -f $tmp And here too, rm -f $tmp.* Thanks, Eryu -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v5 2/3] xfstests: btrfs: test device replace, with EIO on the src dev
On Fri, Aug 14, 2015 at 11:47 AM, Anand Jain anand.j...@oracle.com wrote: From: Anand Jain anand.j...@oracle.com This test case will test to confirm the replace works with the failed (EIO) replacing source device. EIO condition is achieved using the DM device. Signed-off-by: Anand Jain anand.j...@oracle.com Reviewed-by: Filipe Manana fdman...@suse.com --- v4-v5: rebase on latest xfstests code and accepts Filipe comment v3-v4: rebase on latest xfstests code v2-v3: accepts Filipe Manana's review comments, thanks v1-v2: accepts Dave Chinner's review comments, thanks tests/btrfs/098 | 81 + tests/btrfs/098.out | 11 tests/btrfs/group | 1 + 3 files changed, 93 insertions(+) create mode 100755 tests/btrfs/098 create mode 100644 tests/btrfs/098.out diff --git a/tests/btrfs/098 b/tests/btrfs/098 new file mode 100755 index 000..afb41d1 --- /dev/null +++ b/tests/btrfs/098 @@ -0,0 +1,81 @@ +#! /bin/bash +# FS QA Test No. btrfs/098 +# +#test device replace works when the source device has EIO +# +# Copyright (c) 2015 Oracle. All Rights Reserved. +# +# This program is free software; you can redistribute it and/or +# modify it under the terms of the GNU General Public License as +# published by the Free Software Foundation. +# +# This program is distributed in the hope that it would be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program; if not, write the Free Software Foundation, +# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA +# + +seq=`basename $0` +seqres=$RESULT_DIR/$seq +echo QA output created by $seq + +here=`pwd` +tmp=/tmp/$$ + +status=1 # failure is the default! +trap _cleanup; exit \$status 0 1 2 3 15 + + +_cleanup() +{ + _cleanup_dmerror + rm -f $tmp +} + +# get standard environment, filters and checks +. ./common/rc +. ./common/filter +. ./common/filter.btrfs +. ./common/dmerror + +_supported_fs btrfs +_supported_os Linux +_need_to_be_root +_require_scratch_dev_pool 3 +_require_dmerror + +rm -f $seqres.full + +dev1=`echo $SCRATCH_DEV_POOL | $AWK_PROG '{print $2}'` +dev2=`echo $SCRATCH_DEV_POOL | $AWK_PROG '{print $3}'` + +_init_dmerror +_scratch_mkfs_dmerror -f -d raid1 -m raid1 $dev1 +_mount_dmerror + +_run_btrfs_util_prog filesystem show -m $SCRATCH_MNT +$BTRFS_UTIL_PROG filesystem show -m $SCRATCH_MNT | _filter_btrfs_filesystem_show + +error_devid=`$BTRFS_UTIL_PROG filesystem show -m $SCRATCH_MNT |\ + egrep $DMERROR_DEV | $AWK_PROG '{print $2}'` + +snapshot_cmd=$BTRFS_UTIL_PROG subvolume snapshot -r $SCRATCH_MNT +snapshot_cmd=$snapshot_cmd $SCRATCH_MNT/snap_\`date +'%H_%M_%S_%N'\` +run_check $FSSTRESS_PROG -d $SCRATCH_MNT -n 200 -p 8 $FSSTRESS_AVOID -x \ + $snapshot_cmd -X 50 /dev/null Sorry missed this before, but you don't need to redirect stdout/stderr to /dev/null. run_check redirects them to $seqres.full where it's actually useful - when we have the test failing, we can check $seqres.full to see what seed fsstress used (fsstress prints it to stdout/stderr). That's for the case where it's failing only for some seeds of course. Same observation applies to the other test/patch. Thanks. + +# now load the error into the DMERROR_DEV +_load_dmerror_table + +_run_btrfs_util_prog replace start -B $error_devid $dev2 $SCRATCH_MNT + +_run_btrfs_util_prog filesystem show -m $SCRATCH_MNT +$BTRFS_UTIL_PROG filesystem show -m $SCRATCH_MNT | _filter_btrfs_filesystem_show + +echo === device replace completed + +status=0; exit diff --git a/tests/btrfs/098.out b/tests/btrfs/098.out new file mode 100644 index 000..eb2f87f --- /dev/null +++ b/tests/btrfs/098.out @@ -0,0 +1,11 @@ +QA output created by 098 +Label: none uuid: UUID + Total devices NUM FS bytes used SIZE + devid DEVID size SIZE used SIZE path SCRATCH_DEV + devid DEVID size SIZE used SIZE path /dev/mapper/error-test + +Label: none uuid: UUID + Total devices NUM FS bytes used SIZE + devid DEVID size SIZE used SIZE path SCRATCH_DEV + +=== device replace completed diff --git a/tests/btrfs/group b/tests/btrfs/group index e13865a..c8a53b5 100644 --- a/tests/btrfs/group +++ b/tests/btrfs/group @@ -100,3 +100,4 @@ 095 auto quick metadata 096 auto quick clone 097 auto quick send clone +098 auto quick replace -- 2.4.1 -- To unsubscribe from this list: send the line unsubscribe fstests in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- Filipe David Manana, Reasonable men adapt themselves to the
Re: RAID0 wrong (raw) device?
On 2015-08-13 19:29, Gareth Pye wrote: I would have been surprised if any generic file system copes well with being mounted in several locations at once, DRBD appears to fight really hard to avoid that happening :) And yeah I'm doing the second thing, I've successfully switched which of the servers is active a few times with no ill effect (I would expect scrub to give me some significant warnings if one of the disks was a couple of months out of date) so I'm presuming that DRBD copes reasonably well or I've been very lucky. Either that luck is very deterministic, DRBD copes correctly, or I've been very very lucky. Very very lucky doesn't sound likely. Yeah, I'd be willing to bet that DRBD does cope well with direct writes to the backing store (either that or it prevents the kernel from doing that, which would be even better and would not surprise me at all). In my experience it's one of the most resilient shared storage options out there. smime.p7s Description: S/MIME Cryptographic Signature
Re: RAID0 wrong (raw) device?
On Fri 2015-08-14 (00:24), Anand Jain wrote: root@toy02:~# btrfs filesystem show Label: data uuid: 411af13f-6cae-4f03-99dc-5941acb3135b Total devices 2 FS bytes used 106.51GiB devid3 size 1.82TiB used 82.03GiB path /dev/drbd2 devid4 size 1.82TiB used 82.03GiB path /dev/drbd3 And now, after a reboot: root@toy02:~/bin# btrfs filesystem show Label: data uuid: 411af13f-6cae-4f03-99dc-5941acb3135b Total devices 2 FS bytes used 119.82GiB devid3 size 1.82TiB used 82.03GiB path /dev/drbd2 devid4 size 1.82TiB used 82.03GiB path /dev/sde GRMPF! pls use 'btrfs fi show -m' and just ignore no option or -d if fs is mounted, as -m reads from the kernel. There is now a new behaviour: after the btrfs mount, I can see shortly the wrong raw device /dev/sde and a few seconds later there is the correct /dev/drbd3 : root@toy02:/etc# umount /data root@toy02:/etc# mount /data root@toy02:/etc# btrfs filesystem show Label: data uuid: 411af13f-6cae-4f03-99dc-5941acb3135b Total devices 2 FS bytes used 109.56GiB devid3 size 1.82TiB used 63.03GiB path /dev/drbd2 devid4 size 1.82TiB used 63.03GiB path /dev/sde Btrfs v3.12 root@toy02:/etc# btrfs filesystem show Label: data uuid: 411af13f-6cae-4f03-99dc-5941acb3135b Total devices 2 FS bytes used 109.56GiB devid3 size 1.82TiB used 63.03GiB path /dev/drbd2 devid4 size 1.82TiB used 63.03GiB path /dev/drbd3 Btrfs v3.12 root@toy02:/etc# btrfs filesystem show -m Label: data uuid: 411af13f-6cae-4f03-99dc-5941acb3135b Total devices 2 FS bytes used 109.56GiB devid3 size 1.82TiB used 63.03GiB path /dev/drbd2 devid4 size 1.82TiB used 63.03GiB path /dev/drbd3 Btrfs v3.12 Still, the kernel sees 3 instead of (really) 2 HGST drives: root@toy02:/etc# hdparm -I /dev/sdb | grep Number: Model Number: HGST HUS724020ALA640 Serial Number: PN2134P5G2P2AX root@toy02:/etc# hdparm -I /dev/sde | grep Number: Model Number: HGST HUS724020ALA640 Serial Number: PN2134P5G2P2AX root@toy02:/etc# hdparm -I /dev/sdd | grep Number: Model Number: HGST HUS724020ALA640 Serial Number: PN2134P5G2P2XX -- Ullrich Horlacher Informationssysteme und Serverbetrieb IZUS/TIK E-Mail: horlac...@rus.uni-stuttgart.de Universitaet Stuttgart Tel:++49-711-68565868 Allmandring 30aFax:++49-711-682357 70550 Stuttgart (Germany) WWW:http://www.tik.uni-stuttgart.de/ REF:55ccc4ab.2080...@oracle.com -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
re: Btrfs: don't start the log transaction if the log tree init fails
Hello Miao Xie, The patch e87ac1368700: Btrfs: don't start the log transaction if the log tree init fails from Feb 20, 2014, leads to the following static checker warning: fs/btrfs/tree-log.c:178 start_log_trans() warn: we tested 'root-log_root' before and it was 'false' fs/btrfs/tree-log.c 147 if (root-log_root) { We test root-log_root here. 148 if (btrfs_need_log_full_commit(root-fs_info, trans)) { 149 ret = -EAGAIN; 150 goto out; 151 } 152 if (!root-log_start_pid) { 153 root-log_start_pid = current-pid; 154 clear_bit(BTRFS_ROOT_MULTI_LOG_TASKS, root-state); 155 } else if (root-log_start_pid != current-pid) { 156 set_bit(BTRFS_ROOT_MULTI_LOG_TASKS, root-state); 157 } 158 159 atomic_inc(root-log_batch); 160 atomic_inc(root-log_writers); 161 if (ctx) { 162 index = root-log_transid % 2; 163 list_add_tail(ctx-list, root-log_ctxs[index]); 164 ctx-log_transid = root-log_transid; 165 } 166 mutex_unlock(root-log_mutex); 167 return 0; 168 } 169 170 ret = 0; 171 mutex_lock(root-fs_info-tree_log_mutex); 172 if (!root-fs_info-log_root_tree) 173 ret = btrfs_init_log_root_tree(trans, root-fs_info); 174 mutex_unlock(root-fs_info-tree_log_mutex); 175 if (ret) 176 goto out; 177 178 if (!root-log_root) { Couldn't we just remove this condition here? This is a new Smatch thing I am working on and I am investigating false positives. 179 ret = btrfs_add_log_tree(trans, root); 180 if (ret) 181 goto out; 182 } regards, dan carpenter -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: trim not working and irreparable errors from btrfsck
Am Fri, 14 Aug 2015 10:05:55 +0200 schrieb Marc Joliet mar...@gmx.de: (I mean, that's part of being a user of btrfs at this stage) I meant *being prepared* to file a bug report, not that one constantly has to file bug reports :) . -- Marc Joliet -- People who think they know everything really annoy those of us who know we don't - Bjarne Stroustrup pgpK308M7lnAT.pgp Description: Digitale Signatur von OpenPGP
Re: trim not working and irreparable errors from btrfsck
Am Thu, 13 Aug 2015 17:14:36 -0600 schrieb Chris Murphy li...@colorremedies.com: On Thu, Aug 13, 2015 at 3:23 AM, Marc Joliet mar...@gmx.de wrote: Speaking as a user, since fstrim -av still always outputs 0 bytes trimmed on my system: what's the status of this? Did anybody ever file a bug report? Since I'm not having this problem with my SSD, I'm not in a position to provide any meaningful information for such a report. The bug should whether this problem is reproducible with ext4 and XFS on the same device, and the complete details of the stacking (if this is not the full device or partition of it; e.g. if LVM, md, or encryption is between fs and physical device). And also the bug should include full dmesg as attachment, and strace of the fstrim command that results in 0 bytes trimmed. And probably separate bugs for each make/model of SSD, with the bug including make/model and firmware version. Right now I think there's no status because a.) no bug report and b.) not enough information. I was mainly asking because apparently there *is* a patch that helps some people affected by this, but nobody ever commented on it. Perhaps there's a reason for that, but I found it curious. (I see now that it was submitted in early January, in the thread [PATCH V2] Btrfs: really fix trim 0 bytes after a device delete.) I can open a bug (I mean, that's part of being a user of btrfs at this stage), I'm just surprised that nobody else has. BTW, is there a way to tell if the discard mount option does anything? I'm curious about whether it could behave differently. -- Marc Joliet -- People who think they know everything really annoy those of us who know we don't - Bjarne Stroustrup pgp3RKB19fH0i.pgp Description: Digitale Signatur von OpenPGP
Re: trim not working and irreparable errors from btrfsck
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 6/18/15 1:25 AM, Duncan wrote: Austin S Hemmelgarn posted on Wed, 17 Jun 2015 13:17:22 -0400 as excerpted: On 2015-06-17 11:40, Christian wrote: On 06/17/2015 11:28 AM, Chris Murphy wrote: However, fstrim still gives me 0 B (0 bytes) trimmed, so that may be another problem. Is there a way to check if trim works? That sounds like maybe your SSD is blacklisted for trim, is all I can think of. So trim shouldn't be the cause of the problem if it's being blacklisted. The recent problems appear to be around newer SSDs that support queue trim and newer kernels that issue queued trim. There have been some patches related to trim to the kernel, but the existence of blacklisting and claims of bugs in firmware make it difficult to test and isolate. http://techreport.com/news/28473/some-samsung-ssds-may-suffer-from- a- buggy-trim-implementation This is an Intel SSD in a Lenovo Thinkpad X1 Carbon. Trim worked until a few weeks ago and still works for my small ext4 boot partition (just ran it to check). I will keep looking for a solution. Thanks! I'm seeing the same issue here, but with a Crucial brand SSD. Somewhat interestingly, I don't see any issues like this with BTRFS on top of LVM's thin-provisioning volumes, or with any other filesystems, so I think it has something to do with how BTRFS is reporting unused space or how it is submitting the discard requests. FWIW, there's a current btrfs patch in progress that relates to problems with btrfs trim. But while I do have SSDs, I purposefully overprovisioned them by nearly 100% (IOW I partitioned only about 55%, the rest is entirely unused), so trim isn't as critical here as it is for many. I don't use the discard mount option, and have a systemd timer job setup to automate my fstrims and don't worry about the output too much, so I haven't been following the patch progress /that/ closely. But I do know that recent kernel btrfs trims (either fstrim or discard mount option triggered) haven't been working as originally intended due to some bug, and this patch is supposed to fix it. I'd thus conclude that you're very likely hitting this known issue, and that either for 4.1 or 4.2 (again, I'm not following progress that closely, and don't remember for sure if it's in 4.1, altho I've been running the rcs since rc6 or so), the problem should be fixed as that patch gets into mainline. Anyone wishing to investigate further can of course check the list (and/ or possibly the kernel's git log) for discard/trim related patches and follow the progress once found. ... Actually, just checked myself. Looks like the patches were first posted on March 30 @ 15:12:17 -0400 or so (that's the time for one of them). There's one for the discard mount option, and another for FITRIM (which may or may not be a typo for FSTRIM, I'm not actually sure). Jeff Mahoney je...@suse.com author. That should be enough to find the threads. And I don't see the patches in the late 4.1-rc I'm running so either my git log search foo is bad or it'll be (at least) 4.2. It's not a typo. FITRIM is the name of the ioctl that fstrim calls. The final version of that patch set is ready to go. Mostly. It probably needs to be re-integrated now. The reason it was delayed for inclusion is that it makes other bugs more obvious and irrecoverable since the data is completely gone. I'm not sure what Chris's timeline for inclusion is. - -Jeff - -- Jeff Mahoney SUSE Labs -BEGIN PGP SIGNATURE- Version: GnuPG/MacGPG2 v2.0.19 (Darwin) iQIcBAEBAgAGBQJVzgU/AAoJEB57S2MheeWyjSwQALOLfzNLiuAdxBXDnP076Pq7 8m2F2DtTRxuDCwBmnlgZtX3QuWK/J1HVRpAO/aC6WQkOo3uRNFrG4xK45EOTA5hH VBNtAFooMreicFQq5ZE6i+4yEdV8D4YRSoVn7+GrjL40IjiP8u7HXtDGw0x4ugGI iVNf3yipaTZtlRcjGt91dfW3w3D8RpjUK3z7RwSOEy3C8GP90omRWVkYV/jcFIo4 hqFMZ77hisRLf1aCFxXlO14ERyMpLPtC3HOBMLHRrdpjPp/f4XnXyFmFA0kbOX8S dwS9qPRmlnS5Lif2XMXK0a6aA0HK7sN/ghMigAh9t4zHwDkuDpAd6OWVEuCMMpCY uN2KyuNsjam2DxJHQVulNu1xlS/sGedfh8p66lC29fkB8ZpyGp4fnK1N4MVRdk8R 4o/emRb+vg7CTZ3fvss7Af6w+m22GISO43Q1MWr6Hr1Ll2y0DWL1IaB/zky8sr/5 u6E5RI7DOvbFyC31dGqvh5WQDIPrTxRoDMJL+pSOkF4CM5SM1uHak4IgUqfZ85hr MXVhRHFmH9UXRTFrkxzAV2wmSNpl2ki2pX5ItB6+c4fMMStb5dynThv27R69xxHf mn8qZBuwc5iXXsPJ9dUAxTRoquOw9Rd/1fz4S/oLH6xOrtlNlLa2HFour4Ofp16h 3e6CvcV+h4/sz0PYpYSQ =ZiEh -END PGP SIGNATURE- -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: can we make balance delete missing devices?
On Fri, Aug 14, 2015 at 6:12 AM, Russell Coker russ...@coker.com.au wrote: [ 2918.502237] BTRFS info (device loop1): disk space caching is enabled [ 2918.503213] BTRFS: failed to read chunk tree on loop1 [ 2918.540082] BTRFS: open_ctree failed I just had a test RAID-1 filesystem with a missing device. I mounted it with the degraded option and added a new device. I balanced it (to make it do RAID-1 again) and thought everything was good. Then when I tried to mount it again it gave errors such as the above (not sure why). Then I tried wiping /dev/loop1 and it refused to mount entirely due to having 2 missing devices. Obviously it was my mistake to not remove the missing device, and wiping /dev/loop1 was a bad idea. Failing to remove a missing device seems likely to be a common mistake. Could we make the balance operation automatically delete the missing device? I can't imagine a situation in which a balance would be desired but deleting the missing device wouldn't be desired. I think this is specious because balance doesn't at all convey a missing device will be silently dropped. If a device is missing and balancing is a bad idea, then balance should probably fail rather than automatically delete missing. The proper way to avoid this is to use btrfs replace start. Maybe it's just an old habit that needs purging, device add + device delete + balance, this is the exact use case replace was meant to address. -- Chris Murphy -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v6 2/3] xfstests: btrfs: test device replace, with EIO on the src dev
From: Anand Jain anand.j...@oracle.com This test case will test to confirm the replace works with the failed (EIO) replacing source device. EIO condition is achieved using the DM device. Signed-off-by: Anand Jain anand.j...@oracle.com Reviewed-by: Filipe Manana fdman...@suse.com --- v5-v6: accepts Eryu and Filipe's comments v4-v5: rebase on latest xfstests code and accepts Filipe comment v3-v4: rebase on latest xfstests code v2-v3: accepts Filipe Manana's review comments, thanks v1-v2: accepts Dave Chinner's review comments, thanks tests/btrfs/098 | 81 + tests/btrfs/098.out | 11 tests/btrfs/group | 1 + 3 files changed, 93 insertions(+) create mode 100755 tests/btrfs/098 create mode 100644 tests/btrfs/098.out diff --git a/tests/btrfs/098 b/tests/btrfs/098 new file mode 100755 index 000..a41ea86 --- /dev/null +++ b/tests/btrfs/098 @@ -0,0 +1,81 @@ +#! /bin/bash +# FS QA Test No. btrfs/098 +# +# Test device replace works when the source device has EIO +# +# Copyright (c) 2015 Oracle. All Rights Reserved. +# +# This program is free software; you can redistribute it and/or +# modify it under the terms of the GNU General Public License as +# published by the Free Software Foundation. +# +# This program is distributed in the hope that it would be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program; if not, write the Free Software Foundation, +# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA +# + +seq=`basename $0` +seqres=$RESULT_DIR/$seq +echo QA output created by $seq + +here=`pwd` +tmp=/tmp/$$ + +status=1 # failure is the default! +trap _cleanup; exit \$status 0 1 2 3 15 + + +_cleanup() +{ + _cleanup_dmerror + rm -f $tmp.* +} + +# get standard environment, filters and checks +. ./common/rc +. ./common/filter +. ./common/filter.btrfs +. ./common/dmerror + +_supported_fs btrfs +_supported_os Linux +_need_to_be_root +_require_scratch_dev_pool 3 +_require_dmerror + +rm -f $seqres.full + +dev1=`echo $SCRATCH_DEV_POOL | $AWK_PROG '{print $2}'` +dev2=`echo $SCRATCH_DEV_POOL | $AWK_PROG '{print $3}'` + +_init_dmerror +_scratch_mkfs_dmerror -f -d raid1 -m raid1 $dev1 +_mount_dmerror + +_run_btrfs_util_prog filesystem show -m $SCRATCH_MNT +$BTRFS_UTIL_PROG filesystem show -m $SCRATCH_MNT | _filter_btrfs_filesystem_show + +error_devid=`$BTRFS_UTIL_PROG filesystem show -m $SCRATCH_MNT |\ + egrep $DMERROR_DEV | $AWK_PROG '{print $2}'` + +snapshot_cmd=$BTRFS_UTIL_PROG subvolume snapshot -r $SCRATCH_MNT +snapshot_cmd=$snapshot_cmd $SCRATCH_MNT/snap_\`date +'%H_%M_%S_%N'\` +run_check $FSSTRESS_PROG -d $SCRATCH_MNT -n 200 -p 8 $FSSTRESS_AVOID -x \ + $snapshot_cmd -X 50 + +# now load the error into the DMERROR_DEV +_load_dmerror_table + +_run_btrfs_util_prog replace start -B $error_devid $dev2 $SCRATCH_MNT + +_run_btrfs_util_prog filesystem show -m $SCRATCH_MNT +$BTRFS_UTIL_PROG filesystem show -m $SCRATCH_MNT | _filter_btrfs_filesystem_show + +echo === device replace completed + +status=0; exit diff --git a/tests/btrfs/098.out b/tests/btrfs/098.out new file mode 100644 index 000..eb2f87f --- /dev/null +++ b/tests/btrfs/098.out @@ -0,0 +1,11 @@ +QA output created by 098 +Label: none uuid: UUID + Total devices NUM FS bytes used SIZE + devid DEVID size SIZE used SIZE path SCRATCH_DEV + devid DEVID size SIZE used SIZE path /dev/mapper/error-test + +Label: none uuid: UUID + Total devices NUM FS bytes used SIZE + devid DEVID size SIZE used SIZE path SCRATCH_DEV + +=== device replace completed diff --git a/tests/btrfs/group b/tests/btrfs/group index e13865a..c8a53b5 100644 --- a/tests/btrfs/group +++ b/tests/btrfs/group @@ -100,3 +100,4 @@ 095 auto quick metadata 096 auto quick clone 097 auto quick send clone +098 auto quick replace -- 2.4.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v6 1/3] xfstests: btrfs: add functions to create dm-error device
From: Anand Jain anand.j...@oracle.com Controlled EIO from the device is achieved using the dm device. Helper functions are at common/dmerror. Broadly steps will include calling _init_dmerror(). _init_dmerror() will use SCRATCH_DEV to create dm linear device and assign DMERROR_DEV to /dev/mapper/error-test. When test script is ready to get EIO, the test cases can call _load_dmerror_table() which then it will load the dm error. so that reading DMERROR_DEV will cause EIO. After the test case is complete, cleanup must be done by calling _cleanup_dmerror(). Signed-off-by: Anand Jain anand.j...@oracle.com Reviewed-by: Filipe Manana fdman...@suse.com --- v5-v6: accepts Eryu's comments v4-v5: No Change. keep up with the patch set v3-v4: rebase on latest xfstests code v2.1-v3: accepts Filipe Manana's review comments, thanks v2-v2.1: fixed missed typo error fixup in the commit. v1-v2: accepts Dave Chinner's review comments, thanks common/dmerror | 69 ++ common/rc | 9 2 files changed, 78 insertions(+) create mode 100644 common/dmerror diff --git a/common/dmerror b/common/dmerror new file mode 100644 index 000..928e998 --- /dev/null +++ b/common/dmerror @@ -0,0 +1,69 @@ +##/bin/bash +# +# Copyright (c) 2015 Oracle. All Rights Reserved. +# +# This program is free software; you can redistribute it and/or +# modify it under the terms of the GNU General Public License as +# published by the Free Software Foundation. +# +# This program is distributed in the hope that it would be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program; if not, write the Free Software Foundation, +# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA +# +# +# common functions for setting up and tearing down a dmerror device + +_init_dmerror() +{ + $DMSETUP_PROG remove error-test /dev/null 21 + + local BLK_DEV_SIZE=`blockdev --getsz $SCRATCH_DEV` + + DMERROR_DEV='/dev/mapper/error-test' + + DMLINEAR_TABLE=0 $BLK_DEV_SIZE linear $SCRATCH_DEV 0 + + $DMSETUP_PROG create error-test --table $DMLINEAR_TABLE || \ + _fatal failed to create dm linear device + + DMERROR_TABLE=0 $BLK_DEV_SIZE error $SCRATCH_DEV 0 +} + +_scratch_mkfs_dmerror() +{ + $MKFS_BTRFS_PROG $MKFS_OPTIONS $* $DMERROR_DEV $seqres.full 21 || \ + _fatal failed to create mkfs.btrfs $* $DMERROR_DEV +} + +_mount_dmerror() +{ + $MOUNT_PROG -t $FSTYP $MOUNT_OPTIONS $DMERROR_DEV $SCRATCH_MNT +} + +_unmount_dmerror() +{ + $UMOUNT_PROG $SCRATCH_MNT +} + +_cleanup_dmerror() +{ + $UMOUNT_PROG $SCRATCH_MNT /dev/null 21 + $DMSETUP_PROG remove error-test /dev/null 21 +} + +_load_dmerror_table() +{ + $DMSETUP_PROG suspend error-test + [ $? -ne 0 ] _fatal failed to suspend error-test + + $DMSETUP_PROG load error-test --table $DMERROR_TABLE + [ $? -ne 0 ] _fatal failed to load error table error-test + + $DMSETUP_PROG resume error-test + [ $? -ne 0 ] _fatal failed to resume error-test +} diff --git a/common/rc b/common/rc index 70d2fa8..8d4da0e 100644 --- a/common/rc +++ b/common/rc @@ -1337,6 +1337,15 @@ _require_sane_bdev_flush() fi } +# this test requires the device mapper error target +# +_require_dmerror() +{ + _require_command $DMSETUP_PROG dmsetup + $DMSETUP_PROG targets | grep error /dev/null 21 + [ $? -ne 0 ] _notrun This test requires dm error support +} + # this test requires the device mapper flakey target # _require_dm_flakey() -- 2.4.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
The performance is not as expected when used several disks on raid0.
Hi all, This is my first email to this list, so please excuse any gaffe. I am in the evaluation early stages of a new storage, an SGI MIS, currently with two HBAs LSI and 32 disks. The hba controllers are LSI 9207-8i and the disks are Seagate 6TB, model ST6000NM0004-1FT17Z. To evaluate the performance I am using IOzone over a raid0 using all the 32 disks, with the parameters: iozone -i0 -i1 -t5 -s 20G -P0. With btrfs the result approaches 3.5GB/s. When using mdadm+xfs the result reaches 6gb/s, which is the expected value when compared with parallel dd made on discs. When used btrfs with only half of the disc the result is about 3GB/s. More information: # uname -a Linux spstrg13 4.2.0-999-generic #201508132200 SMP Fri Aug 14 02:01:52 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux # btrfs --version btrfs-progs v4.0 # btrfs fi show Label: none uuid: be2a5671-87d1-4b89-ac4a-04efabb5912f Total devices 32 FS bytes used 3.66MiB devid1 size 5.46TiB used 1.07GiB path /dev/sdc devid2 size 5.46TiB used 1.06GiB path /dev/sdd devid3 size 5.46TiB used 1.06GiB path /dev/sde devid4 size 5.46TiB used 1.06GiB path /dev/sdf devid5 size 5.46TiB used 1.06GiB path /dev/sdg devid6 size 5.46TiB used 1.06GiB path /dev/sdh devid7 size 5.46TiB used 1.06GiB path /dev/sdi devid8 size 5.46TiB used 1.06GiB path /dev/sdj devid9 size 5.46TiB used 1.06GiB path /dev/sdk devid 10 size 5.46TiB used 1.06GiB path /dev/sdl devid 11 size 5.46TiB used 1.06GiB path /dev/sdm devid 12 size 5.46TiB used 1.06GiB path /dev/sdn devid 13 size 5.46TiB used 1.06GiB path /dev/sdo devid 14 size 5.46TiB used 1.06GiB path /dev/sdp devid 15 size 5.46TiB used 1.06GiB path /dev/sdq devid 16 size 5.46TiB used 1.06GiB path /dev/sdr devid 17 size 5.46TiB used 1.06GiB path /dev/sds devid 18 size 5.46TiB used 1.06GiB path /dev/sdt devid 19 size 5.46TiB used 1.06GiB path /dev/sdu devid 20 size 5.46TiB used 1.06GiB path /dev/sdv devid 21 size 5.46TiB used 1.06GiB path /dev/sdw devid 22 size 5.46TiB used 1.06GiB path /dev/sdx devid 23 size 5.46TiB used 1.06GiB path /dev/sdy devid 24 size 5.46TiB used 1.06GiB path /dev/sdz devid 25 size 5.46TiB used 1.06GiB path /dev/sdaa devid 26 size 5.46TiB used 1.06GiB path /dev/sdab devid 27 size 5.46TiB used 1.06GiB path /dev/sdac devid 28 size 5.46TiB used 1.06GiB path /dev/sdad devid 29 size 5.46TiB used 1.06GiB path /dev/sdae devid 30 size 5.46TiB used 1.06GiB path /dev/sdaf devid 31 size 5.46TiB used 1.06GiB path /dev/sdag devid 32 size 5.46TiB used 1.06GiB path /dev/sdah btrfs-progs v4.0 # btrfs fi df /root/backup/root/storageTestes/mbtr Data, RAID0: total=30.00GiB, used=3.50MiB System, RAID0: total=32.00MiB, used=16.00KiB Metadata, RAID0: total=4.00GiB, used=128.00KiB Metadata, single: total=8.00MiB, used=16.00KiB GlobalReserve, single: total=16.00MiB, used=0.00B The dmesg is attached. The result are about the same using kernel 3.16 and btrfs tools 3.12. I am far from be able to isolate the problem, so please ask me any information you think is relevant. Thanks in advance. Eduardo. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v6 0/3] dm error based test cases
This is v6 of this patch set. Mainly accepts Filipe latest review comments and Eryu's review comments with thanks. Anand Jain (3): xfstests: btrfs: add functions to create dm-error device xfstests: btrfs: test device replace, with EIO on the src dev xfstests: btrfs: test device delete with EIO on src dev common/dmerror | 69 common/rc | 16 +++ tests/btrfs/098 | 81 tests/btrfs/098.out | 11 +++ tests/btrfs/099 | 82 + tests/btrfs/099.out | 11 +++ tests/btrfs/group | 2 ++ 7 files changed, 272 insertions(+) create mode 100644 common/dmerror create mode 100755 tests/btrfs/098 create mode 100644 tests/btrfs/098.out create mode 100755 tests/btrfs/099 create mode 100644 tests/btrfs/099.out -- 2.4.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v6 3/3] xfstests: btrfs: test device delete with EIO on src dev
From: Anand Jain anand.j...@oracle.com This test case tests if the device delete works with the failed (EIO) source device. EIO errors are achieved usign the DM device. This test would need following btrfs-progs and btrfs kernel patch btrfs-progs: device delete to accept devid Btrfs: device delete by devid However when btrfs-progs patch is not found this test will not run, and when kernel patch is not found btrfs-progs will fail gracefully and thus the test script. Signed-off-by: Anand Jain anand.j...@oracle.com --- v5-v6: accepts Eryu and Filipe's comments, thanks v4-v5: rebase on latest xfstests code, and accepts Filipe comment v3-v4: rebase on latest xfstests code v2-v3: accepts Filipe Manana's review comments, thanks v1-v2: accepts Dave Chinner's review comments, thanks common/rc | 7 + tests/btrfs/099 | 82 + tests/btrfs/099.out | 11 +++ tests/btrfs/group | 1 + 4 files changed, 101 insertions(+) create mode 100755 tests/btrfs/099 create mode 100644 tests/btrfs/099.out diff --git a/common/rc b/common/rc index 8d4da0e..31a0328 100644 --- a/common/rc +++ b/common/rc @@ -2737,6 +2737,13 @@ _require_meta_uuid() umount $SCRATCH_MNT } +_require_btrfs_dev_del_by_devid() +{ + $BTRFS_UTIL_PROG device delete --help | egrep devid /dev/null 21 + [ $? -eq 0 ] || _notrun $BTRFS_UTIL_PROG too old \ + (must support 'btrfs device delete devid /mnt') +} + _get_total_inode() { if [ -z $1 ]; then diff --git a/tests/btrfs/099 b/tests/btrfs/099 new file mode 100755 index 000..a0761c7 --- /dev/null +++ b/tests/btrfs/099 @@ -0,0 +1,82 @@ +#! /bin/bash +# FS QA Test No. btrfs/099 +# +# test device delete when the source device has EIO +# +# Copyright (c) 2015 Oracle. All Rights Reserved. +# +# This program is free software; you can redistribute it and/or +# modify it under the terms of the GNU General Public License as +# published by the Free Software Foundation. +# +# This program is distributed in the hope that it would be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program; if not, write the Free Software Foundation, +# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA +# + +seq=`basename $0` +seqres=$RESULT_DIR/$seq +echo QA output created by $seq + +here=`pwd` +tmp=/tmp/$$ + +status=1 # failure is the default! +trap _cleanup; exit \$status 0 1 2 3 15 + + +_cleanup() +{ + _cleanup_dmerror + rm -f $tmp.* +} + +# get standard environment, filters and checks +. ./common/rc +. ./common/filter +. ./common/filter.btrfs +. ./common/dmerror + +_supported_fs btrfs +_supported_os Linux +_need_to_be_root +_require_scratch_dev_pool 3 +_require_btrfs_dev_del_by_devid +_require_dmerror + +rm -f $seqres.full + +dev1=`echo $SCRATCH_DEV_POOL | $AWK_PROG '{print $2}'` +dev2=`echo $SCRATCH_DEV_POOL | $AWK_PROG '{print $3}'` + +_init_dmerror +_scratch_mkfs_dmerror -f -d raid1 -m raid1 $dev1 $dev2 +_mount_dmerror + +_run_btrfs_util_prog filesystem show -m $SCRATCH_MNT +$BTRFS_UTIL_PROG filesystem show -m $SCRATCH_MNT | _filter_btrfs_filesystem_show + +error_devid=`$BTRFS_UTIL_PROG filesystem show -m $SCRATCH_MNT |\ + egrep $DMERROR_DEV | $AWK_PROG '{print $2}'` + +snapshot_cmd=$BTRFS_UTIL_PROG subvolume snapshot -r $SCRATCH_MNT +snapshot_cmd=$snapshot_cmd $SCRATCH_MNT/snap_\`date +'%H_%M_%S_%N'\` +run_check $FSSTRESS_PROG -d $SCRATCH_MNT -n 200 -p 8 $FSSTRESS_AVOID -x \ + $snapshot_cmd -X 50 + +# now load the error into the DMERROR_DEV +_load_dmerror_table + +_run_btrfs_util_prog device delete $error_devid $SCRATCH_MNT + +_run_btrfs_util_prog filesystem show -m $SCRATCH_MNT +$BTRFS_UTIL_PROG filesystem show -m $SCRATCH_MNT | _filter_btrfs_filesystem_show + +echo === device delete completed + +status=0; exit diff --git a/tests/btrfs/099.out b/tests/btrfs/099.out new file mode 100644 index 000..ec74e45 --- /dev/null +++ b/tests/btrfs/099.out @@ -0,0 +1,11 @@ +QA output created by 099 +Label: none uuid: UUID + Total devices NUM FS bytes used SIZE + devid DEVID size SIZE used SIZE path SCRATCH_DEV + devid DEVID size SIZE used SIZE path /dev/mapper/error-test + +Label: none uuid: UUID + Total devices NUM FS bytes used SIZE + devid DEVID size SIZE used SIZE path SCRATCH_DEV + +=== device delete completed diff --git a/tests/btrfs/group b/tests/btrfs/group index c8a53b5..968ee63 100644 --- a/tests/btrfs/group +++ b/tests/btrfs/group @@ -101,3 +101,4 @@ 096 auto quick clone 097 auto quick send clone 098 auto quick replace +099 auto quick replace -- 2.4.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs
Re: [PATCH v2] fstests: generic/018: expand write backwards sync but contiguous to test regression in btrfs
On 8/13/15 3:47 AM, Liu Bo wrote: Btrfs has a problem when defraging a file which has a large fragment'ed range, it'd leave the tail extent as a seperate extent instead of merging it with previous extents. This makes generic/018 recognize the above regression. Sorry for the late review, but here it is ;) In 2 years (heck, even now) we'll have no idea why this change was made. What regression is that? Can you describe it? Is there already an upstream fix/commit you can refer to? I see 3 changes here: 1) You change xfs_io's for loop from seq 9 -1 0 to seq 64 -1 0 - presumably this matters to btrfs. Why does this matter? Meanwhile, I find that in the case of 'write backwards sync but contiguous, ext4 doesn't produce fragments like btrfs and xfs, so I modify 018.out a little bit to let ext4 pass. 2) You stop expecting 10 extents initially in the backwards-write test for the above reason, I guess. I'm a little unsure about this. For me, this passes as-is. If it isn't working for you, we should understand why, instead of making the test ignore it. (And bundling this ext4 change into a btrfs-specific commit isn't great, anyway) Moreover, I follow Filipe's suggestion to filter xfs_io's output in order to check these writes actually succeed. 3) You stop redirecting xfs_io to /dev/null, and save it to the golden output file instead. Honestly, I find hundreds of extra xfs_io output lines to be rather unhelpful, because the old output file used to be quite easy to read, to see what's going on. Today it only redirects stdout: $XFS_IO_PROG -f -c pwrite -b $((4 * bsize)) 0 $((4 * bsize)) $fragfile \ /dev/null so if a write fails, I *think* stderr will get output, and the test *should* fail as a result.[1] You could add a || _fail xfs_io failed for good measure... -Eric [1] oh, maybe not, I guess xfs_io is kind of notorious for not returning errors... Signed-off-by: Liu Bo bo.li@oracle.com --- v2: fix typo in title, s/expend/expand/g tests/generic/018 | 16 ++-- tests/generic/018.out | 198 +- 2 files changed, 203 insertions(+), 11 deletions(-) diff --git a/tests/generic/018 b/tests/generic/018 index d97bb88..3693874 100755 --- a/tests/generic/018 +++ b/tests/generic/018 @@ -68,28 +68,24 @@ $XFS_IO_PROG -f -c truncate 1m $fragfile _defrag --before 0 --after 0 $fragfile echo Contiguous file: | tee -a $seqres.full -$XFS_IO_PROG -f -c pwrite -b $((4 * bsize)) 0 $((4 * bsize)) $fragfile \ - /dev/null +$XFS_IO_PROG -f -c pwrite -b $((4 * bsize)) 0 $((4 * bsize)) $fragfile | _filter_xfs_io _defrag --before 1 --after 1 $fragfile echo Write backwards sync, but contiguous - should defrag to 1 extent | tee -a $seqres.full -for i in `seq 9 -1 0`; do - $XFS_IO_PROG -fs -c pwrite -b $bsize $((i * bsize)) $bsize $fragfile \ - /dev/null +for i in `seq 64 -1 0`; do + $XFS_IO_PROG -fd -c pwrite -b $bsize $((i * bsize)) $bsize $fragfile | _filter_xfs_io done -_defrag --before 10 --after 1 $fragfile +_defrag --after 1 $fragfile echo Write backwards sync leaving holes - defrag should do nothing | tee -a $seqres.full for i in `seq 31 -2 0`; do - $XFS_IO_PROG -fs -c pwrite -b $bsize $((i * bsize)) $bsize $fragfile \ - /dev/null + $XFS_IO_PROG -fs -c pwrite -b $bsize $((i * bsize)) $bsize $fragfile | _filter_xfs_io done _defrag --before 16 --after 16 $fragfile echo Write forwards sync leaving holes - defrag should do nothing | tee -a $seqres.full for i in `seq 0 2 31`; do - $XFS_IO_PROG -fs -c pwrite -b $bsize $((i * bsize)) $bsize $fragfile \ - /dev/null + $XFS_IO_PROG -fs -c pwrite -b $bsize $((i * bsize)) $bsize $fragfile | _filter_xfs_io done _defrag --before 16 --after 16 $fragfile diff --git a/tests/generic/018.out b/tests/generic/018.out index 5f265d1..0886a9a 100644 --- a/tests/generic/018.out +++ b/tests/generic/018.out @@ -6,14 +6,210 @@ Sparse file (no blocks): Before: 0 After: 0 Contiguous file: +wrote 16384/16384 bytes at offset 0 +XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) Before: 1 After: 1 Write backwards sync, but contiguous - should defrag to 1 extent -Before: 10 +wrote 4096/4096 bytes at offset 262144 +XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) +wrote 4096/4096 bytes at offset 258048 +XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) +wrote 4096/4096 bytes at offset 253952 +XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) +wrote 4096/4096 bytes at offset 249856 +XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) +wrote 4096/4096 bytes at offset 245760 +XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) +wrote 4096/4096 bytes at offset
Re: The performance is not as expected when used several disks on raid0.
On Fri, Aug 14, 2015 at 1:50 PM, Austin S Hemmelgarn ahferro...@gmail.com wrote: On 2015-08-14 14:31, Chris Murphy wrote: On Fri, Aug 14, 2015 at 9:16 AM, Eduardo Bach hellb...@gmail.com wrote: With btrfs the result approaches 3.5GB/s. When using mdadm+xfs the result reaches 6gb/s, which is the expected value when compared with parallel dd made on discs. mdadm with what chunk (strip) size? The default for mdadm is 512KiB. On Btrfs it's fixed at 64KiB. While testing with 64KiB chunk with XFS on md RAID might improve its performance relative to Btrfs, at least it's a more apples to apples comparison. I have a feeling that XFS will still win this. It is one of the slower filesystems for Linux, but it still beats BTRFS senseless when it comes to performance as of right now. Yeah I was suggesting with a 64KiB chunk the XFS case might get even faster. -- Chris Murphy -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: The performance is not as expected when used several disks on raid0.
On 2015-08-14 15:54, Chris Murphy wrote: On Fri, Aug 14, 2015 at 1:50 PM, Austin S Hemmelgarn ahferro...@gmail.com wrote: On 2015-08-14 14:31, Chris Murphy wrote: On Fri, Aug 14, 2015 at 9:16 AM, Eduardo Bach hellb...@gmail.com wrote: With btrfs the result approaches 3.5GB/s. When using mdadm+xfs the result reaches 6gb/s, which is the expected value when compared with parallel dd made on discs. mdadm with what chunk (strip) size? The default for mdadm is 512KiB. On Btrfs it's fixed at 64KiB. While testing with 64KiB chunk with XFS on md RAID might improve its performance relative to Btrfs, at least it's a more apples to apples comparison. I have a feeling that XFS will still win this. It is one of the slower filesystems for Linux, but it still beats BTRFS senseless when it comes to performance as of right now. Yeah I was suggesting with a 64KiB chunk the XFS case might get even faster. Ah, misunderstood what you meant. Yeah, that will almost certainly make things faster for XFS. FWIW, running BTRFS on top of MDRAID actually works very well, especially for BTRFS raid1 on top of MD-RAID0 (I get an almost 50% performance increase for this usage over BTRFS raid10, although most of this is probably due to how btrfs dispatches I/O's to disks in multi-disk stetups). smime.p7s Description: S/MIME Cryptographic Signature
Re: Can't mount degraded. How to remove/add drives OFFLINE?
I'm not sure my situation is quite like the one you linked, so here's my bug report: https://bugzilla.kernel.org/show_bug.cgi?id=102881 On Fri, Aug 14, 2015 at 2:44 PM, Chris Murphy li...@colorremedies.com wrote: On Fri, Aug 14, 2015 at 12:12 PM, Timothy Normand Miller theo...@gmail.com wrote: Sorry about that empty email. I hit a wrong key, and gmail decided to send. Anyhow, my replacement drive is going to arrive this evening, and I need to know how to add it to my btrfs array. Here's the situation: - I had a drive fail, so I removed it and mounted degraded. - I hooked up a replacement drive, did an add on that one, and did a delete missing. - During the rebalance, the replacement drive failed, there were OOPSes, etc. - Now, although all of my data is there, I can't mount degraded, because btrfs is complaining that too many devices are missing (3 are there, but it sees 2 missing). It might be related to this (long) bug: https://bugzilla.kernel.org/show_bug.cgi?id=92641 While Btrfs RAID 1 can tolerate only a single device failure, what you have is an in-progress rebuild of a missing device. If it becomes missing, the volume should be no worse off than it was before. But Btrfs doesn't see it this way, instead is sees this as two separate missing devices and now too many devices missing and it refuses to proceed. And there's no mechanism to remove missing devices unless you can mount rw. So it's stuck. So I could use some help with cleaning up this mess. All the data is there, so I need to know how to either force it to mount degraded, or add and remove devices offline. Where do I begin? You can try to ask on IRC. I have no ideas for this scenario, I've tried and failed. My case was throw away, what should still be possible is using btrfs restore. Also, doesn't it seem a bit arbitrary that there are too many missing, when all of the data is there? If I understand correctly, all four drives in my RAID1 should all have copies of the metadata, No that's not correct. RAID 1 means 2 copies of metadata. In a 4 device RAID 1 that's still only 2 copies. It is not n-way RAID 1. But that doesn't matter here, the problem is that Btrfs has a narrow idea of the volume, it assumes without context that once the number of devices is below the minimum, the volume can't be mounted. In reality, an exception exists if the failure is for an in-progress rebuild of a missing drive. That drive failing should mean the volume is no worse off than before but Btrfs doesn't know that. Pretty sure about that anyway. and of the remaining three good drives, there should be one or two copies of every data block. So it's all there, but btrfs has decided, based on the NUMBER of missing devices, that it won't mount. Shouldn't it refuse to mount if it knows there is data missing? For that matter, why should it even refuse in that case? So some data might missing, so it should throw some errors if you try to access that missing data. Right? I think no data is missing, no metadata is missing, and Btrfs is confused and stuck in this case. -- Chris Murphy -- Timothy Normand Miller, PhD Assistant Professor of Computer Science, Binghamton University http://www.cs.binghamton.edu/~millerti/ Open Graphics Project -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Can't mount degraded. How to remove/add drives OFFLINE?
On Fri, Aug 14, 2015 at 1:03 PM, Timothy Normand Miller theo...@gmail.com wrote: I'm not sure my situation is quite like the one you linked, so here's my bug report: https://bugzilla.kernel.org/show_bug.cgi?id=102881 I can easily reproduce with just 2 device RAID. I updated the bug. It's best these are separate bugs, but I think the underlying problems are related. The work around is to mount -o ro,degraded, and then move data to a new Btrfs volume with btrfs send/receive or conventional copy for data that's not already in a read-only snapshot. -- Chris Murphy -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Can't mount degraded. How to remove/add drives OFFLINE?
On Fri, Aug 14, 2015 at 12:12 PM, Timothy Normand Miller theo...@gmail.com wrote: Sorry about that empty email. I hit a wrong key, and gmail decided to send. Anyhow, my replacement drive is going to arrive this evening, and I need to know how to add it to my btrfs array. Here's the situation: - I had a drive fail, so I removed it and mounted degraded. - I hooked up a replacement drive, did an add on that one, and did a delete missing. - During the rebalance, the replacement drive failed, there were OOPSes, etc. - Now, although all of my data is there, I can't mount degraded, because btrfs is complaining that too many devices are missing (3 are there, but it sees 2 missing). It might be related to this (long) bug: https://bugzilla.kernel.org/show_bug.cgi?id=92641 While Btrfs RAID 1 can tolerate only a single device failure, what you have is an in-progress rebuild of a missing device. If it becomes missing, the volume should be no worse off than it was before. But Btrfs doesn't see it this way, instead is sees this as two separate missing devices and now too many devices missing and it refuses to proceed. And there's no mechanism to remove missing devices unless you can mount rw. So it's stuck. So I could use some help with cleaning up this mess. All the data is there, so I need to know how to either force it to mount degraded, or add and remove devices offline. Where do I begin? You can try to ask on IRC. I have no ideas for this scenario, I've tried and failed. My case was throw away, what should still be possible is using btrfs restore. Also, doesn't it seem a bit arbitrary that there are too many missing, when all of the data is there? If I understand correctly, all four drives in my RAID1 should all have copies of the metadata, No that's not correct. RAID 1 means 2 copies of metadata. In a 4 device RAID 1 that's still only 2 copies. It is not n-way RAID 1. But that doesn't matter here, the problem is that Btrfs has a narrow idea of the volume, it assumes without context that once the number of devices is below the minimum, the volume can't be mounted. In reality, an exception exists if the failure is for an in-progress rebuild of a missing drive. That drive failing should mean the volume is no worse off than before but Btrfs doesn't know that. Pretty sure about that anyway. and of the remaining three good drives, there should be one or two copies of every data block. So it's all there, but btrfs has decided, based on the NUMBER of missing devices, that it won't mount. Shouldn't it refuse to mount if it knows there is data missing? For that matter, why should it even refuse in that case? So some data might missing, so it should throw some errors if you try to access that missing data. Right? I think no data is missing, no metadata is missing, and Btrfs is confused and stuck in this case. -- Chris Murphy -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: The performance is not as expected when used several disks on raid0.
On 2015-08-14 14:31, Chris Murphy wrote: On Fri, Aug 14, 2015 at 9:16 AM, Eduardo Bach hellb...@gmail.com wrote: With btrfs the result approaches 3.5GB/s. When using mdadm+xfs the result reaches 6gb/s, which is the expected value when compared with parallel dd made on discs. mdadm with what chunk (strip) size? The default for mdadm is 512KiB. On Btrfs it's fixed at 64KiB. While testing with 64KiB chunk with XFS on md RAID might improve its performance relative to Btrfs, at least it's a more apples to apples comparison. I have a feeling that XFS will still win this. It is one of the slower filesystems for Linux, but it still beats BTRFS senseless when it comes to performance as of right now. smime.p7s Description: S/MIME Cryptographic Signature
Can't mount degraded. How to remove/add drives OFFLINE?
My -- Timothy Normand Miller, PhD Assistant Professor of Computer Science, Binghamton University http://www.cs.binghamton.edu/~millerti/ Open Graphics Project -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: The performance is not as expected when used several disks on raid0.
On Fri, 2015-08-14 at 12:16 -0300, Eduardo Bach wrote: Hi all, This is my first email to this list, so please excuse any gaffe. I am in the evaluation early stages of a new storage, an SGI MIS, currently with two HBAs LSI and 32 disks. The hba controllers are LSI 9207-8i and the disks are Seagate 6TB, model ST6000NM0004-1FT17Z. To evaluate the performance I am using IOzone over a raid0 using all the 32 disks, with the parameters: iozone -i0 -i1 -t5 -s 20G -P0. With btrfs the result approaches 3.5GB/s. When using mdadm+xfs the result reaches 6gb/s, which is the expected value when compared with parallel dd made on discs. When used btrfs with only half of the disc the result is about 3GB/s. There's two things in particular to pay attention with on btrfs with this sort of setup right now: 1. btrfs's raid0 is not an n-way stripe; it's a 2-way stripe only. (n -way stripe is a long requested feature, but there is no timeline on its completion) A single-threaded disk write will only ever be writing to two disks at the same time. The total throughput you get for multithreaded writes is up to which blocks the allocator happens to pick; it will probably often happen that multiple threads will both be using the same chunk, sharing IO from only 2 disks. 2. Btrfs development is currently primarily focused on functionality over performance. There's several places where placeholder or untuned algorithms are used (e.g. the multi-mirror io read scheduling just does pid % number_of_mirrors to pick a mirror). This kind of a performance difference on large performance-oriented RAID systems between btrfs's built-in raid and mdadm is interesting to see, but for the moment I'd say it's mostly expected. One of the developers here might have some more precise information on exactly why you're seeing such a performance difference. As an aside, you have 192TB in RAID0? That's certainly pretty impressive, but as soon as one disk dies, you're going to lose a *lot* of data. -- Calvin Walton calvin.wal...@kepstin.ca -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: The performance is not as expected when used several disks on raid0.
On Fri, 2015-08-14 at 12:30 -0400, Calvin Walton wrote: On Fri, 2015-08-14 at 12:16 -0300, Eduardo Bach wrote: Hi all, This is my first email to this list, so please excuse any gaffe. I am in the evaluation early stages of a new storage, an SGI MIS, currently with two HBAs LSI and 32 disks. The hba controllers are LSI 9207-8i and the disks are Seagate 6TB, model ST6000NM0004-1FT17Z. To evaluate the performance I am using IOzone over a raid0 using all the 32 disks, with the parameters: iozone -i0 -i1 -t5 -s 20G -P0. With btrfs the result approaches 3.5GB/s. When using mdadm+xfs the result reaches 6gb/s, which is the expected value when compared with parallel dd made on discs. When used btrfs with only half of the disc the result is about 3GB/s. There's two things in particular to pay attention with on btrfs with this sort of setup right now: Umm, Ok, I made a mistake. You can ignore paragraph #1 - I got some details about the btrfs raid1 and raid0 modes mixed up! Btrfs RAID0 is n-way striping across all available drives which have room for allocations. 1. btrfs's raid0 is not an n-way stripe; it's a 2-way stripe only. (n -way stripe is a long requested feature, but there is no timeline on its completion) A single-threaded disk write will only ever be writing to two disks at the same time. The total throughput you get for multithreaded writes is up to which blocks the allocator happens to pick; it will probably often happen that multiple threads will both be using the same chunk, sharing IO from only 2 disks. 2. Btrfs development is currently primarily focused on functionality over performance. There's several places where placeholder or untuned algorithms are used (e.g. the multi-mirror io read scheduling just does pid % number_of_mirrors to pick a mirror). This kind of a performance difference on large performance-oriented RAID systems between btrfs's built-in raid and mdadm is interesting to see, but for the moment I'd say it's mostly expected. One of the developers here might have some more precise information on exactly why you're seeing such a performance difference. As an aside, you have 192TB in RAID0? That's certainly pretty impressive, but as soon as one disk dies, you're going to lose a *lot* of data. -- Calvin Walton calvin.wal...@kepstin.ca -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Can't mount degraded. How to remove/add drives OFFLINE?
Sorry about that empty email. I hit a wrong key, and gmail decided to send. Anyhow, my replacement drive is going to arrive this evening, and I need to know how to add it to my btrfs array. Here's the situation: - I had a drive fail, so I removed it and mounted degraded. - I hooked up a replacement drive, did an add on that one, and did a delete missing. - During the rebalance, the replacement drive failed, there were OOPSes, etc. - Now, although all of my data is there, I can't mount degraded, because btrfs is complaining that too many devices are missing (3 are there, but it sees 2 missing). So I could use some help with cleaning up this mess. All the data is there, so I need to know how to either force it to mount degraded, or add and remove devices offline. Where do I begin? Also, doesn't it seem a bit arbitrary that there are too many missing, when all of the data is there? If I understand correctly, all four drives in my RAID1 should all have copies of the metadata, and of the remaining three good drives, there should be one or two copies of every data block. So it's all there, but btrfs has decided, based on the NUMBER of missing devices, that it won't mount. Shouldn't it refuse to mount if it knows there is data missing? For that matter, why should it even refuse in that case? So some data might missing, so it should throw some errors if you try to access that missing data. Right? Thanks! -- Timothy Normand Miller, PhD Assistant Professor of Computer Science, Binghamton University http://www.cs.binghamton.edu/~millerti/ Open Graphics Project -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Can't mount degraded. How to remove/add drives OFFLINE?
On 08/15/2015 02:12 AM, Timothy Normand Miller wrote: Sorry about that empty email. I hit a wrong key, and gmail decided to send. Anyhow, my replacement drive is going to arrive this evening, and I need to know how to add it to my btrfs array. Here's the situation: - I had a drive fail, so I removed it and mounted degraded. that bit dangerous to do without the below patch. patch has more details why. - I hooked up a replacement drive, did an add on that one, and did a delete missing. - During the rebalance, the replacement drive failed, there were OOPSes, etc. - Now, although all of my data is there, I can't mount degraded, because btrfs is complaining that too many devices are missing (3 are there, but it sees 2 missing). This is addressed in the patch [PATCH 23/23] Btrfs: allow -o rw,degraded for single group profile Thanks, Anand So I could use some help with cleaning up this mess. All the data is there, so I need to know how to either force it to mount degraded, or add and remove devices offline. Where do I begin? Also, doesn't it seem a bit arbitrary that there are too many missing, when all of the data is there? If I understand correctly, all four drives in my RAID1 should all have copies of the metadata, and of the remaining three good drives, there should be one or two copies of every data block. So it's all there, but btrfs has decided, based on the NUMBER of missing devices, that it won't mount. Shouldn't it refuse to mount if it knows there is data missing? For that matter, why should it even refuse in that case? So some data might missing, so it should throw some errors if you try to access that missing data. Right? Thanks! -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Can't mount degraded. How to remove/add drives OFFLINE?
Just to be clear, I removed the drive (the original failed drive) when the power was off, then powered up, and then mounted degraded. That's not dangerous that I know of. patch has details. pls refer. Where is this patch, and what kernel versions can this be applied to? https://patchwork.kernel.org/patch/7014141/ its on 4.3. but should apply nice on below. thanks Anand -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Deleted files cause btrfs-send to fail
Am Thu, 13 Aug 2015 10:54:58 +0200 schrieb Marc Joliet mar...@gmx.de: Am Thu, 13 Aug 2015 08:29:19 + (UTC) schrieb Duncan 1i5t5.dun...@cox.net: Marc Joliet posted on Thu, 13 Aug 2015 09:05:41 +0200 as excerpted: Here's the actual output now, obtained via btrfs-progs 4.0.1 from an initramfs emergency shell: checking extents checking free space cache checking fs roots root 5 inode 8338813 errors 2000, link count wrong unresolved ref dir 26699 index 50500 namelen 4 name root filetype 0 errors 3, no dir item, no dir index root 5 inode 8338814 errors 2000, link count wrong unresolved ref dir 26699 index 50502 namelen 6 name marcec filetype 0 errors 3, no dir item, no dir index root 5 inode 8338815 errors 2000, link count wrong unresolved ref dir 26699 index 50504 namelen 6 name systab filetype 0 errors 3, no dir item, no dir index root 5 inode 8710030 errors 2000, link count wrong unresolved ref dir 26699 index 59588 namelen 6 name marcec filetype 0 errors 3, no dir item, no dir index root 5 inode 8710031 errors 2000, link count wrong unresolved ref dir 26699 index 59590 namelen 4 name root filetype 0 errors 3, no dir item, no dir index Checking filesystem on /dev/sda1 UUID: 0267d8b3-a074-460a-832d-5d5fd36bae64 found 63467610172 bytes used err is 1 total csum bytes: 59475016 total tree bytes: 1903411200 total fs tree bytes: 1691504640 total extent tree bytes: 130322432 btree space waste bytes: 442495212 file data blocks allocated: 555097092096 referenced 72887840768 btrfs-progs v4.0.1 Again: is this fixable? FWIW, root 5 (which you asked about upthread) is the main filesystem root. So all these appear to be on the main filesystem, not on snapshots/ subvolumes. [...] But if it's critical, you may wish to wait and have someone else confirm that before acting on it, just in case I have it wrong. I can wait until tonight, at least. The FS still mounts, and it's just the root subvolume that's affected; running btrfs-send on the /home subvolume still works. Well, I got impatient, and just went ahead and did it (I have backups, after all). It looks like it worked: the affected files were moved to /lost+found/, where I deleted them again, and btrfs-send works again. The output of btrfs check after --repair: checking extents checking free space cache checking fs roots checking csums There are no extents for csum range 0-69632 Csum exists for 0-69632 but there is no extent record Checking filesystem on /dev/sda1 UUID: 0267d8b3-a074-460a-832d-5d5fd36bae64 block group 274307481600 has wrong amount of free spacefailed to load free space cache for block group 274307481600 found 60980420666 bytes used err is 1 total csum bytes: 57521732 total tree bytes: 199680 total fs tree bytes: 1791721472 total extent tree bytes: 127942656 btree space waste bytes: 460072661 file data blocks allocated: 478650343424 referenced 73326161920 btrfs-progs v4.1.2 If I notice anything amiss, I'll report back. (One other thing I found interesting was that btrfs scrub didn't care about the link count errors.) Greetings. -- Marc Joliet -- People who think they know everything really annoy those of us who know we don't - Bjarne Stroustrup pgpvHFAJ5LvQi.pgp Description: Digitale Signatur von OpenPGP
Re: Can't mount degraded. How to remove/add drives OFFLINE?
On Fri, Aug 14, 2015 at 7:49 PM, Anand Jain anand.j...@oracle.com wrote: - I had a drive fail, so I removed it and mounted degraded. that bit dangerous to do without the below patch. patch has more details why. Just to be clear, I removed the drive (the original failed drive) when the power was off, then powered up, and then mounted degraded. That's not dangerous that I know of. - I hooked up a replacement drive, did an add on that one, and did a delete missing. - During the rebalance, the replacement drive failed, there were OOPSes, etc. - Now, although all of my data is there, I can't mount degraded, because btrfs is complaining that too many devices are missing (3 are there, but it sees 2 missing). This is addressed in the patch [PATCH 23/23] Btrfs: allow -o rw,degraded for single group profile Where is this patch, and what kernel versions can this be applied to? -- Timothy Normand Miller, PhD Assistant Professor of Computer Science, Binghamton University http://www.cs.binghamton.edu/~millerti/ Open Graphics Project -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID0 wrong (raw) device?
First of all there is a known issue in handling multiple paths / instances of the same device image in btrfs. Fixing this caused regression earlier. And my survey [survey] BTRFS_IOC_DEVICES_READY return status almost told me not to fix the bug. But these are just a reporting issue which would confuse users, should be fixed. There is now a new behaviour: after the btrfs mount, I can see shortly the wrong raw device /dev/sde and a few seconds later there is the correct /dev/drbd3 : yep possible. but it does not mean that btrfs kernel is using the new path its just a reporting (bug). (pls use -m option) root@toy02:/etc# umount /data root@toy02:/etc# mount /data root@toy02:/etc# btrfs filesystem show Label: data uuid: 411af13f-6cae-4f03-99dc-5941acb3135b Total devices 2 FS bytes used 109.56GiB devid3 size 1.82TiB used 63.03GiB path /dev/drbd2 devid4 size 1.82TiB used 63.03GiB path /dev/sde Btrfs v3.12 root@toy02:/etc# btrfs filesystem show Label: data uuid: 411af13f-6cae-4f03-99dc-5941acb3135b Total devices 2 FS bytes used 109.56GiB devid3 size 1.82TiB used 63.03GiB path /dev/drbd2 devid4 size 1.82TiB used 63.03GiB path /dev/drbd3 Btrfs v3.12 root@toy02:/etc# btrfs filesystem show -m Label: data uuid: 411af13f-6cae-4f03-99dc-5941acb3135b Total devices 2 FS bytes used 109.56GiB devid3 size 1.82TiB used 63.03GiB path /dev/drbd2 devid4 size 1.82TiB used 63.03GiB path /dev/drbd3 Btrfs v3.12 Still, the kernel sees 3 instead of (really) 2 HGST drives: root@toy02:/etc# hdparm -I /dev/sdb | grep Number: Model Number: HGST HUS724020ALA640 Serial Number: PN2134P5G2P2AX root@toy02:/etc# hdparm -I /dev/sde | grep Number: Model Number: HGST HUS724020ALA640 Serial Number: PN2134P5G2P2AX This is important to know but not a btrfs issue. Do you have multiple host paths reaching this this device with serial # PN2134P5G2P2AX ? -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Major qgroup regression in 4.2?
On Thu, Aug 13, 2015 at 04:13:08PM -0700, Mark Fasheh wrote: If there *is* a plan to make this all work again, can I please hear it? The comment mentions something about adding those nodes to a dirty_extent_root. Why wasn't that done? Ok so I had more time to look through the changes today and came up with this naive patch, it simply inserts dirty extent records where we were doing our qgroup refs before. This passes my micro-test but I'm unclear on whether there's some pitfall I'm unaware of (I'm guessing there must be?). Please advise. Thanks, --Mark -- Mark Fasheh From: Mark Fasheh mfas...@suse.de btrfs: qgroup: account shared subtree during snapshot delete (again) Commit 0ed4792 ('btrfs: qgroup: Switch to new extent-oriented qgroup mechanism.') removed our qgroup accounting during btrfs_drop_snapshot(). Predictably, this results in qgroup numbers going bad shortly after a snapshot is removed. Fix this by adding a dirty extent record when we encounter extents during our shared subtree walk. This effectively restores the functionality we had with the original shared subtree walkign code in commit 1152651 (btrfs: qgroup: account shared subtrees during snapshot delete) This patch also moves the open coded allocation handling for qgroup_extent_record into their own functions. Signed-off-by: Mark Fasheh mfas...@suse.de qdiff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c index ac3e81d..8156f50 100644 --- a/fs/btrfs/delayed-ref.c +++ b/fs/btrfs/delayed-ref.c @@ -486,7 +486,7 @@ add_delayed_ref_head(struct btrfs_fs_info *fs_info, qexisting = btrfs_qgroup_insert_dirty_extent(delayed_refs, qrecord); if (qexisting) - kfree(qrecord); + btrfs_qgroup_free_extent_record(qrecord); } spin_lock_init(head_ref-lock); @@ -654,7 +654,7 @@ int btrfs_add_delayed_tree_ref(struct btrfs_fs_info *fs_info, goto free_ref; if (fs_info-quota_enabled is_fstree(ref_root)) { - record = kmalloc(sizeof(*record), GFP_NOFS); + record = btrfs_qgroup_alloc_extent_record(); if (!record) goto free_head_ref; } diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 07204bf..ab81135 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -7756,18 +7756,31 @@ reada: wc-reada_slot = slot; } -/* - * TODO: Modify related function to add related node/leaf to dirty_extent_root, - * for later qgroup accounting. - * - * Current, this function does nothing. - */ +static int record_one_item(struct btrfs_trans_handle *trans, u64 bytenr, + u64 num_bytes) +{ + struct btrfs_qgroup_extent_record *qrecord = btrfs_qgroup_alloc_extent_record(); + struct btrfs_delayed_ref_root *delayed_refs = trans-transaction-delayed_refs; + + if (!qrecord) + return -ENOMEM; + + qrecord-bytenr = bytenr; + qrecord-num_bytes = num_bytes; + qrecord-old_roots = NULL; + + if (btrfs_qgroup_insert_dirty_extent(delayed_refs, qrecord)) + btrfs_qgroup_free_extent_record(qrecord); + + return 0; +} + static int account_leaf_items(struct btrfs_trans_handle *trans, struct btrfs_root *root, struct extent_buffer *eb) { int nr = btrfs_header_nritems(eb); - int i, extent_type; + int i, extent_type, ret; struct btrfs_key key; struct btrfs_file_extent_item *fi; u64 bytenr, num_bytes; @@ -7790,6 +7803,10 @@ static int account_leaf_items(struct btrfs_trans_handle *trans, continue; num_bytes = btrfs_file_extent_disk_num_bytes(eb, fi); + + ret = record_one_item(trans, bytenr, num_bytes); + if (ret) + return ret; } return 0; } @@ -7858,8 +7875,6 @@ static int adjust_slots_upwards(struct btrfs_root *root, /* * root_eb is the subtree root and is locked before this function is called. - * TODO: Modify this function to mark all (including complete shared node) - * to dirty_extent_root to allow it get accounted in qgroup. */ static int account_shared_subtree(struct btrfs_trans_handle *trans, struct btrfs_root *root, @@ -7937,6 +7952,11 @@ walk_down: btrfs_tree_read_lock(eb); btrfs_set_lock_blocking_rw(eb, BTRFS_READ_LOCK); path-locks[level] = BTRFS_READ_LOCK_BLOCKING; + + ret = record_one_item(trans, child_bytenr, + root-nodesize); + if (ret) + goto out; } if (level == 0) { diff --git a/fs/btrfs/qgroup.c
[survey] sysfs layout for btrfs
Hello, as of now btrfs sysfs does not include the attributes for the volume manager part in its sysfs layout, so its being developed and there are two types of layout here below, so I have a quick survey to know which will be preferred. contenders are: 1. FS and VM (volume manager) attributes[1] merged sysfs layout 2. FS and VM attributes separated sysfs layout. These two choices differ whether the VM attributes are amalgamate with existing FS attributes or if VM attributes are put under a kobject named pools/volumes under /sys/fs/btrfs. More in the below example. which would highlight the trade off between these two. Eg for #1 (above): The existing sysfs for btrfs, has the top kobject fsid /sys/fs/btrfs/fsid -- holds FS attr, VM attr will be added here. /sys/fs/btrfs/fsid/devices/uuid [2] -- btrfs_devices attr here /sys/fs/btrfs/fsid/devices/uuid/state /sys/fs/btrfs/fsid/devices/uuid/offline we won't be able to change the sysfs entries which is already there. However we could change the context in which they are created and destroyed that is, from mount and unmount, to device scan and module unload respectively. And so this will enable us to implement the # 1 (above). Eg for #2 (above): For the 2nd choice, a new 'pools or volumes' kobject will be created under existing /sys/fs/btrfs/ which will hold the VM attributes. (however note that: there will be duplicate kobjects like fsid both under FS and VM in this choice #2). /sys/fs/btrfs/fsid --- as is, will continue to hold fs attributes. /sys/fs/btrfs/pools/fsid/ -- will hold VM attributes /sys/fs/btrfs/pools/fsid/devices/sdx -- btrfs_devices attr here /sys/fs/btrfs/pools/fsid/devices/sdx/state /sys/fs/btrfs/pools/fsid/devices/sdx/offline There is certainly a small trade-off between these two. Your comments / feedback are kindly appreciated. Thanks, Anand [1] attributes will be of the btrfs_fs_devices structure. And few newly introduced attributes like 'state', to state the volume current state. [2] note that we can not use sdx here since a link to the block device already exists with that name. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Can't mount degraded. How to remove/add drives OFFLINE?
I applied that patch to my 4.1.4, it mounted degraded, and now it's balancing to the new drive. Thanks for all the help! On Fri, Aug 14, 2015 at 8:28 PM, Anand Jain anand.j...@oracle.com wrote: Just to be clear, I removed the drive (the original failed drive) when the power was off, then powered up, and then mounted degraded. That's not dangerous that I know of. patch has details. pls refer. Where is this patch, and what kernel versions can this be applied to? https://patchwork.kernel.org/patch/7014141/ its on 4.3. but should apply nice on below. thanks Anand -- Timothy Normand Miller, PhD Assistant Professor of Computer Science, Binghamton University http://www.cs.binghamton.edu/~millerti/ Open Graphics Project -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Deleted files cause btrfs-send to fail
Marc Joliet posted on Fri, 14 Aug 2015 23:37:37 +0200 as excerpted: (One other thing I found interesting was that btrfs scrub didn't care about the link count errors.) A lot of people are confused about exactly what btrfs scrub does, and expect it to detect and possibly fix stuff it has nothing to do with. It's *not* an fsck. Scrub does one very useful, but limited, thing. It systematically verifies that the computed checksums for all data and metadata covered by checksums match the corresponding recorded checksums. For dup/raid1/ raid10 modes, if there's a match failure, it will look up the other copy and see if it matches, replacing the invalid block with a new copy of the other one, assuming it's valid. For raid56 modes, it attempts to compute the valid copy from parity and, again assuming a match after doing so, does the replace. If a valid copy cannot be found or computed, either because it's damaged too or because there's no second copy or parity to fall back on (single and raid0 modes), then scrub will detect but cannot correct the error. In routine usage, btrfs automatically does the same thing if it happens to come across checksum errors in its normal IO stream, but it has to come across them first. Scrub's benefit is that it systematically verifies (and corrects errors where it can) checksums on the entire filesystem, not just the parts that happen to appear in the normal IO stream. Such checksum errors can be for a few reasons... I have one ssd that's gradually failing and returns checksum errors fairly regularly. Were I using a normal filesystem I'd have had to replace it some time ago. But with btrfs in raid1 mode and regular scrubs (and backups, should they be needed; sometimes I let them get a bit stale, but I do have them and am prepared to live with the stale restored data if I have to), I've been able to keep using the failing device. When the scrubs hit errors and btrfs does the rewrite from the good copy, a block relocation on the failing device is triggered as well, with the bad block taken out of service and a new one from the set of spares all modern devices have takes its place. Currently, smartctl -A reports 904 reallocated sectors raw value, with a standardized value of 92. Before the first reallocated sector, the standardized value was 253, perfect. With the first reallocated sector, it immediately dropped to 100, apparently the rounded percentage of spare sectors left. It has gradually dropped since then to its current 92, with a threshold value of 36. So while it's gradually failing, there's still plenty of spare sectors left. Normally I would have replaced the device even so, but I've never actually had the opportunity to actually watch a slow failure continue to get worse over time, and now that I do I'm a bit curious how things will go, so I'm just letting it happen, tho I do have a replacement device already purchased and ready, when the time comes. So real media failure, bitrot, is one reason for bad checksums. The data read back from the device simply isn't the same data that was stored to it, and the checksum fails as a result. Of course bad connector cables or storage chipset firmware or hardware is another hardware cause. Sudden reboot or power loss, with data being actively written and one copy either already updated or not yet touched, while the other is actually being written at the time of the crash so the write isn't completed, is yet another reason for checksum failure. This one is actually why a scrub can appear to do so much more than it does, because where there's a second copy (or parity) of the data available, scrub can use it to recover the partially written copy (which being partially written fails its checksum verification) to either the completed write state, if the other copy was already written, or the pre-write state, if the other copy hadn't been written at all, yet. In this way the result is often the same one an fsck would normally produce, detecting and fixing the error, but the mechanism is entirely different -- it only detected and fixed the error because the checksum was bad and it had a good copy it could replace it with, not because it had any smarts about how the filesystem actually worked, and could actually tell what the error was and correct it by actually correcting it. Meanwhile, in your case the problem was an actual btrfs logic bug -- it didn't track the inode ref-counts correctly, and didn't remove the inode when the last reference to it was deleted, because it still thought there were more references. So the metadata actually written to storage was incorrect due to the logic flaw, but the checksum covering it was indeed the correct checksum for that metadata, as wrong as the metadata actually happened to be. So scrub couldn't detect the error, because it was an error not in checksum, which was computed correctly over the metadata, but in
lockup
I have a Xen server with 14 DomUs that are being used for BTRFS and ZFS training. About 5 people are corrupting virtual disks and scrubbing them, lots of IO. All the virtual machine disk images are snapshots of a master image with copy on write. I just had the following error which ended with a NMI. I copied what I could. It's running the latest Debian/Jessie kernel 3.16.7. [15780.056002] Code: 44 24 10 e9 1c ff ff ff 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 41 54 55 48 89 fd 53 4c 8b 67 50 66 66 66 66 90 f0 ff 4d 4c 74 35 5b 5d 41 5c c3 48 8b 1d a9 07 07 00 48 85 db 74 1c 48 8b [15808.056003] BUG: soft lockup - CPU#1 stuck for 22s! [qemu-system-i38:22730] [15808.056003] Modules linked in: xt_tcpudp xt_physdev iptable_filter ip_tables x_tables xen_netback xen_gntdev xen_evtchn xenfs xen_privcmd nfsd auth_rpcgss oid_registry nfs_acl nfs lockd fscache sunrpc bridge stp llc ext4 crc16 mbcache jbd2 ppdev psmouse serio_raw pcspkr k8temp joydev evdev ipmi_si ns558 gameport parport_pc parport ipmi_msghandler snd_mpu401_uart snd_rawmidi snd_seq_device snd processor button soundcore edac_mce_amd edac_core i2c_nforce2 i2c_core shpchp thermal_sys loop autofs4 crc32c_generic btrfs xor raid6_pq raid1 md_mod sd_mod crc_t10dif crct10dif_generic crct10dif_common hid_generic usbhid hid sg sr_mod cdrom ata_generic ohci_pci mptsas scsi_transport_sas mptscsih mptbase e1000 pata_amd ehci_pci ohci_hcd ehci_hcd libata forcedeth scsi_mod usbcore usb_common [15808.056003] CPU: 1 PID: 22730 Comm: qemu-system-i38 Not tainted 3.16.0-4- amd64 #1 Debian 3.16.7-ckt11-1+deb8u3 [15808.056003] Hardware name: Sun Microsystems Sun Fire X4100 M2/Sun Fire X4100 M2, BIOS 0ABJX102 11/03/2008 [15808.056003] task: 8812e010 ti: 880001e9c000 task.ti: 880001e9c000 [15808.056003] RIP: e030:[a024edb9] [a024edb9] btrfs_put_ordered_extent+0x19/0xc0 [btrfs] [15808.056003] RSP: e02b:880001e9fe08 EFLAGS: 0202 [15808.056003] RAX: 0583 RBX: 88000a4f0580 RCX: 06a4 [15808.056003] RDX: 88000a4f0580 RSI: 88000a4f0508 RDI: 88000a4f0508 [15808.056003] RBP: 88000a4f0508 R08: 88000a4f0560 R09: 8800502f29b0 [15808.056003] R10: 7ff0 R11: 0005 R12: 880053821950 [15808.056003] R13: 88000a4f0508 R14: 880004f7cf00 R15: 880001e9fe50 [15808.056003] FS: 7fdc312f5700() GS:88007744() knlGS: [15808.056003] CS: e033 DS: ES: CR0: 8005003b [15808.056003] CR2: 7f4af0c74000 CR3: 2e534000 CR4: 0660 [15808.056003] Stack: [15808.056003] 88000a4f0580 880052d76800 880002503800 a02342f4 [15808.056003] 880004f7cfa8 880002503000 a02881e2 [15808.056003] 8800 880052d76800 88000b7f7b18 [15808.056003] Call Trace: [15808.056003] [a02342f4] ? btrfs_wait_pending_ordered+0xc4/0x100 [btrfs] [15808.056003] [a02881e2] ? __btrfs_run_delayed_items+0xf2/0x1d0 [btrfs] [15808.056003] [a0236356] ? btrfs_commit_transaction+0x2d6/0xa10 [btrfs] [15808.056003] [810a7a40] ? prepare_to_wait_event+0xf0/0xf0 [15808.056003] [a0246529] ? btrfs_sync_file+0x1c9/0x2f0 [btrfs] [15808.056003] [811d53cb] ? do_fsync+0x4b/0x70 [15808.056003] [811d564f] ? SyS_fdatasync+0xf/0x20 [15808.056003] [8151158d] ? system_call_fast_compare_end+0x10/0x15 [15808.056003] Code: 44 24 10 e9 1c ff ff ff 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 41 54 55 48 89 fd 53 4c 8b 67 50 66 66 66 66 90 f0 ff 4d 4c 74 35 5b 5d 41 5c c3 48 8b 1d a9 07 07 00 48 85 db 74 1c 48 8b [15818.440002] INFO: rcu_sched self-detected stall on CPU { 1} (t=68266 jiffies g=236497 c=236496 q=6784) [15818.440002] sending NMI to all CPUs: [15818.440002] NMI backtrace for cpu 1 [15818.440002] CPU: 1 PID: 22730 Comm: qemu-system-i38 Not tainted 3.16.0-4- amd64 #1 Debian 3.16.7-ckt11-1+deb8u3 [15818.440002] Hardware name: Sun Microsystems Sun Fire X4100 M2/Sun Fire X4100 M2, BIOS 0ABJX102 11/03/2008 [15818.440002] task: 8812e010 ti: 880001e9c000 task.ti: 880001e9c000 [15818.440002] RIP: e030:[8100130a] [8100130a] xen_hypercall_vcpu_op+0xa/0x20 [15818.440002] RSP: e02b:880077443cc8 EFLAGS: 0046 [15818.440002] RAX: RBX: 0001 RCX: 8100130a [15818.440002] RDX: RSI: 0001 RDI: 000b [15818.440002] RBP: 818e2900 R08: 818e23e0 R09: 880bcc40 [15818.440002] R10: 0855 R11: 0246 R12: 818e23e0 [15818.440002] R13: 0005 R14: 1a80 R15: 81853680 [15818.440002] FS: 7fdc312f5700() GS:88007744() knlGS: [15818.440002] CS: e033 DS: ES: CR0: 8005003b [15818.440002]
Re: delete missing with two missing devices doesn't delete both missing, only does a partial reconstruction
BTW, when this is all over with, how do I make sure there are really two copies of everything? Will a scrub verify this? Should I run a balance operation? On Fri, Aug 14, 2015 at 11:29 PM, Timothy Normand Miller theo...@gmail.com wrote: After applying Anand's patch, I was able to mount my 4-drive RAID1 and bring a new fourth drive online. However, something weird happened where the first delete missing only deleted one missing drive and only did a partial duplication. I've posted a bug report here: https://bugzilla.kernel.org/show_bug.cgi?id=102901 -- Timothy Normand Miller, PhD Assistant Professor of Computer Science, Binghamton University http://www.cs.binghamton.edu/~millerti/ Open Graphics Project -- Timothy Normand Miller, PhD Assistant Professor of Computer Science, Binghamton University http://www.cs.binghamton.edu/~millerti/ Open Graphics Project -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Can't mount degraded. How to remove/add drives OFFLINE?
I thought for a second that maybe the problem is due to the phantom single chunk(s) created at mkfs time. I redid the test, and did a balance to get rid of the single chunk. I did this right after populating volume with some data. But the problem still happens. --- Chris Murphy -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: delete missing with two missing devices doesn't delete both missing, only does a partial reconstruction
BTW, when this is all over with, how do I make sure there are really two copies of everything? Will a scrub verify this? Should I run a balance operation? pls use 'btrfs bal profile and convert' to migrate single chunk (if any created when there were lesser number of RW-able devices) back to your desired raid1. Do this when all the devices are back online. Kindly note there is a bug in the btrfs VM that you won't be able to bring a device online with out unmount - mount (I am working to fix). btrfs-progs will be wrong in this case don't depend too much on that. So to understand inside of btrfs kernel volume I generally use: https://patchwork.kernel.org/patch/5816011/ In there if bdev is null it indicates device is scanned but not part of VM yet. Then unmount - mount will bring device back to be part of VM. After applying Anand's patch, I was able to mount my 4-drive RAID1 and bring a new fourth drive online. However, something weird happened where the first delete missing only deleted one missing drive and only did a partial duplication. I've posted a bug report here: that seems to be normal to me. unless I am missing something else / clarity. Thanks, Anand -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
delete missing with two missing devices doesn't delete both missing, only does a partial reconstruction
After applying Anand's patch, I was able to mount my 4-drive RAID1 and bring a new fourth drive online. However, something weird happened where the first delete missing only deleted one missing drive and only did a partial duplication. I've posted a bug report here: https://bugzilla.kernel.org/show_bug.cgi?id=102901 -- Timothy Normand Miller, PhD Assistant Professor of Computer Science, Binghamton University http://www.cs.binghamton.edu/~millerti/ Open Graphics Project -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 05/23] Btrfs: rename super_kobj to fsid_kobj
Signed-off-by: Anand Jain anand.j...@oracle.com --- fs/btrfs/sysfs.c | 36 ++-- fs/btrfs/volumes.c | 2 +- fs/btrfs/volumes.h | 2 +- 3 files changed, 20 insertions(+), 20 deletions(-) diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c index 52319d1..e0ac859 100644 --- a/fs/btrfs/sysfs.c +++ b/fs/btrfs/sysfs.c @@ -437,24 +437,24 @@ static const struct attribute *btrfs_attrs[] = { NULL, }; -static void btrfs_release_super_kobj(struct kobject *kobj) +static void btrfs_release_fsid_kobj(struct kobject *kobj) { struct btrfs_fs_devices *fs_devs = to_fs_devs(kobj); - memset(fs_devs-super_kobj, 0, sizeof(struct kobject)); + memset(fs_devs-fsid_kobj, 0, sizeof(struct kobject)); complete(fs_devs-kobj_unregister); } static struct kobj_type btrfs_ktype = { .sysfs_ops = kobj_sysfs_ops, - .release= btrfs_release_super_kobj, + .release= btrfs_release_fsid_kobj, }; static inline struct btrfs_fs_devices *to_fs_devs(struct kobject *kobj) { if (kobj-ktype != btrfs_ktype) return NULL; - return container_of(kobj, struct btrfs_fs_devices, super_kobj); + return container_of(kobj, struct btrfs_fs_devices, fsid_kobj); } static inline struct btrfs_fs_info *to_fs_info(struct kobject *kobj) @@ -502,12 +502,12 @@ static int addrm_unknown_feature_attrs(struct btrfs_fs_info *fs_info, bool add) attrs[0] = fa-kobj_attr.attr; if (add) { int ret; - ret = sysfs_merge_group(fs_info-fs_devices-super_kobj, + ret = sysfs_merge_group(fs_info-fs_devices-fsid_kobj, agroup); if (ret) return ret; } else - sysfs_unmerge_group(fs_info-fs_devices-super_kobj, + sysfs_unmerge_group(fs_info-fs_devices-fsid_kobj, agroup); } @@ -523,9 +523,9 @@ static void __btrfs_sysfs_remove_fsid(struct btrfs_fs_devices *fs_devs) fs_devs-device_dir_kobj = NULL; } - if (fs_devs-super_kobj.state_initialized) { - kobject_del(fs_devs-super_kobj); - kobject_put(fs_devs-super_kobj); + if (fs_devs-fsid_kobj.state_initialized) { + kobject_del(fs_devs-fsid_kobj); + kobject_put(fs_devs-fsid_kobj); wait_for_completion(fs_devs-kobj_unregister); } } @@ -555,8 +555,8 @@ void btrfs_sysfs_remove_mounted(struct btrfs_fs_info *fs_info) kobject_put(fs_info-space_info_kobj); } addrm_unknown_feature_attrs(fs_info, false); - sysfs_remove_group(fs_info-fs_devices-super_kobj, btrfs_feature_attr_group); - sysfs_remove_files(fs_info-fs_devices-super_kobj, btrfs_attrs); + sysfs_remove_group(fs_info-fs_devices-fsid_kobj, btrfs_feature_attr_group); + sysfs_remove_files(fs_info-fs_devices-fsid_kobj, btrfs_attrs); btrfs_sysfs_rm_device_link(fs_info-fs_devices, NULL); } @@ -675,7 +675,7 @@ int btrfs_sysfs_add_device(struct btrfs_fs_devices *fs_devs) { if (!fs_devs-device_dir_kobj) fs_devs-device_dir_kobj = kobject_create_and_add(devices, - fs_devs-super_kobj); + fs_devs-fsid_kobj); if (!fs_devs-device_dir_kobj) return -ENOMEM; @@ -730,8 +730,8 @@ int btrfs_sysfs_add_fsid(struct btrfs_fs_devices *fs_devs, int error; init_completion(fs_devs-kobj_unregister); - fs_devs-super_kobj.kset = btrfs_kset; - error = kobject_init_and_add(fs_devs-super_kobj, + fs_devs-fsid_kobj.kset = btrfs_kset; + error = kobject_init_and_add(fs_devs-fsid_kobj, btrfs_ktype, parent, %pU, fs_devs-fsid); return error; } @@ -740,7 +740,7 @@ int btrfs_sysfs_add_mounted(struct btrfs_fs_info *fs_info) { int error; struct btrfs_fs_devices *fs_devs = fs_info-fs_devices; - struct kobject *super_kobj = fs_devs-super_kobj; + struct kobject *fsid_kobj = fs_devs-fsid_kobj; btrfs_set_fs_info_ptr(fs_info); @@ -748,13 +748,13 @@ int btrfs_sysfs_add_mounted(struct btrfs_fs_info *fs_info) if (error) return error; - error = sysfs_create_files(super_kobj, btrfs_attrs); + error = sysfs_create_files(fsid_kobj, btrfs_attrs); if (error) { btrfs_sysfs_rm_device_link(fs_devs, NULL); return error; } - error = sysfs_create_group(super_kobj, + error = sysfs_create_group(fsid_kobj,
[PATCH 02/23] Btrfs: rename btrfs_sysfs_remove_one to btrfs_sysfs_remove_mounted
--- fs/btrfs/ctree.h | 2 +- fs/btrfs/disk-io.c | 4 ++-- fs/btrfs/sysfs.c | 4 ++-- 3 files changed, 5 insertions(+), 5 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index afce306..4484063 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -4005,7 +4005,7 @@ int btrfs_defrag_leaves(struct btrfs_trans_handle *trans, int btrfs_init_sysfs(void); void btrfs_exit_sysfs(void); int btrfs_sysfs_add_mounted(struct btrfs_fs_info *fs_info); -void btrfs_sysfs_remove_one(struct btrfs_fs_info *fs_info); +void btrfs_sysfs_remove_mounted(struct btrfs_fs_info *fs_info); /* xattr.c */ ssize_t btrfs_listxattr(struct dentry *dentry, char *buffer, size_t size); diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 376a6ef..8571025 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -3108,7 +3108,7 @@ fail_cleaner: filemap_write_and_wait(fs_info-btree_inode-i_mapping); fail_sysfs: - btrfs_sysfs_remove_one(fs_info); + btrfs_sysfs_remove_mounted(fs_info); fail_fsdev_sysfs: btrfs_sysfs_remove_fsid(fs_info-fs_devices); @@ -3791,7 +3791,7 @@ void close_ctree(struct btrfs_root *root) percpu_counter_sum(fs_info-delalloc_bytes)); } - btrfs_sysfs_remove_one(fs_info); + btrfs_sysfs_remove_mounted(fs_info); btrfs_sysfs_remove_fsid(fs_info-fs_devices); btrfs_free_fs_roots(fs_info); diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c index cabf840..095a302 100644 --- a/fs/btrfs/sysfs.c +++ b/fs/btrfs/sysfs.c @@ -545,7 +545,7 @@ void btrfs_sysfs_remove_fsid(struct btrfs_fs_devices *fs_devs) } } -void btrfs_sysfs_remove_one(struct btrfs_fs_info *fs_info) +void btrfs_sysfs_remove_mounted(struct btrfs_fs_info *fs_info) { btrfs_reset_fs_info_ptr(fs_info); @@ -776,7 +776,7 @@ int btrfs_sysfs_add_mounted(struct btrfs_fs_info *fs_info) return 0; failure: - btrfs_sysfs_remove_one(fs_info); + btrfs_sysfs_remove_mounted(fs_info); return error; } -- 2.4.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 17/23] Btrfs: kernel operation should come after user input has been verified
By general rule of thumb there shouldn't be any way that user land could trigger a kernel operation just by sending wrong arguments. Here do commit cleanups after user input has been verified. Signed-off-by: Anand Jain anand.j...@oracle.com --- fs/btrfs/dev-replace.c | 26 +- 1 file changed, 13 insertions(+), 13 deletions(-) diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c index 673a2c3..937e53b 100644 --- a/fs/btrfs/dev-replace.c +++ b/fs/btrfs/dev-replace.c @@ -325,19 +325,6 @@ int btrfs_dev_replace_start(struct btrfs_root *root, args-start.tgtdev_name[0] == '\0') return -EINVAL; - /* -* Here we commit the transaction to make sure commit_total_bytes -* of all the devices are updated. -*/ - trans = btrfs_attach_transaction(root); - if (!IS_ERR(trans)) { - ret = btrfs_commit_transaction(trans, root); - if (ret) - return ret; - } else if (PTR_ERR(trans) != -ENOENT) { - return PTR_ERR(trans); - } - /* the disk copy procedure reuses the scrub code */ mutex_lock(fs_info-volume_mutex); ret = btrfs_find_device_by_user_input(root, args-start.srcdevid, @@ -354,6 +341,19 @@ int btrfs_dev_replace_start(struct btrfs_root *root, if (ret) return ret; + /* +* Here we commit the transaction to make sure commit_total_bytes +* of all the devices are updated. +*/ + trans = btrfs_attach_transaction(root); + if (!IS_ERR(trans)) { + ret = btrfs_commit_transaction(trans, root); + if (ret) + return ret; + } else if (PTR_ERR(trans) != -ENOENT) { + return PTR_ERR(trans); + } + btrfs_dev_replace_lock(dev_replace); switch (dev_replace-replace_state) { case BTRFS_IOCTL_DEV_REPLACE_STATE_NEVER_STARTED: -- 2.4.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 21/23] Btrfs: fix fs logging for multi device
In case of multi device btrfs fs, using one of device for the logging purpose it quite confusing, instead use the fsid. FSID is bit long, but the device path can be long as well in some cases. Signed-off-by: Anand Jain anand.j...@oracle.com --- fs/btrfs/super.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c index 56c0174..a8a0109 100644 --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -190,12 +190,12 @@ static const char * const logtypes[] = { void btrfs_printk(const struct btrfs_fs_info *fs_info, const char *fmt, ...) { - struct super_block *sb = fs_info-sb; char lvl[4]; struct va_format vaf; va_list args; const char *type = logtypes[4]; int kern_level; + struct btrfs_fs_devices *fs_devs = fs_info-fs_devices; va_start(args, fmt); @@ -212,7 +212,7 @@ void btrfs_printk(const struct btrfs_fs_info *fs_info, const char *fmt, ...) vaf.fmt = fmt; vaf.va = args; - printk(%sBTRFS %s (device %s): %pV\n, lvl, type, sb-s_id, vaf); + printk(%sBTRFS: %pU %s: %pV\n, lvl, fs_devs-fsid, type, vaf); va_end(args); } -- 2.4.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 12/23] Btrfs: use btrfs_find_device_by_user_input()
btrfs_rm_device() has a section of the code which can be replaced btrfs_find_device_by_user_input() Signed-off-by: Anand Jain anand.j...@oracle.com --- fs/btrfs/volumes.c | 84 -- 1 file changed, 19 insertions(+), 65 deletions(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index f1b36b9..1d35332 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -1689,7 +1689,6 @@ int btrfs_rm_device(struct btrfs_root *root, char *device_path, u64 devid) struct btrfs_super_block *disk_super = NULL; struct btrfs_fs_devices *cur_devices; u64 num_devices; - u8 *dev_uuid; int ret = 0; bool clear_super = false; char *dev_name = NULL; @@ -1700,62 +1699,19 @@ int btrfs_rm_device(struct btrfs_root *root, char *device_path, u64 devid) if (ret) goto out; - if (devid) { - device = btrfs_find_device(root-fs_info, devid, - NULL, NULL); - if (!device) { - ret = -ENOENT; - goto out; - } - device_path = rcu_str_deref(device-name); - } else if (strcmp(device_path, missing) == 0) { - struct list_head *devices; - struct btrfs_device *tmp; - - device = NULL; - devices = root-fs_info-fs_devices-devices; - /* -* It is safe to read the devices since the volume_mutex -* is held. -*/ - list_for_each_entry(tmp, devices, dev_list) { - if (tmp-in_fs_metadata - !tmp-is_tgtdev_for_dev_replace - !tmp-bdev) { - device = tmp; - break; - } - } - if (!device) { - ret = BTRFS_ERROR_DEV_MISSING_NOT_FOUND; - goto out; - } - } else { - ret = btrfs_get_bdev_and_sb(device_path, - FMODE_WRITE | FMODE_EXCL, - root-fs_info-bdev_holder, 0, - bdev, bh); - if (ret) - goto out; - disk_super = (struct btrfs_super_block *)bh-b_data; - devid = btrfs_stack_device_id(disk_super-dev_item); - dev_uuid = disk_super-dev_item.uuid; - device = btrfs_find_device(root-fs_info, devid, dev_uuid, - disk_super-fsid); - if (!device) { - ret = -ENOENT; - goto error_brelse; - } - } + ret = btrfs_find_device_by_user_input(root, devid, device_path, + device); + if (ret) + goto out; if (device-is_tgtdev_for_dev_replace) { ret = BTRFS_ERROR_DEV_TGT_REPLACE; - goto error_brelse; + goto out; } if (device-writeable root-fs_info-fs_devices-rw_devices == 1) { ret = BTRFS_ERROR_DEV_ONLY_WRITABLE; - goto error_brelse; + goto out; } if (device-writeable) { @@ -1865,7 +1821,7 @@ int btrfs_rm_device(struct btrfs_root *root, char *device_path, u64 devid) * to fail. So return success */ ret = 0; - goto done; + goto out; } disk_super = (struct btrfs_super_block *)bh-b_data; @@ -1876,6 +1832,7 @@ int btrfs_rm_device(struct btrfs_root *root, char *device_path, u64 devid) memset(disk_super-magic, 0, sizeof(disk_super-magic)); set_buffer_dirty(bh); sync_dirty_buffer(bh); + brelse(bh); /* clear the mirror copies of super block on the disk * being removed, 0th copy is been taken care above and @@ -1887,7 +1844,6 @@ int btrfs_rm_device(struct btrfs_root *root, char *device_path, u64 devid) i_size_read(bdev-bd_inode)) break; - brelse(bh); bh = __bread(bdev, bytenr / 4096, BTRFS_SUPER_INFO_SIZE); if (!bh) @@ -1897,35 +1853,33 @@ int btrfs_rm_device(struct btrfs_root *root, char *device_path, u64 devid) if (btrfs_super_bytenr(disk_super) != bytenr || btrfs_super_magic(disk_super) != BTRFS_MAGIC) { + brelse(bh);
[PATCH 10/23] Btrfs: rename btrfs_dev_replace_find_srcdev()
The patch renames btrfs_dev_replace_find_srcdev() to btrfs_find_device_by_user_input() so that it can be used by btrfs_rm_device() as well in the next patches. Signed-off-by: Anand Jain anand.j...@oracle.com --- fs/btrfs/dev-replace.c | 24 +--- fs/btrfs/volumes.c | 19 +++ fs/btrfs/volumes.h | 3 +++ 3 files changed, 23 insertions(+), 23 deletions(-) diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c index 6eb9324..673a2c3 100644 --- a/fs/btrfs/dev-replace.c +++ b/fs/btrfs/dev-replace.c @@ -44,9 +44,6 @@ static void btrfs_dev_replace_update_device_in_mapping_tree( struct btrfs_fs_info *fs_info, struct btrfs_device *srcdev, struct btrfs_device *tgtdev); -static int btrfs_dev_replace_find_srcdev(struct btrfs_root *root, u64 srcdevid, -char *srcdev_name, -struct btrfs_device **device); static u64 __btrfs_dev_replace_cancel(struct btrfs_fs_info *fs_info); static int btrfs_dev_replace_kthread(void *data); static int btrfs_dev_replace_continue_on_mount(struct btrfs_fs_info *fs_info); @@ -343,7 +340,7 @@ int btrfs_dev_replace_start(struct btrfs_root *root, /* the disk copy procedure reuses the scrub code */ mutex_lock(fs_info-volume_mutex); - ret = btrfs_dev_replace_find_srcdev(root, args-start.srcdevid, + ret = btrfs_find_device_by_user_input(root, args-start.srcdevid, args-start.srcdev_name, src_device); if (ret) { @@ -626,25 +623,6 @@ static void btrfs_dev_replace_update_device_in_mapping_tree( write_unlock(em_tree-lock); } -static int btrfs_dev_replace_find_srcdev(struct btrfs_root *root, u64 srcdevid, -char *srcdev_name, -struct btrfs_device **device) -{ - int ret; - - if (srcdevid) { - ret = 0; - *device = btrfs_find_device(root-fs_info, srcdevid, NULL, - NULL); - if (!*device) - ret = -ENOENT; - } else { - ret = btrfs_find_device_missing_or_by_path(root, srcdev_name, - device); - } - return ret; -} - void btrfs_dev_replace_status(struct btrfs_fs_info *fs_info, struct btrfs_ioctl_dev_replace_args *args) { diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 1573997..101a473 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -2089,6 +2089,25 @@ int btrfs_find_device_missing_or_by_path(struct btrfs_root *root, } } +int btrfs_find_device_by_user_input(struct btrfs_root *root, u64 srcdevid, +char *srcdev_name, +struct btrfs_device **device) +{ + int ret; + + if (srcdevid) { + ret = 0; + *device = btrfs_find_device(root-fs_info, srcdevid, NULL, + NULL); + if (!*device) + ret = -ENOENT; + } else { + ret = btrfs_find_device_missing_or_by_path(root, srcdev_name, + device); + } + return ret; +} + /* * does all the dirty work required for changing file system's UUID. */ diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h index f4b0ed8..a093b36 100644 --- a/fs/btrfs/volumes.h +++ b/fs/btrfs/volumes.h @@ -429,6 +429,9 @@ void btrfs_close_extra_devices(struct btrfs_fs_devices *fs_devices, int step); int btrfs_find_device_missing_or_by_path(struct btrfs_root *root, char *device_path, struct btrfs_device **device); +int btrfs_find_device_by_user_input(struct btrfs_root *root, u64 srcdevid, +char *srcdev_name, +struct btrfs_device **device); struct btrfs_device *btrfs_alloc_device(struct btrfs_fs_info *fs_info, const u64 *devid, const u8 *uuid); -- 2.4.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 00/23] btrfs device related patch set
This patch set includes patches which has been sent before independently, however here they are consolidated on the current integration 4.3. Most of them are cleanup and preparatory work for the RFE which are published before, viz.. addition of sys volume attributes and introduce a method to offline device. And except for the patch Btrfs: device delete by devid provides a way to delete device using devid (assume that devid has failed) thus fixes the issue reported by the user in the community. and the patch Btrfs: allow -o rw,degraded for single group profile fixes an important btrfs volume availability issue Anand Jain (22): Btrfs: rename btrfs_sysfs_add_one to btrfs_sysfs_add_mounted Btrfs: rename btrfs_sysfs_remove_one to btrfs_sysfs_remove_mounted Btrfs: rename btrfs_kobj_add_device to btrfs_sysfs_add_device_link Btrfs: rename btrfs_kobj_rm_device to btrfs_sysfs_rm_device_link Btrfs: rename super_kobj to fsid_kobj Btrfs: SB read failure should return EIO for __bread failure Btrfs: __btrfs_std_error() logic should be consistent w/out CONFIG_PRINTK defined Btrfs: device delete by devid Btrfs: move check for min number of devices to a function Btrfs: rename btrfs_dev_replace_find_srcdev() Btrfs: use BTRFS_ERROR_DEV_MISSING_NOT_FOUND when missing device is not found Btrfs: use btrfs_find_device_by_user_input() Btrfs: add btrfs_read_dev_one_super() to read one specific SB Btrfs: fix btrfs_scratch_superblock() with fixes from device delete Btrfs: use btrfs_scratch_superblock() in btrfs_rm_device() Btrfs: device path change must be logged Btrfs: kernel operation should come after user input has been verified Btrfs: check device_path in btrfs_find_device_by_user_input() Btrfs: avoid user cli usage error logging into the sys log Btrfs: move device close to btrfs_close_one_device Btrfs: fix fs logging for multi device Btrfs: allow -o rw,degraded for single group profile Liu Bo (1): Btrfs: move kobj stuff out of dev_replace lock range fs/btrfs/ctree.h | 4 +- fs/btrfs/dev-replace.c | 64 +++-- fs/btrfs/disk-io.c | 65 ++--- fs/btrfs/disk-io.h | 2 + fs/btrfs/ioctl.c | 50 ++- fs/btrfs/super.c | 34 ++--- fs/btrfs/sysfs.c | 52 +++ fs/btrfs/sysfs.h | 4 +- fs/btrfs/volumes.c | 345 + fs/btrfs/volumes.h | 10 +- include/uapi/linux/btrfs.h | 8 ++ 11 files changed, 331 insertions(+), 307 deletions(-) -- 2.4.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 01/23] Btrfs: rename btrfs_sysfs_add_one to btrfs_sysfs_add_mounted
Signed-off-by: Anand Jain anand.j...@oracle.com --- fs/btrfs/ctree.h | 2 +- fs/btrfs/disk-io.c | 2 +- fs/btrfs/sysfs.c | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 938efe3..afce306 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -4004,7 +4004,7 @@ int btrfs_defrag_leaves(struct btrfs_trans_handle *trans, /* sysfs.c */ int btrfs_init_sysfs(void); void btrfs_exit_sysfs(void); -int btrfs_sysfs_add_one(struct btrfs_fs_info *fs_info); +int btrfs_sysfs_add_mounted(struct btrfs_fs_info *fs_info); void btrfs_sysfs_remove_one(struct btrfs_fs_info *fs_info); /* xattr.c */ diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index cc15514b..376a6ef 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -2928,7 +2928,7 @@ retry_root_backup: goto fail_fsdev_sysfs; } - ret = btrfs_sysfs_add_one(fs_info); + ret = btrfs_sysfs_add_mounted(fs_info); if (ret) { pr_err(BTRFS: failed to init sysfs interface: %d\n, ret); goto fail_fsdev_sysfs; diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c index 603b0cc..cabf840 100644 --- a/fs/btrfs/sysfs.c +++ b/fs/btrfs/sysfs.c @@ -736,7 +736,7 @@ int btrfs_sysfs_add_fsid(struct btrfs_fs_devices *fs_devs, return error; } -int btrfs_sysfs_add_one(struct btrfs_fs_info *fs_info) +int btrfs_sysfs_add_mounted(struct btrfs_fs_info *fs_info) { int error; struct btrfs_fs_devices *fs_devs = fs_info-fs_devices; -- 2.4.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 08/23] Btrfs: device delete by devid
This introduces BTRFS_IOC_RM_DEV_V2, which can accept devid as an argument to delete the device. Current only choice to is to pass device path for the device delete cli, but if btrfs is unable to read device SB, then cli fails. And user won't be able to delete the device. With this patch now the user can specify devid as the device to delete. The patch won't delete the old interface so that kernel will remain compatible with the older user-interface programs like btrfs-progs. Test case/script: echo 0 $(blockdev --getsz /dev/sdf) linear /dev/sdf 0 | dmsetup create bad_disk mkfs.btrfs -f -d raid1 -m raid1 /dev/sdd /dev/sde /dev/mapper/bad_disk mount /dev/sdd /btrfs dmsetup suspend bad_disk echo 0 $(blockdev --getsz /dev/sdf) error /dev/sdf 0 | dmsetup load bad_disk dmsetup resume bad_disk echo bad disk failed. now deleting/replacing btrfs dev del 3 /btrfs echo $? btrfs fi show /btrfs umount /btrfs btrfs-show-super /dev/sdd | egrep num_device dmsetup remove bad_disk wipefs -a /dev/sdf v3: commit update, included test case v2: don't use device-name after free commit update with the test script which I have been using Signed-off-by: Anand Jain anand.j...@oracle.com Reported-by: Martin m_bt...@ml1.co.uk --- fs/btrfs/ioctl.c | 50 - fs/btrfs/volumes.c | 51 -- fs/btrfs/volumes.h | 2 +- include/uapi/linux/btrfs.h | 8 4 files changed, 98 insertions(+), 13 deletions(-) diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index 0adf542..6c9e58c 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -2653,6 +2653,52 @@ out: return ret; } +static long btrfs_ioctl_rm_dev_v2(struct file *file, void __user *arg) +{ + struct btrfs_root *root = BTRFS_I(file_inode(file))-root; + struct btrfs_ioctl_vol_args_v3 *vol_args; + int ret; + + if (!capable(CAP_SYS_ADMIN)) + return -EPERM; + + ret = mnt_want_write_file(file); + if (ret) + return ret; + + vol_args = memdup_user(arg, sizeof(*vol_args)); + if (IS_ERR(vol_args)) { + ret = PTR_ERR(vol_args); + goto err_drop; + } + + vol_args-name[BTRFS_PATH_NAME_MAX] = '\0'; + + if (atomic_xchg(root-fs_info-mutually_exclusive_operation_running, + 1)) { + ret = BTRFS_ERROR_DEV_EXCL_RUN_IN_PROGRESS; + goto out; + } + + mutex_lock(root-fs_info-volume_mutex); + ret = btrfs_rm_device(root, vol_args-name, vol_args-devid); + mutex_unlock(root-fs_info-volume_mutex); + atomic_set(root-fs_info-mutually_exclusive_operation_running, 0); + + if (!ret) { + if (vol_args-devid) + btrfs_info(root-fs_info, disk devid %llu deleted, + vol_args-devid); + else + btrfs_info(root-fs_info, disk deleted - %s, vol_args-name); + } +out: + kfree(vol_args); +err_drop: + mnt_drop_write_file(file); + return ret; +} + static long btrfs_ioctl_rm_dev(struct file *file, void __user *arg) { struct btrfs_root *root = BTRFS_I(file_inode(file))-root; @@ -2681,7 +2727,7 @@ static long btrfs_ioctl_rm_dev(struct file *file, void __user *arg) } mutex_lock(root-fs_info-volume_mutex); - ret = btrfs_rm_device(root, vol_args-name); + ret = btrfs_rm_device(root, vol_args-name, 0); mutex_unlock(root-fs_info-volume_mutex); atomic_set(root-fs_info-mutually_exclusive_operation_running, 0); @@ -5419,6 +5465,8 @@ long btrfs_ioctl(struct file *file, unsigned int return btrfs_ioctl_add_dev(root, argp); case BTRFS_IOC_RM_DEV: return btrfs_ioctl_rm_dev(file, argp); + case BTRFS_IOC_RM_DEV_V2: + return btrfs_ioctl_rm_dev_v2(file, argp); case BTRFS_IOC_FS_INFO: return btrfs_ioctl_fs_info(root, argp); case BTRFS_IOC_DEV_INFO: diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index a3fde18..2f8b974 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -1637,21 +1637,21 @@ out: return ret; } -int btrfs_rm_device(struct btrfs_root *root, char *device_path) +int btrfs_rm_device(struct btrfs_root *root, char *device_path, u64 devid) { struct btrfs_device *device; struct btrfs_device *next_device; - struct block_device *bdev; + struct block_device *bdev = NULL; struct buffer_head *bh = NULL; - struct btrfs_super_block *disk_super; + struct btrfs_super_block *disk_super = NULL; struct btrfs_fs_devices *cur_devices; u64 all_avail; - u64 devid; u64 num_devices; u8 *dev_uuid; unsigned seq; int ret = 0; bool clear_super = false; + char *dev_name = NULL;
[PATCH 22/23] Btrfs: move kobj stuff out of dev_replace lock range
From: Liu Bo bo.li@oracle.com To avoid deadlock described in commit 084b6e7c7607 (btrfs: Fix a lockdep warning when running xfstest.), we should move kobj stuff out of dev_replace lock range. Signed-off-by: Liu Bo bo.li@oracle.com Signed-off-by: Anand Jain anand.j...@oracle.com --- fs/btrfs/dev-replace.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c index 0df3d9b..c326d51 100644 --- a/fs/btrfs/dev-replace.c +++ b/fs/btrfs/dev-replace.c @@ -369,10 +369,6 @@ int btrfs_dev_replace_start(struct btrfs_root *root, WARN_ON(!tgt_device); dev_replace-tgtdev = tgt_device; - ret = btrfs_sysfs_add_device_link(tgt_device-fs_devices, tgt_device); - if (ret) - btrfs_err(root-fs_info, kobj add dev failed %d\n, ret); - printk_in_rcu(KERN_INFO BTRFS: dev_replace from %s (devid %llu) to %s started\n, src_device-missing ? missing disk : @@ -395,6 +391,10 @@ int btrfs_dev_replace_start(struct btrfs_root *root, args-result = BTRFS_IOCTL_DEV_REPLACE_RESULT_NO_ERROR; btrfs_dev_replace_unlock(dev_replace); + ret = btrfs_sysfs_add_device_link(tgt_device-fs_devices, tgt_device); + if (ret) + btrfs_err(root-fs_info, kobj add dev failed %d\n, ret); + btrfs_wait_ordered_roots(root-fs_info, -1); /* force writing the updated state information to disk */ -- 2.4.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 09/23] Btrfs: move check for min number of devices to a function
btrfs_rm_device() has a section of the code to check for min number of the devices required by various group profile. This patch move that part of the code in the function __check_raid_min_devices() Signed-off-by: Anand Jain anand.j...@oracle.com --- fs/btrfs/volumes.c | 78 ++ 1 file changed, 43 insertions(+), 35 deletions(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 2f8b974..1573997 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -1637,61 +1637,69 @@ out: return ret; } -int btrfs_rm_device(struct btrfs_root *root, char *device_path, u64 devid) +static int __check_raid_min_devices(struct btrfs_fs_info *fs_info) { - struct btrfs_device *device; - struct btrfs_device *next_device; - struct block_device *bdev = NULL; - struct buffer_head *bh = NULL; - struct btrfs_super_block *disk_super = NULL; - struct btrfs_fs_devices *cur_devices; u64 all_avail; u64 num_devices; - u8 *dev_uuid; unsigned seq; - int ret = 0; - bool clear_super = false; - char *dev_name = NULL; - - mutex_lock(uuid_mutex); - do { - seq = read_seqbegin(root-fs_info-profiles_lock); - - all_avail = root-fs_info-avail_data_alloc_bits | - root-fs_info-avail_system_alloc_bits | - root-fs_info-avail_metadata_alloc_bits; - } while (read_seqretry(root-fs_info-profiles_lock, seq)); - - num_devices = root-fs_info-fs_devices-num_devices; - btrfs_dev_replace_lock(root-fs_info-dev_replace); - if (btrfs_dev_replace_is_ongoing(root-fs_info-dev_replace)) { + num_devices = fs_info-fs_devices-num_devices; + btrfs_dev_replace_lock(fs_info-dev_replace); + if (btrfs_dev_replace_is_ongoing(fs_info-dev_replace)) { WARN_ON(num_devices 1); num_devices--; } - btrfs_dev_replace_unlock(root-fs_info-dev_replace); + btrfs_dev_replace_unlock(fs_info-dev_replace); + + do { + seq = read_seqbegin(fs_info-profiles_lock); + + all_avail = fs_info-avail_data_alloc_bits | + fs_info-avail_system_alloc_bits | + fs_info-avail_metadata_alloc_bits; + } while (read_seqretry(fs_info-profiles_lock, seq)); if ((all_avail BTRFS_BLOCK_GROUP_RAID10) num_devices = 4) { - ret = BTRFS_ERROR_DEV_RAID10_MIN_NOT_MET; - goto out; + return BTRFS_ERROR_DEV_RAID10_MIN_NOT_MET; } if ((all_avail BTRFS_BLOCK_GROUP_RAID1) num_devices = 2) { - ret = BTRFS_ERROR_DEV_RAID1_MIN_NOT_MET; - goto out; + return BTRFS_ERROR_DEV_RAID1_MIN_NOT_MET; } if ((all_avail BTRFS_BLOCK_GROUP_RAID5) - root-fs_info-fs_devices-rw_devices = 2) { - ret = BTRFS_ERROR_DEV_RAID5_MIN_NOT_MET; - goto out; + fs_info-fs_devices-rw_devices = 2) { + return BTRFS_ERROR_DEV_RAID5_MIN_NOT_MET; } + if ((all_avail BTRFS_BLOCK_GROUP_RAID6) - root-fs_info-fs_devices-rw_devices = 3) { - ret = BTRFS_ERROR_DEV_RAID6_MIN_NOT_MET; - goto out; + fs_info-fs_devices-rw_devices = 3) { + return BTRFS_ERROR_DEV_RAID6_MIN_NOT_MET; } + return 0; +} + +int btrfs_rm_device(struct btrfs_root *root, char *device_path, u64 devid) +{ + struct btrfs_device *device; + struct btrfs_device *next_device; + struct block_device *bdev = NULL; + struct buffer_head *bh = NULL; + struct btrfs_super_block *disk_super = NULL; + struct btrfs_fs_devices *cur_devices; + u64 num_devices; + u8 *dev_uuid; + int ret = 0; + bool clear_super = false; + char *dev_name = NULL; + + mutex_lock(uuid_mutex); + + ret = __check_raid_min_devices(root-fs_info); + if (ret) + goto out; + if (devid) { device = btrfs_find_device(root-fs_info, devid, NULL, NULL); -- 2.4.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 23/23] Btrfs: allow -o rw,degraded for single group profile
As of now only the exception to allow mount when number of missing device is more than group profile tolerance count is RDONLY this patch adds another lateral exception DEGRADED This will enable user to recover from the following and similar volume unavailability issue raid1 volume: mkfs.btrfs -draid1 -mraid1 /dev/sdc /dev/sdd unscan the device scan: modprobe -r btrfs modprobe btrfs = dev scanned is cleared since kernel does not know about /dev/sdd use degraded option to mount: mount -o degraded /dev/sdc /btrfs = sdd is not used umount /btrfs problem: following umount the mount fails even with degraded option: mount -o degraded /dev/sdc /btrfs == fails. because: unmount triggered commit used single profile which needs all the disks Signed-off-by: Anand Jain anand.j...@oracle.com --- fs/btrfs/disk-io.c | 3 ++- fs/btrfs/super.c | 3 ++- 2 files changed, 4 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 2f2379d..3377f1a 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -2949,7 +2949,8 @@ retry_root_backup: btrfs_calc_num_tolerated_disk_barrier_failures(fs_info); if (fs_info-fs_devices-missing_devices fs_info-num_tolerated_disk_barrier_failures - !(sb-s_flags MS_RDONLY)) { + !(sb-s_flags MS_RDONLY || + btrfs_test_opt(fs_info-dev_root, DEGRADED))) { pr_warn(BTRFS: missing devices(%llu) exceeds the limit(%d), writeable mount is not allowed\n, fs_info-fs_devices-missing_devices, fs_info-num_tolerated_disk_barrier_failures); diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c index a8a0109..315035a2 100644 --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -1666,7 +1666,8 @@ static int btrfs_remount(struct super_block *sb, int *flags, char *data) if (fs_info-fs_devices-missing_devices fs_info-num_tolerated_disk_barrier_failures - !(*flags MS_RDONLY)) { + !(*flags MS_RDONLY || + btrfs_test_opt(root, DEGRADED))) { btrfs_warn(fs_info, too many missing devices, writeable remount is not allowed); ret = -EACCES; -- 2.4.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 15/23] Btrfs: use btrfs_scratch_superblock() in btrfs_rm_device()
With the previous patches now the btrfs_scratch_superblock() is ready to be used in btrfs_rm_device() so use it. Signed-off-by: Anand Jain anand.j...@oracle.com --- fs/btrfs/volumes.c | 69 -- 1 file changed, 5 insertions(+), 64 deletions(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index b2a19ea..ebf37a9 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -1684,9 +1684,6 @@ int btrfs_rm_device(struct btrfs_root *root, char *device_path, u64 devid) { struct btrfs_device *device; struct btrfs_device *next_device; - struct block_device *bdev = NULL; - struct buffer_head *bh = NULL; - struct btrfs_super_block *disk_super = NULL; struct btrfs_fs_devices *cur_devices; u64 num_devices; int ret = 0; @@ -1807,68 +1804,12 @@ int btrfs_rm_device(struct btrfs_root *root, char *device_path, u64 devid) * remove it from the devices list and zero out the old super */ if (clear_super) { - u64 bytenr; - int i; - - if (!disk_super) { - ret = btrfs_get_bdev_and_sb(dev_name, - FMODE_WRITE | FMODE_EXCL, - root-fs_info-bdev_holder, 0, - bdev, bh); - if (ret) { - /* -* It could be a failed device ok for clear_super -* to fail. So return success -*/ - ret = 0; - goto out; - } - - disk_super = (struct btrfs_super_block *)bh-b_data; - } - /* make sure this device isn't detected as part of -* the FS anymore -*/ - memset(disk_super-magic, 0, sizeof(disk_super-magic)); - set_buffer_dirty(bh); - sync_dirty_buffer(bh); - brelse(bh); - - /* clear the mirror copies of super block on the disk -* being removed, 0th copy is been taken care above and -* the below would take of the rest -*/ - for (i = 1; i BTRFS_SUPER_MIRROR_MAX; i++) { - bytenr = btrfs_sb_offset(i); - if (bytenr + BTRFS_SUPER_INFO_SIZE = - i_size_read(bdev-bd_inode)) - break; - - bh = __bread(bdev, bytenr / 4096, - BTRFS_SUPER_INFO_SIZE); - if (!bh) - continue; - - disk_super = (struct btrfs_super_block *)bh-b_data; - - if (btrfs_super_bytenr(disk_super) != bytenr || - btrfs_super_magic(disk_super) != BTRFS_MAGIC) { - brelse(bh); - continue; - } - memset(disk_super-magic, 0, - sizeof(disk_super-magic)); - set_buffer_dirty(bh); - sync_dirty_buffer(bh); - brelse(bh); - } - - if (bdev) { - /* Notify udev that device has changed */ - btrfs_kobject_uevent(bdev, KOBJ_CHANGE); + struct block_device *bdev; - /* Update ctime/mtime for device path for libblkid */ - update_dev_time(dev_name); + bdev = blkdev_get_by_path(dev_name, FMODE_READ | FMODE_EXCL, + root-fs_info-bdev_holder); + if (!IS_ERR(bdev)) { + btrfs_scratch_superblock(bdev, dev_name); blkdev_put(bdev, FMODE_READ | FMODE_EXCL); } } -- 2.4.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/2] Btrfs-progs: device delete to accept devid
This is btrfs-progs side of the patch. Kernel patch is Btrfs: device delete by devid Anand Jain (2): btrfs-progs: move is_numerical to utils-lib.h and make it non static btrfs-progs: device delete to accept devid Documentation/btrfs-device.asciidoc | 2 +- cmds-device.c | 45 - cmds-replace.c | 11 - ioctl.h | 8 +++ utils-lib.c | 11 + utils.h | 1 + 6 files changed, 56 insertions(+), 22 deletions(-) -- 2.4.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] btrfs-progs: move is_numerical to utils-lib.h and make it non static
Signed-off-by: Anand Jain anand.j...@oracle.com --- cmds-replace.c | 11 --- utils-lib.c| 11 +++ utils.h| 1 + 3 files changed, 12 insertions(+), 11 deletions(-) diff --git a/cmds-replace.c b/cmds-replace.c index 85365e3..e6a27e3 100644 --- a/cmds-replace.c +++ b/cmds-replace.c @@ -65,17 +65,6 @@ static const char * const replace_cmd_group_usage[] = { NULL }; -static int is_numerical(const char *str) -{ - if (!(*str = '0' *str = '9')) - return 0; - while (*str = '0' *str = '9') - str++; - if (*str != '\0') - return 0; - return 1; -} - static int dev_replace_cancel_fd = -1; static void dev_replace_sigint_handler(int signal) { diff --git a/utils-lib.c b/utils-lib.c index 79ef35e..9ac0b7b 100644 --- a/utils-lib.c +++ b/utils-lib.c @@ -38,3 +38,14 @@ u64 arg_strtou64(const char *str) } return value; } + +int is_numerical(const char *str) +{ + if (!(*str = '0' *str = '9')) + return 0; + while (*str = '0' *str = '9') + str++; + if (*str != '\0') + return 0; + return 1; +} diff --git a/utils.h b/utils.h index 94606ed..0975301 100644 --- a/utils.h +++ b/utils.h @@ -243,5 +243,6 @@ int btrfs_tree_search2_ioctl_supported(int fd); int btrfs_check_nodesize(u32 nodesize, u32 sectorsize); const char *get_argv0_buf(void); +int is_numerical(const char *str); #endif -- 2.4.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] btrfs-progs: device delete to accept devid
This patch introduces new option devid for the command btrfs device delete device_path|devid[..] mnt In a user reported issue on a 3-disk-RAID1, one disk failed with its SB unreadable. Now with this patch user will have a choice to delete the device using devid. The other method we could do, is to match the input device_path to the available device_paths with in the kernel. But that won't work in all the cases, like what if user provided mapper path when the path within the kernel is a non-mapper path. This patch depends on the below kernel patch for the new feature to work, however it will fail-back to the old interface for the kernel without the patch Btrfs: device delete by devid Signed-off-by: Anand Jain anand.j...@oracle.com --- Documentation/btrfs-device.asciidoc | 2 +- cmds-device.c | 45 - ioctl.h | 8 +++ 3 files changed, 44 insertions(+), 11 deletions(-) diff --git a/Documentation/btrfs-device.asciidoc b/Documentation/btrfs-device.asciidoc index 2827598..61ede6e 100644 --- a/Documentation/btrfs-device.asciidoc +++ b/Documentation/btrfs-device.asciidoc @@ -74,7 +74,7 @@ do not perform discard by default -f|--force force overwrite of existing filesystem on the given disk(s) -*remove* dev [dev...] path:: +*remove* dev|devid [dev|devid...] path:: Remove device(s) from a filesystem identified by path. *delete* dev [dev...] path:: diff --git a/cmds-device.c b/cmds-device.c index 0e60500..eb4358d 100644 --- a/cmds-device.c +++ b/cmds-device.c @@ -164,16 +164,34 @@ static int _cmd_rm_dev(int argc, char **argv, const char * const *usagestr) struct btrfs_ioctl_vol_args arg; int res; - if (!is_block_device(argv[i])) { + struct btrfs_ioctl_vol_args_v3 argv3 = {0}; + int its_num = false; + + if (is_numerical(argv[i])) { + argv3.devid = arg_strtou64(argv[i]); + its_num = true; + } else if (is_block_device(argv[i])) { + strncpy_null(argv3.name, argv[i]); + } else { fprintf(stderr, - ERROR: %s is not a block device\n, argv[i]); + ERROR: %s is not a block device or devid\n, argv[i]); ret++; continue; } - memset(arg, 0, sizeof(arg)); - strncpy_null(arg.name, argv[i]); - res = ioctl(fdmnt, BTRFS_IOC_RM_DEV, arg); + res = ioctl(fdmnt, BTRFS_IOC_RM_DEV_V2, argv3); e = errno; + if (res e == ENOTTY) { + if (its_num) { + fprintf(stderr, + Error: Kernel does not support delete by devid\n); + ret = 1; + continue; + } + memset(arg, 0, sizeof(arg)); + strncpy_null(arg.name, argv[i]); + res = ioctl(fdmnt, BTRFS_IOC_RM_DEV, arg); + e = errno; + } if (res) { const char *msg; @@ -181,9 +199,16 @@ static int _cmd_rm_dev(int argc, char **argv, const char * const *usagestr) msg = btrfs_err_str(res); else msg = strerror(e); - fprintf(stderr, - ERROR: error removing the device '%s' - %s\n, - argv[i], msg); + + if (its_num) + fprintf(stderr, + ERROR: error removing the devid '%llu' - %s\n, + argv3.devid, msg); + else + fprintf(stderr, + ERROR: error removing the device '%s' - %s\n, + argv[i], msg); + ret++; } } @@ -193,7 +218,7 @@ static int _cmd_rm_dev(int argc, char **argv, const char * const *usagestr) } static const char * const cmd_rm_dev_usage[] = { - btrfs device remove device [device...] path, + btrfs device remove device|devid [device|devid...] path, Remove a device from a filesystem, NULL }; @@ -204,7 +229,7 @@ static int cmd_rm_dev(int argc, char **argv) } static const char * const cmd_del_dev_usage[] = { - btrfs device delete device [device...] path, + btrfs device delete device|devid [device|devid...] path, Remove a device from a filesystem, NULL }; diff --git a/ioctl.h b/ioctl.h index dff015a..6870931 100644 --- a/ioctl.h +++ b/ioctl.h @@
[PATCH v5 3/3] xfstests: btrfs: test device delete with EIO on src dev
From: Anand Jain anand.j...@oracle.com This test case tests if the device delete works with the failed (EIO) source device. EIO errors are achieved usign the DM device. This test would need following btrfs-progs and btrfs kernel patch btrfs-progs: device delete to accept devid Btrfs: device delete by devid However when btrfs-progs patch is not found this test will not run, and when kernel patch is not found btrfs-progs will fail gracefully and thus the test script. Signed-off-by: Anand Jain anand.j...@oracle.com --- v4-v5: rebase on latest xfstests code, and accepts Filipe comment v3-v4: rebase on latest xfstests code v2-v3: accepts Filipe Manana's review comments, thanks v1-v2: accepts Dave Chinner's review comments, thanks common/rc | 7 + tests/btrfs/099 | 82 + tests/btrfs/099.out | 11 +++ tests/btrfs/group | 1 + 4 files changed, 101 insertions(+) create mode 100755 tests/btrfs/099 create mode 100644 tests/btrfs/099.out diff --git a/common/rc b/common/rc index 8d4da0e..31a0328 100644 --- a/common/rc +++ b/common/rc @@ -2737,6 +2737,13 @@ _require_meta_uuid() umount $SCRATCH_MNT } +_require_btrfs_dev_del_by_devid() +{ + $BTRFS_UTIL_PROG device delete --help | egrep devid /dev/null 21 + [ $? -eq 0 ] || _notrun $BTRFS_UTIL_PROG too old \ + (must support 'btrfs device delete devid /mnt') +} + _get_total_inode() { if [ -z $1 ]; then diff --git a/tests/btrfs/099 b/tests/btrfs/099 new file mode 100755 index 000..4464e24 --- /dev/null +++ b/tests/btrfs/099 @@ -0,0 +1,82 @@ +#! /bin/bash +# FS QA Test No. btrfs/099 +# +# test device delete when the source device has EIO +# +# Copyright (c) 2015 Oracle. All Rights Reserved. +# +# This program is free software; you can redistribute it and/or +# modify it under the terms of the GNU General Public License as +# published by the Free Software Foundation. +# +# This program is distributed in the hope that it would be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program; if not, write the Free Software Foundation, +# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA +# + +seq=`basename $0` +seqres=$RESULT_DIR/$seq +echo QA output created by $seq + +here=`pwd` +tmp=/tmp/$$ + +status=1 # failure is the default! +trap _cleanup; exit \$status 0 1 2 3 15 + + +_cleanup() +{ + _cleanup_dmerror + rm -f $tmp +} + +# get standard environment, filters and checks +. ./common/rc +. ./common/filter +. ./common/filter.btrfs +. ./common/dmerror + +_supported_fs btrfs +_supported_os Linux +_need_to_be_root +_require_scratch_dev_pool 3 +_require_btrfs_dev_del_by_devid +_require_dmerror + +rm -f $seqres.full + +dev1=`echo $SCRATCH_DEV_POOL | $AWK_PROG '{print $2}'` +dev2=`echo $SCRATCH_DEV_POOL | $AWK_PROG '{print $3}'` + +_init_dmerror +_scratch_mkfs_dmerror -f -d raid1 -m raid1 $dev1 $dev2 +_mount_dmerror + +_run_btrfs_util_prog filesystem show -m $SCRATCH_MNT +$BTRFS_UTIL_PROG filesystem show -m $SCRATCH_MNT | _filter_btrfs_filesystem_show + +error_devid=`$BTRFS_UTIL_PROG filesystem show -m $SCRATCH_MNT |\ + egrep $DMERROR_DEV | $AWK_PROG '{print $2}'` + +snapshot_cmd=$BTRFS_UTIL_PROG subvolume snapshot -r $SCRATCH_MNT +snapshot_cmd=$snapshot_cmd $SCRATCH_MNT/snap_\`date +'%H_%M_%S_%N'\` +run_check $FSSTRESS_PROG -d $SCRATCH_MNT -n 200 -p 8 $FSSTRESS_AVOID -x \ + $snapshot_cmd -X 50 /dev/null + +# now load the error into the DMERROR_DEV +_load_dmerror_table + +_run_btrfs_util_prog device delete $error_devid $SCRATCH_MNT + +_run_btrfs_util_prog filesystem show -m $SCRATCH_MNT +$BTRFS_UTIL_PROG filesystem show -m $SCRATCH_MNT | _filter_btrfs_filesystem_show + +echo === device delete completed + +status=0; exit diff --git a/tests/btrfs/099.out b/tests/btrfs/099.out new file mode 100644 index 000..ec74e45 --- /dev/null +++ b/tests/btrfs/099.out @@ -0,0 +1,11 @@ +QA output created by 099 +Label: none uuid: UUID + Total devices NUM FS bytes used SIZE + devid DEVID size SIZE used SIZE path SCRATCH_DEV + devid DEVID size SIZE used SIZE path /dev/mapper/error-test + +Label: none uuid: UUID + Total devices NUM FS bytes used SIZE + devid DEVID size SIZE used SIZE path SCRATCH_DEV + +=== device delete completed diff --git a/tests/btrfs/group b/tests/btrfs/group index c8a53b5..968ee63 100644 --- a/tests/btrfs/group +++ b/tests/btrfs/group @@ -101,3 +101,4 @@ 096 auto quick clone 097 auto quick send clone 098 auto quick replace +099 auto quick replace -- 2.4.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to
[PATCH v5 1/3] xfstests: btrfs: add functions to create dm-error device
From: Anand Jain anand.j...@oracle.com Controlled EIO from the device is achieved using the dm device. Helper functions are at common/dmerror. Broadly steps will include calling _init_dmerror(). _init_dmerror() will use SCRATCH_DEV to create dm linear device and assign DMERROR_DEV to /dev/mapper/error-test. When test script is ready to get EIO, the test cases can call _load_dmerror_table() which then it will load the dm error. so that reading DMERROR_DEV will cause EIO. After the test case is complete, cleanup must be done by calling _cleanup_dmerror(). Signed-off-by: Anand Jain anand.j...@oracle.com Reviewed-by: Filipe Manana fdman...@suse.com --- v4-v5: No Change. keep up with the patch set v3-v4: rebase on latest xfstests code v2.1-v3: accepts Filipe Manana's review comments, thanks v2-v2.1: fixed missed typo error fixup in the commit. v1-v2: accepts Dave Chinner's review comments, thanks common/dmerror | 69 ++ common/rc | 9 2 files changed, 78 insertions(+) create mode 100644 common/dmerror diff --git a/common/dmerror b/common/dmerror new file mode 100644 index 000..f895d90 --- /dev/null +++ b/common/dmerror @@ -0,0 +1,69 @@ +##/bin/bash +# +# Copyright (c) 2015 Oracle. All Rights Reserved. +# +# This program is free software; you can redistribute it and/or +# modify it under the terms of the GNU General Public License as +# published by the Free Software Foundation. +# +# This program is distributed in the hope that it would be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program; if not, write the Free Software Foundation, +# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA +# +# +# common functions for setting up and tearing down a dmerror device + +_init_dmerror() +{ + $DMSETUP_PROG remove error-test /dev/null 21 + + local BLK_DEV_SIZE=`blockdev --getsz $SCRATCH_DEV` + + DMERROR_DEV='/dev/mapper/error-test' + + DMLINEAR_TABLE=0 $BLK_DEV_SIZE linear $SCRATCH_DEV 0 + + $DMSETUP_PROG create error-test --table $DMLINEAR_TABLE || \ + _fatal failed to create dm linear device + + DMERROR_TABLE=0 $BLK_DEV_SIZE error $SCRATCH_DEV 0 +} + +_scratch_mkfs_dmerror() +{ + $MKFS_BTRFS_PROG $* $DMERROR_DEV $seqres.full 21 || \ + _fatal failed to create mkfs.btrfs $* $DMERROR_DEV +} + +_mount_dmerror() +{ + mount -t $FSTYP $MOUNT_OPTIONS $DMERROR_DEV $SCRATCH_MNT +} + +_unmount_dmerror() +{ + $UMOUNT_PROGS $SCRATCH_MNT +} + +_cleanup_dmerror() +{ + $UMOUNT_PROG $SCRATCH_MNT /dev/null 21 + $DMSETUP_PROG remove error-test /dev/null 21 +} + +_load_dmerror_table() +{ + $DMSETUP_PROG suspend error-test + [ $? -ne 0 ] _fatal failed to suspend error-test + + $DMSETUP_PROG load error-test --table $DMERROR_TABLE + [ $? -ne 0 ] _fatal failed to load error table error-test + + $DMSETUP_PROG resume error-test + [ $? -ne 0 ] _fatal failed to resume error-test +} diff --git a/common/rc b/common/rc index 70d2fa8..8d4da0e 100644 --- a/common/rc +++ b/common/rc @@ -1337,6 +1337,15 @@ _require_sane_bdev_flush() fi } +# this test requires the device mapper error target +# +_require_dmerror() +{ + _require_command $DMSETUP_PROG dmsetup + $DMSETUP_PROG targets | grep error /dev/null 21 + [ $? -ne 0 ] _notrun This test requires dm error support +} + # this test requires the device mapper flakey target # _require_dm_flakey() -- 2.4.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v5 0/3] dm error based test cases
This is v5 of this patch set. Mainly accepts Filipe latest review comments. Anand Jain (3): xfstests: btrfs: add functions to create dm-error device xfstests: btrfs: test device replace, with EIO on the src dev xfstests: btrfs: test device delete with EIO on src dev common/dmerror | 69 common/rc | 16 +++ tests/btrfs/098 | 81 tests/btrfs/098.out | 11 +++ tests/btrfs/099 | 82 + tests/btrfs/099.out | 11 +++ tests/btrfs/group | 2 ++ 7 files changed, 272 insertions(+) create mode 100644 common/dmerror create mode 100755 tests/btrfs/098 create mode 100644 tests/btrfs/098.out create mode 100755 tests/btrfs/099 create mode 100644 tests/btrfs/099.out -- 2.4.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v5 2/3] xfstests: btrfs: test device replace, with EIO on the src dev
From: Anand Jain anand.j...@oracle.com This test case will test to confirm the replace works with the failed (EIO) replacing source device. EIO condition is achieved using the DM device. Signed-off-by: Anand Jain anand.j...@oracle.com Reviewed-by: Filipe Manana fdman...@suse.com --- v4-v5: rebase on latest xfstests code and accepts Filipe comment v3-v4: rebase on latest xfstests code v2-v3: accepts Filipe Manana's review comments, thanks v1-v2: accepts Dave Chinner's review comments, thanks tests/btrfs/098 | 81 + tests/btrfs/098.out | 11 tests/btrfs/group | 1 + 3 files changed, 93 insertions(+) create mode 100755 tests/btrfs/098 create mode 100644 tests/btrfs/098.out diff --git a/tests/btrfs/098 b/tests/btrfs/098 new file mode 100755 index 000..afb41d1 --- /dev/null +++ b/tests/btrfs/098 @@ -0,0 +1,81 @@ +#! /bin/bash +# FS QA Test No. btrfs/098 +# +#test device replace works when the source device has EIO +# +# Copyright (c) 2015 Oracle. All Rights Reserved. +# +# This program is free software; you can redistribute it and/or +# modify it under the terms of the GNU General Public License as +# published by the Free Software Foundation. +# +# This program is distributed in the hope that it would be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program; if not, write the Free Software Foundation, +# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA +# + +seq=`basename $0` +seqres=$RESULT_DIR/$seq +echo QA output created by $seq + +here=`pwd` +tmp=/tmp/$$ + +status=1 # failure is the default! +trap _cleanup; exit \$status 0 1 2 3 15 + + +_cleanup() +{ + _cleanup_dmerror + rm -f $tmp +} + +# get standard environment, filters and checks +. ./common/rc +. ./common/filter +. ./common/filter.btrfs +. ./common/dmerror + +_supported_fs btrfs +_supported_os Linux +_need_to_be_root +_require_scratch_dev_pool 3 +_require_dmerror + +rm -f $seqres.full + +dev1=`echo $SCRATCH_DEV_POOL | $AWK_PROG '{print $2}'` +dev2=`echo $SCRATCH_DEV_POOL | $AWK_PROG '{print $3}'` + +_init_dmerror +_scratch_mkfs_dmerror -f -d raid1 -m raid1 $dev1 +_mount_dmerror + +_run_btrfs_util_prog filesystem show -m $SCRATCH_MNT +$BTRFS_UTIL_PROG filesystem show -m $SCRATCH_MNT | _filter_btrfs_filesystem_show + +error_devid=`$BTRFS_UTIL_PROG filesystem show -m $SCRATCH_MNT |\ + egrep $DMERROR_DEV | $AWK_PROG '{print $2}'` + +snapshot_cmd=$BTRFS_UTIL_PROG subvolume snapshot -r $SCRATCH_MNT +snapshot_cmd=$snapshot_cmd $SCRATCH_MNT/snap_\`date +'%H_%M_%S_%N'\` +run_check $FSSTRESS_PROG -d $SCRATCH_MNT -n 200 -p 8 $FSSTRESS_AVOID -x \ + $snapshot_cmd -X 50 /dev/null + +# now load the error into the DMERROR_DEV +_load_dmerror_table + +_run_btrfs_util_prog replace start -B $error_devid $dev2 $SCRATCH_MNT + +_run_btrfs_util_prog filesystem show -m $SCRATCH_MNT +$BTRFS_UTIL_PROG filesystem show -m $SCRATCH_MNT | _filter_btrfs_filesystem_show + +echo === device replace completed + +status=0; exit diff --git a/tests/btrfs/098.out b/tests/btrfs/098.out new file mode 100644 index 000..eb2f87f --- /dev/null +++ b/tests/btrfs/098.out @@ -0,0 +1,11 @@ +QA output created by 098 +Label: none uuid: UUID + Total devices NUM FS bytes used SIZE + devid DEVID size SIZE used SIZE path SCRATCH_DEV + devid DEVID size SIZE used SIZE path /dev/mapper/error-test + +Label: none uuid: UUID + Total devices NUM FS bytes used SIZE + devid DEVID size SIZE used SIZE path SCRATCH_DEV + +=== device replace completed diff --git a/tests/btrfs/group b/tests/btrfs/group index e13865a..c8a53b5 100644 --- a/tests/btrfs/group +++ b/tests/btrfs/group @@ -100,3 +100,4 @@ 095 auto quick metadata 096 auto quick clone 097 auto quick send clone +098 auto quick replace -- 2.4.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
can we make balance delete missing devices?
[ 2918.502237] BTRFS info (device loop1): disk space caching is enabled [ 2918.503213] BTRFS: failed to read chunk tree on loop1 [ 2918.540082] BTRFS: open_ctree failed I just had a test RAID-1 filesystem with a missing device. I mounted it with the degraded option and added a new device. I balanced it (to make it do RAID-1 again) and thought everything was good. Then when I tried to mount it again it gave errors such as the above (not sure why). Then I tried wiping /dev/loop1 and it refused to mount entirely due to having 2 missing devices. Obviously it was my mistake to not remove the missing device, and wiping /dev/loop1 was a bad idea. Failing to remove a missing device seems likely to be a common mistake. Could we make the balance operation automatically delete the missing device? I can't imagine a situation in which a balance would be desired but deleting the missing device wouldn't be desired. -- My Main Blog http://etbe.coker.com.au/ My Documents Bloghttp://doc.coker.com.au/ -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 07/23] Btrfs: __btrfs_std_error() logic should be consistent w/out CONFIG_PRINTK defined
error handling logic behaves differently with or without CONFIG_PRINTK defined, since there are two copies of the same function which a bit of different logic One, when CONFIG_PRINTK is defined, code is __btrfs_std_error(..) { :: save_error_info(fs_info); if (sb-s_flags MS_BORN) btrfs_handle_error(fs_info); } and two when CONFIG_PRINTK is not defined, the code is __btrfs_std_error(..) { :: if (sb-s_flags MS_BORN) { save_error_info(fs_info); btrfs_handle_error(fs_info); } } I doubt if this was intentional ? and appear to have caused since we maintain two copies of the same function and they got diverged with commits. Now to decide which logic is correct reviewed changes as below, 533574c6bc30cf526cc1c41bde050c854a945efb Commit added two copies of this function cf79ffb5b79e8a2b587fbf218809e691bb396c98 Commit made change to only one copy of the function and to the copy when CONFIG_PRINTK is defined. To fix this, instead of maintaining two copies of same function approach, maintain single function, and just put the extra portion of the code under CONFIG_PRINTK define. This patch just does that. And keeps code of with CONFIG_PRINTK defined. Signed-off-by: Anand Jain anand.j...@oracle.com --- fs/btrfs/super.c | 27 +-- 1 file changed, 5 insertions(+), 22 deletions(-) diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c index c389c13..56c0174 100644 --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -130,7 +130,6 @@ static void btrfs_handle_error(struct btrfs_fs_info *fs_info) } } -#ifdef CONFIG_PRINTK /* * __btrfs_std_error decodes expected errors from the caller and * invokes the approciate error response. @@ -140,7 +139,9 @@ void __btrfs_std_error(struct btrfs_fs_info *fs_info, const char *function, unsigned int line, int errno, const char *fmt, ...) { struct super_block *sb = fs_info-sb; +#ifdef CONFIG_PRINTK const char *errstr; +#endif /* * Special case: if the error is EROFS, and we're already @@ -149,6 +150,7 @@ void __btrfs_std_error(struct btrfs_fs_info *fs_info, const char *function, if (errno == -EROFS (sb-s_flags MS_RDONLY)) return; +#ifdef CONFIG_PRINTK errstr = btrfs_decode_error(errno); if (fmt) { struct va_format vaf; @@ -166,6 +168,7 @@ void __btrfs_std_error(struct btrfs_fs_info *fs_info, const char *function, printk(KERN_CRIT BTRFS: error (device %s) in %s:%d: errno=%d %s\n, sb-s_id, function, line, errno, errstr); } +#endif /* Don't go through full error handling during mount */ save_error_info(fs_info); @@ -173,6 +176,7 @@ void __btrfs_std_error(struct btrfs_fs_info *fs_info, const char *function, btrfs_handle_error(fs_info); } +#ifdef CONFIG_PRINTK static const char * const logtypes[] = { emergency, alert, @@ -212,27 +216,6 @@ void btrfs_printk(const struct btrfs_fs_info *fs_info, const char *fmt, ...) va_end(args); } - -#else - -void __btrfs_std_error(struct btrfs_fs_info *fs_info, const char *function, - unsigned int line, int errno, const char *fmt, ...) -{ - struct super_block *sb = fs_info-sb; - - /* -* Special case: if the error is EROFS, and we're already -* under MS_RDONLY, then it is safe here. -*/ - if (errno == -EROFS (sb-s_flags MS_RDONLY)) - return; - - /* Don't go through full error handling during mount */ - if (sb-s_flags MS_BORN) { - save_error_info(fs_info); - btrfs_handle_error(fs_info); - } -} #endif /* -- 2.4.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 13/23] Btrfs: add btrfs_read_dev_one_super() to read one specific SB
This uses a chunk of code from btrfs_read_dev_super() and creates a function called btrfs_read_dev_one_super() so that next patch can use it for scratch superblock. Signed-off-by: Anand Jain anand.j...@oracle.com --- fs/btrfs/disk-io.c | 54 ++ fs/btrfs/disk-io.h | 2 ++ 2 files changed, 36 insertions(+), 20 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index faf5b8d..2f2379d 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -3183,6 +3183,37 @@ static void btrfs_end_buffer_write_sync(struct buffer_head *bh, int uptodate) put_bh(bh); } +int btrfs_read_dev_one_super(struct block_device *bdev, int copy_num, + struct buffer_head **bh) +{ + struct buffer_head *bufhead; + struct btrfs_super_block *super; + u64 bytenr; + + bytenr = btrfs_sb_offset(copy_num); + if (bytenr + BTRFS_SUPER_INFO_SIZE = i_size_read(bdev-bd_inode)) + return -EINVAL; + + bufhead = __bread(bdev, bytenr / 4096, BTRFS_SUPER_INFO_SIZE); + /* +* If we fail to read from the underlaying drivers, as of now +* the best option we have is to mark it EIO. +*/ + if (!bufhead) + return -EIO; + + super = (struct btrfs_super_block *)bufhead-b_data; + if (btrfs_super_bytenr(super) != bytenr || + btrfs_super_magic(super) != BTRFS_MAGIC) { + brelse(bufhead); + return -EINVAL; + } + + *bh = bufhead; + return 0; +} + + struct buffer_head *btrfs_read_dev_super(struct block_device *bdev) { struct buffer_head *bh; @@ -3190,7 +3221,6 @@ struct buffer_head *btrfs_read_dev_super(struct block_device *bdev) struct btrfs_super_block *super; int i; u64 transid = 0; - u64 bytenr; int ret = -EINVAL; /* we would like to check all the supers, but that would make @@ -3199,28 +3229,12 @@ struct buffer_head *btrfs_read_dev_super(struct block_device *bdev) * later supers, using BTRFS_SUPER_MIRROR_MAX instead */ for (i = 0; i 1; i++) { - bytenr = btrfs_sb_offset(i); - if (bytenr + BTRFS_SUPER_INFO_SIZE = - i_size_read(bdev-bd_inode)) - break; - bh = __bread(bdev, bytenr / 4096, - BTRFS_SUPER_INFO_SIZE); - /* -* If we fail to read from the underlaying drivers, as of now -* the best option we have is to mark it EIO. -*/ - if (!bh) { - ret = -EIO; + + ret = btrfs_read_dev_one_super(bdev, i, bh); + if (ret) continue; - } super = (struct btrfs_super_block *)bh-b_data; - if (btrfs_super_bytenr(super) != bytenr || - btrfs_super_magic(super) != BTRFS_MAGIC) { - brelse(bh); - ret = -EINVAL; - continue; - } if (!latest || btrfs_super_generation(super) transid) { brelse(latest); diff --git a/fs/btrfs/disk-io.h b/fs/btrfs/disk-io.h index d4cbfee..8dc9ff1 100644 --- a/fs/btrfs/disk-io.h +++ b/fs/btrfs/disk-io.h @@ -60,6 +60,8 @@ void close_ctree(struct btrfs_root *root); int write_ctree_super(struct btrfs_trans_handle *trans, struct btrfs_root *root, int max_mirrors); struct buffer_head *btrfs_read_dev_super(struct block_device *bdev); +int btrfs_read_dev_one_super(struct block_device *bdev, int copy_num, + struct buffer_head **bh); int btrfs_commit_super(struct btrfs_root *root); struct extent_buffer *btrfs_find_tree_block(struct btrfs_fs_info *fs_info, u64 bytenr); -- 2.4.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 06/23] Btrfs: SB read failure should return EIO for __bread failure
This will return EIO when __bread() fails to read SB, instead of EINVAL. Signed-off-by: Anand Jain anand.j...@oracle.com --- fs/btrfs/disk-io.c | 18 +++--- fs/btrfs/volumes.c | 8 2 files changed, 19 insertions(+), 7 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 8571025..faf5b8d 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -2648,8 +2648,8 @@ int open_ctree(struct super_block *sb, * Read super block and check the signature bytes only */ bh = btrfs_read_dev_super(fs_devices-latest_bdev); - if (!bh) { - err = -EINVAL; + if (IS_ERR(bh)) { + err = PTR_ERR(bh); goto fail_alloc; } @@ -3191,6 +3191,7 @@ struct buffer_head *btrfs_read_dev_super(struct block_device *bdev) int i; u64 transid = 0; u64 bytenr; + int ret = -EINVAL; /* we would like to check all the supers, but that would make * a btrfs mount succeed after a mkfs from a different FS. @@ -3204,13 +3205,20 @@ struct buffer_head *btrfs_read_dev_super(struct block_device *bdev) break; bh = __bread(bdev, bytenr / 4096, BTRFS_SUPER_INFO_SIZE); - if (!bh) + /* +* If we fail to read from the underlaying drivers, as of now +* the best option we have is to mark it EIO. +*/ + if (!bh) { + ret = -EIO; continue; + } super = (struct btrfs_super_block *)bh-b_data; if (btrfs_super_bytenr(super) != bytenr || btrfs_super_magic(super) != BTRFS_MAGIC) { brelse(bh); + ret = -EINVAL; continue; } @@ -3222,6 +3230,10 @@ struct buffer_head *btrfs_read_dev_super(struct block_device *bdev) brelse(bh); } } + + if (!latest) + return ERR_PTR(ret); + return latest; } diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 8368393..a3fde18 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -211,8 +211,8 @@ btrfs_get_bdev_and_sb(const char *device_path, fmode_t flags, void *holder, } invalidate_bdev(*bdev); *bh = btrfs_read_dev_super(*bdev); - if (!*bh) { - ret = -EINVAL; + if (IS_ERR(*bh)) { + ret = PTR_ERR(*bh); blkdev_put(*bdev, flags); goto error; } @@ -6842,8 +6842,8 @@ int btrfs_scratch_superblock(struct btrfs_device *device) struct btrfs_super_block *disk_super; bh = btrfs_read_dev_super(device-bdev); - if (!bh) - return -EINVAL; + if (IS_ERR(bh)) + return PTR_ERR(bh); disk_super = (struct btrfs_super_block *)bh-b_data; memset(disk_super-magic, 0, sizeof(disk_super-magic)); -- 2.4.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 03/23] Btrfs: rename btrfs_kobj_add_device to btrfs_sysfs_add_device_link
--- fs/btrfs/dev-replace.c | 2 +- fs/btrfs/sysfs.c | 4 ++-- fs/btrfs/sysfs.h | 2 +- fs/btrfs/volumes.c | 2 +- 4 files changed, 5 insertions(+), 5 deletions(-) diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c index 564a7de..c1bf0d6 100644 --- a/fs/btrfs/dev-replace.c +++ b/fs/btrfs/dev-replace.c @@ -376,7 +376,7 @@ int btrfs_dev_replace_start(struct btrfs_root *root, WARN_ON(!tgt_device); dev_replace-tgtdev = tgt_device; - ret = btrfs_kobj_add_device(tgt_device-fs_devices, tgt_device); + ret = btrfs_sysfs_add_device_link(tgt_device-fs_devices, tgt_device); if (ret) btrfs_err(root-fs_info, kobj add dev failed %d\n, ret); diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c index 095a302..df67f6b 100644 --- a/fs/btrfs/sysfs.c +++ b/fs/btrfs/sysfs.c @@ -683,7 +683,7 @@ int btrfs_sysfs_add_device(struct btrfs_fs_devices *fs_devs) return 0; } -int btrfs_kobj_add_device(struct btrfs_fs_devices *fs_devices, +int btrfs_sysfs_add_device_link(struct btrfs_fs_devices *fs_devices, struct btrfs_device *one_device) { int error = 0; @@ -744,7 +744,7 @@ int btrfs_sysfs_add_mounted(struct btrfs_fs_info *fs_info) btrfs_set_fs_info_ptr(fs_info); - error = btrfs_kobj_add_device(fs_devs, NULL); + error = btrfs_sysfs_add_device_link(fs_devs, NULL); if (error) return error; diff --git a/fs/btrfs/sysfs.h b/fs/btrfs/sysfs.h index 6392527..6529680 100644 --- a/fs/btrfs/sysfs.h +++ b/fs/btrfs/sysfs.h @@ -82,7 +82,7 @@ char *btrfs_printable_features(enum btrfs_feature_set set, u64 flags); extern const char * const btrfs_feature_set_names[3]; extern struct kobj_type space_info_ktype; extern struct kobj_type btrfs_raid_ktype; -int btrfs_kobj_add_device(struct btrfs_fs_devices *fs_devices, +int btrfs_sysfs_add_device_link(struct btrfs_fs_devices *fs_devices, struct btrfs_device *one_device); int btrfs_kobj_rm_device(struct btrfs_fs_devices *fs_devices, struct btrfs_device *one_device); diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 7c84a81..18ea1eb 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -2309,7 +2309,7 @@ int btrfs_init_new_device(struct btrfs_root *root, char *device_path) tmp + 1); /* add sysfs device entry */ - btrfs_kobj_add_device(root-fs_info-fs_devices, device); + btrfs_sysfs_add_device_link(root-fs_info-fs_devices, device); /* * we've got more storage, clear any full flags on the space -- 2.4.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 11/23] Btrfs: use BTRFS_ERROR_DEV_MISSING_NOT_FOUND when missing device is not found
use btrfs specific error code BTRFS_ERROR_DEV_MISSING_NOT_FOUND instead of -ENOENT. Next this removes the logging when user specifies missing and we don't find it in the kernel device list. logging are for system events not for user input errors. Signed-off-by: Anand Jain anand.j...@oracle.com --- fs/btrfs/volumes.c | 6 ++ 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 101a473..f1b36b9 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -2078,10 +2078,8 @@ int btrfs_find_device_missing_or_by_path(struct btrfs_root *root, } } - if (!*device) { - btrfs_err(root-fs_info, no missing device found); - return -ENOENT; - } + if (!*device) + return BTRFS_ERROR_DEV_MISSING_NOT_FOUND; return 0; } else { -- 2.4.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 16/23] Btrfs: device path change must be logged
From the issue diagnosable point of view, log if the device path is changed. Signed-off-by: Anand Jain anand.j...@oracle.com --- fs/btrfs/volumes.c | 4 1 file changed, 4 insertions(+) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index ebf37a9..dcb10fa 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -595,6 +595,10 @@ static noinline int device_list_add(const char *path, return -EEXIST; } + printk_in_rcu(KERN_INFO \ + BTRFS: device fsid %pU devid %llu old path %s new path %s\n, + disk_super-fsid, devid, rcu_str_deref(device-name), path); + name = rcu_string_strdup(path, GFP_NOFS); if (!name) return -ENOMEM; -- 2.4.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 20/23] Btrfs: move device close to btrfs_close_one_device
this will help to add the proposed device offline RFE --- fs/btrfs/volumes.c | 66 +- fs/btrfs/volumes.h | 1 + 2 files changed, 37 insertions(+), 30 deletions(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index f3ca87d..00ca858 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -768,36 +768,7 @@ static int __btrfs_close_devices(struct btrfs_fs_devices *fs_devices) mutex_lock(fs_devices-device_list_mutex); list_for_each_entry_safe(device, tmp, fs_devices-devices, dev_list) { - struct btrfs_device *new_device; - struct rcu_string *name; - - if (device-bdev) - fs_devices-open_devices--; - - if (device-writeable - device-devid != BTRFS_DEV_REPLACE_DEVID) { - list_del_init(device-dev_alloc_list); - fs_devices-rw_devices--; - } - - if (device-missing) - fs_devices-missing_devices--; - - new_device = btrfs_alloc_device(NULL, device-devid, - device-uuid); - BUG_ON(IS_ERR(new_device)); /* -ENOMEM */ - - /* Safe because we are under uuid_mutex */ - if (device-name) { - name = rcu_string_strdup(device-name-str, GFP_NOFS); - BUG_ON(!name); /* -ENOMEM */ - rcu_assign_pointer(new_device-name, name); - } - - list_replace_rcu(device-dev_list, new_device-dev_list); - new_device-fs_devices = device-fs_devices; - - call_rcu(device-rcu, free_device); + btrfs_close_one_device(device); } mutex_unlock(fs_devices-device_list_mutex); @@ -6890,3 +6861,38 @@ void btrfs_reset_fs_info_ptr(struct btrfs_fs_info *fs_info) fs_devices = fs_devices-seed; } } + +void btrfs_close_one_device(struct btrfs_device *device) +{ + struct btrfs_fs_devices *fs_devices = device-fs_devices; + struct btrfs_device *new_device; + struct rcu_string *name; + + if (device-bdev) + fs_devices-open_devices--; + + if (device-writeable + device-devid != BTRFS_DEV_REPLACE_DEVID) { + list_del_init(device-dev_alloc_list); + fs_devices-rw_devices--; + } + + if (device-missing) + fs_devices-missing_devices--; + + new_device = btrfs_alloc_device(NULL, device-devid, + device-uuid); + BUG_ON(IS_ERR(new_device)); /* -ENOMEM */ + + /* Safe because we are under uuid_mutex */ + if (device-name) { + name = rcu_string_strdup(device-name-str, GFP_NOFS); + BUG_ON(!name); /* -ENOMEM */ + rcu_assign_pointer(new_device-name, name); + } + + list_replace_rcu(device-dev_list, new_device-dev_list); + new_device-fs_devices = device-fs_devices; + + call_rcu(device-rcu, free_device); +} diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h index 32a66c7..5f4911a 100644 --- a/fs/btrfs/volumes.h +++ b/fs/btrfs/volumes.h @@ -550,5 +550,6 @@ static inline void unlock_chunks(struct btrfs_root *root) struct list_head *btrfs_get_fs_uuids(void); void btrfs_set_fs_info_ptr(struct btrfs_fs_info *fs_info); void btrfs_reset_fs_info_ptr(struct btrfs_fs_info *fs_info); +void btrfs_close_one_device(struct btrfs_device *device); #endif -- 2.4.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 04/23] Btrfs: rename btrfs_kobj_rm_device to btrfs_sysfs_rm_device_link
--- fs/btrfs/dev-replace.c | 2 +- fs/btrfs/sysfs.c | 6 +++--- fs/btrfs/sysfs.h | 2 +- fs/btrfs/volumes.c | 6 +++--- 4 files changed, 8 insertions(+), 8 deletions(-) diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c index c1bf0d6..6eb9324 100644 --- a/fs/btrfs/dev-replace.c +++ b/fs/btrfs/dev-replace.c @@ -587,7 +587,7 @@ static int btrfs_dev_replace_finishing(struct btrfs_fs_info *fs_info, mutex_unlock(uuid_mutex); /* replace the sysfs entry */ - btrfs_kobj_rm_device(fs_info-fs_devices, src_device); + btrfs_sysfs_rm_device_link(fs_info-fs_devices, src_device); btrfs_rm_dev_replace_free_srcdev(fs_info, src_device); /* write back the superblocks */ diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c index df67f6b..52319d1 100644 --- a/fs/btrfs/sysfs.c +++ b/fs/btrfs/sysfs.c @@ -557,7 +557,7 @@ void btrfs_sysfs_remove_mounted(struct btrfs_fs_info *fs_info) addrm_unknown_feature_attrs(fs_info, false); sysfs_remove_group(fs_info-fs_devices-super_kobj, btrfs_feature_attr_group); sysfs_remove_files(fs_info-fs_devices-super_kobj, btrfs_attrs); - btrfs_kobj_rm_device(fs_info-fs_devices, NULL); + btrfs_sysfs_rm_device_link(fs_info-fs_devices, NULL); } const char * const btrfs_feature_set_names[3] = { @@ -637,7 +637,7 @@ static void init_feature_attrs(void) /* when one_device is NULL, it removes all device links */ -int btrfs_kobj_rm_device(struct btrfs_fs_devices *fs_devices, +int btrfs_sysfs_rm_device_link(struct btrfs_fs_devices *fs_devices, struct btrfs_device *one_device) { struct hd_struct *disk; @@ -750,7 +750,7 @@ int btrfs_sysfs_add_mounted(struct btrfs_fs_info *fs_info) error = sysfs_create_files(super_kobj, btrfs_attrs); if (error) { - btrfs_kobj_rm_device(fs_devs, NULL); + btrfs_sysfs_rm_device_link(fs_devs, NULL); return error; } diff --git a/fs/btrfs/sysfs.h b/fs/btrfs/sysfs.h index 6529680..9c09522 100644 --- a/fs/btrfs/sysfs.h +++ b/fs/btrfs/sysfs.h @@ -84,7 +84,7 @@ extern struct kobj_type space_info_ktype; extern struct kobj_type btrfs_raid_ktype; int btrfs_sysfs_add_device_link(struct btrfs_fs_devices *fs_devices, struct btrfs_device *one_device); -int btrfs_kobj_rm_device(struct btrfs_fs_devices *fs_devices, +int btrfs_sysfs_rm_device_link(struct btrfs_fs_devices *fs_devices, struct btrfs_device *one_device); int btrfs_sysfs_add_fsid(struct btrfs_fs_devices *fs_devs, struct kobject *parent); diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 18ea1eb..4c7c344 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -1801,7 +1801,7 @@ int btrfs_rm_device(struct btrfs_root *root, char *device_path) if (device-bdev) { device-fs_devices-open_devices--; /* remove sysfs entry */ - btrfs_kobj_rm_device(root-fs_info-fs_devices, device); + btrfs_sysfs_rm_device_link(root-fs_info-fs_devices, device); } call_rcu(device-rcu, free_device); @@ -1971,7 +1971,7 @@ void btrfs_destroy_dev_replace_tgtdev(struct btrfs_fs_info *fs_info, WARN_ON(!tgtdev); mutex_lock(fs_info-fs_devices-device_list_mutex); - btrfs_kobj_rm_device(fs_info-fs_devices, tgtdev); + btrfs_sysfs_rm_device_link(fs_info-fs_devices, tgtdev); if (tgtdev-bdev) { btrfs_scratch_superblock(tgtdev); @@ -2388,7 +2388,7 @@ int btrfs_init_new_device(struct btrfs_root *root, char *device_path) error_trans: btrfs_end_transaction(trans, root); rcu_string_free(device-name); - btrfs_kobj_rm_device(root-fs_info-fs_devices, device); + btrfs_sysfs_rm_device_link(root-fs_info-fs_devices, device); kfree(device); error: blkdev_put(bdev, FMODE_EXCL); -- 2.4.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 18/23] Btrfs: check device_path in btrfs_find_device_by_user_input()
so btrfs_dev_replace_start() can be sleak and btrfs_rm_device() will also need it. Signed-off-by: Anand Jain anand.j...@oracle.com --- fs/btrfs/dev-replace.c | 4 fs/btrfs/volumes.c | 3 +++ 2 files changed, 3 insertions(+), 4 deletions(-) diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c index 937e53b..0df3d9b 100644 --- a/fs/btrfs/dev-replace.c +++ b/fs/btrfs/dev-replace.c @@ -321,10 +321,6 @@ int btrfs_dev_replace_start(struct btrfs_root *root, return -EINVAL; } - if ((args-start.srcdevid == 0 args-start.srcdev_name[0] == '\0') || - args-start.tgtdev_name[0] == '\0') - return -EINVAL; - /* the disk copy procedure reuses the scrub code */ mutex_lock(fs_info-volume_mutex); ret = btrfs_find_device_by_user_input(root, args-start.srcdevid, diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index dcb10fa..5803c45 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -2001,6 +2001,9 @@ int btrfs_find_device_by_user_input(struct btrfs_root *root, u64 srcdevid, if (!*device) ret = -ENOENT; } else { + if (!srcdev_name || !srcdev_name[0]) + return -EINVAL; + ret = btrfs_find_device_missing_or_by_path(root, srcdev_name, device); } -- 2.4.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 14/23] Btrfs: fix btrfs_scratch_superblock() with fixes from device delete
This patch updates the btrfs_scratch_superblock(), (which is used by the replace device thread), with those fixes from the scratch superblock code section of btrfs_rm_device(). The fixes are: Scratch all copies of superblock Notify kobject that superblock has been changed Update time on the device so that btrfs_rm_device() can use the function btrfs_scratch_superblock() instead of its own scratch code. And further replace deivce code which similarly releases device back to the system, will have the fixes from the btrfs device delete. Signed-off-by: Anand Jain anand.j...@oracle.com --- fs/btrfs/volumes.c | 40 fs/btrfs/volumes.h | 2 +- 2 files changed, 29 insertions(+), 13 deletions(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 1d35332..b2a19ea 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -1915,7 +1915,8 @@ void btrfs_rm_dev_replace_remove_srcdev(struct btrfs_fs_info *fs_info, if (srcdev-writeable) { fs_devices-rw_devices--; /* zero out the old super if it is writable */ - btrfs_scratch_superblock(srcdev); + btrfs_scratch_superblock(srcdev-bdev, + rcu_str_deref(srcdev-name)); } if (srcdev-bdev) @@ -1965,7 +1966,8 @@ void btrfs_destroy_dev_replace_tgtdev(struct btrfs_fs_info *fs_info, btrfs_sysfs_rm_device_link(fs_info-fs_devices, tgtdev); if (tgtdev-bdev) { - btrfs_scratch_superblock(tgtdev); + btrfs_scratch_superblock(tgtdev-bdev, + rcu_str_deref(tgtdev-name)); fs_info-fs_devices-open_devices--; } fs_info-fs_devices-num_devices--; @@ -6844,22 +6846,36 @@ int btrfs_get_dev_stats(struct btrfs_root *root, return 0; } -int btrfs_scratch_superblock(struct btrfs_device *device) +void btrfs_scratch_superblock(struct block_device *bdev, char *device_path) { struct buffer_head *bh; struct btrfs_super_block *disk_super; + int copy_num; - bh = btrfs_read_dev_super(device-bdev); - if (IS_ERR(bh)) - return PTR_ERR(bh); - disk_super = (struct btrfs_super_block *)bh-b_data; + if (!bdev) + return; - memset(disk_super-magic, 0, sizeof(disk_super-magic)); - set_buffer_dirty(bh); - sync_dirty_buffer(bh); - brelse(bh); + for (copy_num = 0; copy_num BTRFS_SUPER_MIRROR_MAX; + copy_num++) { - return 0; + if (btrfs_read_dev_one_super(bdev, copy_num, bh)) + continue; + + disk_super = (struct btrfs_super_block *)bh-b_data; + + memset(disk_super-magic, 0, sizeof(disk_super-magic)); + set_buffer_dirty(bh); + sync_dirty_buffer(bh); + brelse(bh); + } + + /* Notify udev that device has changed */ + btrfs_kobject_uevent(bdev, KOBJ_CHANGE); + + /* Update ctime/mtime for device path for libblkid */ + update_dev_time(device_path); + + return; } /* diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h index a093b36..32a66c7 100644 --- a/fs/btrfs/volumes.h +++ b/fs/btrfs/volumes.h @@ -477,7 +477,7 @@ void btrfs_destroy_dev_replace_tgtdev(struct btrfs_fs_info *fs_info, struct btrfs_device *tgtdev); void btrfs_init_dev_replace_tgtdev_for_resume(struct btrfs_fs_info *fs_info, struct btrfs_device *tgtdev); -int btrfs_scratch_superblock(struct btrfs_device *device); +void btrfs_scratch_superblock(struct block_device *bdev, char *device_path); int btrfs_is_parity_mirror(struct btrfs_mapping_tree *map_tree, u64 logical, u64 len, int mirror_num); unsigned long btrfs_full_stripe_len(struct btrfs_root *root, -- 2.4.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 19/23] Btrfs: avoid user cli usage error logging into the sys log
Signed-off-by: Anand Jain anand.j...@oracle.com --- fs/btrfs/volumes.c | 1 - 1 file changed, 1 deletion(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 5803c45..f3ca87d 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -198,7 +198,6 @@ btrfs_get_bdev_and_sb(const char *device_path, fmode_t flags, void *holder, if (IS_ERR(*bdev)) { ret = PTR_ERR(*bdev); - printk(KERN_INFO BTRFS: open %s failed\n, device_path); goto error; } -- 2.4.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html