Re: File system is oddly full after kernel upgrade, balance doesn't help
MegaBrutal posted on Fri, 27 Jan 2017 19:45:00 +0100 as excerpted: > Hi, > > Not sure if it caused by the upgrade, but I only encountered this > problem after I upgraded to Ubuntu Yakkety, which comes with a 4.8 > kernel. > Linux vmhost 4.8.0-34-generic #36-Ubuntu SMP Wed Dec 21 17:24:18 UTC > 2016 x86_64 x86_64 x86_64 GNU/Linux > > This is the 2nd file system which showed these symptoms, so I thought > it's more than happenstance. I don't remember what I did with the first > one, but I somehow managed to fix it with balance, if I remember > correctly, but it doesn't help with this one. > > FS state before any attempts to fix: > Filesystem 1M-blocks Used Available Use% Mounted on > [...]curlybrace 1024 1024 0 100% /tmp/mnt/curlybrace > > Resized LV, run „btrfs filesystem resize max /tmp/mnt/curlybrace”: > [...]curlybrace 2048 1303 0 100% /tmp/mnt/curlybrace > > Notice how the usage magically jumped up to 1303 MB, and despite the FS > size is 2048 MB, the usage is still displayed as 100%. > > Tried full balance (other options with -dusage had no result): > root@vmhost:~# btrfs balance start -v /tmp/mnt/curlybrace > Starting balance without any filters. > ERROR: error during balancing '/tmp/mnt/curlybrace': > No space left on device > No space left on device? How? > > But it changed the situation: > [...]curlybrace 2048 1302 190 88% /tmp/mnt/curlybrace > > This is still not acceptable. I need to recover at least 50% free space > (since I increased the FS to the double). > > A 2nd balance attempt resulted in this: > [...]curlybrace 2048 1302 162 89% /tmp/mnt/curlybrace > > So... it became slightly worse. > > What's going on? How can I fix the file system to show real data? Something seems off, yes, but... https://btrfs.wiki.kernel.org/index.php/FAQ Reading the whole thing will likely be useful, but especially 1.3/1.4 and 4.6-4.9 discussing the problem of space usage, reporting, and (primarily in some of the other space related FAQs beyond the specific ones above) how to try and fix it when space runes out, on btrfs. If you read them before, read them again, because you didn't post the btrfs free-space reports covered in 4.7, instead posting what appears to be the standard (non-btrfs) df report, which for all the reasons explained in the FAQ, is at best only an estimate on btrfs. That estimate is obviously behaving unexpectedly in your case, but without the btrfs specific reports, it's nigh impossible to even guess with any chance at accuracy what's going on, or how to fix it. A WAG would be that part of the problem might be that you were into global reserve before the resize, so after the filesystem got more space to use, the first thing it did was unload that global reserve usage, thereby immediately upping apparent usage. That might explain that initial jump in usage after the resize. But that's just a WAG. Without at least btrfs filesystem usage, or btrfs filesystem df plus btrfs filesystem show, from before the resize, after, and before and after the balances, a WAG is what it remains. And again, without those reports, there's no way to say whether balance can be expected to help, or not. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs recovery
Austin S. Hemmelgarn posted on Fri, 27 Jan 2017 07:58:20 -0500 as excerpted: > On 2017-01-27 06:01, Oliver Freyermuth wrote: >>> I'm also running 'memtester 12G' right now, which at least tests 2/3 >>> of the memory. I'll leave that running for a day or so, but of course >>> it will not provide a clear answer... >> >> A small update: while the online memtester is without any errors still, >> I checked old syslogs from the machine and found something intriguing. >> kernel: Corrupted low memory at 88009000 (9000 phys) = 00098d39 >> kernel: Corrupted low memory at 88009000 (9000 phys) = 00099795 >> kernel: Corrupted low memory at 88009000 (9000 phys) = 000dd64e 0x9000 = 36K... >> This seems to be consistently happening from time to time (I have low >> memory corruption checking compiled in). >> The numbers always consistently increase, and after a reboot, start >> fresh from a small number again. >> >> I suppose this is a BIOS bug and it's storing some counter in low >> memory. I am unsure whether this could have triggered the BTRFS >> corruption, nor do I know what to do about it (are there kernel quirks >> for that?). The vendor does not provide any updates, as usual. >> >> If someone could confirm whether this might cause corruption for btrfs >> (and maybe direct me to the correct place to ask for a kernel quirk for >> this device - do I ask on MM, or somewhere else?), that would be much >> appreciated. > It is a firmware bug, Linux doesn't use stuff in that physical address > range at all. I don't think it's likely that this specific bug caused > the corruption, but given that the firmware doesn't have it's > allocations listed correctly in the e820 table (if they were listed > correctly, you wouldn't be seeing this message), it would not surprise > me if the firmware was involved somehow. Correct me if I'm wrong (I'm no kernel expert, but I've been building my own kernel for well over a decade now so having a working familiarity with the kernel options, of which the following is my possibly incorrect read), but I believe that's only "fact check: mostly correct" (mostly as in yes it's the default, but there's a mainline kernel option to change it). I was just going over the related kernel options again a couple days ago, so they're fresh in my head, and AFAICT... There are THREE semi-related kernel options (config UI option location is based on the mainline 4.10-rc5+ git kernel I'm presently running): DEFAULT_MMAP_MIN_ADDR Config location: Processor type and features: Low address space to protect from user allocation This one is virtual memory according to config help, so likely not directly related, but similar idea. X86_CHECK_BIOS_CORRUPTION Location: Same section, a few lines below the first one: Check for low memory corruption I guess this is the option you (OF) have enabled. Note that according to help, in addition to enabling this in options, a runtime kernel commandline option must be given as well, to actually enable the checks. X86_RESERVE_LOW Location: Same section, immediately below the check option: Amount of low memory, in kilobytes, to reserve for the BIOS Help for this one suggests enabling the check bios corruption option above if there are any doubts, so the two are directly related. All three options apparently default to 64K (as that's what I see here and I don't believe I've changed them), but can be changed. See the kernel options help and where it points for more. My read of the above is that yes, by default the kernel won't use physical 0x9000 (36K), as it's well within the 64K default reserve area, but a blanket "Linux doesn't use stuff in that physical address range at all" is incorrect, as if the defaults have been changed it /could/ use that space (#3's minimum is 1 page, 4K, leaving that 36K address uncovered) -- there's a mainline-official option to do so, so it doesn't even require patching. Meanwhile, since the defaults cover it, no quirk should be necessary (tho I might increase the reserve and test coverage area to the maximum 640K and run for awhile to be sure it's not going above the 64K default), but were it outside the default 64K coverage area, I would probably file it as a bug (my usual method for confirmed bugs), and mark it initially as an arch-x86 bug, tho they may switch it to something else, later. But the devs would probably suggest further debugging, possibly giving you debug patches to try, etc, to nail down the specific device, before setting up a quirk for it. Because the problem could be an expansion card or something, not the mobo/factory-default-machine, too, and it'd be a shame to setup a quirk for the wrong hardware. >> Additionally, I found that "btrfs restore" works on this broken FS. I >> will take an external backup of the content within the next 24 hours >> using that, then I am ready to try anything you suggeest. > FWIW the fact that btrfs restore works is a good
[GIT PULL] Btrfs
Hi Linus, My for-linus-4.10 branch: git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs.git for-linus-4.10 Has some fixes that we've collected from the list. We still have one more pending to nail down a regression in lzo compression, but I wanted to get this batch out the door. Omar Sandoval (3) commits (+2/-6): Btrfs: remove ->{get, set}_acl() from btrfs_dir_ro_inode_operations (+0/-2) Btrfs: remove old tree_root case in btrfs_read_locked_inode() (+1/-4) Btrfs: disable xattr operations on subvolume directories (+1/-0) Liu Bo (1) commits (+12/-1): Btrfs: fix truncate down when no_holes feature is enabled Chandan Rajendra (1) commits (+2/-2): Btrfs: Fix deadlock between direct IO and fast fsync Wang Xiaoguang (1) commits (+1/-0): btrfs: fix false enospc error when truncating heavily reflinked file Total: (6) commits (+17/-9) fs/btrfs/inode.c | 26 +- 1 file changed, 17 insertions(+), 9 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: raid1: cannot add disk to replace faulty because can only mount fs as read-only.
On Fri, Jan 27, 2017 at 03:03:18PM -0500, Austin S. Hemmelgarn wrote: > On 2017-01-27 11:47, Hans Deragon wrote: > > However, as a user, I am seeking for an easy, no maintenance raid > > solution. I wish that if a drive fails, the btrfs filesystem still > > mounts rw and leaves the OS running, but warns the user of the failing > > disk and easily allow the addition of a new drive to reintroduce > > redundancy. > Before I make any suggestions regarding this, I should point out that > mounting read-write when a device is missing is what caused this issue in > the first place. Doing so is extremely dangerous in any RAID setup, > regardless of your software stack. The filesystem is expected to store > things reliably when a write succeeds, and if you've got a broken RAID > array, claiming that you can store things reliably is generally a lie. MD > and LVM both have things in place to mitigate most of the risk, but even > there it's still risky. Yes, it's not convenient to have to deal with a > system that won't boot, but it's at least a whole lot easier from Linux than > it is in most other operating systems. Now, now. Other RAID implementations already have this feature that you're clamoring for! When it is degraded, they will continue without a hitch, and perform their duties not even bothering the user. Then a couple years later, the other disk will fail. Obviously, there are no backups -- "we have RAID". This is when I get a call. > The second is proper monitoring. A well set up monitoring system will let > you know when the disk is failing before it gets to the point of just > disappearing from the system most of the time. No problem, the second busted disk I mentioned above will include a full mbox with a mail from mdadm for every single day. They were either unread, or read by an admin who ignored them and perhaps even wrote a filter to send them to /dev/null. Because the system still works, what's the hurry? Meow! -- Autotools hint: to do a zx-spectrum build on a pdp11 host, type: ./configure --host=zx-spectrum --build=pdp11 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: raid1: cannot add disk to replace faulty because can only mount fs as read-only.
On 2017-01-27 11:47, Hans Deragon wrote: On 2017-01-24 14:48, Adam Borowski wrote: On Tue, Jan 24, 2017 at 01:57:24PM -0500, Hans Deragon wrote: If I remove 'ro' from the option, I cannot get the filesystem mounted because of the following error: BTRFS: missing devices(1) exceeds the limit(0), writeable mount is not allowed So I am stuck. I can only mount the filesystem as read-only, which prevents me to add a disk. A known problem: you get only one shot at fixing the filesystem, but that's not because of some damage but because the check whether the fs is in a shape is good enough to mount is oversimplistic. Here's a patch, if you apply it and recompile, you'll be able to mount degraded rw. Note that it removes a safety harness: here, the harness got tangled up and keeps you from recovering when it shouldn't, but it _has_ valid uses that. Meow! Greetings, Ok, that solution will solve my problem in the short run, i.e. getting my raid1 up again. However, as a user, I am seeking for an easy, no maintenance raid solution. I wish that if a drive fails, the btrfs filesystem still mounts rw and leaves the OS running, but warns the user of the failing disk and easily allow the addition of a new drive to reintroduce redundancy. Are there any plans within the btrfs community to implement such a feature? In a year from now, when the other drive will fail, will I hit again this problem, i.e. my OS failing to start, booting into a terminal, and cannot reintroduce a new drive without recompiling the kernel? Before I make any suggestions regarding this, I should point out that mounting read-write when a device is missing is what caused this issue in the first place. Doing so is extremely dangerous in any RAID setup, regardless of your software stack. The filesystem is expected to store things reliably when a write succeeds, and if you've got a broken RAID array, claiming that you can store things reliably is generally a lie. MD and LVM both have things in place to mitigate most of the risk, but even there it's still risky. Yes, it's not convenient to have to deal with a system that won't boot, but it's at least a whole lot easier from Linux than it is in most other operating systems. Now, the first step to reliable BTRFS usage is using up-to-date kernels. If you're actually serious about using BTRFS, you should be doing this anyway though. Assuming you're keeping up-to-date on the kernel, then you won't hit this same problem again (or at least you shouldn't, since multiple people now have checks for this in their regression testing suites for BTRFS). The second is proper monitoring. A well set up monitoring system will let you know when the disk is failing before it gets to the point of just disappearing from the system most of the time. There is currently no specific monitoring tool for BTRFS, but it's really easy to set up automated monitoring for stuff like this. It's impractical for me to cover exact configuration here, since I don't know how much background you have dealing with stuff like this (and you're probably using systemd since it's Ubnutu, and I have near zero background dealing with recurring task scheduling with that). I can however cover a list of what you should be monitoring and roughly how often: 1. SMART status from the storage devices. You'll need smartmontools for this. In general, I'd suggest using smartctl through cron or a systemd timer unit to monitor this instead of smartd. Basic command-line that will work on all modern SATA disks to perform the checks you want is: smartctl -H /dev/sda You'll need one call for each disk, just replace /dev/sda with each device. Note that this should be the device itself, not the partitions. If that command spits out a warning (or returns with an exit code other than 0), something's wrong and you should at least investigate (and possibly look at replacing the disk). I would suggest checking SMART status at least daily, and potentially much more frequently. When the self-checks in the disk firmware start failing (which is what this is checking), it generally means that failure is imminent, usually within a couple of days at most. 2. BTRFS scrub. if you're serious about data safety, you should be running a scrub on the filesystem regularly. As a general rule, once a week is reasonable unless you have marginal hardware or are seriously paranoid. Make sure to check the results later with the 'btrfs scrub status' command. It will tell you if it found any errors, and how many it was able to fix. Isolated single errors are generally not a sign of imminent failure, it's when they start happening regularly or you see a whole lot at once that you're in trouble. Scrub will also fix most synchronization issues between devices in a RAID set. 3. BTRFS device stats. BTRFS stores per-device error counters in the filesystem. These track cumulative errors since the last time they were
File system is oddly full after kernel upgrade, balance doesn't help
Hi, Not sure if it caused by the upgrade, but I only encountered this problem after I upgraded to Ubuntu Yakkety, which comes with a 4.8 kernel. Linux vmhost 4.8.0-34-generic #36-Ubuntu SMP Wed Dec 21 17:24:18 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux This is the 2nd file system which showed these symptoms, so I thought it's more than happenstance. I don't remember what I did with the first one, but I somehow managed to fix it with balance, if I remember correctly, but it doesn't help with this one. FS state before any attempts to fix: Filesystem 1M-blocks Used Available Use% Mounted on /dev/mapper/vmdata--vg-lxc--curlybrace 1024 1024 0 100% /tmp/mnt/curlybrace Resized LV, run „btrfs filesystem resize max /tmp/mnt/curlybrace”: /dev/mapper/vmdata--vg-lxc--curlybrace 2048 1303 0 100% /tmp/mnt/curlybrace Notice how the usage magically jumped up to 1303 MB, and despite the FS size is 2048 MB, the usage is still displayed as 100%. Tried full balance (other options with -dusage had no result): root@vmhost:~# btrfs balance start -v /tmp/mnt/curlybrace Dumping filters: flags 0x7, state 0x0, force is off DATA (flags 0x0): balancing METADATA (flags 0x0): balancing SYSTEM (flags 0x0): balancing WARNING: Full balance without filters requested. This operation is very intense and takes potentially very long. It is recommended to use the balance filters to narrow down the balanced data. Use 'btrfs balance start --full-balance' option to skip this warning. The operation will start in 10 seconds. Use Ctrl-C to stop it. 10 9 8 7 6 5 4 3 2 1 Starting balance without any filters. ERROR: error during balancing '/tmp/mnt/curlybrace': No space left on device There may be more info in syslog - try dmesg | tail No space left on device? How? But it changed the situation: /dev/mapper/vmdata--vg-lxc--curlybrace 2048 1302 190 88% /tmp/mnt/curlybrace This is still not acceptable. I need to recover at least 50% free space (since I increased the FS to the double). A 2nd balance attempt resulted in this: /dev/mapper/vmdata--vg-lxc--curlybrace 2048 1302 162 89% /tmp/mnt/curlybrace So... it became slightly worse. What's going on? How can I fix the file system to show real data? Regards, MegaBrutal -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] xfstests: btrfs/047: check btrfs-convert with extent and non-extent source
This is used to check the source which contains combination of Ext3 files in non-extent format and Ext4 extent-files. And validate the file md5sums before and after conversion. btrfs/012: BTRFS_CONVERT_PROG,E2FSCK_PROG definitions reused from common/config Signed-off-by: Lakshmipathi.G--- common/config | 3 ++ tests/btrfs/012 | 3 -- tests/btrfs/047 | 122 tests/btrfs/047.out | 2 + tests/btrfs/group | 1 + 5 files changed, 128 insertions(+), 3 deletions(-) create mode 100755 tests/btrfs/047 create mode 100644 tests/btrfs/047.out diff --git a/common/config b/common/config index 0706aca..fa89f42 100644 --- a/common/config +++ b/common/config @@ -240,11 +240,14 @@ case "$HOSTOS" in export DUMP_F2FS_PROG="`set_prog_path dump.f2fs`" export BTRFS_UTIL_PROG="`set_prog_path btrfs`" export BTRFS_SHOW_SUPER_PROG="`set_prog_path btrfs-show-super`" + export BTRFS_CONVERT_PROG="`set_prog_path btrfs-convert`" export XFS_FSR_PROG="`set_prog_path xfs_fsr`" export MKFS_NFS_PROG="false" export MKFS_CIFS_PROG="false" export MKFS_OVERLAY_PROG="false" export MKFS_REISER4_PROG="`set_prog_path mkfs.reiser4`" + export E2FSCK_PROG="`set_prog_path e2fsck`" + export TUNE2FS_PROG="`set_prog_path tune2fs`" ;; esac diff --git a/tests/btrfs/012 b/tests/btrfs/012 index 6a3cb81..85c82f0 100755 --- a/tests/btrfs/012 +++ b/tests/btrfs/012 @@ -54,9 +54,6 @@ _supported_fs btrfs _supported_os Linux _require_scratch_nocheck -BTRFS_CONVERT_PROG="`set_prog_path btrfs-convert`" -E2FSCK_PROG="`set_prog_path e2fsck`" - _require_command "$BTRFS_CONVERT_PROG" btrfs-convert _require_command "$MKFS_EXT4_PROG" mkfs.ext4 _require_command "$E2FSCK_PROG" e2fsck diff --git a/tests/btrfs/047 b/tests/btrfs/047 new file mode 100755 index 000..d349d12 --- /dev/null +++ b/tests/btrfs/047 @@ -0,0 +1,122 @@ +#! /bin/bash +# FS QA Test 047 +# +# Test btrfs-convert +# +# 1) create ext3 filesystem & populate it. +# 2) upgrade ext3 filesystem to ext4. +# 3) populate data. +# 4) source has combination of non-extent and extent files. +# 5) convert it to btrfs, mount and verify contents. +#--- +# Copyright (c) 2017 Lakshmipathi.G All Rights Reserved. +# +# This program is free software; you can redistribute it and/or +# modify it under the terms of the GNU General Public License as +# published by the Free Software Foundation. +# +# This program is distributed in the hope that it would be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program; if not, write the Free Software Foundation, +# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA +#--- +# + +seq=`basename $0` +seqres=$RESULT_DIR/$seq +echo "QA output created by $seq" + +here=`pwd` +tmp=/tmp/$$ +status=1 # failure is the default! +trap "_cleanup; exit \$status" 0 1 2 3 15 + +_cleanup() +{ + cd / + rm -f $tmp.* +} + +# get standard environment, filters and checks +. ./common/rc +. ./common/filter + +# remove previous $seqres.full before test +rm -f $seqres.full + +# real QA test starts here + +# Modify as appropriate. +_supported_fs btrfs +_supported_os Linux +_require_scratch_nocheck + +_require_command "$BTRFS_CONVERT_PROG" btrfs-convert +_require_command "$MKFS_EXT4_PROG" mkfs.ext4 +_require_command "$E2FSCK_PROG" e2fsck +_require_command "$TUNE2FS_PROG" tune2fs + +rm -f $seqres.full + +BLOCK_SIZE=`_get_block_size $TEST_DIR` +EXT_MD5SUM="$tmp.ext43" +BTRFS_MD5SUM="$tmp.btrfs" + +_populate_data(){ + data_path=$1 + mkdir -p $data_path + args=`_scale_fsstress_args -p 20 -n 100 $FSSTRESS_AVOID -d $data_path` + echo "Run fsstress $args" >>$seqres.full + $FSSTRESS_PROG $args >/dev/null 2>&1 & + fsstress_pid=$! + wait $fsstress_pid +} + +# Create & populate an ext3 filesystem +$MKFS_EXT4_PROG -F -t ext3 -b $BLOCK_SIZE $SCRATCH_DEV > $seqres.full 2>&1 || \ + _notrun "Could not create ext3 filesystem" + +# mount and populate non-extent file +mount -t ext3 $SCRATCH_DEV $SCRATCH_MNT +_populate_data "$SCRATCH_MNT/ext3_ext4_data/ext3" +_scratch_unmount + +# Upgrade it to ext4. +$TUNE2FS_PROG -O extents,uninit_bg,dir_index $SCRATCH_DEV >> $seqres.full 2>&1 +# After Conversion, its highly recommended to run e2fsck. +$E2FSCK_PROG -fyD $SCRATCH_DEV >> $seqres.full 2>&1 + +# mount and populate extent file +mount -t ext4 $SCRATCH_DEV $SCRATCH_MNT +_populate_data "$SCRATCH_MNT/ext3_ext4_data/ext4" + +# Compute md5 of ext3,ext4 files. +find "$SCRATCH_MNT/ext3_ext4_data"
Re: [PATCH 8/8] Revert "ext4: fix wrong gfp type under transaction"
On Fri, Jan 27, 2017 at 10:37:35AM +0100, Michal Hocko wrote: > If this ever turn out to be a problem and with the vmapped stacks we > have good chances to get a proper stack traces on a potential overflow > we can add the scope API around the problematic code path with the > explanation why it is needed. Yeah, or maybe we can automate it? Can the reclaim code check how much stack space is left and do the right thing automatically? The reason why I'm nervous is that nojournal mode is not a common configuration, and "wait until production systems start failing" is not a strategy that I or many SRE-types find comforting. So if we can assure ourselves that the right thing will happen automatically, or that lockdep will detect a required GFP_NOFS when running tests, the happier I'll be. - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: raid1: cannot add disk to replace faulty because can only mount fs as read-only.
On 2017-01-24 14:48, Adam Borowski wrote: On Tue, Jan 24, 2017 at 01:57:24PM -0500, Hans Deragon wrote: If I remove 'ro' from the option, I cannot get the filesystem mounted because of the following error: BTRFS: missing devices(1) exceeds the limit(0), writeable mount is not allowed So I am stuck. I can only mount the filesystem as read-only, which prevents me to add a disk. A known problem: you get only one shot at fixing the filesystem, but that's not because of some damage but because the check whether the fs is in a shape is good enough to mount is oversimplistic. Here's a patch, if you apply it and recompile, you'll be able to mount degraded rw. Note that it removes a safety harness: here, the harness got tangled up and keeps you from recovering when it shouldn't, but it _has_ valid uses that. Meow! Greetings, Ok, that solution will solve my problem in the short run, i.e. getting my raid1 up again. However, as a user, I am seeking for an easy, no maintenance raid solution. I wish that if a drive fails, the btrfs filesystem still mounts rw and leaves the OS running, but warns the user of the failing disk and easily allow the addition of a new drive to reintroduce redundancy. Are there any plans within the btrfs community to implement such a feature? In a year from now, when the other drive will fail, will I hit again this problem, i.e. my OS failing to start, booting into a terminal, and cannot reintroduce a new drive without recompiling the kernel? Best regards, Hans Deragon -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC][PATCH v2 4/4] vfs: wrap write f_ops with file_{start,end}_write()
[adding mfasheh & btrfs list to cc] On Fri, Jan 27, 2017 at 06:20:12PM +0200, Amir Goldstein wrote: > On Fri, Jan 27, 2017 at 1:50 PM, Amir Goldsteinwrote: > > On Fri, Jan 27, 2017 at 1:09 PM, Miklos Szeredi wrote: > >> On Mon, Jan 23, 2017 at 8:43 PM, Amir Goldstein wrote: > >>> Before calling write f_ops, call file_start_write() instead > >>> of sb_start_write(). > >>> > >>> This ensures freeze protection for both overlay and upper fs > >>> when file is open from an overlayfs mount. > >>> > >>> Replace {sb,file}_start_write() for {copy,clone}_file_range() and > >>> for fallocate(). > >>> > >>> For dedup_file_range() there is no need for mnt_want_write_file(). > >>> File is already open for write, so we already have mnt_want_write() > >>> and we only need file_start_write(). > >> > >> Being opened for write is not verified if capable(CAP_SYS_ADMIN). > >> Ugly special case, don't ask me why it's done... > >> > > > > Christoph, Darrick, is that by design? > > Anyway, whether is makes sense or not, that's a legacy from > BTRFS_IOC_FILE_EXTENT_SAME, we probably have to live with. > > Michael, I recon man page needs updating. > > I'll remove this hunk from the patch. I /think/ that behavior (CAP_SYS_ADMIN not requiring destfd to be open for writes in order to dedupe) was intentional; it seems to date back to the original ioctl in 2013. My guess of the justification is that we're not really writing to dest, so if the admin comes along with an O_RDONLY destfd it's ok? Let's see if we get any bites from the btrfs developers. :) --D -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Btrfs progs release 4.9.1
Hi, btrfs-progs version 4.9.1 have been released. Changes: * check: * use correct inode number for lost+found files * lowmem mode: fix false alert on dropped leaf * size reports: negative numbers might appear in size reports during device deletes (previously in EiB units) * mkfs: print device being trimmed * defrag: v1 ioctl support dropped * quota: print message before starting to wait for rescan * qgroup show: new option to sync before printing the stats * other: * corrupt-block enhancements * backtrace and co. cleanups * doc fixes Changes since rc1: * change name of one test directory, duplicate numbers Tarballs: https://www.kernel.org/pub/linux/kernel/people/kdave/btrfs-progs/ Git: git://git.kernel.org/pub/scm/linux/kernel/git/kdave/btrfs-progs.git Shortlog: David Sterba (18): btrfs-progs: make negative number pretty printing optional btrfs-progs: enable negative numbers for unallocated device space btrfs-progs: mkfs: print device name while trimming btrfs-progs: defrag: force using v2 defrag ioctl and make default 32M threshold actually work btrfs-progs: defrag: remove v1 ioctl support btrfs-progs: qgroups show: clean up errno passing btrfs-progs: qgroup show: refine error messages btrfs-progs: tests: 005-qgroup-show btrfs-progs: kerncompat: add separate trace print for BUG_ON btrfs-progs: kerncompat: pass exact condition value from ASSERT btrfs-progs: kerncompat: disconnect assert and warning messages btrfs-progs: kerncompat: simplify warning_trace btrfs-progs: qgroup show: do not error if sync fails btrfs-progs: tests: add variable quotation to cli-tests btrfs-progs: tests: use built binaries for 004-send-parent-multi-subvol btrfs-progs: mkfs/convert: separate the convert part from make_btrfs btrfs-progs: update CHANGES for v4.9.1 Btrfs progs v4.9.1 Esteve Fernandez (1): btrfs-progs: docs: fix typo in btrfs-subvolume Goldwyn Rodrigues (2): btrfs-progs: check: get the highest inode for lost+found btrfs-progs: sanitize - Use correct source for memcpy Jeff Mahoney (1): btrfs-progs: quota: fix printing during wait mode Lakshmipathi.G (2): btrfs-progs: corrupt-block: Include inode nlink field btrfs-progs: corrupt-block: Include more inode fields Nicholas D Steeves (1): btrfs-progs: Fix spelling/typos in user-facing strings Qu Wenruo (2): btrfs-progs: check: fix false alert on dropped leaf in lowmem mode btrfs-progs: Fix disable backtrace assert error Tsutomu Itoh (3): btrfs-progs: qgroup: add sync option to 'qgroup show' btrfs-progs: qgroup: change the value of sort option btrfs-progs: tests: add test for --sync option of qgroup show Zygo Blaxell (1): btrfs-progs: utils: negative numbers are more plausible than sizes over 8 EiB -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs recovery
On 2017-01-27 06:01, Oliver Freyermuth wrote: I'm also running 'memtester 12G' right now, which at least tests 2/3 of the memory. I'll leave that running for a day or so, but of course it will not provide a clear answer... A small update: while the online memtester is without any errors still, I checked old syslogs from the machine and found something intriguing. Jan 16 10:03:11 xxx kernel: Corrupted low memory at 88009000 (9000 phys) = 00098d39 Jan 16 10:18:33 xxx kernel: Corrupted low memory at 88009000 (9000 phys) = 00099795 Jan 16 17:35:48 xxx kernel: Corrupted low memory at 88009000 (9000 phys) = 000dd64e This seems to be consistently happening from time to time (I have low memory corruption checking compiled in). The numbers always consistently increase, and after a reboot, start fresh from a small number again. I suppose this is a BIOS bug and it's storing some counter in low memory. I am unsure whether this could have triggered the BTRFS corruption, nor do I know what to do about it (are there kernel quirks for that?). The vendor does not provide any updates, as usual. If someone could confirm whether this might cause corruption for btrfs (and maybe direct me to the correct place to ask for a kernel quirk for this device - do I ask on MM, or somewhere else?), that would be much appreciated. It is a firmware bug, Linux doesn't use stuff in that physical address range at all. I don't think it's likely that this specific bug caused the corruption, but given that the firmware doesn't have it's allocations listed correctly in the e820 table (if they were listed correctly, you wouldn't be seeing this message), it would not surprise me if the firmware was involved somehow. We can probably talk you through fixing this by hand with a decent hex editor. I've done it before... That would be nice! Is it fine via the mailing list? Potentially, the instructions could be helpful for future reference, and "real" IRC is not accessible from my current location. Do you have suggestions for a decent hexeditor for this job? Until now, I have been mainly using emacs, classic hexedit (http://rigaux.org/hexedit.html), or okteta (beware, it's graphical!), but of course these were made for a few MiB of files and are not so well suited for a block device. The first thing to do would then probably just be to jump to the offset where 0xd89500014da12000 is written (can I get that via inspect-internal, or do I have to search for it?), fix that to read 0x00a800014da12000 (if I understood correctly) and then probably adapt a checksum? Additionally, I found that "btrfs restore" works on this broken FS. I will take an external backup of the content within the next 24 hours using that, then I am ready to try anything you suggeest. FWIW< the fact that btrfs restore works is a good sign, it means that the filesystem is almost certainly repairable (even though the tools might not be able to repair it themselves). -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 8/8] Revert "ext4: fix wrong gfp type under transaction"
On Fri 27-01-17 01:13:18, Theodore Ts'o wrote: > On Thu, Jan 26, 2017 at 08:44:55AM +0100, Michal Hocko wrote: > > > > I'm convinced the current series is OK, only real life will tell us > > > > whether > > > > we missed something or not ;) > > > > > > I would like to extend the changelog of "jbd2: mark the transaction > > > context with the scope GFP_NOFS context". > > > > > > " > > > Please note that setups without journal do not suffer from potential > > > recursion problems and so they do not need the scope protection because > > > neither ->releasepage nor ->evict_inode (which are the only fs entry > > > points from the direct reclaim) can reenter a locked context which is > > > doing the allocation currently. > > > " > > > > Could you comment on this Ted, please? > > I guess so there still is one way this could screw us, and it's this > reason for GFP_NOFS: > > - to prevent from stack overflows during the reclaim because > the allocation is performed from a deep context already > > The writepages call stack can be pretty deep. (Especially if we're > using ext4 in no journal mode over, say, iSCSI.) > > How much stack space can get consumed by a reclaim? ./scripts/stackusage with allyesconfig says: ./mm/page_alloc.c:3745 __alloc_pages_nodemask 264 static ./mm/page_alloc.c:3531 __alloc_pages_slowpath 520 static ./mm/vmscan.c:2946 try_to_free_pages 216 static ./mm/vmscan.c:2753 do_try_to_free_pages304 static ./mm/vmscan.c:2517 shrink_node 352 static ./mm/vmscan.c:2317 shrink_node_memcg 560 static ./mm/vmscan.c:1692 shrink_inactive_list688 static ./mm/vmscan.c:908 shrink_page_list608 static So this would be 3512 for the standard LRUs reclaim whether we have GFP_FS or not. shrink_page_list can recurse to releasepage but there is no NOFS protection there so it doesn't make much sense to check this path. So we are left with the slab shrinkers path ./mm/page_alloc.c:3745 __alloc_pages_nodemask 264 static ./mm/page_alloc.c:3531 __alloc_pages_slowpath 520 static ./mm/vmscan.c:2946 try_to_free_pages 216 static ./mm/vmscan.c:2753 do_try_to_free_pages304 static ./mm/vmscan.c:2517 shrink_node 352 static ./mm/vmscan.c:427 shrink_slab 336 static ./fs/super.c:56 super_cache_scan104 static << here we have the NOFS protection ./fs/dcache.c:1089 prune_dcache_sb 152 static ./fs/dcache.c:939 shrink_dentry_list 96 static ./fs/dcache.c:509 __dentry_kill 72 static ./fs/dcache.c:323 dentry_unlink_inode 64 static ./fs/inode.c:1527 iput80 static ./fs/inode.c:532evict 72 static This is where the fs specific callbacks play role and I am not sure which paths can pass through for ext4 in the nojournal mode and how much of the stack this can eat. But currently we are at +536 wrt. NOFS context. This is quite a lot but still much less (2632 vs. 3512) than the regular reclaim. So there is quite some stack space to eat... I am wondering whether we have to really treat nojournal mode any special just because of the stack usage? If this ever turn out to be a problem and with the vmapped stacks we have good chances to get a proper stack traces on a potential overflow we can add the scope API around the problematic code path with the explanation why it is needed. Does that make sense to you? Thanks! -- Michal Hocko SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs recovery
> I'm also running 'memtester 12G' right now, which at least tests 2/3 of the > memory. I'll leave that running for a day or so, but of course it will not > provide a clear answer... A small update: while the online memtester is without any errors still, I checked old syslogs from the machine and found something intriguing. Jan 16 10:03:11 xxx kernel: Corrupted low memory at 88009000 (9000 phys) = 00098d39 Jan 16 10:18:33 xxx kernel: Corrupted low memory at 88009000 (9000 phys) = 00099795 Jan 16 17:35:48 xxx kernel: Corrupted low memory at 88009000 (9000 phys) = 000dd64e This seems to be consistently happening from time to time (I have low memory corruption checking compiled in). The numbers always consistently increase, and after a reboot, start fresh from a small number again. I suppose this is a BIOS bug and it's storing some counter in low memory. I am unsure whether this could have triggered the BTRFS corruption, nor do I know what to do about it (are there kernel quirks for that?). The vendor does not provide any updates, as usual. If someone could confirm whether this might cause corruption for btrfs (and maybe direct me to the correct place to ask for a kernel quirk for this device - do I ask on MM, or somewhere else?), that would be much appreciated. >>We can probably talk you through fixing this by hand with a decent >> hex editor. I've done it before... >> > That would be nice! Is it fine via the mailing list? > Potentially, the instructions could be helpful for future reference, and > "real" IRC is not accessible from my current location. > > Do you have suggestions for a decent hexeditor for this job? Until now, I > have been mainly using emacs, > classic hexedit (http://rigaux.org/hexedit.html), or okteta (beware, it's > graphical!), but of course these were made for a few MiB of files and are not > so well suited for a block device. > > The first thing to do would then probably just be to jump to the offset where > 0xd89500014da12000 is written (can I get that via inspect-internal, or do I > have to search for it?), fix that to read > 0x00a800014da12000 > (if I understood correctly) and then probably adapt a checksum? > Additionally, I found that "btrfs restore" works on this broken FS. I will take an external backup of the content within the next 24 hours using that, then I am ready to try anything you suggeest. Cheers and thanks! Oliver -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html