Re: Filesystem Corruption
On Mon, Dec 3, 2018, at 4:31 AM, Stefan Malte Schumacher wrote: > I have noticed an unusual amount of crc-errors in downloaded rars, > beginning about a week ago. But lets start with the preliminaries. I > am using Debian Stretch. > Kernel: Linux mars 4.9.0-8-amd64 #1 SMP Debian 4.9.110-3+deb9u4 > (2018-08-21) x86_64 GNU/Linux > > [5390748.884929] Buffer I/O error on dev dm-0, logical block > 976701312, async page read Excuse me for butting when there are *many* more qualified people on this list. But assuming the rar crc errors are related to your unexplained buffer I/O errors, (and not some weird coincidence of simply bad downloads.), I would start, immediately, by testing the Memory. Ram corruption can wreak havok with btrfs, (any filesystem but I think BTRFS has special challenges in this regard.) and this looks like memory error to me.
Re: Understanding "btrfs filesystem usage"
On 2018-10-29 02:11 PM, Ulli Horlacher wrote: > I want to know how many free space is left and have problems in > interpreting the output of: > > btrfs filesystem usage > btrfs filesystem df > btrfs filesystem show > > In my not so humble opinion, the filesystem usage command has the easiest to understand output. It' lays out all the pertinent information. You can clearly see 825GiB is allocated, with 494GiB used, therefore, filesystem show is actually using the "Allocated" value as "Used". Allocated can be thought of "Reserved For". As the output of the Usage command and df command clearly show, you have almost 400GiB space available. Note that the btrfs commands are clearly and explicitly displaying values in Binary units, (Mi, and Gi prefix, respectively). If you want df command to match, use -h instead of -H (see man df) An observation: The disparity between 498GiB used and 823Gib is pretty high. This is probably the result of using an SSD with an older kernel. If your kernel is not very recent, (sorry, I forget where this was fixed, somewhere around 4.14 or 4.15), then consider mounting with the nossd option. You can improve this by running a balance. Something like: btrfs balance start -dusage=55 You do *not* want to end up with all your space allocated to Data, but not actually used by data. Bad things can happen if you run out of Unallocated space for more metadata. (not catastrophic, but awkward and unexpected downtime that can be a little tricky to sort out.) <>
Re: Have 15GB missing in btrfs filesystem.
On 2018-10-27 04:19 PM, Marc MERLIN wrote: > Thanks for confirming. Because I always have snapshots for btrfs > send/receive, defrag will duplicate as you say, but once the older > snapshots get freed up, the duplicate blocks should go away, correct? > > Back to usage, thanks for pointing out that command: > saruman:/mnt/btrfs_pool1# btrfs fi usage . > Overall: > Device size: 228.67GiB > Device allocated: 203.54GiB > Device unallocated: 25.13GiB > Device missing:0.00B > Used: 192.01GiB > Free (estimated): 32.44GiB (min: 19.88GiB) > Data ratio: 1.00 > Metadata ratio: 2.00 > Global reserve:512.00MiB (used: 0.00B) > > Data,single: Size:192.48GiB, Used:185.16GiB >/dev/mapper/pool1 192.48GiB > > Metadata,DUP: Size:5.50GiB, Used:3.42GiB >/dev/mapper/pool111.00GiB > > System,DUP: Size:32.00MiB, Used:48.00KiB >/dev/mapper/pool164.00MiB > > Unallocated: >/dev/mapper/pool125.13GiB > > > I'm still seing that I'm using 192GB, but 203GB allocated. > Do I have 25GB usable: > Device unallocated: 25.13GiB > > Or 35GB usable? > Device size: 228.67GiB > - > Used:192.01GiB > = 36GB ? > The answer is somewhere between the two. (BTRFS's estimate of 32.44 Free is probably as close as you'll get to predicting.) So you have 7.32GB allocated but still free space for data, and 25GB of completely unallocated disk space. However, as you add more data, or create more snapshots and create metadata duplication, some of that 25GB will be allocated for Metadata. Remember that Metadata is Duplicated (so that 3.42GB of Metadata you are using now is actually using 6.84GB of disk space, out of the allocated 11GB You want to be careful that unallocated space doesn't run out.If the system runs out of usable space for metadata, it can be tricky to get yourself out of the corner. That is why a large discrepency between Data Size and Used would be a concern. If those 25GB of space were allocated to data, your would get out of space errors even if the 25GB was still unused. On that note, you seem to have a rather high metadata to data ratio.. (at least, compared to my limited experience.). Are you using noatime on your filesystems? without it, snapshots will end up causing duplicated metadata when atime updates. <>
Re: Have 15GB missing in btrfs filesystem.
On 2018-10-27 01:42 PM, Marc MERLIN wrote: > > I've been using btrfs for a long time now but I've never had a > filesystem where I had 15GB apparently unusable (7%) after a balance. > The space isn't unusable. It's just allocated.. (It's used in the sense that it's reserved for data chunks.). Start writing data to the drive, and the data will fill that space before more gets allocated.. (Unless you are using an older kernel and the filesystem gets mounted with ssd option, in which case, you'll want to add nossd option to prevent that behaviour.) You can use btrfs fi usage to display that more clearly. > I can try a defrag next, but since I have COW for snapshots, it's not > going to help much, correct? The defrag will end up using more space, as the fragmented parts of files will get duplicated. That being said, if you have the luxury to defrag *before* taking new snapshots, that would be the time to do it. <>
Re: Two partitionless BTRFS drives no longer seen as containing BTRFS filesystem
On 2018-10-06 07:23 PM, evan d wrote: > I have two hard drives that were never partitioned, but set up as two > independent BRTFS filesystems. Both drives were used in the same > machine running Arch Linux and the drives contain(ed) largely static > data. > > I decommissioned the machine they were originally used in and on > installing in a newer Arch build found that BRTFS reported no > filesystem on either of the drives. > > uname -a: > Linux z87i-pro 4.18.9-arch1-1-ARCH #1 SMP PREEMPT Wed Sep 19 21:19:17 > UTC 2018 x86_64 GNU/Linux > > btrfs --version: btrfs-progs v4.17.1 > btrfs fi show: returns no data Did you try a btrfs device scan ? (Normally, that would be done on boot, but depending on how your arch was configured, or if the devices are available early enough in the boot process) <>
Re: btrfs problems
On 2018-09-20 05:35 PM, Adrian Bastholm wrote: > Thanks a lot for the detailed explanation. > Aabout "stable hardware/no lying hardware". I'm not running any raid > hardware, was planning on just software raid. three drives glued > together with "mkfs.btrfs -d raid5 /dev/sdb /dev/sdc /dev/sdd". Would > this be a safer bet, or would You recommend running the sausage method > instead, with "-d single" for safety ? I'm guessing that if one of the > drives dies the data is completely lost > Another variant I was considering is running a raid1 mirror on two of > the drives and maybe a subvolume on the third, for less important > stuff In case you were not aware, it's perfectly acceptable with BTRFS to use Raid 1 over 3 devices. Even more amazing, regardless of how many devices you start with, 2, 3, 4, whatever, you can add a single drive to the array to increase capacity, (at 50%, of course,, ie, adding a 4TB drive will give you 2TB usable space, assuming the other drives add up to at least 4TB to match it.) <>
Re: very poor performance / a lot of writes to disk with space_cache (but not with space_cache=v2)
On 2018-09-19 04:43 AM, Tomasz Chmielewski wrote: > I have a mysql slave which writes to a RAID-1 btrfs filesystem (with > 4.17.14 kernel) on 3 x ~1.9 TB SSD disks; filesystem is around 40% full. > > The slave receives around 0.5-1 MB/s of data from the master over the > network, which is then saved to MySQL's relay log and executed. In ideal > conditions (i.e. no filesystem overhead) we should expect some 1-3 MB/s > of data written to disk. > > MySQL directory and files in it are chattr +C (since the directory was > created, so all files are really +C); there are no snapshots. Not related to the issue you are reporting, but I thought it's worth mentioning, (since not many do), that using chattr +C on a BTRFS Raid 1 is a dangerous thing. without COW, the 2 copies are never synchronized, even if a scrub is executed. So any kind of unclean shutdown that interrupts writes (not to mention the extreme of a temporarily disconnected drive.) will result in files that are inconsistent. (ie, depending on which disk happens to read at the time, the data will be different on each read.) <>
Re: Re-mounting removable btrfs on different device
On 2018-09-06 11:32 PM, Duncan wrote: > Without the mentioned patches, the only way (other than reboot) is to > remove and reinsert the btrfs kernel module (assuming it's a module, not > built-in), thus forcing it to forget state. > > Of course if other critical mounted filesystems (such as root) are btrfs, > or if btrfs is a kernel-built-in not a module and thus can't be removed, > the above doesn't work and a reboot is necessary. Thus the need for > those patches you mentioned. > Good to know, thanks. <>
Re-mounting removable btrfs on different device
I'm trying to use a BTRFS filesystem on a removable drive. The first drive drive was added to the system, it was /dev/sdb Files were added and device unmounted without error. But when I re-attach the drive, it becomes /dev/sdg (kernel is fussy about re-using /dev/sdb). btrfs fi show: output: Label: 'Archive 01' uuid: 221222e7-70e7-4d67-9aca-42eb134e2041 Total devices 1 FS bytes used 515.40GiB devid1 size 931.51GiB used 522.02GiB path /dev/sdg1 This causes BTRFS to fail mounting the device with the following errors: sd 3:0:0:0: [sdg] Attached SCSI disk blk_partition_remap: fail for partition 1 BTRFS error (device sdb1): bdev /dev/sdg1 errs: wr 0, rd 1, flush 0, corrupt 0, gen 0 blk_partition_remap: fail for partition 1 BTRFS error (device sdb1): bdev /dev/sdg1 errs: wr 0, rd 2, flush 0, corrupt 0, gen 0 blk_partition_remap: fail for partition 1 BTRFS error (device sdb1): bdev /dev/sdg1 errs: wr 0, rd 3, flush 0, corrupt 0, gen 0 blk_partition_remap: fail for partition 1 BTRFS error (device sdb1): bdev /dev/sdg1 errs: wr 0, rd 4, flush 0, corrupt 0, gen 0 ata4: exception Emask 0x50 SAct 0x0 SErr 0x4090800 action 0xe frozen ata4: irq_stat 0x00400040, connection status changed ata4: SError: { HostInt PHYRdyChg 10B8B DevExch } I've seen some patches on this list to add a btrfs device forget option, which I presume would help with a situation like this. Is there a way to do that manually? <>
Re: btrfs fi du unreliable?
On 2018-08-29 08:00 AM, Jorge Bastos wrote: > > Look for example at snapshots from July 21st and 22nd, total used > space went from 199 to 277GiB, this is mostly from new added files, as > I confirmed from browsing those snapshots, there were no changes on > the 23rd, and a lot of files were deleted before the 24th, so should't > there be about 80GiB of exclusive content for the 22nd, or am I > misunderstanding how this is reported? ? Those were new files only, > never existed on previous snapshots, If I delete both snapshots from > the 22nd and the 23rd I expect to get about 80GiB freed space. Exclusive means... exclusive... to that 1 snapshop/subvolume. If the data also exists on the 23'rd snapshot, it's not exclusive. If you wanted to report how much data is exclusive to a group of snapshots, (say, July 22 *and* 23rd). you would have to make them members of a parent qgroup, then, you could see the exclusive value of the whole group. <>
BTRFS and databases
On 2018-08-02 03:07 AM, Qu Wenruo wrote: > For data, since we have cow (along with csum), it should be no problem > to recover. > > And since datacow is used, transaction on each device should be atomic, > thus we should be able to handle one-time device out-of-sync case. > (For multiple out-of-sync events, we don't have any good way though). > > Or did I miss something from previous discussion? As far as I know, that is indeed correct and works very well. The question was specifically about using nodatacow for databases,, and that's the question I was responding too. In the current state, I do no believe btrfs nodatacow is in any way appropriate for databases/vm hosting when combined with multi-device. signature.asc Description: OpenPGP digital signature
Re: BTRFS and databases
On 2018-07-31 11:45 PM, MegaBrutal wrote: > I know that with nodatacow, I take away most of the benefits of BTRFS > (those are actually hurting database performance – the exact CoW > nature that is elsewhere a blessing, with databases it's a drawback). > But are there any advantages of still sticking to BTRFS for a database > albeit CoW is disabled, or should I just return to the old and > reliable ext4 for those applications? > Be very careful about nodatacow and btrfs 'raid'. BTRFS has no data synching mechanism for raid, so if your mirrors end up different somehow, your Array is going to be inconsistent. <>
Re: Major design flaw with BTRFS Raid, temporary device drop will corrupt nodatacow files
> Acceptable, but not really apply to software based RAID1. > Which completely disregards the minor detail that all the software Raid's I know of can handle exactly this kind of situation without loosing or corrupting a single byte of data, (Errors on the remaining hard drive notwithstanding.) Exactly what methods they employ to do so I'm not an expert at,, but it *does* work, contrary to your repeated assertions otherwise. In any case, thank you the for the patch you wrote. I will, however, propose a different solution. Given the reliance of BTRFS on csum, and the lack of any resynchronization, (no matter how the drives got out of sync, doesn't matter.). I think NoDataCow should just be ignored in the case of RAID, just like the data blocks would get copied if there was a snapshot. In the current implementation of RAID on btrfs, RAID and nodatacow are effectively mutually exclusive. Consider the kinds of use cases nodatacow is usually recommended for, VM images and databases. Even though those files should have their own mechanisms for dealing with incomplete writes, and data verification, BTRFS RAID creates a unique situation where parts of the file can be inconsistent, with different data being read depending on which device is doing the reading. Regardless of which method, short term and long term, developers choose to address this, this next part I have stress I consider very important. The status page really needs to be updated to reflect this gotchya. It *will* bite people in ways they do not expect, and disastrously. <> signature.asc Description: OpenPGP digital signature
[PATCH RFC] btrfs: Do extra device generation check at mount time
On 2018-06-28 10:36 AM, Adam Borowski wrote: > > Uhm, that'd be a nasty regression for the regular (no-nodatacow) case. > The vast majority of data is fine, and extents that have been written to > while a device is missing will be either placed elsewhere (if the filesystem > knew it was degraded) or read one of the copies to notice a wrong checksum > and automatically recover (if the device was still falsely believed to be > good at write time). > > We currently don't have selective scrub yet so resyncing such single-copy That might not be the case. though I don't really know the numbers myself and repeating this is hearsay: crc32 is not infallible. 1 in so many billion errors will be undetected by it. In the case of a dropped device with write failures, when you *know* the data supposedly written to the disk is bad, re-synching from believed good copy (so long as it passes checksum verification, of course), is the only way to be certain that the data is good. Otherwise, you can be left with a Schroedinger's bit somewhere, (It's not 0 or 1, but both, depending on which device the filesystem is reading from at the time.) -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Major design flaw with BTRFS Raid, temporary device drop will corrupt nodatacow files
On 2018-06-28 10:17 AM, Chris Murphy wrote: > 2. The new data goes in a single chunk; even if the user does a manual > balance (resync) their data isn't replicated. They must know to do a > -dconvert balance to replicate the new data. Again this is a net worse > behavior than mdadm out of the box, putting user data at risk. I'm not sure this is the case. Even though writes failed to the disconnected device, btrfs seemed to keep on going as though it *were*. When the array was re-mounted with both devices, (never mounted as degraded), and scrub was run, scrub took a *long* time fixing errors, at a whopping 3MB/s, and reported having fixed millions of them. <>
Re: Major design flaw with BTRFS Raid, temporary device drop will corrupt nodatacow files
On Wed, Jun 27, 2018, at 10:55 PM, Qu Wenruo wrote: > > Please get yourself clear of what other raid1 is doing. A drive failure, where the drive is still there when the computer reboots, is a situation that *any* raid 1, (or for that matter, raid 5, raid 6, anything but raid 0) will recover from perfectly without raising a sweat. Some will rebuild the array automatically, others will automatically kick out the misbehaving drive. *none* of them will take back the the drive with old data and start commingling that data with good copy.)\ This behaviour from BTRFS is completely abnormal.. and defeats even the most basic expectations of RAID. I'm not the one who has to clear his expectations here. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Major design flaw with BTRFS Raid, temporary device drop will corrupt nodatacow files
On 2018-06-27 09:58 PM, Qu Wenruo wrote: > > > On 2018年06月28日 09:42, Remi Gauvin wrote: >> There seems to be a major design flaw with BTRFS that needs to be better >> documented, to avoid massive data loss. >> >> Tested with Raid 1 on Ubuntu Kernel 4.15 >> >> The use case being tested was a Virtualbox VDI file created with >> NODATACOW attribute, (as is often suggested, due to the painful >> performance penalty of COW on these files.) > > NODATACOW implies NODATASUM. > yes yes,, none of which changes the simple fact that if you use this option, which is often touted as outright necessary for some types of files, BTRFS raid is worse than useless,, not only will it not protect your data at all from bitrot, (as expected), it will actively go out of it's way to corrupt it! This is not expected behaviour from 'Raid', and I despair that seems to be something that I have to explain! signature.asc Description: OpenPGP digital signature
Major design flaw with BTRFS Raid, temporary device drop will corrupt nodatacow files
There seems to be a major design flaw with BTRFS that needs to be better documented, to avoid massive data loss. Tested with Raid 1 on Ubuntu Kernel 4.15 The use case being tested was a Virtualbox VDI file created with NODATACOW attribute, (as is often suggested, due to the painful performance penalty of COW on these files.) However, if a device is temporarily dropped (this in case, tested by disconnecting drives.) and re-connects automatically next boot, BTRFS does not in any way synchronize the VDI file, or have any means to know that one of copy is out of date and bad. The result of trying to use said VDI file is interestingly insane. Scrub did not do anything to rectify the situation. <>