[PATCH 1/3] btrfs-progs: test: umount if confirmation failed
When a check in check_inode() failed, the test should umount test target file system. This commit add clean up umount line in failure path. Signed-off-by: Naohiro Aota--- tests/fsck-tests/012-leaf-corruption/test.sh | 4 1 file changed, 4 insertions(+) diff --git a/tests/fsck-tests/012-leaf-corruption/test.sh b/tests/fsck-tests/012-leaf-corruption/test.sh index 6e23145..bfdd0ea 100755 --- a/tests/fsck-tests/012-leaf-corruption/test.sh +++ b/tests/fsck-tests/012-leaf-corruption/test.sh @@ -57,6 +57,7 @@ check_inode() # Check whether the inode exists exists=$($SUDO_HELPER find $path -inum $ino) if [ -z "$exists" ]; then + $SUDO_HELPER umount $TEST_MNT _fail "inode $ino not recovered correctly" fi @@ -64,17 +65,20 @@ check_inode() found_mode=$(printf "%o" 0x$($SUDO_HELPER stat $exists -c %f)) if [ $found_mode -ne $mode ]; then echo "$found_mode" + $SUDO_HELPER umount $TEST_MNT _fail "inode $ino modes not recovered" fi # Check inode size found_size=$($SUDO_HELPER stat $exists -c %s) if [ $mode -ne 41700 -a $found_size -ne $size ]; then + $SUDO_HELPER umount $TEST_MNT _fail "inode $ino size not recovered correctly" fi # Check inode name if [ "$(basename $exists)" != "$name" ]; then + $SUDO_HELPER umount $TEST_MNT _fail "inode $ino name not recovered correctly" else return 0 -- 2.6.3 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/3] btrfs-progs: properly reset nlink of multi-linked file
If a file is linked from more than one directory and only one of the links is corrupted, btrfs check dose not reset the nlink properly. Actually it can go into infinite loop to link the broken file into lost+found. This patch fix two part of the code. The first one delay the freeing valid (no error, found inode ref, directory index, and directory item) backrefs. Freeing valid backrefs earier prevent reset_nlink() to add back all valid links. The second fix is obvious: passing `ref_type' to btrfs_add_link() is just wrong. It should be `filetype' instead. The current code can break all valid file links. Signed-off-by: Naohiro Aota--- cmds-check.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/cmds-check.c b/cmds-check.c index 6a0b50a..11ff3fe 100644 --- a/cmds-check.c +++ b/cmds-check.c @@ -810,7 +810,8 @@ static void maybe_free_inode_rec(struct cache_tree *inode_cache, if (backref->found_dir_item && backref->found_dir_index) { if (backref->filetype != filetype) backref->errors |= REF_ERR_FILETYPE_UNMATCH; - if (!backref->errors && backref->found_inode_ref) { + if (!backref->errors && backref->found_inode_ref && + rec->nlink == rec->found_link) { list_del(>list); free(backref); } @@ -2392,7 +2393,7 @@ static int reset_nlink(struct btrfs_trans_handle *trans, list_for_each_entry(backref, >backrefs, list) { ret = btrfs_add_link(trans, root, rec->ino, backref->dir, backref->name, backref->namelen, -backref->ref_type, >index, 1); +backref->filetype, >index, 1); if (ret < 0) goto out; } -- 2.6.3 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Linux 4.3 call traces for defective disk
I have a defect disk which produced kernel backtraces like (see below). Are you interested in them, what else do you need to know, do you prefer things inline or as attachments? unmodified Linux 4.3 tainted with nvidia driver disk:WDC WD2002FYPS-02W3B0 196 Reallocated_Event_Count 0x0032 200 200 000Old_age Always - 0 197 Current_Pending_Sector 0x0032 200 200 000Old_age Always - 3 198 Offline_Uncorrectable 0x0030 200 200 000Old_age Offline - 2 199 UDMA_CRC_Error_Count0x0032 200 200 000Old_age Always - 0 200 Multi_Zone_Error_Rate 0x0008 200 200 000Old_age Offline - 1 I mounted the disk normally (no RAID) and copied files from it. I know I should have mounted readonly ... Meanwhile the disk data is really corrupt, even after having it cool down overnight. btrfs check -sX fails for X in 0..5. So since mounting is no longer possible, I cannot produce new call traces. smartctl still says PASSED. The data loss is no problem. Dec 4 08:48:08 s5 kernel: [ 114.814022] ata5.00: irq_stat 0x4008 Dec 4 08:48:08 s5 kernel: [ 114.814024] ata5.00: failed command: READ FPDMA QUEUED Dec 4 08:48:08 s5 kernel: [ 114.814028] ata5.00: cmd 60/08:60:07:8e:03/00:00:00:00:00/40 tag 12 ncq 4096 in Dec 4 08:48:08 s5 kernel: [ 114.814028] res 41/40:00:0e:8e:03/00:00:00:00:00/40 Emask 0x409 (media error) Dec 4 08:48:08 s5 kernel: [ 114.814029] ata5.00: status: { DRDY ERR } Dec 4 08:48:08 s5 kernel: [ 114.814030] ata5.00: error: { UNC } Dec 4 08:48:08 s5 kernel: [ 114.822313] ata5.00: configured for UDMA/133 Dec 4 08:48:08 s5 kernel: [ 114.822322] sd 4:0:0:0: [sde] tag#12 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 Dec 4 08:48:08 s5 kernel: [ 114.822324] sd 4:0:0:0: [sde] tag#12 Sense Key : 0x3 [current] [descriptor] Dec 4 08:48:08 s5 kernel: [ 114.822326] sd 4:0:0:0: [sde] tag#12 ASC=0x11 ASCQ=0x4 Dec 4 08:48:08 s5 kernel: [ 114.822328] sd 4:0:0:0: [sde] tag#12 CDB: opcode=0x28 28 00 00 03 8e 07 00 00 08 00 Dec 4 08:48:08 s5 kernel: [ 114.822329] blk_update_request: I/O error, dev sde, sector 232974 Dec 4 08:48:08 s5 kernel: [ 114.822340] ata5: EH complete Dec 4 08:48:08 s5 kernel: [ 114.822360] BTRFS: failed to read tree root on sde1 And this is one of the six backtrace I got: (BTW all six are diffent) Dec 3 11:39:45 s5 kernel: [ 8393.928639] ata5: link is slow to respond, please be patient (ready=0) Dec 3 11:39:46 s5 kernel: [ 8395.160246] ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Dec 3 11:39:46 s5 kernel: [ 8395.164216] ata5.00: ACPI cmd ef/10:06:00:00:00:00 (SET FEATURES) succeeded Dec 3 11:39:46 s5 kernel: [ 8395.164219] ata5.00: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out Dec 3 11:39:46 s5 kernel: [ 8395.164220] ata5.00: ACPI cmd b1/c1:00:00:00:00:00 (DEVICE CONFIGURATION OVERLAY) filtered out Dec 3 11:39:46 s5 kernel: [ 8395.185378] ata5.00: ACPI cmd ef/10:06:00:00:00:00 (SET FEATURES) succeeded Dec 3 11:39:46 s5 kernel: [ 8395.185381] ata5.00: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out Dec 3 11:39:46 s5 kernel: [ 8395.185383] ata5.00: ACPI cmd b1/c1:00:00:00:00:00 (DEVICE CONFIGURATION OVERLAY) filtered out Dec 3 11:39:46 s5 kernel: [ 8395.190195] ata5.00: configured for UDMA/133 Dec 3 11:39:46 s5 kernel: [ 8395.204218] ata5: EH complete Dec 3 11:39:57 s5 kernel: [ 8406.044742] ata5.00: exception Emask 0x50 SAct 0x0 SErr 0x4090800 action 0xe frozen Dec 3 11:39:57 s5 kernel: [ 8406.044746] ata5.00: irq_stat 0x00400040, connection status changed Dec 3 11:39:57 s5 kernel: [ 8406.044747] ata5: SError: { HostInt PHYRdyChg 10B8B DevExch } Dec 3 11:39:57 s5 kernel: [ 8406.044749] ata5.00: failed command: FLUSH CACHE EXT Dec 3 11:39:57 s5 kernel: [ 8406.044752] ata5.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 2 Dec 3 11:39:57 s5 kernel: [ 8406.044752] res 40/00:0c:bf:8f:03/00:00:00:00:00/40 Emask 0x50 (ATA bus error) Dec 3 11:39:57 s5 kernel: [ 8406.044753] ata5.00: status: { DRDY } Dec 3 11:39:57 s5 kernel: [ 8406.044756] ata5: hard resetting link Dec 3 11:40:03 s5 kernel: [ 8411.806856] ata5: link is slow to respond, please be patient (ready=0) Dec 3 11:40:04 s5 kernel: [ 8413.038465] ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Dec 3 11:40:04 s5 kernel: [ 8413.043051] ata5.00: ACPI cmd ef/10:06:00:00:00:00 (SET FEATURES) succeeded Dec 3 11:40:04 s5 kernel: [ 8413.043054] ata5.00: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out Dec 3 11:40:04 s5 kernel: [ 8413.043056] ata5.00: ACPI cmd b1/c1:00:00:00:00:00 (DEVICE CONFIGURATION OVERLAY) filtered out Dec 3 11:40:04 s5 kernel: [ 8413.064667] ata5.00: ACPI cmd ef/10:06:00:00:00:00 (SET FEATURES) succeeded Dec 3 11:40:04 s5 kernel: [ 8413.064670] ata5.00: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out Dec 3 11:40:04 s5
[PATCH 0/3] btrfs-progs: fix file restore to lost+found bug
This series address an issue of btrfsck to restore infinite number of same file into `lost+found' directory. The issue occur on a file which is linked from two different directory A and B. If links from dir A is corrupted and links from dir B is kept valid, btrfsck won't stop creating a file in lost+found like this: - Moving file 'file.del.51' to 'lost+found' dir since it has no valid backref Fixed the nlink of inode 1876 Trying to rebuild inode:1877 Moving file 'del' to 'lost+found' dir since it has no valid backref Fixed the nlink of inode 1877 Can't get file name for inode 1876, using '1876' as fallback Moving file '1876' to 'lost+found' dir since it has no valid backref Fixed the nlink of inode 1876 Can't get file name for inode 1876, using '1876' as fallback Moving file '1876.1876' to 'lost+found' dir since it has no valid backref Fixed the nlink of inode 1876 (snip) Moving file '1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876' to 'lost+found' dir since it has no valid backref Fixed the nlink of inode 1876 Can't get file name for inode 1876, using '1876' as fallback Can't get file name for inode 1876, using '1876' as fallback Can't get file name for inode 1876, using '1876' as fallback - The problem is early release of inode backrefs. The release prevents `reset_nlink()' to add back valid backrefs to an inode. In the result, the following results occur: 0. btrfsck scan a FS tree 1. It finds valid links and invalid links (some links are lost) 2. All valid links are released 3. btrfsck detects found_links != nlink 4. reset_nlink() reset nlink to 0 5. No valid links are restored (thus still nlink = 0) 6. The file is restored to lost+found since nlink == 0 (now, nlink = 1) 7. btrfsck rescan the FS tree 8. It finds `found_links' = #valid_links+1 (in lost+found) and nlink = 1 9. again all valid links are lost, and restore to lost+found The first patch add clean up code to the test. It umount test directory on failure path. The second patch fix the above problem. And the last patch extend the test to check a case of multiple-linked file corruption. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Bug/regression: Read-only mount not read-only
On 2015-12-02 18:40, Qu Wenruo wrote: On 12/03/2015 06:48 AM, Eric Sandeen wrote: On 12/2/15 11:48 AM, Austin S Hemmelgarn wrote: On a side note, do either XFS or ext4 support removing the norecovery option from the mount flags through mount -o remount? Even if they don't, that might be a nice feature to have in BTRFS if we can safely support it. It's not remountable today on xfs: /* ro -> rw */ if ((mp->m_flags & XFS_MOUNT_RDONLY) && !(*flags & MS_RDONLY)) { if (mp->m_flags & XFS_MOUNT_NORECOVERY) { xfs_warn(mp, "ro->rw transition prohibited on norecovery mount"); return -EINVAL; } not sure about ext4. -Eric Not remountable is very good to implement it. Makes things super easy to do. Or we will need to add log replay for remount time. I'd like to implement it first for non-remountable case as a try. And for the option name, I prefer something like "notreereplay", but I don't consider it the best one yet I entirely understand wanting a simple implementation first, my only point is that it would be a potentially useful feature to have if we could sanely implement it. smime.p7s Description: S/MIME Cryptographic Signature
3.16.0 Debian kernel hang
One of my test laptops started hanging on mounting the root filesystem. I think that it had experience an unexpected power outage prior to that which may have caused corruption. When I tried to mount the root filesystem the mount process would stick in D state, there would be no disk IO, and the computer would get hot - presumably due to kernel CPU use even though "top" didn't seem to indicate that. When I mounted the filesystem with a 4.2.0 kernel it said "The free space cache file (1103101952) is invalid, skip it" and then things worked. Now that the machine is running 4.2.0 everything is fine. I know that there are no plans to backport things to 3.16 and I don't think the Debian people are going to be very interested in this. So this message is a FYI for users, maybe consider not using the Debian/Jessie kernel for BTRFS systems. -- My Main Blog http://etbe.coker.com.au/ My Documents Bloghttp://doc.coker.com.au/ -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Subvolume UUID, data corruption?
Hello As we know, two file systems with the same UUID (like reported by eg. "blkid") are problematic, especially if both are mounted at the same time it leads to data corruption. So, copying a BTRFS partition with eg. dd to another and use it immediately is bad. To prevent this, "btrfstune -u /dev/sdaX" changes the UUID of the given partition. However, BTRFS subvolumes have their own UUID, which can be viewed eg. with "btrfs sub list -u /mountpoint". This UUIDs are not changed by the command above, and apparently there is no other way to do this. My question is: Is this a problem similar to the main UUID? Can mounting two BTRFS partitions with equal subvolume UUIDs (but different main UUID) can cause data corruption? (...well, and maybe someone could explain me what these subvol UUIDs are for in the first place. Subvolumes already have an unique number, and from user p.o.v, there isn't anything where the subvol UUIDs can be used at all (?)) Thank you PS: Apologies for sending a second mail, somehow my first try didn't contain any text -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs crashing the kernel with Seagate 8TB SMR drives.
As Chris mentioned, check out the Bug report here: https://bugzilla.kernel.org/show_bug.cgi?id=93581 I have a 8TB SMR Drive and the kernel was reporting drive errors. Switching to Kernel 3.16 (Standard Debian Jessie kernel) fixed it for me ( for the moment). >From what I read in that kernel bug report. The patch has been submitted for kernel 4.4. On 03.12.2015 19:07, Codebird wrote: > I've got a nice bug for you - because I can offer you what everyone > likes to see, a precise error message. > > I've got a btrfs filesystem spread over six devices, RAID1 mode. Four > of these are Seagate 8TB archive drives - those SMR ones that a few > others have reported failing when used with btrfs. I've had that issue > too, and I just can't explain why, other than to say that it only > occurs when using them on my mainboard SATA ports, not via USB dock. > But that's not what I'm reporting - that's just the source of the > problem that causes the crash I am reporting. > > The crash occurs when scrubbing, after some time and some terabytes - > or possibly just when reading a certain place, I'm not sure - and it > gives this helpful error left on the screen along with a system so > unresponsive numlock won't flash: > > BTRFS: Error (device sdg1) in __btrfs_free_extent:6360: errno=-5 IO > failure > BTRFS: Error (device sdg1) in __btrfs_free_extent:6360: errno=-5 IO > failure > BTRFS: Error (device sdg1) in btrfs_run_delayed_refs:2851: errno=-5 > IO failure > BTRFS: Error (device sdg1) in btrfs_run_delayed_refs:2851: errno=-5 > IO failure > BTRFS: Error (device sdg1) in btrfs_run_delayed_refs:2851: errno=-5 > IO failure > BTRFS: assertion failed: > f(fs_info->sb->s_flags & MS > ---[ cut here ] > kernel BUG at ../fs/btrfs/ctree.h:4057! > > Not sure if some of those 5 might be 6, as I was in a hurry to get it > back up both times and just got a blurry photo. But it looks to me > like there might be a chunk of code that doesn't handle a hardware > fault - rather than cleanly return an error it's causing the kernel to > hang entirely. I've managed to get this to happen twice now, so it's > certainly something worth looking into. This is on SUSE tumbleweed, > with kernel 4.3.0-2-default. > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: compression disk space saving - what are your results?
On 2015-12-03 01:29, Duncan wrote: Austin S Hemmelgarn posted on Wed, 02 Dec 2015 09:39:08 -0500 as excerpted: On 2015-12-02 09:03, Imran Geriskovan wrote: What are your disk space savings when using btrfs with compression? [Some] posters have reported that for mostly text, compress didn't give them expected compression results and they needed to use compress-force. "compress-force" option compresses regardless of the "compressibility" of the file. "compress" option makes some inference about the "compressibility" and decides to compress or not. I wonder how that inference is done? Can anyone provide some pseudo code for it? I'm not certain how BTRFS does it, but my guess would be trying to compress the block, then storing the uncompressed version if the compressed one is bigger. No pseudocode as I'm not a dev and wouldn't want to give the wrong impression, but as I believe I replied recently in another thread, based on comments the devs have made... With compress, btrfs does a(n intended to be fast) trial compression of the first 128 KiB block or two and uses the result of that to decide whether to compress the entire file. Compress-force simply bypasses that first decision point, processing the file as if the test always succeeded and compression was chosen. If the decision to compress is made, the file is (evidently, again, not a dev, but filefrag results support) compressed a 128 KiB block at a time with the resulting size compared against the uncompressed version, with the smaller version stored. (Filefrag doesn't understand btrfs compression and reports individual extents for each 128 KiB compression block, if compressed. However, for many files processed with compress-force, filefrag doesn't report the expected size/128-KiB extents, but rather something lower. If filefrag -v is used, details of each "extent" are listed, and some show up as multiples of 128 KiB, indicating runs of uncompressable blocks that unlike actually compressed blocks, filefrag can and does report correctly as single extents. The conclusion is thus as above, that btrfs is testing the compression result of each block, and not compressing if the "compression" ends up being negative, that is, if the "compressed" size is larger than the uncompressed size.) On a side note, I really wish BTRFS would just add LZ4 support. It's a lot more deterministic WRT decompression time than LZO, gets a similar compression ratio, and runs faster on most processors for both compression and decompression. There were patches (at least RFC level, IIRC) floating around years ago to add lz4... I wonder what happened to them? My impression was that a large deployment somewhere may actually be running them as well, making them well tested (and obviously well beyond preliminary RFC level) by now, altho that impression could well be wrong. Hmm, I'll have to see if I can find those and rebase them. IIRC, the argument against adding it was 'but we already have a fast compression algorithm!', which in turn says to me they didn't try to sell it on the most significant parts, namely that it's faster at decompression than LZO (even when you use the lz4hc variant, which takes longer to compress to give a (usually) better compression ratio, but decompresses just as fast as regular lz4), and the timings are a lot more deterministic (which is really important if you're doing real-time stuff). smime.p7s Description: S/MIME Cryptographic Signature
Re: 3.16.0 Debian kernel hang
On 2015-12-04 05:00, Russell Coker wrote: One of my test laptops started hanging on mounting the root filesystem. I think that it had experience an unexpected power outage prior to that which may have caused corruption. When I tried to mount the root filesystem the mount process would stick in D state, there would be no disk IO, and the computer would get hot - presumably due to kernel CPU use even though "top" didn't seem to indicate that. When I mounted the filesystem with a 4.2.0 kernel it said "The free space cache file (1103101952) is invalid, skip it" and then things worked. Now that the machine is running 4.2.0 everything is fine. I know that there are no plans to backport things to 3.16 and I don't think the Debian people are going to be very interested in this. So this message is a FYI for users, maybe consider not using the Debian/Jessie kernel for BTRFS systems. I'd suggest extending that suggestion to: If you're not using an Enterprise distro (RHEL, SLES, CentOS, OEL), then you should probably be building your own kernel, ideally using upstream sources. Ubuntu is notorious for picking 'stable' kernels that then fail to be marked by kernel.org as LTS, Debian picks kernels that are multiple versions old by the time they make a release, and I've heard similar from other non-enterprise distros that don't inherently make you build your own kernel. Even among ones that you have to build the kernel yourself anyway, there are issues (Gentoo for example doesn't often mark new kernels as stable, even when they are perfectly usable for pretty much everyone). smime.p7s Description: S/MIME Cryptographic Signature
Re: 3.16.0 Debian kernel hang
On Sat, 5 Dec 2015 12:53:07 AM Austin S Hemmelgarn wrote: > > The only reason I'm not running Unstable kernels on my Debian systems is > > because I run some Xen servers and upgrading Xen is problemmatic. Linode > > is moving from Xen to KVM so I guess I should consider doing the > > same. If I migrate my Xen servers to KVM I can use newer kernels with > > less risk. > > That's interesting, that must be something with how they do kernel > development in Debian, because I've never had any issues upgrading > either Xen or Linux on any of the systems I've run Xen on, and I > directly track mainline (with a small number of patches) for Linux, and > stay relatively close to mainline with Xen (Gentoo doesn't have all that > many patches on top of the regular release for Xen, aside from XSA > patches). I don't think that Debian does anything wrong in this regard. It's just that my experience of Xen is that it is fragile at the best of times. The fact that Red Hat packaged the Xen kernel in the Linux kernel package is a major indication of Xen problems IMHO, the concept of Xen is that it shouldn't be tied to a Linux kernel. If you haven't had Xen issues then I think you have been lucky. -- My Main Blog http://etbe.coker.com.au/ My Documents Bloghttp://doc.coker.com.au/ -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: compression disk space saving - what are your results?
On 2015-12-03 07:09, Imran Geriskovan wrote: On a side note, I really wish BTRFS would just add LZ4 support. It's a lot more deterministic WRT decompression time than LZO, gets a similar compression ratio, and runs faster on most processors for both compression and decompression. Relative ratios according to http://catchchallenger.first-world.info//wiki/Quick_Benchmark:_Gzip_vs_Bzip2_vs_LZMA_vs_XZ_vs_LZ4_vs_LZO Compressed size gzip (1) - lzo (1.4) - lz4 (1.4) Compression Time gzip (5) - lzo (1) - lz4 (0.8) Decompression Time gzip (9) - lzo (4) - lz4 (1) Compression Memory gzip (1) - lzo (2) - lz4 (20) Decompression Memory gzip (1) - lzo (2) - lz4 (130). Yes 130! not a typo. But there is a note: Note: lz4 it's the program using this size, the code for internal lz4 use very less memory. However, I could not find any better apples to apples comparison. If lz4's real memory consumption is in orders of lzo, than it looks good. AFAICT, it's similar memory consumption. I did some tests a while back comparing the options for kernel image compression using a VM, and one of the things I tested (although I can't for the life of me remember how exactly except that it involved using QEMU hooked up to GDB) was run-time decompressor footprint. LZO really should have a smaller memory footprint too, it's just that lzop needs to handle almost a dozen different LZO compression formats. smime.p7s Description: S/MIME Cryptographic Signature
Re: Bug/regression: Read-only mount not read-only
On 2015-12-02 18:51, Hugo Mills wrote: On Thu, Dec 03, 2015 at 07:40:08AM +0800, Qu Wenruo wrote: On 12/03/2015 06:48 AM, Eric Sandeen wrote: On 12/2/15 11:48 AM, Austin S Hemmelgarn wrote: On a side note, do either XFS or ext4 support removing the norecovery option from the mount flags through mount -o remount? Even if they don't, that might be a nice feature to have in BTRFS if we can safely support it. It's not remountable today on xfs: /* ro -> rw */ if ((mp->m_flags & XFS_MOUNT_RDONLY) && !(*flags & MS_RDONLY)) { if (mp->m_flags & XFS_MOUNT_NORECOVERY) { xfs_warn(mp, "ro->rw transition prohibited on norecovery mount"); return -EINVAL; } not sure about ext4. -Eric Not remountable is very good to implement it. Makes things super easy to do. Or we will need to add log replay for remount time. I'd like to implement it first for non-remountable case as a try. And for the option name, I prefer something like "notreereplay", but I don't consider it the best one yet Thinking out loud... no-log-replay, no-log, hard-ro, ro-log, really-read-only-i-mean-it-this-time-honest-guvnor Delete hyphens at your pleasure. Personally, I think no-log-replay (with or without hyphens) is the most concise option name. With something like this, it should be as clear as possible what is being done. smime.p7s Description: S/MIME Cryptographic Signature
Re: BUG: failure at fs/btrfs/ctree.h:337/btrfs_chunk_item_size()!
On Fri, Dec 04, 2015 at 09:21:59AM +0800, Qu Wenruo wrote: > > We do have the alignment check in kernel, but it's in the early phase > > where we don't know if nodesize is reliable and print only a warning. > > > This can be enhanced by the following method: At minimum, we can promote the 4k alignment checks in btrfs_check_super_valid from a warning to an error. The blocks must be 4k aligned, regardless of sectorsize or nodesize. > 1) Check sectorsize first > Only several sector size is valid for current btrfs: > 4K, 8K, 16K, 32K, 64K > Just five numbers, quite easy to check. The sectorsize must be PAGE_SIZE at the moment. This will change with Chandan's patchset though. > Or if anyone is going to extend supported sectorsize, we can change > the check to if the number is power of 2 starting from 4K. > > 2) Check nodesize/leafsize then > It should be aligned to sectorsize. This particular check is missing but is implicit because of the sectorsize == PAGE_SIZE restriction. > And nodesize must match with leafsize. > Currently, it's done out of check_super_valid(), we can integrate it. Yeah it's done, then I don't see why we should add it agian. > 3) Check all super root bytenr against *sectorsize* > Yeah, not nodesize. > As some old bad convert will cause metadata extent unaligned to > nodesize(just before my convert rework patch), but only aligned to > sectorsize. > So only check alignment of sectorsize. While the real check should be against the sectorsize, at the moment I think it's covered by the 4k checks anyway. I understand why we can't use the nodesize. So, if we do the warning -> error, we're fine for now. Some of the checks you suggest would be good to merge when the subpage blocksize patchset is merged. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: disable online scrub repair on ro cases
Hi Liu, [auto build test ERROR on btrfs/next] [also build test ERROR on v4.4-rc3 next-20151203] url: https://github.com/0day-ci/linux/commits/Liu-Bo/Btrfs-disable-online-scrub-repair-on-ro-cases/20151204-205115 base: https://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs.git next config: powerpc-defconfig (attached as .config) reproduce: wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # save the attached .config to linux build tree make.cross ARCH=powerpc All errors (new ones prefixed by >>): fs/btrfs/scrub.c: In function 'scrub_fixup_readpage': >> fs/btrfs/scrub.c:703:10: error: invalid type argument of '->' (have 'u64 >> {aka long long unsigned int}') if (root->fs_info->sb->s_flags & MS_RDONLY) ^ vim +703 fs/btrfs/scrub.c 697 struct inode *inode = NULL; 698 struct btrfs_fs_info *fs_info; 699 u64 end = offset + PAGE_SIZE - 1; 700 struct btrfs_root *local_root; 701 int srcu_index; 702 > 703 if (root->fs_info->sb->s_flags & MS_RDONLY) 704 return -EROFS; 705 706 key.objectid = root; --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation .config.gz Description: Binary data
Re: 3.16.0 Debian kernel hang
On Sat, 5 Dec 2015 12:08:58 AM Austin S Hemmelgarn wrote: > > I know that there are no plans to backport things to 3.16 and I don't > > think the Debian people are going to be very interested in this. So > > this message is a FYI for users, maybe consider not using the > > Debian/Jessie kernel for BTRFS systems. > > I'd suggest extending that suggestion to: > If you're not using an Enterprise distro (RHEL, SLES, CentOS, OEL), then > you should probably be building your own kernel, ideally using upstream > sources. There are lots of ways of dealing with this. Debian development doesn't stop. Anyone who is running a Jessie system can easily run a kernel from Testing or Unstable (which really isn't particularly unstable). It's generally expected that Debian user-space will work with a kernel from +- one release of Debian. Also every time I've tried it Debian has worked well with a CentOS kernel of a similar version. The only reason I'm not running Unstable kernels on my Debian systems is because I run some Xen servers and upgrading Xen is problemmatic. Linode is moving from Xen to KVM so I guess I should consider doing the same. If I migrate my Xen servers to KVM I can use newer kernels with less risk. -- My Main Blog http://etbe.coker.com.au/ My Documents Bloghttp://doc.coker.com.au/ -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 0/5] Make btrfs-progs really compatible with any kernel version
On Fri, Dec 04, 2015 at 10:08:35AM +0800, Qu Wenruo wrote: > Liu Bo wrote on 2015/12/03 17:44 -0800: > > On Mon, Nov 23, 2015 at 06:56:09PM +0100, David Sterba wrote: > >> On Mon, Nov 23, 2015 at 08:56:13PM +0800, Anand Jain wrote: > >>> Btrfs-progs is a tool for the btrfs kernel and we hope latest btrfs-progs > >>> be compatible w any set of older/newer kernels. > >>> > >>> So far mkfs.btrfs and btrfs-convert sets the default features, for eg, > >>> skinny-metadata even if the running kernel does not supports it, and > >>> so the mount fails on the running. > >> > >> So the default behaviour of mkfs will try to best guess the feature set > >> of currently running kernel. I think this is is the most common scenario > >> and justifies the change in default behaviours. > >> > >> For the other cases I'd like to introduce some human-readable shortcuts > >> to the --features option. Eg. 'mkfs.btrfs -O compat-3.2' will pick all > >> options supported by the unpatched mainline kernel of version 3.2. This > >> would be present for all version, regardless if there was a change in the > >> options or not. > >> > >> Similarly for convenience, add 'running' that would pick the options > >> from running kernel but will be explicit. > >> > >> A remaining option should override the 'running' behaviour and pick the > >> latest mkfs options. Naming it 'defaults' sounds a bit ambiguous so the > >> name is yet to be determined. > >> > >>> Here in this set of patches will make sure the progs understands the > >>> kernel supported features. > >>> > >>> So in this patch, checks if sysfs tells whether the feature is > >>> supported if not, then it will relay on static kernel version which > >>> provided that feature (skinny-metadata here in this example), next > >>> if for some reason the running kernel does not provide the kernel > >>> version, then it will fall back to the original method to enable > >>> the feature with a hope that kernel will support it. > >>> > >>> Also the last patch adds a warning when we fail to read either > >>> sysfs features or the running kernel version. > >> > >> Your patchset is a good start, the additional options I've described can > >> be added on top of that. We might need to switch the version > >> representation from string to KERNEL_VERSION but that's an > >> implementation detail. > > > > Depending on sysfs is stable but depending on kernel version may be not, > > we may have a distro kernel which backports some incompat features from > > upstream, then we have to decide based on sysfs interface. > > +1. > > Although sysfs does not always show up even for supported kernel, e.g > btrfs modules is not loaded after boot. > So we need to consider twice before choosing a fallback method. There are several factors that we have to take into account for the default behaviour and fallback. I'm close to a final proposal yet missed the possibility of unloaded module that would remove the access to sysfs, as you point out. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 04/12] btrfs: change how delay_iput is tracked in btrfs_delalloc_work
On Thu, Dec 03, 2015 at 06:25:37PM -0800, Liu Bo wrote: > > struct inode *inode; > > - int delay_iput; > > struct completion completion; > > struct list_head list; > > struct btrfs_work work; > > diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c > > index 15b29e879ffc..529a53b80ca0 100644 > > --- a/fs/btrfs/inode.c > > +++ b/fs/btrfs/inode.c > > @@ -9436,16 +9436,21 @@ static void btrfs_run_delalloc_work(struct > > btrfs_work *work) > > { > > struct btrfs_delalloc_work *delalloc_work; > > struct inode *inode; > > + int delay_iput; > > > > delalloc_work = container_of(work, struct btrfs_delalloc_work, > > work); > > inode = delalloc_work->inode; > > + /* Lowest bit of inode pointer tracks the delayed status */ > > + delay_iput = ((unsigned long)inode & 1UL); > > + inode = (struct inode *)((unsigned long)inode & ~1UL); > > + > > To be quite frankly, I don't like this, it's a pointer anyway, > error-prone in a debugging view, instead would 'u8 delayed_iput' help? The point was to shrink the structure. Adding the u8 will grow it by another 8 bytes, besides the slab objects are aligned to 8 bytes by default so the overall cost of storing the delayed information is 8 bytes: struct btrfs_delalloc_work { struct inode * inode;/* 0 8 */ struct completion completion; /* 832 */ struct list_head list; /*4016 */ struct btrfs_work work; /*5688 */ /* --- cacheline 2 boundary (128 bytes) was 16 bytes ago --- */ u8 delay;/* 144 1 */ /* size: 152, cachelines: 3, members: 5 */ /* padding: 7 */ /* last cacheline: 24 bytes */ }; As the use of the inode pointer is limited, I don't think this would cause surprises. And it's commented where used which should help during debugging. Abusing the low bits of pointers is nothing new, the page cache tags are implemented that way. This kind of low-level optimizations is IMO acceptable. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Very various speed of grep operation on btrfs partition
On 2015-12-03 14:36, Михаил Гаврилов wrote: Today on work I needed searching some strings in repository. Only machine with windows was available. I am was using grep from Cygwin for this task and I am was surprised about speed of NTFS partition.I decided to repeat this task on my home Linux workstation. [...snip...] From results we see that search goes sometimes instantly less than a second, and sometimes lasts 4 minutes. /home partition formatted in BTRFS filesystem. I would be interested investigate what is related to search speed. And make that search was always goes less than a second. Here is my mount options: UUID=82df2d84-bf54-46cb-84ba-c88e93677948 /home btrfs subvolid=5,autodefrag,noatime,space_cache,inode_cache,nodatacow 0 0 # uname -a Linux localhost.localdomain 4.2.6-301.fc23.x86_64+debug #1 SMP Fri Nov 20 22:07:57 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux How to start investigation? Well, what other things are accessing the filesystem at the same time? If you've got something like KDE running with the 'semantic desktop' stuff turned on, than that will seriously impact the performance of other things using that filesystem. The other thing to keep in mind, is that caching may be impacting things somewhat. To really get a good idea of performance for something like this, you should run 'sync' followed by 'echo 3 > /proc/sys/vm/drop_caches' (you'll need to be root for the second one) prior to each run, and ideally have nothing else running on that filesystem. On a separate note, if you're either running on a 64-bit system, or have less than about 2^31 files on the FS, inode_cache will slow things down. It's intended for stuff like mail spools where you have billions of files being created and deleted over a few weeks, and quickly use up the inode numbers. On almost all systems, it will make things run slower, and possibly result in non-=deterministic filesystem performance like what you are seeing here. Additionally, do you have some particular reason that you absolutely _need_ nodatacow to be enabled for the FS? It usually has no impact on performance, but it removes any kind of error correction for file data (checksums can't be used safely without COW semantics). It probably has no direct impact on what you're seeing here, but it is something that really shouldn't be used in most cases at the filesystem level (it can be done on given subvolumes or directories, and that's the recommended way to do it if you don't want to go down to the per-file level). smime.p7s Description: S/MIME Cryptographic Signature
Re: [PATCH] Btrfs: disable online scrub repair on ro cases
Hi Liu, [auto build test WARNING on btrfs/next] [also build test WARNING on v4.4-rc3 next-20151203] url: https://github.com/0day-ci/linux/commits/Liu-Bo/Btrfs-disable-online-scrub-repair-on-ro-cases/20151204-205115 base: https://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs.git next config: i386-randconfig-c0-12042053 (attached as .config) reproduce: # save the attached .config to linux build tree make ARCH=i386 All warnings (new ones prefixed by >>): In file included from include/uapi/linux/stddef.h:1:0, from include/linux/stddef.h:4, from include/uapi/linux/posix_types.h:4, from include/uapi/linux/types.h:13, from include/linux/types.h:5, from include/uapi/linux/capability.h:16, from include/linux/capability.h:15, from include/linux/sched.h:15, from include/linux/blkdev.h:4, from fs/btrfs/scrub.c:19: fs/btrfs/scrub.c: In function 'scrub_fixup_readpage': fs/btrfs/scrub.c:703:10: error: invalid type argument of '->' (have 'u64 {aka long long unsigned int}') if (root->fs_info->sb->s_flags & MS_RDONLY) ^ include/linux/compiler.h:147:28: note: in definition of macro '__trace_if' if (__builtin_constant_p((cond)) ? !!(cond) : \ ^ >> fs/btrfs/scrub.c:703:2: note: in expansion of macro 'if' if (root->fs_info->sb->s_flags & MS_RDONLY) ^ fs/btrfs/scrub.c:703:10: error: invalid type argument of '->' (have 'u64 {aka long long unsigned int}') if (root->fs_info->sb->s_flags & MS_RDONLY) ^ include/linux/compiler.h:147:40: note: in definition of macro '__trace_if' if (__builtin_constant_p((cond)) ? !!(cond) : \ ^ >> fs/btrfs/scrub.c:703:2: note: in expansion of macro 'if' if (root->fs_info->sb->s_flags & MS_RDONLY) ^ fs/btrfs/scrub.c:703:10: error: invalid type argument of '->' (have 'u64 {aka long long unsigned int}') if (root->fs_info->sb->s_flags & MS_RDONLY) ^ include/linux/compiler.h:158:16: note: in definition of macro '__trace_if' __r = !!(cond); \ ^ >> fs/btrfs/scrub.c:703:2: note: in expansion of macro 'if' if (root->fs_info->sb->s_flags & MS_RDONLY) ^ vim +/if +703 fs/btrfs/scrub.c 687 } 688 689 static int scrub_fixup_readpage(u64 inum, u64 offset, u64 root, void *fixup_ctx) 690 { 691 struct page *page = NULL; 692 unsigned long index; 693 struct scrub_fixup_nodatasum *fixup = fixup_ctx; 694 int ret; 695 int corrected = 0; 696 struct btrfs_key key; 697 struct inode *inode = NULL; 698 struct btrfs_fs_info *fs_info; 699 u64 end = offset + PAGE_SIZE - 1; 700 struct btrfs_root *local_root; 701 int srcu_index; 702 > 703 if (root->fs_info->sb->s_flags & MS_RDONLY) 704 return -EROFS; 705 706 key.objectid = root; 707 key.type = BTRFS_ROOT_ITEM_KEY; 708 key.offset = (u64)-1; 709 710 fs_info = fixup->root->fs_info; 711 srcu_index = srcu_read_lock(_info->subvol_srcu); --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation .config.gz Description: Binary data
Re: [PATCH 04/12] btrfs: change how delay_iput is tracked in btrfs_delalloc_work
On 12/04/15 13:36, David Sterba wrote: [snip] > As the use of the inode pointer is limited, I don't think this would > cause surprises. And it's commented where used which should help during > debugging. When I read through those bits (mostly pondering portability) I was wondering whether it might make sense to provide thin wrap/unwrap functions for the tag bit instead of relying on open code and comments only. Just an idea, not sure if it's worth the trouble. The code itself is functional and works fine as it is, I'm running it right now. -h -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Subvolume UUID, data corruption?
On Fri, Dec 04, 2015 at 01:05:28PM +0100, S.J wrote: > Hello > > As we know, two file systems with the same UUID (like reported by eg. > "blkid") are problematic, especially if both are mounted at the same time it > leads to data corruption. So, copying a BTRFS partition with eg. dd to > another and use it immediately is bad. To prevent this, "btrfstune -u > /dev/sdaX" changes the UUID of the given partition. > > However, BTRFS subvolumes have their own UUID, which can be viewed eg. with > "btrfs sub list -u /mountpoint". This UUIDs are not changed by the command > above, and apparently there is no other way to do this. > > My question is: Is this a problem similar to the main UUID? Can mounting two > BTRFS partitions with equal subvolume UUIDs (but different main UUID) can > cause data corruption? I don't think it'll cause problems. The UUIDs on subvols are only really used internally to that filesystem, so the kernel doesn't have a chance to get confused. The main thing that could be confused is send/receive, but that's a matter of possibly losing some validation (thus allowing you to do something that will fail) rather than causing active damage, as in the duplicate-FS-UUID case. > (...well, and maybe someone could explain me what these subvol UUIDs are for > in the first place. Subvolumes already have an unique number, and from user > p.o.v, there isn't anything where the subvol UUIDs can be used at all (?)) The subvol UUIDs are used to identify them through send/receive operations. There are three main UUID fields on a subvol: the actual UUID (u), the Received_UUID (r) and the Parent_UUID (p), and these are used to identify whether an incremental send could function correctly when received. (I can give you chapter and verse on how they're used if you like, but that's a bit excessive just for answering your question here). Hugo. > Thank you > > PS: Apologies for sending a second mail, somehow my first try didn't contain > any text -- Hugo Mills | Do not meddle in the affairs of system hugo@... carfax.org.uk | administrators, for they are subtle, and quick to http://carfax.org.uk/ | anger. PGP: E2AB1DE4 | signature.asc Description: Digital signature
Re: 3.16.0 Debian kernel hang
On 2015-12-04 08:42, Russell Coker wrote: On Sat, 5 Dec 2015 12:08:58 AM Austin S Hemmelgarn wrote: I know that there are no plans to backport things to 3.16 and I don't think the Debian people are going to be very interested in this. So this message is a FYI for users, maybe consider not using the Debian/Jessie kernel for BTRFS systems. I'd suggest extending that suggestion to: If you're not using an Enterprise distro (RHEL, SLES, CentOS, OEL), then you should probably be building your own kernel, ideally using upstream sources. There are lots of ways of dealing with this. Debian development doesn't stop. Anyone who is running a Jessie system can easily run a kernel from Testing or Unstable (which really isn't particularly unstable). It's generally expected that Debian user-space will work with a kernel from +- one release of Debian. Also every time I've tried it Debian has worked well with a CentOS kernel of a similar version. Well yes, that does usually work, but that doesn't mean that it keeps up with mainline very well. Back when I used Debian on a regular basis, I ran the 'unstable' kernels, and they still lagged behind mainline by at least a minor version, and often more than that. And there have been cases where things got horribly broken in mainline due to lack of proper vetting of code (Most recent example being the insanity with the clustered MD code, which broke non-clustered soft raid for at least two major releases), which prevents them from safely keeping up-to-date with mainline. The only reason I'm not running Unstable kernels on my Debian systems is because I run some Xen servers and upgrading Xen is problemmatic. Linode is moving from Xen to KVM so I guess I should consider doing the same. If I migrate my Xen servers to KVM I can use newer kernels with less risk. That's interesting, that must be something with how they do kernel development in Debian, because I've never had any issues upgrading either Xen or Linux on any of the systems I've run Xen on, and I directly track mainline (with a small number of patches) for Linux, and stay relatively close to mainline with Xen (Gentoo doesn't have all that many patches on top of the regular release for Xen, aside from XSA patches). smime.p7s Description: S/MIME Cryptographic Signature
Re: btrfs crashing the kernel with Seagate 8TB SMR drives.
I did suspect that NCQ may be involved, but I had no clear evidence - until I noticed that my drives had also incremented the 'end to end error' count in SMART, which does match accounts of the NCQ issue. That suggests there are two interlinked issues: The issue with those Seagate drives and NCQ, combined with btrfs causing a kernel lock under certain error circumstances when it would be more appropriate to remount ro. Looks like the NCQ issue is already being addressed, but I did uncover a new and unusual error condition that btrfs needs to handle - and looking at the patch, it's a trivial thing to fix, so bothering the mailing list with it has made btrfs better in a tiny way. I don't usually report errors, assuming that people far more capable than I are already on top of them, but when I saw one that gave a description right down to the line number I thought it might be something that could be looked into very easily. I'm still impressed with the resilience of btrfs though - after all this abuse of crashing during rebalancing, corrupted filesystem structures and out-of-order commands, all my data is still undamaged. No conventional RAID could have endured that. Thanks for the patch, but I'd rather not fiddle with he kernel and have to repeat every time a new version comes out. I'll just disable NCQ until the fix is mainlined and SUSE incorporates it. uOn 04/12/15 15:21, Robert Krig wrote: As Chris mentioned, check out the Bug report here: https://bugzilla.kernel.org/show_bug.cgi?id=93581 I have a 8TB SMR Drive and the kernel was reporting drive errors. Switching to Kernel 3.16 (Standard Debian Jessie kernel) fixed it for me ( for the moment). >From what I read in that kernel bug report. The patch has been submitted for kernel 4.4. On 03.12.2015 19:07, Codebird wrote: I've got a nice bug for you - because I can offer you what everyone likes to see, a precise error message. I've got a btrfs filesystem spread over six devices, RAID1 mode. Four of these are Seagate 8TB archive drives - those SMR ones that a few others have reported failing when used with btrfs. I've had that issue too, and I just can't explain why, other than to say that it only occurs when using them on my mainboard SATA ports, not via USB dock. But that's not what I'm reporting - that's just the source of the problem that causes the crash I am reporting. The crash occurs when scrubbing, after some time and some terabytes - or possibly just when reading a certain place, I'm not sure - and it gives this helpful error left on the screen along with a system so unresponsive numlock won't flash: BTRFS: Error (device sdg1) in __btrfs_free_extent:6360: errno=-5 IO failure BTRFS: Error (device sdg1) in __btrfs_free_extent:6360: errno=-5 IO failure BTRFS: Error (device sdg1) in btrfs_run_delayed_refs:2851: errno=-5 IO failure BTRFS: Error (device sdg1) in btrfs_run_delayed_refs:2851: errno=-5 IO failure BTRFS: Error (device sdg1) in btrfs_run_delayed_refs:2851: errno=-5 IO failure BTRFS: assertion failed: f(fs_info->sb->s_flags & MS ---[ cut here ] kernel BUG at ../fs/btrfs/ctree.h:4057! Not sure if some of those 5 might be 6, as I was in a hurry to get it back up both times and just got a blurry photo. But it looks to me like there might be a chunk of code that doesn't handle a hardware fault - rather than cleanly return an error it's causing the kernel to hang entirely. I've managed to get this to happen twice now, so it's certainly something worth looking into. This is on SUSE tumbleweed, with kernel 4.3.0-2-default. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 3.16.0 Debian kernel hang
On 2015-12-04 09:26, Russell Coker wrote: On Sat, 5 Dec 2015 12:53:07 AM Austin S Hemmelgarn wrote: The only reason I'm not running Unstable kernels on my Debian systems is because I run some Xen servers and upgrading Xen is problemmatic. Linode is moving from Xen to KVM so I guess I should consider doing the same. If I migrate my Xen servers to KVM I can use newer kernels with less risk. That's interesting, that must be something with how they do kernel development in Debian, because I've never had any issues upgrading either Xen or Linux on any of the systems I've run Xen on, and I directly track mainline (with a small number of patches) for Linux, and stay relatively close to mainline with Xen (Gentoo doesn't have all that many patches on top of the regular release for Xen, aside from XSA patches). I don't think that Debian does anything wrong in this regard. It's just that my experience of Xen is that it is fragile at the best of times. The fact that Red Hat packaged the Xen kernel in the Linux kernel package is a major indication of Xen problems IMHO, the concept of Xen is that it shouldn't be tied to a Linux kernel. In the case of Red Hat, that's probably the way it's done because that's originally what was needed to make things work. Early versions of Xen very much did need a special version of Linux running as Domain 0. Coupling things like that also simplifies testing for the developers at Red hat, as they then only need to test one combination, instead of a big matrix of features. Less to test means they can test more thoroughly, which means they can provide a better guarantee that things will work without intervention right out of the box, which is important for enterprise distros. Xen is supposed to be decoupled from the version of the Domain 0 kernel, and in most of my experience with it, they do a pretty good job. 90% of the issues I've heard of personally have been with patched versions put together by Linux distros, not with an upstream release. If you haven't had Xen issues then I think you have been lucky. I have personally had issues using Debian as Domain 0 and keeping Xen up to date myself, but all of those issues vanished when I switched to Gentoo for that purpose (well, they vanished when I switched to NetBSD, but haven't resurfaced since I switched from that to Gentoo Linux after about a week of pulling my hair out from fighting with BSD). I'm admittedly not doing anything other than small purpose built PV domains for service isolation in most cases (although I do use a dedicated PV domain for testing kernel patches from time to time), but that really shouldn't have any impact. smime.p7s Description: S/MIME Cryptographic Signature
Re: [PATCH 0/3] btrfs-progs: fix file restore to lost+found bug
On 12/04/2015 01:37 PM, Naohiro Aota wrote: This series address an issue of btrfsck to restore infinite number of same file into `lost+found' directory. The issue occur on a file which is linked from two different directory A and B. If links from dir A is corrupted and links from dir B is kept valid, btrfsck won't stop creating a file in lost+found like this: - Moving file 'file.del.51' to 'lost+found' dir since it has no valid backref Fixed the nlink of inode 1876 Trying to rebuild inode:1877 Moving file 'del' to 'lost+found' dir since it has no valid backref Fixed the nlink of inode 1877 Can't get file name for inode 1876, using '1876' as fallback Moving file '1876' to 'lost+found' dir since it has no valid backref Fixed the nlink of inode 1876 Can't get file name for inode 1876, using '1876' as fallback Moving file '1876.1876' to 'lost+found' dir since it has no valid backref Fixed the nlink of inode 1876 (snip) Moving file '1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876' to 'lost+found' dir since it has no valid backref Fixed the nlink of inode 1876 Can't get file name for inode 1876, using '1876' as fallback Can't get file name for inode 1876, using '1876' as fallback Can't get file name for inode 1876, using '1876' as fallback - The problem is early release of inode backrefs. The release prevents `reset_nlink()' to add back valid backrefs to an inode. In the result, the following results occur: 0. btrfsck scan a FS tree 1. It finds valid links and invalid links (some links are lost) 2. All valid links are released 3. btrfsck detects found_links != nlink 4. reset_nlink() reset nlink to 0 5. No valid links are restored (thus still nlink = 0) 6. The file is restored to lost+found since nlink == 0 (now, nlink = 1) 7. btrfsck rescan the FS tree 8. It finds `found_links' = #valid_links+1 (in lost+found) and nlink = 1 9. again all valid links are lost, and restore to lost+found Right, that's one case I missed in the repair code. Thanks for the fix. The first patch add clean up code to the test. It umount test directory on failure path. The second patch fix the above problem. And the last patch extend the test to check a case of multiple-linked file corruption. But I only see the first 2 patches in maillist... The last test case seems missing? Thanks, Qu -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Subvolume UUID, data corruption?
On Fri, 2015-12-04 at 13:07 +, Hugo Mills wrote: > I don't think it'll cause problems. Is there any guaranteed behaviour when btrfs encounters two filesystems (i.e. not talking about the subvols now) with the same UUID? Given that it's long standing behaviour that people could clone filesystems (dd, etc.) and this just worked™, btrfs should at least handle such case gracefully. For example, when already more than one block device with a btrfs of the same UUID are known, then it should refuse to mount any of them. And if one is already known and another device pops up it should refuse to mount that and continue to normally use the already mounted one. Cheers, Chris. smime.p7s Description: S/MIME cryptographic signature
Re: [PATCH v2 0/5] Make btrfs-progs really compatible with any kernel version
David, the possibility of unloaded module that would remove the access to sysfs, as you point out. Kindly note, the patch below made /dev/btrfs-control a static node, - commit 578454ff7eab61d13a26b568f99a89a2c9edc881 Author: Kay SieversDate: Thu May 20 18:07:20 2010 +0200 driver core: add devname module aliases to allow module on-demand auto-loading -- And here the function, check_or_load_btrfs_ko(), in the PATCH v2 2/5, will take care of this problem. + +int check_or_load_btrfs_ko() +{ + int fd; + + /* +* open will load btrfs kernel module if its not loaded, +* and if the kernel has CONFIG auto load set? +*/ + fd = open("/dev/btrfs-control", O_RDONLY); + if (fd < 0) + return -errno; + + close(fd); + return 0; +} + Since now static minor number for /dev/btrfs-control is mapped to the btrfs kernel module, it will ensure btrfs is loaded when /dev/btrfs-control is accessed. Further, /dev/btrfs-control node is created by udevd, by reading the modules.devname which is either supplied/updated by the distro or compilation. For systems without udev, IMO should run mknod ..btrfs-control in their install script which I guess is a must. # ls -li /dev/btrfs-control 7338 crw-rw 1 root disk 10, 234 Dec 5 10:45 /dev/btrfs-control # cat modules.devname | egrep btrfs btrfs btrfs-control c10:234 # cat ./include/linux/miscdevice.h | egrep BTRFS #define BTRFS_MINOR 234 So IMO this is not a real problem. Thanks, Anand -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: attacking btrfs filesystems via UUID collisions? (was: Subvolume UUID, data corruption?)
Thinking a bit more I that, I came to the conclusion that it's actually security relevant that btrfs deals gracefully with filesystems having the same UUID: Getting to know someone else's filesystem's UUID may be more easily possible than one may think. It's usually not considered secret and for example included in debug reports (e.g. several Debian packages do this). The only thing an attacker then needs to do is somehow making another filesystem with the UUID available in his victims system. Simplest way is via a USB stick when he has local access. Thanks to some stupid desktop environments, chances aren't to bad that the system will even auto mount the stick. If btrfs doesn't handle this gracefully the attacker may damage or destroy the original filesystem, or if things get awkwardly corrupted (and data is written to the fake btrfs) even get data out of such a system (despite any screen locks or dm-crypt). Cheers Chris. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 06/15] btrfs: Cleanup num_tolerated_disk_barrier_failures
Hi Anand, Would you please push patch 1~6 in your hot spare patchset to Chris first? In my opinion, it will need some time before some details like whether to do hot-spare in kernel or in user-space are settled. And all these 6 patches are quite independent from the hot spare patchset. So it would be OK to push them into mainline in this or next merge windows. Thanks, Qu On 11/09/2015 06:56 PM, Anand Jain wrote: From: Qu WenruoAs we use per-chunk degradable check, now the global num_tolerated_disk_barrier_failures is of no use. So cleanup it. Signed-off-by: Qu Wenruo [Btrfs: resolve conflict to apply 'btrfs: Cleanup num_tolerated_disk_barrier_failures'] Signed-off-by: Anand Jain --- fs/btrfs/ctree.h | 2 -- fs/btrfs/disk-io.c | 56 -- fs/btrfs/disk-io.h | 2 -- fs/btrfs/volumes.c | 17 - 4 files changed, 77 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index a86051e..dedd3e0 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -1753,8 +1753,6 @@ struct btrfs_fs_info { /* next backup root to be overwritten */ int backup_root_index; - int num_tolerated_disk_barrier_failures; - /* device replace state */ struct btrfs_dev_replace dev_replace; diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index d3303f9..d10ef2e 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -2965,8 +2965,6 @@ retry_root_backup: printk(KERN_ERR "BTRFS: Failed to read block groups: %d\n", ret); goto fail_sysfs; } - fs_info->num_tolerated_disk_barrier_failures = - btrfs_calc_num_tolerated_disk_barrier_failures(fs_info); fs_info->cleaner_kthread = kthread_run(cleaner_kthread, tree_root, "btrfs-cleaner"); @@ -3498,60 +3496,6 @@ int btrfs_get_num_tolerated_disk_barrier_failures(u64 flags) return 0; } -int btrfs_calc_num_tolerated_disk_barrier_failures( - struct btrfs_fs_info *fs_info) -{ - struct btrfs_ioctl_space_info space; - struct btrfs_space_info *sinfo; - u64 types[] = {BTRFS_BLOCK_GROUP_DATA, - BTRFS_BLOCK_GROUP_SYSTEM, - BTRFS_BLOCK_GROUP_METADATA, - BTRFS_BLOCK_GROUP_DATA | BTRFS_BLOCK_GROUP_METADATA}; - int i; - int c; - int num_tolerated_disk_barrier_failures = - (int)fs_info->fs_devices->num_devices; - - for (i = 0; i < ARRAY_SIZE(types); i++) { - struct btrfs_space_info *tmp; - - sinfo = NULL; - rcu_read_lock(); - list_for_each_entry_rcu(tmp, _info->space_info, list) { - if (tmp->flags == types[i]) { - sinfo = tmp; - break; - } - } - rcu_read_unlock(); - - if (!sinfo) - continue; - - down_read(>groups_sem); - for (c = 0; c < BTRFS_NR_RAID_TYPES; c++) { - u64 flags; - - if (list_empty(>block_groups[c])) - continue; - - btrfs_get_block_group_info(>block_groups[c], - ); - if (space.total_bytes == 0 || space.used_bytes == 0) - continue; - flags = space.flags; - - num_tolerated_disk_barrier_failures = min( - num_tolerated_disk_barrier_failures, - btrfs_get_num_tolerated_disk_barrier_failures( - flags)); - } - up_read(>groups_sem); - } - - return num_tolerated_disk_barrier_failures; -} - static int write_all_supers(struct btrfs_root *root, int max_mirrors) { struct list_head *head; diff --git a/fs/btrfs/disk-io.h b/fs/btrfs/disk-io.h index adeb318..6dc5fd3 100644 --- a/fs/btrfs/disk-io.h +++ b/fs/btrfs/disk-io.h @@ -142,8 +142,6 @@ struct btrfs_root *btrfs_create_tree(struct btrfs_trans_handle *trans, int btree_lock_page_hook(struct page *page, void *data, void (*flush_fn)(void *)); int btrfs_get_num_tolerated_disk_barrier_failures(u64 flags); -int btrfs_calc_num_tolerated_disk_barrier_failures( - struct btrfs_fs_info *fs_info); int __init btrfs_end_io_wq_init(void); void btrfs_end_io_wq_exit(void); diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index a5262bf..33ad42e 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -1782,9 +1782,6 @@ int btrfs_rm_device(struct btrfs_root *root, char *device_path, u64 devid) free_fs_devices(cur_devices);
Re: BUG: failure at fs/btrfs/ctree.h:337/btrfs_chunk_item_size()!
On 12/04/2015 09:12 PM, David Sterba wrote: On Fri, Dec 04, 2015 at 09:21:59AM +0800, Qu Wenruo wrote: We do have the alignment check in kernel, but it's in the early phase where we don't know if nodesize is reliable and print only a warning. This can be enhanced by the following method: At minimum, we can promote the 4k alignment checks in btrfs_check_super_valid from a warning to an error. The blocks must be 4k aligned, regardless of sectorsize or nodesize. 1) Check sectorsize first Only several sector size is valid for current btrfs: 4K, 8K, 16K, 32K, 64K Just five numbers, quite easy to check. The sectorsize must be PAGE_SIZE at the moment. This will change with Chandan's patchset though. PAGE_SIZE would be good enough. Or if anyone is going to extend supported sectorsize, we can change the check to if the number is power of 2 starting from 4K. 2) Check nodesize/leafsize then It should be aligned to sectorsize. This particular check is missing but is implicit because of the sectorsize == PAGE_SIZE restriction. But still need to check nodesize/leafsize validation against sectorsize. Current btrfs is already using large nodesize by default. For example, 20K nodesize can pass 4K page size check but still wrong. (And I'm also wrong in previous mail, it's not only aligned to sectorisze, but also need to be power of 2) And nodesize must match with leafsize. Currently, it's done out of check_super_valid(), we can integrate it. Yeah it's done, then I don't see why we should add it agian. Just want to move it to check_super_valid(), as it's better to put validation check codes together, and that's why we have check_super_valid(). 3) Check all super root bytenr against *sectorsize* Yeah, not nodesize. As some old bad convert will cause metadata extent unaligned to nodesize(just before my convert rework patch), but only aligned to sectorsize. So only check alignment of sectorsize. While the real check should be against the sectorsize, at the moment I think it's covered by the 4k checks anyway. I understand why we can't use the nodesize. 4K is good enough for x86 family but can't find all problem for 64K page size like PPC64 or AArch64. So it's still better to change the check at least to page size even we don't have subpage size support yet. So, if we do the warning -> error, we're fine for now. Some of the checks you suggest would be good to merge when the subpage blocksize patchset is merged. Right, more accurate check is only needed after subpage patchset. Thanks, Qu -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH V2] Btrfs: disable online scrub repair on ro cases
This disables repair process on ro cases as it can cause system to be unresponsive on the ASSERT() in repair_io_failure(). This can happen when scrub is running and a hardware error pops up, we should fallback to ro mounts gracefully instead of being unresponsive. Reported-by: CodebirdSigned-off-by: Liu Bo --- v2: Get @fs_info from a real pointer instead of a confusing-name u64 root. fs/btrfs/scrub.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c index 2907a77..cb8a4e0 100644 --- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -682,11 +682,14 @@ static int scrub_fixup_readpage(u64 inum, u64 offset, u64 root, void *fixup_ctx) struct btrfs_root *local_root; int srcu_index; + fs_info = fixup->root->fs_info; + if (fs_info->sb->s_flags & MS_RDONLY) + return -EROFS; + key.objectid = root; key.type = BTRFS_ROOT_ITEM_KEY; key.offset = (u64)-1; - fs_info = fixup->root->fs_info; srcu_index = srcu_read_lock(_info->subvol_srcu); local_root = btrfs_read_fs_root_no_name(fs_info, ); -- 2.5.0 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 0/5] Make btrfs-progs really compatible with any kernel version
On Fri, Dec 04, 2015 at 11:57:55AM +0800, Qu Wenruo wrote: > > > Liu Bo wrote on 2015/12/03 18:53 -0800: > >On Fri, Dec 04, 2015 at 10:08:35AM +0800, Qu Wenruo wrote: > >> > >> > >>Liu Bo wrote on 2015/12/03 17:44 -0800: > >>>On Mon, Nov 23, 2015 at 06:56:09PM +0100, David Sterba wrote: > On Mon, Nov 23, 2015 at 08:56:13PM +0800, Anand Jain wrote: > >Btrfs-progs is a tool for the btrfs kernel and we hope latest btrfs-progs > >be compatible w any set of older/newer kernels. > > > >So far mkfs.btrfs and btrfs-convert sets the default features, for eg, > >skinny-metadata even if the running kernel does not supports it, and > >so the mount fails on the running. > > So the default behaviour of mkfs will try to best guess the feature set > of currently running kernel. I think this is is the most common scenario > and justifies the change in default behaviours. > > For the other cases I'd like to introduce some human-readable shortcuts > to the --features option. Eg. 'mkfs.btrfs -O compat-3.2' will pick all > options supported by the unpatched mainline kernel of version 3.2. This > would be present for all version, regardless if there was a change in the > options or not. > > Similarly for convenience, add 'running' that would pick the options > from running kernel but will be explicit. > > A remaining option should override the 'running' behaviour and pick the > latest mkfs options. Naming it 'defaults' sounds a bit ambiguous so the > name is yet to be determined. > > >Here in this set of patches will make sure the progs understands the > >kernel supported features. > > > >So in this patch, checks if sysfs tells whether the feature is > >supported if not, then it will relay on static kernel version which > >provided that feature (skinny-metadata here in this example), next > >if for some reason the running kernel does not provide the kernel > >version, then it will fall back to the original method to enable > >the feature with a hope that kernel will support it. > > > >Also the last patch adds a warning when we fail to read either > >sysfs features or the running kernel version. > > Your patchset is a good start, the additional options I've described can > be added on top of that. We might need to switch the version > representation from string to KERNEL_VERSION but that's an > implementation detail. > >>> > >>>Depending on sysfs is stable but depending on kernel version may be not, > >>>we may have a distro kernel which backports some incompat features from > >>>upstream, then we have to decide based on sysfs interface. > >> > >>+1. > >> > >>Although sysfs does not always show up even for supported kernel, e.g btrfs > >>modules is not loaded after boot. > >>So we need to consider twice before choosing a fallback method. > >> > >>> > >>>However, this brings another problems, for very old kernels, they don't > >>>have sysfs, do you have any suggestions for that? > >> > >>Other fs, like xfs/ext* doesn't even have sysfs feature interface, only > >>release announcement mentioning default behavior change. > >>And I don't see many users complaining about it. > >> > >>Here is the example of xfsprogs changed its default feature recently: > >>In 10th, June, 2015, xfsprogs v3.2.3 is released, with new default feature > >>of enabling CRC for fs. > >>The first supported kernel is 3.15, which is release in 8th Jun, 2014. > >>Almost one year ago. > > > >It's the same thing, if you use a earlier version(before v5) xfs and a > >v5 xfsprogs, you are not going to mount it. > > > >> > >>On the other hand, the sysfs feature is introduced at the end of year 2013. > >>It's already over 2 years. > >> > >>So just forgot the extra minor case of super old kernel would be good > >>enough. > > > >Sorry we're not able to do that since most users won't keep up upgrading > >their > >kernels to the latest one, instead they use the stable one they think. > > > >The fact is that btrfs has way more incompatible features than either ext4 > >or xfs, > >and no complain on ext4/xfs from them won't solve our btrfs issue anyway. > > > >The problem is much more serious for enterprise users which are sort of > >conservative, they would backport what they need, if they use > >btrfs they will experience the painful things. > > Only if enterprise really think btrfs is stable enough. > For this point, xfs is considered more stable than btrfs, but v5 xfs recent > change doesn't introduce such facility to do that compatibility check in > xfsprogs. Xfs on kernel side obviously refuses to mount if you create an incompatible feature with a recent xfsprogs but try to mount it with older kernel. STATIC int xfs_mount_validate_sb() { ... if (xfs_sb_has_incompat_feature(sbp, XFS_SB_FEAT_INCOMPAT_UNKNOWN)) { xfs_warn(mp,