Systemd 219 journald now sets the FS_NOCOW file flag for its journal files, possibly breaking RAID repairs.
Systemd 219 now sets the special FS_NOCOW file flag for its journal files[1]. This unfortunately breaks the ability to repair the journal on RAID 1/5/6 btrfs volumes, should a bad sector happen to appear there. Is this something that can be configured for systemd? Is btrfs going to someday fix the fragmentation problem, making this option reduntant? [1] http://lists.freedesktop.org/archives/systemd-devel/2015-February/028447.html * journald now sets the special FS_NOCOW file flag for its journal files. This should improve performance on btrfs, by avoiding heavy fragmentation when journald's write-pattern is used on COW file systems. It degrades btrfs' data integrity guarantees for the files to the same levels as for ext3/ext4 however. This should be OK though as journald does its own data integrity checks and all its objects are checksummed on disk. Also, journald should handle btrfs disk full events a lot more gracefully now, by processing SIGBUS errors, and not relying on fallocate() anymore. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: price to pay for nocow file bit?
On 8/1/2015 3:30 μμ, Lennart Poettering wrote: On Wed, 07.01.15 15:10, Josef Bacik (jba...@fb.com) wrote: On 01/07/2015 12:43 PM, Lennart Poettering wrote: Heya! Currently, systemd-journald's disk access patterns (appending to the end of files, then updating a few pointers in the front) result in awfully fragmented journal files on btrfs, which has a pretty negative effect on performance when accessing them. I've been wondering if mount -o autodefrag would deal with this problem but I haven't had the chance to look into it. Hmm, I am kinda interested in a solution that I can just implement in systemd/journald now and that will then just make things work for people suffering by the problem. I mean, I can hardly make systemd patch the mount options of btrfs just because I place a journal file on some fs... Is autodefrag supposed to become a default one day? Anyway, given the pros and cons I have now changed journald to set the nocow bit on newly created journal files. When files are rotated (and we hence know we will never ever write again to them) the bit is tried to be unset again, and a defrag ioctl will be invoked right after. btrfs currently silently ignores that we unset the bit, and leaves it set, but I figure i should try to unset it anyway, in case it learns that one day. After all, after rotating the files there's no reason to treat the files special anymore... Can this behaviour be optional? I dont mind some fragmentation if i can keep having checksums and the ability for raid 1 to repair those files. I'll keep an eye on this, and see if I still get user complaints about it. Should autodefrag become default eventually we can get rid of this code in journald again. One question regarding the btrfs defrag ioctl: playing around with it it appears to be asynchronous, the defrag request is simply queued and the ioctl returns immediately. Which is great for my usecase. However I was wondering if it always was async like this? I googled a bit, and found reports that defrag might take a while, but I am not sure if those reports were about the ioctl taking so long, or the effect of defrag actually hitting the disk... Lennart -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs scrub status misreports as interrupted
On 10/12/2014 9:28 μμ, Marc Joliet wrote: Am Wed, 10 Dec 2014 10:51:15 +0800 schrieb Anand Jain anand.j...@oracle.com: Is there any relevant log in the dmegs ? Not in my case; at least, nothing that made it into the syslog. Same with me, no messages at all -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs scrub status misreports as interrupted
I've got the exact same problem, with a 4 drive RAID1. kernel 3.18-git and btrfs tools-git, all built yesterday. On 22/11/2014 2:13 μμ, Marc Joliet wrote: Hi all, While I haven't gotten any scrub already running type errors any more, I do get one strange case of state misreport. When running scrub on /home (btrfs RAID10), after 3 of 4 drives have completed, the 4th drive (sdb) will report as interrupted, despite still running: # btrfs scrub status -d /home scrub status for 472c9290-3ff2-4096-9c47-0612d3a52cef scrub device /dev/sda (id 1) history scrub started at Sat Nov 22 11:57:34 2014 and finished after 3380 seconds total bytes scrubbed: 252.86GiB with 0 errors scrub device /dev/sdb (id 2) status scrub started at Sat Nov 22 11:57:34 2014, interrupted after 3698 seconds, not running total bytes scrubbed: 217.50GiB with 0 errors scrub device /dev/sdc (id 3) history scrub started at Sat Nov 22 11:57:34 2014 and finished after 3013 seconds total bytes scrubbed: 252.85GiB with 0 errors scrub device /dev/sdd (id 4) history scrub started at Sat Nov 22 11:57:34 2014 and finished after 2994 seconds total bytes scrubbed: 252.85GiB with 0 errors The funny thing is, the time will still update as the scrub keeps going: # btrfs scrub status -d /home scrub status for 472c9290-3ff2-4096-9c47-0612d3a52cef scrub device /dev/sda (id 1) history scrub started at Sat Nov 22 11:57:34 2014 and finished after 3380 seconds total bytes scrubbed: 252.86GiB with 0 errors scrub device /dev/sdb (id 2) status scrub started at Sat Nov 22 11:57:34 2014, interrupted after 4136 seconds, not running total bytes scrubbed: 239.44GiB with 0 errors scrub device /dev/sdc (id 3) history scrub started at Sat Nov 22 11:57:34 2014 and finished after 3013 seconds total bytes scrubbed: 252.85GiB with 0 errors scrub device /dev/sdd (id 4) history scrub started at Sat Nov 22 11:57:34 2014 and finished after 2994 seconds total bytes scrubbed: 252.85GiB with 0 errors This has happened a few times, and when sdb finally finishes, the status is then reported correctly as finished: # btrfs scrub status -d /home scrub status for 472c9290-3ff2-4096-9c47-0612d3a52cef scrub device /dev/sda (id 1) history scrub started at Sat Nov 22 11:57:34 2014 and finished after 3380 seconds total bytes scrubbed: 252.86GiB with 0 errors scrub device /dev/sdb (id 2) history scrub started at Sat Nov 22 11:57:34 2014 and finished after 4426 seconds total bytes scrubbed: 252.88GiB with 0 errors scrub device /dev/sdc (id 3) history scrub started at Sat Nov 22 11:57:34 2014 and finished after 3013 seconds total bytes scrubbed: 252.85GiB with 0 errors scrub device /dev/sdd (id 4) history scrub started at Sat Nov 22 11:57:34 2014 and finished after 2994 seconds total bytes scrubbed: 252.85GiB with 0 errors Kernel and btrfs-progs version: # uname -a Linux marcec 3.16.7-gentoo #1 SMP PREEMPT Fri Oct 31 22:45:54 CET 2014 x86_64 AMD Athlon(tm) 64 X2 Dual Core Processor 4200+ AuthenticAMD GNU/Linux # btrfs --version Btrfs v3.17.1 Should I open a report on bugzilla? -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Poll: time to switch skinny-metadata on by default?
On 21/10/2014 2:02 μμ, Austin S Hemmelgarn wrote: On 2014-10-21 05:29, Duncan wrote: David Sterba posted on Mon, 20 Oct 2014 18:34:03 +0200 as excerpted: On Thu, Oct 16, 2014 at 01:33:37PM +0200, David Sterba wrote: I'd like to make it default with the 3.17 release of btrfs-progs. Please let me know if you have objections. For the record, 3.17 will not change the defaults. The timing of the poll was very bad to get enough feedback before the release. Let's keep it open for now. FWIW my own results agree with yours, I've had no problem with skinny- metadata here, and it has been my default now for a couple backup-and-new- mkfs.btrfs generations, now. As you know there were some problems with it in the first kernel cycle or two after it was introduced as an option, and I waited awhile until they died down before trying it here, but as I said, no problems since I switched it on, and I've been running it awhile now. So defaulting to skinny-metadata looks good from here. =:^) Same here, I've been using it on all my systems since I switched from 3.15 to 3.16, and have had no issues whatsoever. I am using skinny-metadata for years, and only once had an issue with it. It was with scrub and was fixed by Liu Bo[1], so i think skinny-metadata is mature enough be a default. [1] https://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg34493.html -- Konstantinos Skarlatos -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Undelete files / directory
On 1/9/2014 7:27 μμ, Marc MERLIN wrote: On Sat, Aug 30, 2014 at 11:26:52AM -1000, Jean-Denis Girard wrote: So I commented out the break on line 238 of btrfs-find-root so that it Thanks for that report. Can a developer review this and see if it should be made an option or removed entirely? I think that is the best way to proceed, or maybe even better make a brute force option for btrfs restore that does something like my for loop, recovering what it can through the filesystem. Until then, can we make this into a concise set of instructions so we can post it on the wiki? Marc continues even if it thinks it went past the fs size, rerun the command, and I finally got a list of blocks to try! Then as you suggested I did: for i in `awk '{print $3}' root.txt` do echo $i btrfs restore -v -f $i --path-regex '^/(|jdg(|/tmp(|/.*)))$' \ ../x220_home.img . done And I now have back my ~2800 photos (~13 Gb). Many thanks to those who helped! I am glad i could help! Best regards, Jean-Denis Girard Le 30/08/2014 10:12, Jean-Denis Girard a écrit : Le 28/08/2014 21:40, Konstantinos Skarlatos a écrit : On 28/8/2014 8:04 μμ, Jean-Denis Girard wrote: Hi Chris, Thanks for your detailed answer. Le 28/08/2014 06:25, Chris Murphy a écrit : 9. btrfs-find-root /dev/sdc Super think's the tree root is at 29917184, chunk root 20987904 Well block 4194304 seems great, but generation doesn't match, have=2, want=9 level 0 Well block 4243456 seems great, but generation doesn't match, have=3, want=9 level 0 Well block 29376512 seems great, but generation doesn't match, have=4, want=9 level 0 Well block 29474816 seems great, but generation doesn't match, have=5, want=9 level 0 Well block 29556736 seems great, but generation doesn't match, have=6, want=9 level 0 Well block 29736960 seems great, but generation doesn't match, have=7, want=9 level 0 Well block 29900800 seems great, but generation doesn't match, have=8, want=9 level 0 Hi all, I did a successful btrfs restore a few months ago, saving all of my deleted files except 2 (So i lost about 1GB on a 4TB filesystem) Here is what i did (this is from memory and from my .zsh_history file, so i may be missing something) btrfs-find-root /dev/sdd -o 5 b1.txt I think the -o 5 option is quite important here. Thanks for the reply, but for some reason btrfs-fins-root does not work on this file system. Here is what I get: [jdg@tiare tmp]$ btrfs-find-root x220_home.img -o 5 Super think's the tree root is at 115230801920, chunk root 131072 Went past the fs size, exiting[jdg@tiare tmp]$ I can mount the file system, access the files, though obviously not the deleted directory. Regards, Jean-Denis Girard -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- Konstantinos Skarlatos -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Undelete files / directory
On 28/8/2014 8:04 μμ, Jean-Denis Girard wrote: Hi Chris, Thanks for your detailed answer. Le 28/08/2014 06:25, Chris Murphy a écrit : 9. btrfs-find-root /dev/sdc Super think's the tree root is at 29917184, chunk root 20987904 Well block 4194304 seems great, but generation doesn't match, have=2, want=9 level 0 Well block 4243456 seems great, but generation doesn't match, have=3, want=9 level 0 Well block 29376512 seems great, but generation doesn't match, have=4, want=9 level 0 Well block 29474816 seems great, but generation doesn't match, have=5, want=9 level 0 Well block 29556736 seems great, but generation doesn't match, have=6, want=9 level 0 Well block 29736960 seems great, but generation doesn't match, have=7, want=9 level 0 Well block 29900800 seems great, but generation doesn't match, have=8, want=9 level 0 Hi all, I did a successful btrfs restore a few months ago, saving all of my deleted files except 2 (So i lost about 1GB on a 4TB filesystem) Here is what i did (this is from memory and from my .zsh_history file, so i may be missing something) btrfs-find-root /dev/sdd -o 5 b1.txt I think the -o 5 option is quite important here. After that, i ran this for i in `awk '{print $3}' b1.txt`; do echo $i btrfs restore /dev/sdd /storage/A3/ -Dv -f $i ; done I think i did that in order to brute force a correct offset I also have done this, in order to find the offset that gave the largest number of files for i in `awk '{print $3}' b1.txt`; do echo $i btrfs restore /dev/sdd /storage/A3/ -Dv -f $i |wc -l ; done Then i did some test restores using various addresses btrfs restore /dev/sdd /storage/A3/B1/ -vD -f 2149617336320 btrfs restore /dev/sdd /storage/A3/B1/ -vD -f 1607682736128 btrfs restore /dev/sdd /storage/A3/B1/ -vD -f 2688721551360 and then i finally did the restore using the offset that looked best btrfs restore /dev/sdd /storage/A3/B1/ -v -f 2688721551360 I hope this helps, good luck! Here is what the command returns : [root@x220 ~]# btrfs-find-root /dev/mapper/home Super think's the tree root is at 115230801920, chunk root 131072 Went past the fs size, exiting[root@x220 ~]# I just tried with latest btrfs-progs (from git), it returns exactly the same. The btrfs partition is on top of dm-crypt, could it be a problem? Thanks, Jean-Denis Girard -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- Konstantinos Skarlatos -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Significance of high number of mails on this list?
On 22/8/2014 6:40 πμ, Shriramana Sharma wrote: Hello people. Thank you for your detailed replies, esp Duncan. In essence, I plan on using BTRFS for my production data -- mainly programs/documents I write in connection with my academic research. I'm not a professional sysadmin and I'm not running a business server. I'm just managing my own data, and as I have mentioned, my chief reason for looking at BTRFS is the ease of snapshots and backups using send/receive. It is clear now that snapshots are by and large stable but send/receive is not. But, IIUC, even if send/receive fails I still have the older data which is not overwritten due to COW and atomic operations, and I can always retry send/receive again. Is this correct? If yes, then I guess I can take the plunge but ensure I have daily backups (which BTRFS itself should help me do easily). I would stay with rsync for a while, because there is always the possibility of a bug that corrupts both your primary filesystem and your backup one, or send propagating corruption from one filesystem to another (Or maybe I am too paranoid, it would be good if we could have the opinion of a btrfs developer on this) I would also suggest lsyncd if rsync runs become slow due to too many files and directories, or you have something like my use case, where i have filesystems with millions of files and my backup servers are a few km away and reachable over relatively slow wireless links. Finaly, be sure to use the --inplace option of rsync. -- Konstantinos Skarlatos -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Significance of high number of mails on this list?
On 22/8/2014 12:58 μμ, Filipe David Manana wrote: On Fri, Aug 22, 2014 at 8:35 AM, Duncan 1i5t5.dun...@cox.net wrote: Konstantinos Skarlatos posted on Fri, 22 Aug 2014 09:56:55 +0300 as excerpted: I would stay with rsync for a while, because there is always the possibility of a bug that corrupts both your primary filesystem and your backup one, or send propagating corruption from one filesystem to another (Or maybe I am too paranoid, it would be good if we could have the opinion of a btrfs developer on this) No claim to be a dev, btrfs or otherwise, here, but I believe in this case you /are/ being too paranoid. Both btrfs send and receive only deal with data/metadata they know how to deal with. If it's corrupt in some way or if they don't understand it, they don't send/write it, they fail. Most of the time yes, however we have at least 1 know bug that affects 3.14.x only where send silently corrupts file data (replaces valid data with zeroes) at the destination: https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?id=766b5e5ae78dd04a93a275690a49e23d7dcb1f39 The fix landed in 3.15, but wasn't backported to 3.14.x yet (adding Chris to cc). I didnt know about this one, but bugs like this are exactly the reason somebody should be paranoid and not rush to use new features, especially when they concern their only backup to an experimental filesystem. IOW, if it works without error it's as guaranteed to be golden as these things get. The problem is that it doesn't always work without error in the first place, sometimes it /does/ fail. In that instance you can always try again as the existing data/metadata shouldn't be damaged, but if it keeps failing you may have to try something else, rsync, etc. -- Duncan - List replies preferred. No HTML msgs. Every nonfree program has a lord, a master -- and if you use the program, he is your master. Richard Stallman -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- Konstantinos Skarlatos -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Ideas for a feature implementation
On 13/8/2014 2:01 μμ, David Pottage wrote: On 12/08/14 12:00, Konstantinos Skarlatos wrote: Maybe help with Andrea Mazzoleni's New RAID library supporting up to six parities? It seems to be a great feature for btrfs. https://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg31735.html That would be very cool, but at present vanila RAID 5 or 6 does not work properly, so I think getting that fully working would be a better idea. (Unless it would make more sense to merge the whole lot into one bit of work where RAID 5 or 6 are just a special case of arbitrary parity level support). At present, you can write RAID 5 or 6 data, but if anything goes wrong, btrfs cannot use the parity information to help you get your data back, so in general you are better off with RAID 1 or 10. Also, I don't think I/O done in parallel so you get no speed advantage from having multiple discs either. Yeah, thats one of the features I am waiting to get finished, because I already have 5 multi disk systems that i would prefer to migrate to RAID5/6 from RAID1/JBOD that they are now. I dont know what is the best sequencing, I just think that these are great patches/features and its a pity for them to languish. -- Konstantinos Skarlatos -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Ideas for a feature implementation
On 10/8/2014 10:21 μμ, Vimal A R wrote: Hello, I came across the to-do list at https://btrfs.wiki.kernel.org/index.php/Project_ideas and would like to know if this list is updated and recent. I am looking for a project idea for my under graduate degree which can be completed in around 3-4 months. Are there any suggestions and ideas to help me further? Maybe help with Andrea Mazzoleni's New RAID library supporting up to six parities? It seems to be a great feature for btrfs. https://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg31735.html Thank you, Vimal -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- Konstantinos Skarlatos -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: mount time of multi-disk arrays
On 7/7/2014 4:38 μμ, André-Sebastian Liebe wrote: Hello List, can anyone tell me how much time is acceptable and assumable for a multi-disk btrfs array with classical hard disk drives to mount? I'm having a bit of trouble with my current systemd setup, because it couldn't mount my btrfs raid anymore after adding the 5th drive. With the 4 drive setup it failed to mount once in a few times. Now it fails everytime because the default timeout of 1m 30s is reached and mount is aborted. My last 10 manual mounts took between 1m57s and 2m12s to finish. I have the exact same problem, and have to manually mount my large multi-disk btrfs filesystems, so I would be interested in a solution as well. My hardware setup contains a - Intel Core i7 4770 - Kernel 3.15.2-1-ARCH - 32GB RAM - dev 1-4 are 4TB Seagate ST4000DM000 (5900rpm) - dev 5 is a 4TB Wstern Digital WDC WD40EFRX (5400rpm) Thanks in advance André-Sebastian Liebe -- # btrfs fi sh Label: 'apc01_pool0' uuid: 066141c6-16ca-4a30-b55c-e606b90ad0fb Total devices 5 FS bytes used 14.21TiB devid1 size 3.64TiB used 2.86TiB path /dev/sdd devid2 size 3.64TiB used 2.86TiB path /dev/sdc devid3 size 3.64TiB used 2.86TiB path /dev/sdf devid4 size 3.64TiB used 2.86TiB path /dev/sde devid5 size 3.64TiB used 2.88TiB path /dev/sdb Btrfs v3.14.2-dirty # btrfs fi df /data/pool0/ Data, single: total=14.28TiB, used=14.19TiB System, RAID1: total=8.00MiB, used=1.54MiB Metadata, RAID1: total=26.00GiB, used=20.20GiB unknown, single: total=512.00MiB, used=0.00 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- Konstantinos Skarlatos -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: mount time of multi-disk arrays
On 7/7/2014 6:48 μμ, Duncan wrote: Konstantinos Skarlatos posted on Mon, 07 Jul 2014 16:54:05 +0300 as excerpted: On 7/7/2014 4:38 μμ, André-Sebastian Liebe wrote: can anyone tell me how much time is acceptable and assumable for a multi-disk btrfs array with classical hard disk drives to mount? I'm having a bit of trouble with my current systemd setup, because it couldn't mount my btrfs raid anymore after adding the 5th drive. With the 4 drive setup it failed to mount once in a few times. Now it fails everytime because the default timeout of 1m 30s is reached and mount is aborted. My last 10 manual mounts took between 1m57s and 2m12s to finish. I have the exact same problem, and have to manually mount my large multi-disk btrfs filesystems, so I would be interested in a solution as well. I don't have a direct answer, as my btrfs devices are all SSD, but... a) Btrfs, like some other filesystems, is designed not to need a pre-mount (or pre-rw-mount) fsck, because it does what /should/ be a quick-scan at mount-time. However, that isn't always as quick as it might be for a number of reasons: a1) Btrfs is still a relatively immature filesystem and certain operations are not yet optimized. In particular, multi-device btrfs operations tend to still be using a first-working-implementation type of algorithm instead of a well optimized for parallel operation algorithm, and thus often serialize access to multiple devices where a more optimized algorithm would parallelize operations across multiple devices at the same time. That will come, but it's not there yet. a2) Certain operations such as orphan cleanup (orphans are files that were deleted while they were in use and thus weren't fully deleted at the time; if they were still in use at unmount (remount-read-only), cleanup is done at mount-time) can delay mount as well. a3) Inode_cache mount option: Don't use this unless you can explain exactly WHY you are using it, preferably backed up with benchmark numbers, etc. It's useful only on 32-bit, generally high-file-activity server systems and has general-case problems, including long mount times and possible overflow issues that make it inappropriate for normal use. Unfortunately there's a lot of people out there using it that shouldn't be, and I even saw it listed on at least one distro (not mine!) wiki. =:^( a4) The space_cache mount option OTOH *IS* appropriate for normal use (and is in fact enabled by default these days), but particularly in improper shutdown cases can require rebuilding at mount time -- altho this should happen /after/ mount, the system will just be busy for some minutes, until the space-cache is rebuilt. But the IO from a space_cache rebuild on one filesystem could slow down the mounting of filesystems that mount after it, as well as the boot-time launching of other post- mount launched services. If you're seeing the time go up dramatically with the addition of more filesystem devices, however, and you do /not/ have inode_cache active, I'd guess it's mainly the not-yet-optimized multi-device operations. b) As with any systemd launched unit, however, there's systemd configuration mechanisms for working around specific unit issues, including timeout issues. Of course most systems continue to use fstab and let systemd auto-generate the mount units, and in fact that is recommended, but either with fstab or directly created mount units, there's a timeout configuration option that can be set. b1) The general systemd *.mount unit [Mount] section option appears to be TimeoutSec=. As is usual with systemd times, the default is seconds, or pass the unit(s, like 5min 20s). b2) I don't see it /specifically/ stated, but with a bit of reading between the lines, the corresponding fstab option appears to be either x-systemd.timeoutsec= or x-systemd.TimeoutSec= (IOW I'm not sure of the case). You may also want to try x-systemd.device-timeout=, which /is/ specifically mentioned, altho that appears to be specifically the timeout for the device to appear, NOT for the filesystem to mount after it does. b3) See the systemd.mount (5) and systemd-fstab-generator (8) manpages for more, that being what the above is based on. Thanks for your detailed answer. A mount unit with a larger timeout works fine, maybe we should tell distro maintainers to up the limit for btrfs to 5 minutes or so? In my experience, mount time definitely grows as the filesystem grows older, and times out after snapshot count gets more than 500-1000 . I guess thats something that can be optimized in the future, but i believe stability is a much more urgent need now... So it might take a bit of experimentation to find the exact command, but based on the above anyway, it /should/ be pretty easy to tell systemd to wait a bit longer for that filesystem. When you find the right invocation, please reply with it here, as I'm sure there's others who will benefit as well. FWIW, I'm still on reiserfs for my spinning
Re: mount time of multi-disk arrays
On 7/7/2014 5:24 μμ, André-Sebastian Liebe wrote: On 07/07/2014 03:54 PM, Konstantinos Skarlatos wrote: On 7/7/2014 4:38 μμ, André-Sebastian Liebe wrote: Hello List, can anyone tell me how much time is acceptable and assumable for a multi-disk btrfs array with classical hard disk drives to mount? I'm having a bit of trouble with my current systemd setup, because it couldn't mount my btrfs raid anymore after adding the 5th drive. With the 4 drive setup it failed to mount once in a few times. Now it fails everytime because the default timeout of 1m 30s is reached and mount is aborted. My last 10 manual mounts took between 1m57s and 2m12s to finish. I have the exact same problem, and have to manually mount my large multi-disk btrfs filesystems, so I would be interested in a solution as well. Hi Konstantinos , you can workaround this by manual creating a systemd mount unit. - First review the autogenerated systemd mount unit (systemctl show your-mount-unit.mount). You you can get the unit name by issuing a 'systemctl' and look for your failed mount. - Then you have to take the needed values (After, Before, Conflicts, RequiresMountsFor, Where, What, Options, Type, Wantedby) and put them into an new systemd mount unit file (possibly under /usr/lib/systemd/system/your-mount-unit.mount ). - Now just add the TimeoutSec with a large enough value below [Mount]. - If you later want to automount you raid, add the WantedBy under [Install] - now issue a 'systemctl daemon-reload' and look for error messages in syslog. - If there are no errors you could enable your manual mount entry by 'systemctl enable your-mount-unit.mount' and safely comment out your old fstab entry (systemd no longer generates autogenerated units). -- 8 --- 8 --- 8 --- 8 --- 8 --- 8 --- 8 --- [Unit] Description=Mount /data/pool0 After=dev-disk-by\x2duuid-066141c6\x2d16ca\x2d4a30\x2db55c\x2de606b90ad0fb.device systemd-journald.socket local-fs-pre.target system.slice -.mount Before=umount.target Conflicts=umount.target RequiresMountsFor=/data /dev/disk/by-uuid/066141c6-16ca-4a30-b55c-e606b90ad0fb [Mount] Where=/data/pool0 What=/dev/disk/by-uuid/066141c6-16ca-4a30-b55c-e606b90ad0fb Options=rw,relatime,skip_balance,compress Type=btrfs TimeoutSec=3min [Install] WantedBy=dev-disk-by\x2duuid-066141c6\x2d16ca\x2d4a30\x2db55c\x2de606b90ad0fb.device -- 8 --- 8 --- 8 --- 8 --- 8 --- 8 --- 8 --- Hi André, This unit file works for me, thank you for creating it! Can somebody put it on the wiki? My hardware setup contains a - Intel Core i7 4770 - Kernel 3.15.2-1-ARCH - 32GB RAM - dev 1-4 are 4TB Seagate ST4000DM000 (5900rpm) - dev 5 is a 4TB Wstern Digital WDC WD40EFRX (5400rpm) Thanks in advance André-Sebastian Liebe -- # btrfs fi sh Label: 'apc01_pool0' uuid: 066141c6-16ca-4a30-b55c-e606b90ad0fb Total devices 5 FS bytes used 14.21TiB devid1 size 3.64TiB used 2.86TiB path /dev/sdd devid2 size 3.64TiB used 2.86TiB path /dev/sdc devid3 size 3.64TiB used 2.86TiB path /dev/sdf devid4 size 3.64TiB used 2.86TiB path /dev/sde devid5 size 3.64TiB used 2.88TiB path /dev/sdb Btrfs v3.14.2-dirty # btrfs fi df /data/pool0/ Data, single: total=14.28TiB, used=14.19TiB System, RAID1: total=8.00MiB, used=1.54MiB Metadata, RAID1: total=26.00GiB, used=20.20GiB unknown, single: total=512.00MiB, used=0.00 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- Konstantinos Skarlatos -- André-Sebastian Liebe -- Konstantinos Skarlatos -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs data dup on single device?
On 25/6/2014 5:41 μμ, Christoph Anton Mitterer wrote: On Wed, 2014-06-25 at 08:47 +0100, Hugo Mills wrote: This has variously been possible and not over the last few years. I think it's finally come down on the side of not, I think that would really be a loss... :( The question is, why? Well imagine you have some computer which can only have one disk drive (laptop, etc.) and you still want at least some kind of redundancy against bit rot errors. IMO, btrfs should support most flavours out there... - n-way duplicates on the same device (and not just DUP with n=2) For the same device there is also erasure coding, where you lose lets say 10% capacity, and have the benefit of recovering from the most probable disk errors that dont take the whole disk with them, bad sectors. - n-way mirrors on multiple devices (i.e. what we have right now with RAID1 plus up to classic RAID1 with copies on each device - RAID5/6 - n-way striped+parity with n2 - stacked layouts (RAID 10 as e.g. MD has it,... RAID50, 60) And terminology should really be re-worked... IMHO it's very bad to use the term RAID1, if it's not what classic RAID1 does. Cheers, Chris. -- Konstantinos Skarlatos -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: frustrations with handling of crash reports
On 19/6/2014 12:22 πμ, Duncan wrote: Konstantinos Skarlatos posted on Wed, 18 Jun 2014 16:23:04 +0300 as excerpted: I guess that btrfs developers have put these BUG_ONs so that they get reports from users when btrfs gets in these unexpected situations. But if most of these reports are ignored or not resolved, then maybe there is no use for these BUG_ONs and they should be replaced with something more mild. Keep in mind that if a system panics, then the only way to get logs from it is with serial or netconsole, so BUG_ON really makes it much harder for users to know what happened and send reports, and only the most technical and determined users will manage to send reports here. In terms of the BUGONs, they've been converting them to WARNONs recently, exactly due to the point you and Marc have made. Not being a dev and simply based on the patch-flow I've seen as btrfs has been basically behaving itself so far here[1], I had /thought/ that was more or less done (perhaps some really bad bug-ons left but only a few, and basically only where the kernel couldn't be sure it was in a logical enough state to continue writing to other filesystems too, so bugon being logical in that case), but based on you guys' comments there's apparently more to go. So at least for BUGONs they agree. I guess it's simply a matter of getting them all converted. Thats good to hear. But we should have a way to recover from these kinds of problems, first of all having btrfs report the exact location, disk and file name that is affected, and then make scrub fix or at least report about it, and finaly make fsck work for this. My filesystem that consistently kernel panics when a specific logical address is read, passes scrub without anything bad reported. What's the use of scrub if it cant deal with this? Tho at least in Marc's case, he's running kernels a couple back in some cases and they may still have BUGONs already replaced in the most current kernel. As for experimental, they've been toning down and removing the warnings recently. Yes, the on-device format may come with some level of compatibility guarantee now so I do agree with that bit, but IMO anyway, that warning should be being replaced with a more explicit on-device- format is now stable but the code is not yet entirely so, so keep your backups and be prepared to use them, and run current kernels, language, and that's not happening, they're mostly just toning it down without the still explicit warnings, ATM. --- [1] Btrfs (so far) behaving itself here: Possibly because my filesystems are relatively small and I don't use snapshots much and prefer several smaller independent filesystems rather than doing subvolumes, thus keeping the number of eggs in a single basket small. Plus, with small filesystems on SSD, I can balance reasonably regularly, and I do full fresh mkfs.btrfs rounds every few kernels as well to take advantage of newer features, which may well have the result of killing smaller problems that aren't yet showing up before they get big enough to cause real issues. Anyway, I'm not complaining! =:^) Well my use case is about 25 filesystems on rotating disks, 20 of them on single disks, and the rest are multiple disk filesystems, either raid1 or single. I have many subvolumes and in some cases thousands of snapshots, but no databases, systemd and the like on them. Of course I have everything backed up, /nag mode on but I believe that after all those years of development I shouldnt still be forced to do mkfs every 6 monts or so, when i use no new features. /nag mode off -- Konstantinos Skarlatos -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
btrfs-transacti:516 blocked 120 seconds on 3.16-rc1
I am not sure this is related with the other reports for lockups etc on 3.16-rc1, so i am sending it. full dmesg attached. this is after some heavy io on a multi disk btrfs filesystem. [69932.966704] INFO: task btrfs-transacti:516 blocked for more than 120 seconds. [69932.966837] Not tainted 3.16.0-rc1-ge99cfa2 #1 [69932.966921] echo 0 /proc/sys/kernel/hung_task_timeout_secs disables this message. [69932.967051] btrfs-transacti D 0001 0 516 2 0x [69932.967060] 8801f422fac0 0046 880203f3bd20 000145c0 [69932.967069] 8801f422ffd8 000145c0 880203f3bd20 8801f422fa30 [69932.967076] a062e392 8800cda63300 8802010b1e60 0c73d192 [69932.967083] Call Trace: [69932.967133] [a062e392] ? add_delayed_tree_ref+0x102/0x1b0 [btrfs] [69932.967146] [8119937a] ? kmem_cache_alloc_trace+0x1fa/0x220 [69932.967155] [814fd759] schedule+0x29/0x70 [69932.967179] [a05c8571] cache_block_group+0x121/0x390 [btrfs] [69932.967187] [810b0990] ? __wake_up_sync+0x20/0x20 [69932.967212] [a05d16fa] find_free_extent+0x5fa/0xc80 [btrfs] [69932.967243] [a0606f00] ? free_extent_buffer+0x10/0xa0 [btrfs] [69932.967269] [a05d1f52] btrfs_reserve_extent+0x62/0x140 [btrfs] [69932.967298] [a05ed388] __btrfs_prealloc_file_range+0xe8/0x380 [btrfs] [69932.967328] [a05f52b0] btrfs_prealloc_file_range_trans+0x30/0x40 [btrfs] [69932.967353] [a05d4a97] btrfs_write_dirty_block_groups+0x5c7/0x700 [btrfs] [69932.967380] [a05e2b5d] commit_cowonly_roots+0x18d/0x240 [btrfs] [69932.967408] [a05e4c87] btrfs_commit_transaction+0x4f7/0xa40 [btrfs] [69932.967435] [a05e0835] transaction_kthread+0x1e5/0x250 [btrfs] [69932.967462] [a05e0650] ? btrfs_cleanup_transaction+0x570/0x570 [btrfs] [69932.967471] [8108c97b] kthread+0xdb/0x100 [69932.967478] [8108c8a0] ? kthread_create_on_node+0x180/0x180 [69932.967486] [8150137c] ret_from_fork+0x7c/0xb0 [69932.967493] [8108c8a0] ? kthread_create_on_node+0x180/0x180 [69932.967505] INFO: task kworker/u16:15:30882 blocked for more than 120 seconds. -- Konstantinos Skarlatos [ 995.654816] BTRFS info (device sdh): force zlib compression [ 995.654827] BTRFS info (device sdh): disk space caching is enabled [ 995.654832] BTRFS: has skinny extents [ 995.785405] BTRFS: bdev /dev/sda errs: wr 0, rd 0, flush 0, corrupt 0, gen 2 [69932.966704] INFO: task btrfs-transacti:516 blocked for more than 120 seconds. [69932.966837] Not tainted 3.16.0-rc1-ge99cfa2 #1 [69932.966921] echo 0 /proc/sys/kernel/hung_task_timeout_secs disables this message. [69932.967051] btrfs-transacti D 0001 0 516 2 0x [69932.967060] 8801f422fac0 0046 880203f3bd20 000145c0 [69932.967069] 8801f422ffd8 000145c0 880203f3bd20 8801f422fa30 [69932.967076] a062e392 8800cda63300 8802010b1e60 0c73d192 [69932.967083] Call Trace: [69932.967133] [a062e392] ? add_delayed_tree_ref+0x102/0x1b0 [btrfs] [69932.967146] [8119937a] ? kmem_cache_alloc_trace+0x1fa/0x220 [69932.967155] [814fd759] schedule+0x29/0x70 [69932.967179] [a05c8571] cache_block_group+0x121/0x390 [btrfs] [69932.967187] [810b0990] ? __wake_up_sync+0x20/0x20 [69932.967212] [a05d16fa] find_free_extent+0x5fa/0xc80 [btrfs] [69932.967243] [a0606f00] ? free_extent_buffer+0x10/0xa0 [btrfs] [69932.967269] [a05d1f52] btrfs_reserve_extent+0x62/0x140 [btrfs] [69932.967298] [a05ed388] __btrfs_prealloc_file_range+0xe8/0x380 [btrfs] [69932.967328] [a05f52b0] btrfs_prealloc_file_range_trans+0x30/0x40 [btrfs] [69932.967353] [a05d4a97] btrfs_write_dirty_block_groups+0x5c7/0x700 [btrfs] [69932.967380] [a05e2b5d] commit_cowonly_roots+0x18d/0x240 [btrfs] [69932.967408] [a05e4c87] btrfs_commit_transaction+0x4f7/0xa40 [btrfs] [69932.967435] [a05e0835] transaction_kthread+0x1e5/0x250 [btrfs] [69932.967462] [a05e0650] ? btrfs_cleanup_transaction+0x570/0x570 [btrfs] [69932.967471] [8108c97b] kthread+0xdb/0x100 [69932.967478] [8108c8a0] ? kthread_create_on_node+0x180/0x180 [69932.967486] [8150137c] ret_from_fork+0x7c/0xb0 [69932.967493] [8108c8a0] ? kthread_create_on_node+0x180/0x180 [69932.967505] INFO: task kworker/u16:15:30882 blocked for more than 120 seconds. [69932.967625] Not tainted 3.16.0-rc1-ge99cfa2 #1 [69932.967707] echo 0 /proc/sys/kernel/hung_task_timeout_secs disables this message. [69932.967835] kworker/u16:15 D 0 30882 2 0x [69932.967867] Workqueue: btrfs-delalloc normal_work_helper [btrfs] [69932.967871] 88003e537858 0046 8801fc599e90 000145c0 [69932.967878
Re: commit 762380a block: add notion of a chunk size for request merging stops io on btrfs
On 18/6/2014 5:11 πμ, Jens Axboe wrote: On 2014-06-17 14:35, Konstantinos Skarlatos wrote: Hi all, with 3.16-rc1 rsync stops writing to my btrfs filesystem and stays at a D+ state. git bisect showed that the problematic commit is: 762380ad9322951cea4ce9d24864265f9c66a916 is the first bad commit commit 762380ad9322951cea4ce9d24864265f9c66a916 Author: Jens Axboe ax...@fb.com Date: Thu Jun 5 13:38:39 2014 -0600 block: add notion of a chunk size for request merging Some drivers have different limits on what size a request should optimally be, depending on the offset of the request. Similar to dividing a device into chunks. Add a setting that allows the driver to inform the block layer of such a chunk size. The block layer will then prevent merging across the chunks. This is needed to optimally support NVMe with a non-zero stripe size. Signed-off-by: Jens Axboe ax...@fb.com That's odd, should not have any effect since nobody enables stripe sizes in the kernel. I'll double check, perhaps it's not always being cleared. Ah wait, does the attached help? Yes, it works! I recompiled at commit 762380ad9322951cea4ce9d24864265f9c66a916 with your patch and it looks ok. Rebooted back to the unpatched kernel and the bug showed up again immediately. The funny thing is that the problem only showed on my (multi-disk) btrfs filesystem. / which is on ext4 seems to work fine. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: frustrations with handling of crash reports
On 17/6/2014 9:27 μμ, Marc MERLIN wrote: On Tue, Jun 17, 2014 at 07:59:57AM -0700, Marc MERLIN wrote: It is also ok to answer Any FS created or used before kernel 3.x can be corrupted due to bugs we fixed in 3.y, thank you for your report but it's not a good use of our time to investigate this (although newer kernels should not just crash with BUG(xxx) on unexpected data, they should remount the FS read only). I was thinking about this some more, and I know I have no right to tell others what to do, so take this as a mere suggestion :) How about doing a release with cleanups and stabilization and better state reporting when things go wrong? This would give a good known version for users who have actual data and backups that can take many hours or days to restore (never mind downtime). A few things I was thinking about: 1) Wouldn't it be a good time to replace all the BUG ON statements with appropriate error handling? Unexpected data can happen, the kernel shouldn't crash that. At the very least it should remount read only and give maybe a wiki link to the user on what to do next (some bu reporting and recovery page) 2) On unexpected cases, output basic information on the filesystem or printk instructions to the user on how to gather data that would be sent to the list to be reviewed. This would include information on how old the filesystem is when it's possible to detect, and the instruction page could say sorry, anything older than X, we don't want to hear about, we already fixed corruption bugs since then 3) getting printk data on an end user machine when it just started refusing to write to disk can be challenging and cause useful debug info to be lost. Things I thinking about: a) make sure most btrfs bugs do not just hang the kernel b) recommend to users to send kernel syslog messages to an ext4 partition How does that sound? I 100% agree with this. I also have a problem where btrfs decides to BUG_ON and force a kernel panic because it has found an unexpected type of metadata. Although in my case I was more lucky and had help and test patches from Liu Bo, I am still of the opinion that btrfs should not take down a whole system because it found something unexpected. I guess that btrfs developers have put these BUG_ONs so that they get reports from users when btrfs gets in these unexpected situations. But if most of these reports are ignored or not resolved, then maybe there is no use for these BUG_ONs and they should be replaced with something more mild. Keep in mind that if a system panics, then the only way to get logs from it is with serial or netconsole, so BUG_ON really makes it much harder for users to know what happened and send reports, and only the most technical and determined users will manage to send reports here. So I can guess that the real number of kernel panics due to btrfs is much higher, and most people are unable to report them, because they _never know_ that it was btrfs that caused their crash. I know btrfs is still experimental, but it is in kernel since 2009-01-09, so I think most users have some expectation of stability after something is 5.5 years in the mainline kernel. So my suggestion is that basicaly the same with Marc's: These BUG_ONs should be replaced with something that does not crash the system and gives out as much info as possible, so that users do not have to get here and ask for a debugging patch. After all, btrfs is still experimental, right? :) Furthermore, these problems should either remount the fs as readonly, or try to make the file that is implicated readonly, and report the filename, so users can delete it and continue with their lives without having to mkfs every few months. Or even make fsck able to fix these, and not choke on a few TB filesystem because it wants to use ridiculous amounts of RAM. In general, btrfs must get _much_ better at reporting what happened, which file was implicated and if it is a multiple disk fs, the disk where the problem is and the sector where that occured. PS. I am not a kernel developer, so please be kind if I have said something completely wrong :) Thanks, Marc -- Konstantinos Skarlatos -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kernel BUG at fs/btrfs/ctree.h:2456
On 5/6/2014 1:59 πμ, Konstantinos Skarlatos wrote: Hi, I get this after doing a few runs of rsync on my btrfs filesystem. kernel: 3.15.0-rc8 filesystem has 6x2tb disks, data is raid 0, fs was created with skinny metadata, mount options are noatime, compress-force=zlib. No quota or defrag or any of the new features is being used. attached full dmesg capture via netconsole. adding some more info $ btrfs fi df /storage/btrfs Data, single: total=8.89TiB, used=8.43TiB System, RAID1: total=32.00MiB, used=992.00KiB Metadata, RAID1: total=69.00GiB, used=66.75GiB unknown, single: total=512.00MiB, used=112.00KiB $ btrfs fi show Label: none uuid: bde3c349-9e08-45bb-8517-b9a6dda81e88 Total devices 6 FS bytes used 8.50TiB devid1 size 3.64TiB used 3.02TiB path /dev/sdf devid2 size 1.82TiB used 1.20TiB path /dev/sda devid3 size 1.82TiB used 1.20TiB path /dev/sdb devid4 size 1.82TiB used 1.20TiB path /dev/sdc devid5 size 1.82TiB used 1.20TiB path /dev/sdd devid6 size 1.82TiB used 1.20TiB path /dev/sdh Btrfs v3.14.2-dirty btrfs su li /storage/btrfs -q | grep parent_uuid - |wc -l 22 btrfs su li /storage/btrfs -q | grep -v parent_uuid - |wc -l 5855 So the filesystem has 22 subvolumes and 5855 snapshots. No vm images or databases are stored here, everything comes and goes with rsync, as this is a backup server. [ 855.493495] BTRFS info (device sdc): force zlib compression [ 855.498427] BTRFS info (device sdc): disk space caching is enabled [ 855.503348] BTRFS: has skinny extents [27199.947244] [ cut here ] [27199.952216] kernel BUG at fs/btrfs/ctree.h:2456! [27199.957188] invalid opcode: [#1] PREEMPT SMP [27199.962184] Modules linked in: netconsole radeon kvm_amd snd_hda_codec_hdmi ttm drm_kms_helper drm kvm r8169 microcode evdev snd_hda_intel snd_hda_controller mac_hid edac_core snd_hda_codec snd_hwdep edac_mce_amd snd_pcm pcspkr snd_timer snd serio_raw i2c_algo_bit k10temp hwmon sp5100_tco i2c_piix4 i2c_core soundcore mii wmi shpchp button acpi_cpufreq processor ext4 crc16 mbcache jbd2 crc32c_generic btrfs xor raid6_pq sd_mod crc_t10dif crct10dif_common ata_generic pata_acpi atkbd libps2 pata_jmicron ahci libahci ohci_pci libata ohci_hcd ehci_pci xhci_hcd ehci_hcd scsi_mod usbcore usb_common i8042 serio [27199.990017] CPU: 1 PID: 7953 Comm: rsync Not tainted 3.15.0-rc8-gfad01e8 #1 [27199.995748] Hardware name: Gigabyte Technology Co., Ltd. GA-890GPA-UD3H/GA-890GPA-UD3H, BIOS FD 07/23/2010 [27200.001584] task: 880202928000 ti: 8800129e task.ti: 8800129e [27200.007439] RIP: 0010:[a0594017] [a0594017] lookup_inline_extent_backref+0x407/0x5d0 [btrfs] [27200.013445] RSP: 0018:8800129e3a90 EFLAGS: 00010283 [27200.019397] RAX: 0038 RBX: 88002ef9af00 RCX: 8800129e3a40 [27200.025312] RDX: 8800 RSI: 36b7 RDI: 88002ef9af00 [27200.031119] RBP: 8800129e3b28 R08: 4000 R09: 8800129e3a50 [27200.036801] R10: R11: 0003 R12: 00b8 [27200.042377] R13: 0038 R14: 36b7 R15: 399c [27200.047899] FS: 7f543bbec700() GS:88020fc4() knlGS: [27200.053379] CS: 0010 DS: ES: CR0: 8005003b [27200.058751] CR2: 015b6fd8 CR3: 00018a0da000 CR4: 07e0 [27200.064060] Stack: [27200.069215] 0c14cecab000 88019abf1480 0327 8800129e3b68 [27200.074437] 399c 00b8 88002ef9af00 000d00b8 [27200.079585] 8800ce8fc800 b000a0594e27 00a80c14ceca 0020 [27200.084647] Call Trace: [27200.089554] [a0595265] insert_inline_extent_backref+0x55/0xe0 [btrfs] [27200.094467] [a0595386] __btrfs_inc_extent_ref+0x96/0x200 [btrfs] [27200.099290] [a059c0f9] __btrfs_run_delayed_refs+0x819/0x1240 [btrfs] [27200.104035] [a058979d] ? btrfs_put_tree_mod_seq+0x10d/0x150 [btrfs] [27200.108676] [a05a091b] btrfs_run_delayed_refs.part.52+0x7b/0x260 [btrfs] [27200.113241] [a05a0b17] btrfs_run_delayed_refs+0x17/0x20 [btrfs] [27200.117675] [a05b1be3] __btrfs_end_transaction+0x243/0x380 [btrfs] [27200.122031] [a05b1d30] btrfs_end_transaction+0x10/0x20 [btrfs] [27200.126275] [a05bb31e] btrfs_truncate+0x23e/0x330 [btrfs] [27200.130452] [a05bbe48] btrfs_setattr+0x228/0x2e0 [btrfs] [27200.134549] [811c6781] notify_change+0x221/0x380 [27200.138641] [811a9006] do_truncate+0x66/0x90 [27200.142715] [811ad159] ? __sb_start_write+0x49/0xf0 [27200.146795] [811a937b] do_sys_ftruncate.constprop.10+0x10b/0x160 [27200.150927] [811a940e] SyS_ftruncate+0xe/0x10 [27200.155104] [814f56a9] system_call_fastpath+0x16/0x1b [27200.159297] Code: 48 39 45 10 74 74 0f 87 28 01 00 00
Re: kernel BUG at fs/btrfs/ctree.h:2456
On 5/6/2014 10:05 πμ, Liu Bo wrote: Hi, Konstantinos On Thu, Jun 05, 2014 at 09:28:16AM +0300, Konstantinos Skarlatos wrote: On 5/6/2014 1:59 πμ, Konstantinos Skarlatos wrote: Hi, I get this after doing a few runs of rsync on my btrfs filesystem. kernel: 3.15.0-rc8 filesystem has 6x2tb disks, data is raid 0, fs was created with skinny metadata, mount options are noatime, compress-force=zlib. No quota or defrag or any of the new features is being used. attached full dmesg capture via netconsole. adding some more info Can you reproduce it? Or everything becomes good after a hard reboot? Looks that this is an 'impossible' case from code analysis. -liubo I recompiled my kernel with CONFIG_BTRFS_DEBUG=y. after a few minutes of scrub and rsync, i got this [ 264.271695] BTRFS info (device sda): force zlib compression [ 264.276668] BTRFS info (device sda): disk space caching is enabled [ 264.282950] BTRFS: has skinny extents [ 363.412708] BTRFS: checking UUID tree [ 1115.402092] BTRFS: checksum/header error at logical 4003307880448 on dev /dev/sda, sector 66783040: metadata node (level -1) in tree 18446744073709551615 [ 1115.406251] [ cut here ] [ 1115.408251] kernel BUG at fs/btrfs/ctree.h:2456! [ 1115.410257] invalid opcode: [#1] PREEMPT SMP [ 1115.412291] Modules linked in: netconsole kvm_amd radeon kvm ttm drm_kms_helper snd_hda_codec_hdmi serio_raw drm k10temp edac_core evdev mac_hid microcode hwmon edac_mce_amd r8169 mii i2c_algo_bit pcspkr snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep snd_pcm snd_timer snd soundcore wmi shpchp sp5100_tco i2c_piix4 i2c_core button acpi_cpufreq processor ext4 crc16 mbcache jbd2 crc32c_generic btrfs xor raid6_pq sd_mod crc_t10dif crct10dif_common ata_generic pata_acpi atkbd libps2 ahci pata_jmicron libahci libata ohci_pci ohci_hcd ehci_pci ehci_hcd scsi_mod xhci_hcd usbcore i8042 serio usb_common [ 1115.423444] CPU: 2 PID: 101 Comm: kworker/u16:6 Not tainted 3.15.0-rc8-g54539cd #1 [ 1115.425705] Hardware name: Gigabyte Technology Co., Ltd. GA-890GPA-UD3H/GA-890GPA-UD3H, BIOS FD 07/23/2010 [ 1115.428023] Workqueue: btrfs-btrfs-scrub normal_work_helper [btrfs] [ 1115.430296] task: 880203451e90 ti: 88020313 task.ti: 88020313 [ 1115.432548] RIP: 0010:[a04437ac] [a04437ac] tree_backref_for_extent+0x1cc/0x1d0 [btrfs] [ 1115.435433] RSP: 0018:880203133b40 EFLAGS: 00010283 [ 1115.438253] RAX: 0019 RBX: 2c05 RCX: 880203133af0 [ 1115.441083] RDX: 8800 RSI: 2c0e RDI: 88017e90efc0 [ 1115.443858] RBP: 880203133b88 R08: 4000 R09: 880203133b00 [ 1115.446573] R10: R11: 0002 R12: 88017e90efc0 [ 1115.449230] R13: 2be4 R14: 880203133bc0 R15: 2c0e [ 1115.451835] FS: 7f9a2c9f1700() GS:88020fc8() knlGS: [ 1115.454422] CS: 0010 DS: ES: CR0: 8005003b [ 1115.456960] CR2: 7f05da035000 CR3: 0001ddbc CR4: 07e0 [ 1115.459467] Stack: [ 1115.461894] 880203133bbf 880203133bd0 2bfc 2c0e [ 1115.464345] fffe 0021 a0460834 88017e90efc0 [ 1115.466760] 880202cbc000 880203133c60 a043ab5c [ 1115.469138] Call Trace: [ 1115.471447] [a043ab5c] scrub_print_warning+0x28c/0x2d0 [btrfs] [ 1115.473737] [a03de746] ? btrfs_csum_data+0x16/0x20 [btrfs] [ 1115.475975] [a043de94] scrub_handle_errored_block+0x974/0xae0 [btrfs] [ 1115.478176] [a043e228] scrub_bio_end_io_worker+0x228/0x810 [btrfs] [ 1115.480327] [a0414b77] normal_work_helper+0x77/0x350 [btrfs] [ 1115.482438] [810821c8] process_one_work+0x168/0x450 [ 1115.484518] [81082c02] worker_thread+0x132/0x3e0 [ 1115.486601] [81082ad0] ? manage_workers.isra.23+0x2d0/0x2d0 [ 1115.488693] [8108908b] kthread+0xdb/0x100 [ 1115.490770] [81088fb0] ? kthread_create_on_node+0x180/0x180 [ 1115.492881] [814f573c] ret_from_fork+0x7c/0xb0 [ 1115.494997] [81088fb0] ? kthread_create_on_node+0x180/0x180 [ 1115.497129] Code: ff 48 83 c4 20 5b 41 5c 41 5d 41 5e 41 5f 5d c3 0f 1f 80 00 00 00 00 48 83 c4 20 b8 fe ff ff ff 5b 41 5c 41 5d 41 5e 41 5f 5d c3 0f 0b 66 90 66 66 66 66 90 55 48 89 e5 41 57 41 56 41 55 41 54 [ 1115.501825] RIP [a04437ac] tree_backref_for_extent+0x1cc/0x1d0 [btrfs] [ 1115.504103] RSP 880203133b40 [ 1115.514902] ---[ end trace 54741a57d59e0263 ]--- [ 1115.516654] BUG: unable to handle kernel paging request at ffd8 [ 1115.518247] IP: [810896f0] kthread_data+0x10/0x20 [ 1115.519811] PGD 1814067 PUD 1816067 PMD 0 [ 1115.521277] Oops: [#2] PREEMPT SMP [ 1115.522735] Modules linked in: netconsole kvm_amd radeon kvm ttm drm_kms_helper snd_hda_codec_hdmi serio_raw drm k10temp
Re: send/receive and bedup
On 21/5/2014 3:58 πμ, Chris Murphy wrote: On May 20, 2014, at 4:56 PM, Konstantinos Skarlatos k.skarla...@gmail.com wrote: On 21/5/2014 1:37 πμ, Mark Fasheh wrote: On Tue, May 20, 2014 at 01:07:50AM +0300, Konstantinos Skarlatos wrote: Duperemove will be shipping as supported software in a major SUSE release so it will be bug fixed, etc as you would expect. At the moment I'm very busy trying to fix qgroup bugs so I haven't had much time to add features, or handle external bug reports, etc. Also I'm not very good at advertising my software which would be why it hasn't really been mentioned on list lately :) I would say that state that it's in is that I've gotten the feature set to a point which feels reasonable, and I've fixed enough bugs that I'd appreciate folks giving it a spin and providing reasonable feedback. Well, after having good results with duperemove with a few gigs of data, i tried it on a 500gb subvolume. After it scanned all files, it is stuck at 100% of one cpu core for about 5 hours, and still hasn't done any deduping. My cpu is an Intel(R) Xeon(R) CPU E3-1230 V2 @ 3.30GHz, so i guess thats not the problem. So I guess the speed of duperemove drops dramatically as data volume increases. Yeah I doubt it's your CPU. Duperemove is right now targeted at smaller data sets (a few VMS, iso images, etc) than you threw it at as you undoubtedly have figured out. It will need a bit of work before it can handle entire file systems. My guess is that it was spending an enormous amount of time finding duplicates (it has a very thorough check that could probably be optimized). It finished after 9 or so hours, so I agree it was checking for duplicates. It does a few GB in just seconds, so time probably scales exponentially with data size. I'm going to guess it ran out of memory. I wonder what happens if you take an SSD and specify a humongous swap partition on it. Like, 4x, or more, the amount of installed memory. Just tried it again, with 32GiB swap added on an SSD. My test files are 633GiB. duperemove -rv /storage/test 19537.67s user 183.86s system 89% cpu 6:06:56.96 total Duperemove was using about 1GiB or RAM, had one core at 100%, and I think swap was not touched at all. This same trick has been mentioned on the XFS list for use with xfsrepair when memory requirements exceed system memory, and is immensely faster. Chris Murphy -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ditto blocks on ZFS
On 20/5/2014 5:07 πμ, Russell Coker wrote: On Mon, 19 May 2014 23:47:37 Brendan Hide wrote: This is extremely difficult to measure objectively. Subjectively ... see below. [snip] *What other failure modes* should we guard against? I know I'd sleep a /little/ better at night knowing that a double disk failure on a raid5/1/10 configuration might ruin a ton of data along with an obscure set of metadata in some long tree paths - but not the entire filesystem. My experience is that most disk failures that don't involve extreme physical damage (EG dropping a drive on concrete) don't involve totally losing the disk. Much of the discussion about RAID failures concerns entirely failed disks, but I believe that is due to RAID implementations such as Linux software RAID that will entirely remove a disk when it gives errors. I have a disk which had ~14,000 errors of which ~2000 errors were corrected by duplicate metadata. If two disks with that problem were in a RAID-1 array then duplicate metadata would be a significant benefit. The other use-case/failure mode - where you are somehow unlucky enough to have sets of bad sectors/bitrot on multiple disks that simultaneously affect the only copies of the tree roots - is an extremely unlikely scenario. As unlikely as it may be, the scenario is a very painful consequence in spite of VERY little corruption. That is where the peace-of-mind/bragging rights come in. http://research.cs.wisc.edu/adsl/Publications/corruption-fast08.html The NetApp research on latent errors on drives is worth reading. On page 12 they report latent sector errors on 9.5% of SATA disks per year. So if you lose one disk entirely the risk of having errors on a second disk is higher than you would want for RAID-5. While losing the root of the tree is unlikely, losing a directory in the middle that has lots of subdirectories is a risk. Seeing the results of that paper, I think erasure coding is a better solution. Instead of having many copies of metadata or data, we could do erasure coding using something like zfec[1] that is being used by Tahoe-LAFS, increasing their size by lets say 5-10%, and be quite safe even from multiple continuous bad sectors. [1] https://pypi.python.org/pypi/zfec I can understand why people wouldn't want ditto blocks to be mandatory. But why are people arguing against them as an option? As an aside, I'd really like to be able to set RAID levels by subtree. I'd like to use RAID-1 with ditto blocks for my important data and RAID-0 for unimportant data. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: send/receive and bedup
On 21/5/2014 1:37 πμ, Mark Fasheh wrote: On Tue, May 20, 2014 at 01:07:50AM +0300, Konstantinos Skarlatos wrote: Duperemove will be shipping as supported software in a major SUSE release so it will be bug fixed, etc as you would expect. At the moment I'm very busy trying to fix qgroup bugs so I haven't had much time to add features, or handle external bug reports, etc. Also I'm not very good at advertising my software which would be why it hasn't really been mentioned on list lately :) I would say that state that it's in is that I've gotten the feature set to a point which feels reasonable, and I've fixed enough bugs that I'd appreciate folks giving it a spin and providing reasonable feedback. Well, after having good results with duperemove with a few gigs of data, i tried it on a 500gb subvolume. After it scanned all files, it is stuck at 100% of one cpu core for about 5 hours, and still hasn't done any deduping. My cpu is an Intel(R) Xeon(R) CPU E3-1230 V2 @ 3.30GHz, so i guess thats not the problem. So I guess the speed of duperemove drops dramatically as data volume increases. Yeah I doubt it's your CPU. Duperemove is right now targeted at smaller data sets (a few VMS, iso images, etc) than you threw it at as you undoubtedly have figured out. It will need a bit of work before it can handle entire file systems. My guess is that it was spending an enormous amount of time finding duplicates (it has a very thorough check that could probably be optimized). It finished after 9 or so hours, so I agree it was checking for duplicates. It does a few GB in just seconds, so time probably scales exponentially with data size. For what it's worth, handling larger data sets is the type of work I want to be doing on it in the future. I can help with testing :) I would also suggest that you publish in this list any changes that you do, so that your program becomes better known among btrfs users. Or even a new announcement mail or a page in the btrfs wiki. Finally, i would like to request the ability to do file level dedup, with a reflink. That has the advantage of consuming very little metadata compared to block level dedup. It could be done with a two pass dedup, first comparing all the same-sized files and after that doing your normal block level dedup. Btw does anybody have a good program/script that can do file level dedup with reflinks and checksum comparison? Kind regards, Konstantinos Skarlatos --Mark -- Mark Fasheh -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: send/receive and bedup
On 19/5/2014 7:01 μμ, Brendan Hide wrote: On 19/05/14 15:00, Scott Middleton wrote: On 19 May 2014 09:07, Marc MERLIN m...@merlins.org wrote: On Wed, May 14, 2014 at 11:36:03PM +0800, Scott Middleton wrote: I read so much about BtrFS that I mistaked Bedup with Duperemove. Duperemove is actually what I am testing. I'm currently using programs that find files that are the same, and hardlink them together: http://marc.merlins.org/perso/linux/post_2012-05-01_Handy-tip-to-save-on-inodes-and-disk-space_-finddupes_-fdupes_-and-hardlink_py.html hardlink.py actually seems to be the faster (memory and CPU) one event though it's in python. I can get others to run out of RAM on my 8GB server easily :( Interesting app. An issue with hardlinking (with the backups use-case, this problem isn't likely to happen), is that if you modify a file, all the hardlinks get changed along with it - including the ones that you don't want changed. @Marc: Since you've been using btrfs for a while now I'm sure you've already considered whether or not a reflink copy is the better/worse option. Bedup should be better, but last I tried I couldn't get it to work. It's been updated since then, I just haven't had the chance to try it again since then. Please post what you find out, or if you have a hardlink maker that's better than the ones I found :) Thanks for that. I may be completely wrong in my approach. I am not looking for a file level comparison. Bedup worked fine for that. I have a lot of virtual images and shadow protect images where only a few megabytes may be the difference. So a file level hash and comparison doesn't really achieve my goals. I thought duperemove may be on a lower level. https://github.com/markfasheh/duperemove Duperemove is a simple tool for finding duplicated extents and submitting them for deduplication. When given a list of files it will hash their contents on a block by block basis and compare those hashes to each other, finding and categorizing extents that match each other. When given the -d option, duperemove will submit those extents for deduplication using the btrfs-extent-same ioctl. It defaults to 128k but you can make it smaller. I hit a hurdle though. The 3TB HDD I used seemed OK when I did a long SMART test but seems to die every few hours. Admittedly it was part of a failed mdadm RAID array that I pulled out of a clients machine. The only other copy I have of the data is the original mdadm array that was recently replaced with a new server, so I am loathe to use that HDD yet. At least for another couple of weeks! I am still hopeful duperemove will work. Duperemove does look exactly like what you are looking for. The last traffic on the mailing list regarding that was in August last year. It looks like it was pulled into the main kernel repository on September 1st. The last commit to the duperemove application was on April 20th this year. Maybe Mark (cc'd) can provide further insight on its current status. I have been testing duperemove and it seems to work just fine, in contrast with bedup that i have been unable to install/compile/sort out the mess with python versions. I have 2 questions about duperemove: 1) can it use existing filesystem csums instead of calculating its own? 2) can it be included in btrfs-progs so that it becomes a standard feature of btrfs? Thanks -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: send/receive and bedup
On 19/5/2014 8:38 μμ, Mark Fasheh wrote: On Mon, May 19, 2014 at 06:01:25PM +0200, Brendan Hide wrote: On 19/05/14 15:00, Scott Middleton wrote: On 19 May 2014 09:07, Marc MERLIN m...@merlins.org wrote: Thanks for that. I may be completely wrong in my approach. I am not looking for a file level comparison. Bedup worked fine for that. I have a lot of virtual images and shadow protect images where only a few megabytes may be the difference. So a file level hash and comparison doesn't really achieve my goals. I thought duperemove may be on a lower level. https://github.com/markfasheh/duperemove Duperemove is a simple tool for finding duplicated extents and submitting them for deduplication. When given a list of files it will hash their contents on a block by block basis and compare those hashes to each other, finding and categorizing extents that match each other. When given the -d option, duperemove will submit those extents for deduplication using the btrfs-extent-same ioctl. It defaults to 128k but you can make it smaller. I hit a hurdle though. The 3TB HDD I used seemed OK when I did a long SMART test but seems to die every few hours. Admittedly it was part of a failed mdadm RAID array that I pulled out of a clients machine. The only other copy I have of the data is the original mdadm array that was recently replaced with a new server, so I am loathe to use that HDD yet. At least for another couple of weeks! I am still hopeful duperemove will work. Duperemove does look exactly like what you are looking for. The last traffic on the mailing list regarding that was in August last year. It looks like it was pulled into the main kernel repository on September 1st. I'm confused - you need to avoid a file scan completely? Duperemove does do that just to be clear. In your mind, what would be the alternative to that sort of a scan? By the way, if you know exactly where the changes are you could just feed the duplicate extents directly to the ioctl via a script. I have a small tool in the duperemove repositry that can do that for you ('make btrfs-extent-same'). The last commit to the duperemove application was on April 20th this year. Maybe Mark (cc'd) can provide further insight on its current status. Duperemove will be shipping as supported software in a major SUSE release so it will be bug fixed, etc as you would expect. At the moment I'm very busy trying to fix qgroup bugs so I haven't had much time to add features, or handle external bug reports, etc. Also I'm not very good at advertising my software which would be why it hasn't really been mentioned on list lately :) I would say that state that it's in is that I've gotten the feature set to a point which feels reasonable, and I've fixed enough bugs that I'd appreciate folks giving it a spin and providing reasonable feedback. Well, after having good results with duperemove with a few gigs of data, i tried it on a 500gb subvolume. After it scanned all files, it is stuck at 100% of one cpu core for about 5 hours, and still hasn't done any deduping. My cpu is an Intel(R) Xeon(R) CPU E3-1230 V2 @ 3.30GHz, so i guess thats not the problem. So I guess the speed of duperemove drops dramatically as data volume increases. There's a TODO list which gives a decent idea of what's on my mind for possible future improvements. I think what I'm most wanting to do right now is some sort of (optional) writeout to a file of what was done during a run. The idea is that you could feed that data back to duperemove to improve the speed of subsequent runs. My priorities may change depending on feedback from users of course. I also at some point want to rewrite some of the duplicate extent finding code as it got messy and could be a bit faster. --Mark -- Mark Fasheh -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs-progs: fsck: add an option to check data csums
On 8/5/2014 4:26 πμ, Wang Shilong wrote: This patch adds an option '--check-data-csum' to verify data csums. fsck won't check data csums unless users specify this option explictly. Can this option be added to btrfs restore as well? i think it would be a good thing if users can tell restore to only recover non-corrupt files. Signed-off-by: Wang Shilong wangsl.f...@cn.fujitsu.com --- Documentation/btrfs-check.txt | 2 + cmds-check.c | 122 -- 2 files changed, 120 insertions(+), 4 deletions(-) diff --git a/Documentation/btrfs-check.txt b/Documentation/btrfs-check.txt index 485a49c..bc10755 100644 --- a/Documentation/btrfs-check.txt +++ b/Documentation/btrfs-check.txt @@ -30,6 +30,8 @@ try to repair the filesystem. create a new CRC tree. --init-extent-tree:: create a new extent tree. +--check-data-csum:: +check data csums. EXIT STATUS --- diff --git a/cmds-check.c b/cmds-check.c index 103efc5..b53d49c 100644 --- a/cmds-check.c +++ b/cmds-check.c @@ -53,6 +53,7 @@ static LIST_HEAD(delete_items); static int repair = 0; static int no_holes = 0; static int init_extent_tree = 0; +static int check_data_csum = 0; struct extent_backref { struct list_head list; @@ -3634,6 +3635,106 @@ static int check_space_cache(struct btrfs_root *root) return error ? -EINVAL : 0; } +static int read_extent_data(struct btrfs_root *root, char *data, + u64 logical, u64 len, int mirror) +{ + u64 offset = 0; + struct btrfs_multi_bio *multi = NULL; + struct btrfs_fs_info *info = root-fs_info; + struct btrfs_device *device; + int ret = 0; + u64 read_len; + unsigned long bytes_left = len; + + while (bytes_left) { + read_len = bytes_left; + device = NULL; + ret = btrfs_map_block(info-mapping_tree, READ, + logical + offset, read_len, multi, + mirror, NULL); + if (ret) { + fprintf(stderr, Couldn't map the block %llu\n, + logical + offset); + goto error; + } + device = multi-stripes[0].dev; + + if (device-fd == 0) + goto error; + + if (read_len root-sectorsize) + read_len = root-sectorsize; + if (read_len bytes_left) + read_len = bytes_left; + + ret = pread64(device-fd, data + offset, read_len, + multi-stripes[0].physical); + if (ret != read_len) + goto error; + offset += read_len; + bytes_left -= read_len; + kfree(multi); + multi = NULL; + } + return 0; +error: + kfree(multi); + return -EIO; +} + +static int check_extent_csums(struct btrfs_root *root, u64 bytenr, + u64 num_bytes, unsigned long leaf_offset, + struct extent_buffer *eb) { + + u64 offset = 0; + u16 csum_size = btrfs_super_csum_size(root-fs_info-super_copy); + char *data; + u32 crc; + unsigned long tmp; + char result[csum_size]; + char out[csum_size]; + int ret = 0; + __s64 cmp; + int mirror; + int num_copies = btrfs_num_copies(root-fs_info-mapping_tree, + bytenr, num_bytes); + + BUG_ON(num_bytes % root-sectorsize); + data = malloc(root-sectorsize); + if (!data) + return -ENOMEM; + + while (offset num_bytes) { + mirror = 0; +again: + ret = read_extent_data(root, data, bytenr + offset, + root-sectorsize, mirror); + if (ret) + goto out; + + crc = ~(u32)0; + crc = btrfs_csum_data(NULL, (char *)data, crc, + root-sectorsize); + btrfs_csum_final(crc, result); + + tmp = leaf_offset + offset / root-sectorsize * csum_size; + read_extent_buffer(eb, out, tmp, csum_size); + cmp = memcmp(out, result, csum_size); + if (cmp) { + fprintf(stderr, mirror: %d range bytenr: %llu, len: %d checksum mismatch\n, + mirror, bytenr + offset, root-sectorsize); + if (mirror num_copies - 1) { + mirror += 1; + goto again; + } + } + offset += root-sectorsize; + } +out: + free(data); + return ret; +} + static int check_extent_exists(struct btrfs_root *root, u64 bytenr, u64 num_bytes) { @@ -3771,6 +3872,8
Test results for [RFC PATCH v10 00/16] Online(inband) data deduplication
Hello, Here are the test results from my testing of the latest patches of btrfs dedup. TLDR; I rsynced 10 separate copies of a 3.8GB folder with 138 RAW photographs (23-36MiB) on a btrfs volume with dedup enabled. On the first try, the copy was very slow, and a sync after that took over 10 minutes to complete. For the next copies sync was much faster, but still took up to one minute to complete. The copy itself was quite slow, until the fifth try when it went from 8MB/sec to 22-40MB/sec. Each copy after the first consumed about 60-65MiB of metadata, or 120-130MiB of free space due to metadata being DUP. Obvious question: Can dedup recognize that 2 files are the same and dedup them on a file level, saving much more space in the process? In any case I am very thankful of the work being done here, and i am willing to help in any way i can. AMD Phenom(tm) II X4 955 Processor MemTotal: 8 GB Hard Disk: Seagate Barracuda 7200.12 [160 GB] kernel: 3.14.0-1-git $ mkfs.btrfs /dev/loop0 -f mount /storage/btrfs_dedup mount |grep dedup btrfs dedup enable /storage/btrfs_dedup btrfs dedup on /storage/btrfs_dedup for i in {01..10}; do time rsync -a /storage/btrfs/costas/Photo_library/2014/ /storage/btrfs_dedup/copy$i/ --stats time btrfs fi sync /storage/btrfs_dedup/ df /storage/btrfs_dedup/ btrfs fi df /storage/btrfs_dedup ; done time umount /storage/btrfs_dedup /root/btrfs.img on /storage/btrfs_dedup type btrfs (rw,noatime,nodiratime,space_cache) sent 4,017,134,246 bytes received 2,689 bytes 8,274,226.44 bytes/sec rsync -a /storage/btrfs/costas/Photo_library/2014/ --stats 21.85s user 45.04s system 13% cpu 8:05.48 total btrfs fi sync /storage/btrfs_dedup/ 0.00s user 0.36s system 0% cpu 10:43.27 total /dev/loop1 46080 4119 40173 10% /storage/btrfs_dedup Data, single: total=4.01GiB, used=3.74GiB Metadata, DUP: total=1.00GiB, used=143.45MiB sent 4,017,134,246 bytes received 2,689 bytes 8,956,827.06 bytes/sec rsync -a /storage/btrfs/costas/Photo_library/2014/ --stats 21.29s user 42.32s system 14% cpu 7:28.74 total btrfs fi sync /storage/btrfs_dedup/ 0.00s user 0.01s system 0% cpu 4.173 total /dev/loop1 46080 4250 40173 10% /storage/btrfs_dedup Data, single: total=5.01GiB, used=3.74GiB Metadata, DUP: total=1.00GiB, used=208.72MiB sent 4,017,134,246 bytes received 2,689 bytes 9,691,524.57 bytes/sec rsync -a /storage/btrfs/costas/Photo_library/2014/ --stats 20.95s user 31.69s system 12% cpu 6:54.90 total btrfs fi sync /storage/btrfs_dedup/ 0.00s user 0.00s system 0% cpu 3.254 total /dev/loop1 46080 4371 40172 10% /storage/btrfs_dedup Data, single: total=5.01GiB, used=3.74GiB Metadata, DUP: total=1.00GiB, used=269.39MiB sent 4,017,134,246 bytes received 2,689 bytes 9,037,428.43 bytes/sec rsync -a /storage/btrfs/costas/Photo_library/2014/ --stats 20.54s user 36.70s system 12% cpu 7:23.93 total btrfs fi sync /storage/btrfs_dedup/ 0.00s user 0.01s system 0% cpu 5.578 total /dev/loop1 46080 4497 40172 11% /storage/btrfs_dedup Data, single: total=5.01GiB, used=3.74GiB Metadata, DUP: total=1.00GiB, used=331.98MiB sent 4,017,134,246 bytes received 2,689 bytes 29,004,598.81 bytes/sec rsync -a /storage/btrfs/costas/Photo_library/2014/ --stats 22.30s user 13.01s system 25% cpu 2:18.15 total btrfs fi sync /storage/btrfs_dedup/ 0.00s user 0.01s system 0% cpu 23.447 total /dev/loop1 46080 4617 40172 11% /storage/btrfs_dedup Data, single: total=5.01GiB, used=3.74GiB Metadata, DUP: total=1.00GiB, used=391.91MiB sent 4,017,134,246 bytes received 2,689 bytes 39,971,511.79 bytes/sec rsync -a /storage/btrfs/costas/Photo_library/2014/ --stats 21.60s user 11.85s system 33% cpu 1:39.74 total btrfs fi sync /storage/btrfs_dedup/ 0.00s user 0.01s system 0% cpu 32.178 total /dev/loop1 46080 4747 40171 11% /storage/btrfs_dedup Data, single: total=5.01GiB, used=3.74GiB Metadata, DUP: total=1.00GiB, used=456.48MiB sent 4,017,134,246 bytes received 2,689 bytes 32,009,059.24 bytes/sec rsync -a /storage/btrfs/costas/Photo_library/2014/ --stats 25.68s user 13.94s system 31% cpu 2:04.42 total btrfs fi sync /storage/btrfs_dedup/ 0.00s user 0.01s system 0% cpu 29.313 total /dev/loop1 46080 4870 40171 11% /storage/btrfs_dedup Data, single: total=5.01GiB, used=3.74GiB Metadata, DUP: total=1.00GiB, used=518.09MiB sent 4,017,134,246 bytes received 2,689 bytes 30,782,658.51 bytes/sec rsync -a /storage/btrfs/costas/Photo_library/2014/ --stats 21.84s user 12.63s system 26% cpu 2:10.20 total btrfs fi sync /storage/btrfs_dedup/ 0.00s user 0.00s system 0% cpu 41.074 total /dev/loop1 46080 4990 40171 12% /storage/btrfs_dedup Data, single: total=5.01GiB, used=3.74GiB Metadata, DUP: total=1.00GiB, used=578.16MiB sent 4,017,134,246 bytes received 2,689 bytes 22,379,592.95 bytes/sec rsync -a /storage/btrfs/costas/Photo_library/2014/ --stats 28.57s user
Re: [RFC PATCH v10 00/16] Online(inband) data deduplication
On 10/4/2014 6:48 πμ, Liu Bo wrote: Hello, This the 10th attempt for in-band data dedupe, based on Linux _3.14_ kernel. Data deduplication is a specialized data compression technique for eliminating duplicate copies of repeating data.[1] This patch set is also related to Content based storage in project ideas[2], it introduces inband data deduplication for btrfs and dedup/dedupe is for short. * PATCH 1 is a speed-up improvement, which is about dedup and quota. * PATCH 2-5 is the preparation work for dedup implementation. * PATCH 6 shows how we implement dedup feature. * PATCH 7 fixes a backref walking bug with dedup. * PATCH 8 fixes a free space bug of dedup extents on error handling. * PATCH 9 adds the ioctl to control dedup feature. * PATCH 10 targets delayed refs' scalability problem of deleting refs, which is uncovered by the dedup feature. * PATCH 11-16 fixes bugs of dedupe including race bug, deadlock, abnormal transaction abortion and crash. * btrfs-progs patch(PATCH 17) offers all details about how to control the dedup feature on progs side. I've tested this with xfstests by adding a inline dedup 'enable on' in xfstests' mount and scratch_mount. ***NOTE*** Known bugs: * Mounting with options flushoncommit and enabling dedupe feature will end up with _deadlock_. TODO: * a bit-to-bit comparison callback. All comments are welcome! Hi Liu, Thanks for doing this work. I tested your previous patches a few months ago, and will now test the new ones. One question about memory requirements, are they in the same league as ZFS dedup (ie needing 10's of gb of RAM for multi TB filesystems) or are they more reasonable? Thanks [1]: http://en.wikipedia.org/wiki/Data_deduplication [2]: https://btrfs.wiki.kernel.org/index.php/Project_ideas#Content_based_storage v10: - fix a typo in the subject line. - update struct 'btrfs_ioctl_dedup_args' in the kernel side to fix 'Inappropriate ioctl for device'. v9: - fix a deadlock and a crash reported by users. - fix the metadata ENOSPC problem with dedup again. v8: - fix the race crash of dedup ref again. - fix the metadata ENOSPC problem with dedup. v7: - rebase onto the lastest btrfs - break a big patch into smaller ones to make reviewers happy. - kill mount options of dedup and use ioctl method instead. - fix two crash due to the special dedup ref For former patch sets: v6: http://thread.gmane.org/gmane.comp.file-systems.btrfs/27512 v5: http://thread.gmane.org/gmane.comp.file-systems.btrfs/27257 v4: http://thread.gmane.org/gmane.comp.file-systems.btrfs/25751 v3: http://comments.gmane.org/gmane.comp.file-systems.btrfs/25433 v2: http://comments.gmane.org/gmane.comp.file-systems.btrfs/24959 Liu Bo (16): Btrfs: disable qgroups accounting when quota_enable is 0 Btrfs: introduce dedup tree and relatives Btrfs: introduce dedup tree operations Btrfs: introduce dedup state Btrfs: make ordered extent aware of dedup Btrfs: online(inband) data dedup Btrfs: skip dedup reference during backref walking Btrfs: don't return space for dedup extent Btrfs: add ioctl of dedup control Btrfs: improve the delayed refs process in rm case Btrfs: fix a crash of dedup ref Btrfs: fix deadlock of dedup work Btrfs: fix transactin abortion in __btrfs_free_extent Btrfs: fix wrong pinned bytes in __btrfs_free_extent Btrfs: use total_bytes instead of bytes_used for global_rsv Btrfs: fix dedup enospc problem fs/btrfs/backref.c | 9 + fs/btrfs/ctree.c | 2 +- fs/btrfs/ctree.h | 86 ++ fs/btrfs/delayed-ref.c | 26 +- fs/btrfs/delayed-ref.h | 3 + fs/btrfs/disk-io.c | 37 +++ fs/btrfs/extent-tree.c | 235 +--- fs/btrfs/extent_io.c | 22 +- fs/btrfs/extent_io.h | 16 ++ fs/btrfs/file-item.c | 244 + fs/btrfs/inode.c | 635 ++- fs/btrfs/ioctl.c | 167 fs/btrfs/ordered-data.c | 44 ++- fs/btrfs/ordered-data.h | 13 +- fs/btrfs/qgroup.c| 3 + fs/btrfs/relocation.c| 3 + fs/btrfs/transaction.c | 41 +++ fs/btrfs/transaction.h | 1 + include/trace/events/btrfs.h | 3 +- include/uapi/linux/btrfs.h | 12 + 20 files changed, 1471 insertions(+), 131 deletions(-) -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] Btrfs: send, add calculate data size flag to allow for progress estimation
On 4/4/2014 6:20 μμ, Filipe David Borba Manana wrote: This new send flag makes send calculate first the amount of new file data (in bytes) the send root has relatively to the parent root, or for the case of a non-incremental send, the total amount of file data we will send through the send stream. In other words, it computes the sum of the lengths of all write and clone operations that will be sent through the send stream. This data size value is sent in a new command, named BTRFS_SEND_C_TOTAL_DATA_SIZE, that immediately follows a BTRFS_SEND_C_SUBVOL or BTRFS_SEND_C_SNAPSHOT command, and precedes any command that changes a file or the filesystem hierarchy. Upon receiving a write or clone command, the receiving end can increment a counter by the data length of that command and therefore report progress by comparing the counter's value with the data size value received in the BTRFS_SEND_C_TOTAL_DATA_SIZE command. The approach is simple, before the normal operation of send, do a scan in the file system tree for new inodes and file extent items, just like in send's normal operation, and keep incrementing a counter with new inodes' size and the size of file extents that are going to be written or cloned. This is actually a simpler and more lightweight tree scan/processing than the one we do when sending the changes, as it doesn't process inode references nor does any lookups in the extent tree for example. After modifying btrfs-progs to understand this new command and report progress, here's an example (the -o flag tells btrfs send to pass the new flag to the kernel's send ioctl): $ btrfs send -o /mnt/sdd/base | btrfs receive /mnt/sdc At subvol /mnt/sdd/base At subvol base About to receive 9211507211 bytes Subvolume/snapshot /mnt/sdc//base, progress 24.73%, 2278015008 bytes received (9211507211 total bytes) $ btrfs send -o -p /mnt/sdd/base /mnt/sdd/incr | btrfs receive /mnt/sdc At subvol /mnt/sdd/incr At snapshot incr About to receive 9211747739 bytes Subvolume/snapshot /mnt/sdc//incr, progress 63.42%, 5843024211 bytes received (9211747739 total bytes) Hi, as a user of send i can say that this feature is very useful. Is it possible to add current speed indication (MB/sec)? Signed-off-by: Filipe David Borba Manana fdman...@gmail.com --- fs/btrfs/send.c| 194 + fs/btrfs/send.h| 1 + include/uapi/linux/btrfs.h | 13 ++- 3 files changed, 175 insertions(+), 33 deletions(-) diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c index c81e0d9..fa378c7 100644 --- a/fs/btrfs/send.c +++ b/fs/btrfs/send.c @@ -81,7 +81,13 @@ struct clone_root { #define SEND_CTX_MAX_NAME_CACHE_SIZE 128 #define SEND_CTX_NAME_CACHE_CLEAN_SIZE (SEND_CTX_MAX_NAME_CACHE_SIZE * 2) +enum btrfs_send_phase { + SEND_PHASE_STREAM_CHANGES, + SEND_PHASE_COMPUTE_DATA_SIZE, +}; + struct send_ctx { + enum btrfs_send_phase phase; struct file *send_filp; loff_t send_off; char *send_buf; @@ -116,6 +122,7 @@ struct send_ctx { u64 cur_inode_last_extent; u64 send_progress; + u64 total_data_size; struct list_head new_refs; struct list_head deleted_refs; @@ -687,6 +694,8 @@ static int send_rename(struct send_ctx *sctx, { int ret; + ASSERT(sctx-phase != SEND_PHASE_COMPUTE_DATA_SIZE); + verbose_printk(btrfs: send_rename %s - %s\n, from-start, to-start); ret = begin_cmd(sctx, BTRFS_SEND_C_RENAME); @@ -711,6 +720,8 @@ static int send_link(struct send_ctx *sctx, { int ret; + ASSERT(sctx-phase != SEND_PHASE_COMPUTE_DATA_SIZE); + verbose_printk(btrfs: send_link %s - %s\n, path-start, lnk-start); ret = begin_cmd(sctx, BTRFS_SEND_C_LINK); @@ -734,6 +745,8 @@ static int send_unlink(struct send_ctx *sctx, struct fs_path *path) { int ret; + ASSERT(sctx-phase != SEND_PHASE_COMPUTE_DATA_SIZE); + verbose_printk(btrfs: send_unlink %s\n, path-start); ret = begin_cmd(sctx, BTRFS_SEND_C_UNLINK); @@ -756,6 +769,8 @@ static int send_rmdir(struct send_ctx *sctx, struct fs_path *path) { int ret; + ASSERT(sctx-phase != SEND_PHASE_COMPUTE_DATA_SIZE); + verbose_printk(btrfs: send_rmdir %s\n, path-start); ret = begin_cmd(sctx, BTRFS_SEND_C_RMDIR); @@ -2286,6 +2301,9 @@ static int send_truncate(struct send_ctx *sctx, u64 ino, u64 gen, u64 size) int ret = 0; struct fs_path *p; + if (sctx-phase == SEND_PHASE_COMPUTE_DATA_SIZE) + return 0; + verbose_printk(btrfs: send_truncate %llu size=%llu\n, ino, size); p = fs_path_alloc(); @@ -2315,6 +2333,8 @@ static int send_chmod(struct send_ctx *sctx, u64 ino, u64 gen, u64 mode) int ret = 0; struct fs_path *p; + ASSERT(sctx-phase != SEND_PHASE_COMPUTE_DATA_SIZE); + verbose_printk(btrfs: send_chmod %llu mode=%llu\n, ino, mode); p = fs_path_alloc(); @@
help with btrfs device delete of a disk with errors (resent from subscribed mail)
I am trying to delete a device (device 5, /dev/sdg) that has some read errors from a multi device file system : Label: none uuid: f379d9aa-ddfd-4b4e-84c1-cd93d4592862 Total devices 6 FS bytes used 7.11TiB devid1 size 1.82TiB used 1.21TiB path /dev/sda devid2 size 1.82TiB used 1.23TiB path /dev/sdb devid3 size 1.82TiB used 1.23TiB path /dev/sdc devid4 size 1.82TiB used 1.23TiB path /dev/sdd devid5 size 0.00 used 1.12TiB path /dev/sdg devid6 size 1.82TiB used 1.23TiB path /dev/sdh $ btrfs fi df /storage/btrfs2 Data, RAID0: total=7.07TiB, used=7.07TiB Data, single: total=8.00MiB, used=7.94MiB System, RAID1: total=8.00MiB, used=416.00KiB System, single: total=4.00MiB, used=0.00 Metadata, RAID1: total=81.00GiB, used=35.02GiB Metadata, single: total=8.00MiB, used=0.00 btrfs: bdev /dev/sdg errs: wr 0, rd 510, flush 0, corrupt 0, gen 0 Device delete works fine until it gets to a block group that has a read error, then it crashes and remounts the filesystem as readonly. I have found via btrfs inspect-internal logical-resolve the file that corresponds to that block group, and deleted it. After that, btrfs inspect-internal logical-resolve returns: ioctl ret=-1, error: No such file or directory When i retry the device delete operation it still tries to relocate that same block group and crashes... Is there something else i can do to skip that block group and continue the device delete? my kernel is linux-3.13.0-rc6-git [2279324.794890] btrfs: found 55688 extents [2279325.525990] btrfs: relocating block group 7349792145408 flags 9 [2279360.657953] btrfs: found 64189 extents [2279367.861713] [ cut here ] [2279367.861753] WARNING: CPU: 1 PID: 29088 at fs/btrfs/extent-tree.c:1597 lookup_inline_extent_backref+0x1d9/0x5c0 [btrfs]() [2279367.861758] Modules linked in: sha256_generic btrfs raid6_pq crc32c libcrc32c radeon xor snd_hda_codec_hdmi snd_hda_intel snd_hda_codec snd_hwdep pcspkr ttm snd_pcm snd_page_alloc snd_timer snd drm_kms_helper edac_core sp5100_tco i2c_piix4 serio_raw k10temp soundcore edac_mce_amd drm evdev i2c_algo_bit r8169 i2c_core mii wmi shpchp button acpi_cpufreq processor ext4 crc16 mbcache jbd2 ata_generic pata_acpi sd_mod hid_generic usbhid hid ohci_pci ehci_pci ohci_hcd xhci_hcd pata_jmicron ehci_hcd ahci libahci libata scsi_mod usbcore usb_common [2279367.861839] CPU: 1 PID: 29088 Comm: btrfs Tainted: G W 3.13.0-rc6-git #1 [2279367.861845] Hardware name: Gigabyte Technology Co., Ltd. GA-890GPA-UD3H/GA-890GPA-UD3H, BIOS FD 07/23/2010 [2279367.861849] 0009 8800827f96e8 814f5648 [2279367.861858] 8800827f9720 81061b5d 8801fad0be10 [2279367.861866] 8800c92b1500 0009 8800827f9730 [2279367.861873] Call Trace: [2279367.861886] [814f5648] dump_stack+0x4d/0x6f [2279367.861897] [81061b5d] warn_slowpath_common+0x7d/0xa0 [2279367.861905] [81061c3a] warn_slowpath_null+0x1a/0x20 [2279367.861929] [a09229d9] lookup_inline_extent_backref+0x1d9/0x5c0 [btrfs] [2279367.861954] [a0923e15] insert_inline_extent_backref+0x55/0xd0 [btrfs] [2279367.861978] [a0923f27] __btrfs_inc_extent_ref+0x97/0x200 [btrfs] [2279367.862003] [a092b016] run_clustered_refs+0xb46/0x1180 [btrfs] [2279367.862027] [a091a63d] ? generic_bin_search.constprop.34+0x8d/0x1a0 [btrfs] [2279367.862054] [a092f3f0] btrfs_run_delayed_refs+0xe0/0x550 [btrfs] [2279367.862083] [a093fdee] btrfs_commit_transaction+0x4e/0x9a0 [btrfs] [2279367.862104] [a09acd6f] prepare_to_merge+0x1d2/0x1ed [btrfs] [2279367.862131] [a098d613] relocate_block_group+0x393/0x640 [btrfs] [2279367.862156] [a098da62] btrfs_relocate_block_group+0x1a2/0x2f0 [btrfs] [2279367.862184] [a0965568] btrfs_relocate_chunk.isra.28+0x68/0x760 [btrfs] [2279367.862207] [a091d066] ? btrfs_search_slot+0x496/0x970 [btrfs] [2279367.862237] [a095b01b] ? release_extent_buffer+0x2b/0xd0 [btrfs] [2279367.862265] [a096082f] ? free_extent_buffer+0x4f/0xb0 [btrfs] [2279367.862294] [a0967df9] btrfs_shrink_device+0x1e9/0x420 [btrfs] [2279367.862322] [a096ab58] btrfs_rm_device+0x328/0x800 [btrfs] [2279367.862330] [8118b192] ? __kmalloc_track_caller+0x32/0x250 [2279367.862358] [a0974ed0] btrfs_ioctl+0x2250/0x2d90 [btrfs] [2279367.862366] [811b350f] ? user_path_at_empty+0x5f/0x90 [2279367.862374] [814ff9c4] ? __do_page_fault+0x2c4/0x5b0 [2279367.862382] [811650b7] ? vma_link+0xb7/0xc0 [2279367.862389] [811b58a0] do_vfs_ioctl+0x2e0/0x4c0 [2279367.862397] [811b5b01] SyS_ioctl+0x81/0xa0 [2279367.862404] [814ffcbe] ? do_page_fault+0xe/0x10 [2279367.862412] [81503aad] system_call_fastpath+0x1a/0x1f
Btrfs send 4-5 times slower than rsync on local
Hello, i am using btrfs send to copy a snapshot to another btrfs filesystem on the same machine, and it has a maximum speed of 30-35MByte/sec. Incredibly rsync is much faster, at 120-140MB/sec. Source btrfs is a 5x2TB raid 0 and target is 1x4TB. mount options: rw,noatime,compress-force=zlib,space_cache kernel is linux-3.13.0-rc6-git and btrfs tools is built from git at about the same time linux-3.13.0-rc6 was released Finally, is there a way to resume an interrupted send? -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH v8 00/14] Online(inband) data deduplication
Sorry for the spam, i just mixed up the order of your patches. they now apply cleanly to 3.13 git. Thanks On 2/1/2014 4:32 μμ, Konstantinos Skarlatos wrote: Hello, I am trying to test your patches and they do not apply to latest 3.12 source or 3.13 git. Am I doing something wrong? ---logs for 3.12--- Hunk #1 succeeded at 59 with fuzz 2 (offset 1 line). patching file init/Kconfig Hunk #1 succeeded at 1085 (offset 96 lines). Hunk #2 succeeded at 1096 (offset 96 lines). patching file fs/btrfs/ctree.h Hunk #1 FAILED at 3692. 1 out of 1 hunk FAILED -- saving rejects to file fs/btrfs/ctree.h.rej patching file fs/btrfs/extent-tree.c Hunk #1 FAILED at 5996. Hunk #2 FAILED at 6023. 2 out of 2 hunks FAILED -- saving rejects to file fs/btrfs/extent-tree.c.rej patching file fs/btrfs/file-item.c Hunk #1 FAILED at 887. Hunk #2 succeeded at 765 with fuzz 2 (offset -151 lines). Hunk #3 FAILED at 978. Hunk #4 FAILED at 1061. Hunk #5 FAILED at 1094. 4 out of 5 hunks FAILED -- saving rejects to file fs/btrfs/file-item.c.rej patching file fs/btrfs/inode.c Hunk #1 FAILED at 969. Hunk #2 FAILED at 2364. 2 out of 2 hunks FAILED -- saving rejects to file fs/btrfs/inode.c.rej ---logs for 3.13--- Hunk #1 succeeded at 59 with fuzz 2 (offset 1 line). patching file init/Kconfig Hunk #1 succeeded at 1078 (offset 89 lines). Hunk #2 succeeded at 1089 (offset 89 lines). patching file fs/btrfs/ctree.h Hunk #1 FAILED at 3692. 1 out of 1 hunk FAILED -- saving rejects to file fs/btrfs/ctree.h.rej patching file fs/btrfs/extent-tree.c Hunk #1 FAILED at 5996. Hunk #2 FAILED at 6023. 2 out of 2 hunks FAILED -- saving rejects to file fs/btrfs/extent-tree.c.rej patching file fs/btrfs/file-item.c Hunk #1 FAILED at 887. Hunk #2 succeeded at 768 with fuzz 2 (offset -148 lines). Hunk #3 FAILED at 978. Hunk #4 FAILED at 1061. Hunk #5 FAILED at 1094. 4 out of 5 hunks FAILED -- saving rejects to file fs/btrfs/file-item.c.rej patching file fs/btrfs/inode.c Hunk #1 FAILED at 969. Hunk #2 FAILED at 2364. 2 out of 2 hunks FAILED -- saving rejects to file fs/btrfs/inode.c.rej On 30/12/2013 10:12 πμ, Liu Bo wrote: Hello, Here is the New Year patch bomb :-) Data deduplication is a specialized data compression technique for eliminating duplicate copies of repeating data.[1] This patch set is also related to Content based storage in project ideas[2], it introduces inband data deduplication for btrfs and dedup/dedupe is for short. PATCH 1 is a hang fix with deduplication on, but it's also useful without dedup in practice use. PATCH 2 and 3 are targetting delayed refs' scalability problems, which are uncovered by the dedup feature. PATCH 4 is a speed-up improvement, which is about dedup and quota. PATCH 5-8 is the preparation work for dedup implementation. PATCH 9 shows how we implement dedup feature. PATCH 10 fixes a backref walking bug with dedup. PATCH 11 fixes a free space bug of dedup extents on error handling. PATCH 12 adds the ioctl to control dedup feature. PATCH 13 fixes the metadata ENOSPC problem with dedup which has been there WAY TOO LONG. PATCH 14 fixes a race bug on dedup writes. And there is also a btrfs-progs patch(PATCH 15) which offers all details about how to control the dedup feature. I've tested this with xfstests by adding a inline dedup 'enable on' in xfstests' mount and scratch_mount. TODO: * a bit-to-bit comparison callback. All comments are welcome! [1]: http://en.wikipedia.org/wiki/Data_deduplication [2]: https://btrfs.wiki.kernel.org/index.php/Project_ideas#Content_based_storage v8: - fix the race crash of dedup ref again. - fix the metadata ENOSPC problem with dedup. v7: - rebase onto the lastest btrfs - break a big patch into smaller ones to make reviewers happy. - kill mount options of dedup and use ioctl method instead. - fix two crash due to the special dedup ref For former patch sets: v6: http://thread.gmane.org/gmane.comp.file-systems.btrfs/27512 v5: http://thread.gmane.org/gmane.comp.file-systems.btrfs/27257 v4: http://thread.gmane.org/gmane.comp.file-systems.btrfs/25751 v3: http://comments.gmane.org/gmane.comp.file-systems.btrfs/25433 v2: http://comments.gmane.org/gmane.comp.file-systems.btrfs/24959 Liu Bo (14): Btrfs: skip merge part for delayed data refs Btrfs: improve the delayed refs process in rm case Btrfs: introduce a head ref rbtree Btrfs: disable qgroups accounting when quata_enable is 0 Btrfs: introduce dedup tree and relatives Btrfs: introduce dedup tree operations Btrfs: introduce dedup state Btrfs: make ordered extent aware of dedup Btrfs: online(inband) data dedup Btrfs: skip dedup reference during backref walking Btrfs: don't return space for dedup extent Btrfs: add ioctl of dedup control Btrfs: fix dedupe 'ENOSPC' problem Btrfs: fix a crash of dedup ref fs/btrfs/backref.c | 9 + fs/btrfs/ctree.c | 2 +- fs/btrfs/ctree.h | 86 ++ fs/btrfs/delayed-ref.c | 161
Re: [PATCH] BTRFS-PROG: recursively subvolume snapshot and delete
On 26/11/2013 7:44 μμ, Goffredo Baroncelli wrote: On 2013-11-26 16:12, Konstantinos Skarlatos wrote: On 25/11/2013 11:23 μμ, Goffredo Baroncelli wrote: Hi all, nobody is interested in these new features ? Is this ZFS-style recursive snapshotting? If yes, i am interested, and thanks for your great work :) No it is not equal. My recursive snapshotting is not atomic as the ZFS one; every subvolume snapshot is atomic, but each snapshot is taken at different time. For my use case that is not a problem, but others may disagree BR G.Baroncelli On 2013-11-16 18:09, Goffredo Baroncelli wrote: Hi All, the following patches implement the recursively snapshotting and deleting of a subvolume. To snapshot recursively you must pass the -R switch: # btrfs subvolume create sub1 Create subvolume './sub1' # btrfs subvolume create sub1/sub2 Create subvolume 'sub1/sub2' # btrfs subvolume snapshot -R sub1 sub1-snap Create a snapshot of 'sub1' in './sub1-snap' Create a snapshot of 'sub1/sub2' in './sub1-snap/sub2' To recursively delete subvolumes, you must pass the switch '-R': # btrfs subvolume create sub1 Create subvolume './sub1' # btrfs subvolume create sub1/sub2 Create subvolume 'sub1/sub2' # btrfs subvolume delete -R sub1 Delete subvolume '/root/sub1/sub2' Delete subvolume '/root/sub1' Some caveats: 1) the recursively behaviour need the root capability This because how the subvolume are discovered 2) it is not possible to recursively snapshot a subvolume in read-only mode This because when a subvolume is snapshotted, its nested subvolumes appear as directory in the snapshot. These directories are removed before snapshotting the nested subvolumes. This is incompatible with a read only subvolume. BR G.Baroncelli -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Btrfs-tools build instructions for Centos
Hello, in https://btrfs.wiki.kernel.org/index.php/Btrfs_source_repositories, i used the fedora instructions for Centos. The problem is that lzo2-devel is named lzo-devel in Centos, so if somebody follows the fedora instructions and doesn't notice that lzo2-devel is missing, btrfs-progs build will fail with /usr/bin/ld: cannot find -llzo2. The solution is to install lzo-devel instead. Can this be added to the wiki? -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] BTRFS-PROG: recursively subvolume snapshot and delete
On 25/11/2013 11:23 μμ, Goffredo Baroncelli wrote: Hi all, nobody is interested in these new features ? Is this ZFS-style recursive snapshotting? If yes, i am interested, and thanks for your great work :) On 2013-11-16 18:09, Goffredo Baroncelli wrote: Hi All, the following patches implement the recursively snapshotting and deleting of a subvolume. To snapshot recursively you must pass the -R switch: # btrfs subvolume create sub1 Create subvolume './sub1' # btrfs subvolume create sub1/sub2 Create subvolume 'sub1/sub2' # btrfs subvolume snapshot -R sub1 sub1-snap Create a snapshot of 'sub1' in './sub1-snap' Create a snapshot of 'sub1/sub2' in './sub1-snap/sub2' To recursively delete subvolumes, you must pass the switch '-R': # btrfs subvolume create sub1 Create subvolume './sub1' # btrfs subvolume create sub1/sub2 Create subvolume 'sub1/sub2' # btrfs subvolume delete -R sub1 Delete subvolume '/root/sub1/sub2' Delete subvolume '/root/sub1' Some caveats: 1) the recursively behaviour need the root capability This because how the subvolume are discovered 2) it is not possible to recursively snapshot a subvolume in read-only mode This because when a subvolume is snapshotted, its nested subvolumes appear as directory in the snapshot. These directories are removed before snapshotting the nested subvolumes. This is incompatible with a read only subvolume. BR G.Baroncelli -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Dedup on read-only snapshots
According to https://github.com/g2p/bedup/tree/wip/dedup-syscall The clone call is considered a write operation and won't work on read-only snapshots. Is this fixed on newer kernels? -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
btrfs filesystems can only be mounted after an unclean shutdown if btrfsck is run and immediately killed!
Hi all, I have two multi-disk btrfs filesystems on a Arch linux 3.4.0 system. After a power failure, both filesystems refuse to mount [ 10.402284] Btrfs loaded [ 10.402714] device fsid 1e7c18a4-02d6-44b1-8eaf-c01378009cd3 devid 4 transid 65282 /dev/sdc [ 10.403108] btrfs: force zlib compression [ 10.403130] btrfs: enabling inode map caching [ 10.403152] btrfs: disk space caching is enabled [ 10.403377] btrfs: failed to read the system array on sdc [ 10.403557] btrfs: open_ctree failed [ 10.431763] device fsid 7f7be913-e359-400f-8bdb-7ef48aad3f03 devid 2 transid 3916 /dev/sdb [ 10.432180] btrfs: force zlib compression [ 10.433040] btrfs: enabling inode map caching [ 10.433892] btrfs: disk space caching is enabled [ 10.434930] btrfs: failed to read the system array on sdb [ 10.435945] btrfs: open_ctree failed fstab: UUID=1e7c18a4-02d6-44b1-8eaf-c01378009cd3 /storage/btrfs btrfs noatime,compress-force=zlib,space_cache,inode_cache 0 0 UUID=7f7be913-e359-400f-8bdb-7ef48aad3f03 /storage/btrfs2 btrfs noatime,compress-force=zlib,space_cache,inode_cache 0 0 The funny thing is that if i run btrfsck for one second on the first filesystem and then kill it with ctrl-c, then both filesystems can be mounted without any problems! I have this problem for many months, probably for all 3.x kernels and maybe a bit older, all git btrfs tools since at least late last year. [root@linuxserver ~/btrfs-progs]# btrfs fi show /dev/sdb Label: none uuid: 7f7be913-e359-400f-8bdb-7ef48aad3f03 Total devices 2 FS bytes used 1.54TB devid1 size 1.82TB used 1.04TB path /dev/sda devid2 size 1.82TB used 1.04TB path /dev/sdb Btrfs Btrfs v0.19 [root@linuxserver ~/btrfs-progs]# btrfs fi show /dev/sdf Label: none uuid: 1e7c18a4-02d6-44b1-8eaf-c01378009cd3 Total devices 4 FS bytes used 4.33TB devid5 size 1.82TB used 1.82TB path /dev/sdg devid4 size 1.82TB used 1.82TB path /dev/sdc devid3 size 1.82TB used 1.79TB path /dev/sdf devid1 size 1.82TB used 1.82TB path /dev/sdd Btrfs Btrfs v0.19 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs filesystems can only be mounted after an unclean shutdown if btrfsck is run and immediately killed!
On Παρασκευή, 8 Ιούνιος 2012 11:28:39 πμ, Tomasz Torcz wrote: On Fri, Jun 08, 2012 at 11:26:21AM +0300, Konstantinos Skarlatos wrote: Hi all, I have two multi-disk btrfs filesystems on a Arch linux 3.4.0 system. After a power failure, both filesystems refuse to mount Multi-device filesystem had to be first fully discovered by btrfs device scan. It is typically done from udev rules. Also, dracut does it in initramfs for quite a long time. (Added cc to btrfs list) You are right, i have forgotten to enable it (arch linux has a new rc.conf option for that), i will reboot in a few minutes to test it. Maybe it would be prudent to give a better error message when such a thing happens, or even make mount run btrfs device scan if it detects that a multi-drive fs is being mounted? -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: cross-subvolume cp --reflink
On Κυριακή, 1 Απρίλιος 2012 8:07:54 μμ, Norbert Scheibner wrote: On: Sun, 01 Apr 2012 19:45:13 +0300 Konstantinos Skarlatos wrote That's my point. This poor man's dedupe would solve my problems here very well. I don't need a zfs-variant of dedupe. I can implement such a file-based dedupe with userland tools and would be happy. do you have any scripts that can search a btrfs filesystem for dupes and replace them with cp --reflink? Nothing really working and tested very well. After I get to known the missing cp --reflink feature I stopped to develop the script any further. I use btrfs for my backups. Ones a day I rsync --delete --inplace the complete system to a subvolume, snapshot it, delete some tempfiles in the snapshot. In my setup I rsync --inplace many servers and workstations, 4-6 times a day into a 12TB btrfs volume, each one in its own subvolume. After every backup a new ro snapshot is created. I have many cross-subvolume duplicate files (OS files, programs, many huge media files that are copied locally from the servers to the workstations etc), so a good dedupe script could save lots of space, and allow me to keep snapshots for much longer. In addition to that I wanted to shrink file-duplicates. What the script should do: 1. I md5sum every file 2. If the checksums are identical, I compare the files 3. If 2 or more files are really identical: - move one to a temp-dir - cp --reflink the second to the position and name of the first - do a chown --reference, chmod --reference and touch --reference to copy owner, file mode bits and time from the orginal to the reflink-copy and then delete the original in temp-dir Everything could be done with bash. Thinkable is the use of a database for the md5sums, which could be used for other purposes in the future. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: cross-subvolume cp --reflink
On 1/4/2012 9:39 μμ, Norbert Scheibner wrote: On: Sun, 01 Apr 2012 19:22:42 +0200Klaus A. Kreil wrote I am just an interested reader on the btrfs list and so far have never posted or sent a message to the list, but I do have a dedup bash script that searches for duplicates underneath a directory (provided as an argument) and hard links identical files. It works very well for an ext3 filesystem, but I guess the basics should be the same for a btrfs filesystem. Thanks for the nice script, it works fine here! I just added a du -sh $1 line at the beginning and end to see how much space it saves. Everyone feel free to correct me here, but: At the moment there is a little problem with the maximum number of hard links in a directory. So I wouldn't use them wherever possible to avoid any thinkable problems in the near future. Plus to hard link 2 files means, that change one file You change the other one. It's something You either don't want to happen or something, which could be done in better ways. The cp --reflink method on a COW-fs is a much smarter method. thats true, cp --reflink is much better. Also am I wrong that btrfs has a limitation on the number of hard links that can only be fixed with a disk format change? Plus hard links across subvolumes do match the case of hard links across devices on a traditional fs, which is forbidden. Plus hard links In my opinion should really be substituted by soft links, because hard links are not transparent at the first sight and can not be copied as it. So no, I'd rather want the patch to allow cross-subvolume cp --reflink in the kernel and I will wait for that to happen. Greetings Norbert -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: cross-subvolume cp --reflink
On 1/4/2012 9:11 μμ, Norbert Scheibner wrote: On: Sun, 01 Apr 2012 20:19:24 +0300 Konstantinos Skarlatos wrote I use btrfs for my backups. Ones a day I rsync --delete --inplace the complete system to a subvolume, snapshot it, delete some tempfiles in the snapshot. In my setup I rsync --inplace many servers and workstations, 4-6 times a day into a 12TB btrfs volume, each one in its own subvolume. After every backup a new ro snapshot is created. I have many cross-subvolume duplicate files (OS files, programs, many huge media files that are copied locally from the servers to the workstations etc), so a good dedupe script could save lots of space, and allow me to keep snapshots for much longer. So the script should be optimized not to try to deduplicate the whole fs everytime but the newly written ones. You could take such a file list out of the rsync output or the btrfs subvolume find-new command. a cron task with btrfs subvolume find-new would be ideal i think Albeit the reflink patch, You could use such a bash-script inside one subvolume, after the rsync and before the snapshot. I don't know how much space it saves for You in this situation, but it's worth a try and a good way to develop such a script, because before You write anything to disc You can see how many duplicates are there and how much space could be freed. MfG Norbert -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] btrfs: allow cross-subvolume BTRFS_IOC_CLONE
On 22/12/2011 2:24 μμ, Chris Samuel wrote: Christoph, On Sat, 2 Apr 2011 12:40:11 AM Chris Mason wrote: Excerpts from Christoph Hellwig's message of 2011-04-01 09:34:05 -0400: I don't think it's a good idea to introduce any user visible operations over subvolume boundaries. Currently we don't have any operations over mount boundaries, which is pretty fumdamental to the unix filesystem semantics. If you want to change this please come up with a clear description of the semantics and post it to linux-fsdevel for discussion. That of course requires a clear description of the btrfs subvolumes, which is still completely missing. The subvolume is just a directory tree that can be snapshotted, and has it's own private inode number space. reflink across subvolumes is no different from copying a file from one subvolume to another at the VFS level. The src and destination are different files and different inodes, they just happen to share data extents. Were Chris Mason's points above enough to sway your opposition to this functionality/patch? There is demand for the ability to move data between subvolumes without needing to copy the extents themselves, it's cropped up again on the list in recent days. It seems a little hard (and counterintuitive) to enforce a wasteful use of resources to copy data between different parts of the same filesystem which happen to be a on a different subvolume when it's permitted functional to the same filesystem on the same subvolume. I don't dispute the comment about documentation on subvolumes though, there is a short discussion of them on the btrfs wiki in the sysadmins guide, but not really a lot of detail. :-) All the best, Chris Me too wants cp --reflink across subvolumes. Please make this feature available to us, as its a poor man's dedupe and would give big space savings for many use cases. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Status of dedupe in btrfs
Hello everyone, I was reading this article in Slashdot about dedupe [1] and i was wondering about the status of the (offline) dedupe patches in btrfs. Are they applicable to a recent kernel? do userspace tools support it? Kind regards [1] http://sk.slashdot.org/story/12/01/04/1955248/ask-slashdot-freeopen-deduplication-software -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Btrfs: blocked for more than 120 seconds, made worse by 3.2 rc7
Hello all: I have two machines with btrfs, that give me the blocked for more than 120 seconds message. After that I cannot write anything to disk, i am unable to unmount the btrfs filesystem and i can only reboot with sysrq-trigger. It always happens when i write many files with rsync over network. When i used 3.2rc6 it happened randomly on both machines after 50-500gb of writes. with rc7 it happens after much less writes, probably 10gb or so, but only on machine 1 for the time being. machine 2 has not crashed yet after 200gb of writes and I am still testing that. machine 1: btrfs on a 6tb sparse file, mounted as loop, on a xfs filesystem that lies on a 10TB md raid5. mount options compress=zlib,compress-force machine 2: btrfs over md raid 5 (4x2TB)=5.5TB filesystem. mount options compress=zlib,compress-force pastebins: machine1: 3.2rc7 http://pastebin.com/u583G7jK 3.2rc6 http://pastebin.com/L12TDaXa machine2: 3.2rc6 http://pastebin.com/khD0wGXx 3.2rc7 (not crashed yet) -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Btrfs: blocked for more than 120 seconds, made worse by 3.2 rc7
Well now machine2 has just crashed too... http://pastebin.com/gvfUm0az On Τετάρτη, 28 Δεκέμβριος 2011 9:26:07 μμ, Konstantinos Skarlatos wrote: Hello all: I have two machines with btrfs, that give me the blocked for more than 120 seconds message. After that I cannot write anything to disk, i am unable to unmount the btrfs filesystem and i can only reboot with sysrq-trigger. It always happens when i write many files with rsync over network. When i used 3.2rc6 it happened randomly on both machines after 50-500gb of writes. with rc7 it happens after much less writes, probably 10gb or so, but only on machine 1 for the time being. machine 2 has not crashed yet after 200gb of writes and I am still testing that. machine 1: btrfs on a 6tb sparse file, mounted as loop, on a xfs filesystem that lies on a 10TB md raid5. mount options compress=zlib,compress-force machine 2: btrfs over md raid 5 (4x2TB)=5.5TB filesystem. mount options compress=zlib,compress-force pastebins: machine1: 3.2rc7 http://pastebin.com/u583G7jK 3.2rc6 http://pastebin.com/L12TDaXa machine2: 3.2rc6 http://pastebin.com/khD0wGXx 3.2rc7 (not crashed yet) -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Btrfs: blocked for more than 120 seconds, made worse by 3.2 rc7
On Τετάρτη, 28 Δεκέμβριος 2011 11:48:32 μμ, Dave Chinner wrote: On Wed, Dec 28, 2011 at 09:26:07PM +0200, Konstantinos Skarlatos wrote: Hello all: I have two machines with btrfs, that give me the blocked for more than 120 seconds message. After that I cannot write anything to disk, i am unable to unmount the btrfs filesystem and i can only reboot with sysrq-trigger. It always happens when i write many files with rsync over network. When i used 3.2rc6 it happened randomly on both machines after 50-500gb of writes. with rc7 it happens after much less writes, probably 10gb or so, but only on machine 1 for the time being. machine 2 has not crashed yet after 200gb of writes and I am still testing that. machine 1: btrfs on a 6tb sparse file, mounted as loop, on a xfs filesystem that lies on a 10TB md raid5. mount options compress=zlib,compress-force machine 2: btrfs over md raid 5 (4x2TB)=5.5TB filesystem. mount options compress=zlib,compress-force pastebins: machine1: 3.2rc7 http://pastebin.com/u583G7jK 3.2rc6 http://pastebin.com/L12TDaXa These two are caused by it taking longer than 120s for XFS to fsync the loop file. Writing a signficant chunk of a sparse 6TB file on a software RAID5 volume is going to take some time. However, if IO is not occurring, then somewhere below XFS an IO has gone missing (MD or hardware problem) because the fsync on the XFS file is blocked waiting for an IO completion. machine2: 3.2rc6 http://pastebin.com/khD0wGXx 3.2rc7 (not crashed yet) Crashed a few hours ago, here is the rc7 pastebin http://pastebin.com/gvfUm0az These don't have XFS in the picture, but also appear to be hung waiting on IO completion with MD stuck in make_request()-get_active_stripe(). That, to me, indicates an MD problem. Added the linux-raid mailing list Please reply to me too, because i am not subscribed. Cheers, Dave. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Blocked for more than 120 seconds
even more kernel messages from btrfs crashing when rsyncing large amounts of data on 3.2rc4 Dec 3 15:12:14 mail kernel: [15481.100564] loop0 D 00010044b6c5 0 1729 2 0x Dec 3 15:12:14 mail kernel: [15481.101550] 8801f9b31b30 0046 Dec 3 15:12:14 mail kernel: [15481.102548] 880200950e40 8801f9b31fd8 8801f9b31fd8 8801f9b31fd8 Dec 3 15:12:14 mail kernel: [15481.103539] 880202cb7200 880200950e40 0002 8801f9b31b78 Dec 3 15:12:14 mail kernel: [15481.104533] Call Trace: Dec 3 15:12:14 mail kernel: [15481.105531] [81101a55] ? find_get_pages_tag+0x125/0x150 Dec 3 15:12:14 mail kernel: [15481.106541] [8110e205] ? pagevec_lookup_tag+0x25/0x40 Dec 3 15:12:14 mail kernel: [15481.107552] [8101d639] ? read_tsc+0x9/0x20 Dec 3 15:12:14 mail kernel: [15481.108576] [8108f14d] ? ktime_get_ts+0xad/0xe0 Dec 3 15:12:14 mail kernel: [15481.109592] [81101d60] ? __lock_page+0x70/0x70 Dec 3 15:12:14 mail kernel: [15481.110607] [814140bf] schedule+0x3f/0x60 Dec 3 15:12:14 mail kernel: [15481.111619] [8141416f] io_schedule+0x8f/0xd0 Dec 3 15:12:14 mail kernel: [15481.112641] [81101d6e] sleep_on_page+0xe/0x20 Dec 3 15:12:14 mail kernel: [15481.113639] [8141491f] __wait_on_bit+0x5f/0x90 Dec 3 15:12:14 mail kernel: [15481.114629] [81101f58] wait_on_page_bit+0x78/0x80 Dec 3 15:12:14 mail kernel: [15481.115628] [81085790] ? autoremove_wake_function+0x40/0x40 Dec 3 15:12:14 mail kernel: [15481.116614] [811020cc] filemap_fdatawait_range+0x10c/0x1a0 Dec 3 15:12:14 mail kernel: [15481.117613] [811030c8] filemap_write_and_wait_range+0x68/0x80 Dec 3 15:12:14 mail kernel: [15481.118630] [a03a7234] xfs_file_fsync+0x54/0x340 [xfs] Dec 3 15:12:14 mail kernel: [15481.119629] [8119148b] vfs_fsync+0x2b/0x40 Dec 3 15:12:14 mail kernel: [15481.120627] [a04dacf2] do_bio_filebacked+0x1b2/0x320 [loop] Dec 3 15:12:14 mail kernel: [15481.121645] [a050efac] ? end_workqueue_bio+0x9c/0xa0 [btrfs] Dec 3 15:12:14 mail kernel: [15481.122668] [a04daf1b] loop_thread+0xbb/0x260 [loop] Dec 3 15:12:14 mail kernel: [15481.123674] [81085750] ? abort_exclusive_wait+0xb0/0xb0 Dec 3 15:12:14 mail kernel: [15481.124676] [a04dae60] ? do_bio_filebacked+0x320/0x320 [loop] Dec 3 15:12:14 mail kernel: [15481.125698] [81084e0c] kthread+0x8c/0xa0 Dec 3 15:12:14 mail kernel: [15481.126710] [81419a34] kernel_thread_helper+0x4/0x10 Dec 3 15:12:14 mail kernel: [15481.127721] [81084d80] ? kthread_worker_fn+0x190/0x190 Dec 3 15:12:14 mail kernel: [15481.128742] [81419a30] ? gs_change+0x13/0x13 Dec 3 15:12:14 mail kernel: [15481.131702] btrfs-transacti D 8801f9ab7200 0 1756 2 0x Dec 3 15:12:14 mail kernel: [15481.132723] 8801e7533bc0 0046 88020fc93400 0002 Dec 3 15:12:14 mail kernel: [15481.133744] 8801f9ab7200 8801e7533fd8 8801e7533fd8 8801e7533fd8 Dec 3 15:12:14 mail kernel: [15481.134771] 880200950e40 8801f9ab7200 8801e7533b10 81051ae2 Dec 3 15:12:14 mail kernel: [15481.135813] Call Trace: Dec 3 15:12:14 mail kernel: [15481.136828] [8105ad36] ? ttwu_do_activate.constprop.172+0x66/0x70 Dec 3 15:12:14 mail kernel: [15481.137863] [8105bd6e] ? try_to_wake_up+0x1de/0x290 Dec 3 15:12:14 mail kernel: [15481.138914] [814140bf] schedule+0x3f/0x60 Dec 3 15:12:14 mail kernel: [15481.139956] [814147d5] schedule_timeout+0x305/0x390 Dec 3 15:12:14 mail kernel: [15481.141007] [8104d003] ? __wake_up+0x53/0x70 Dec 3 15:12:14 mail kernel: [15481.142074] [81413348] wait_for_common+0xc8/0x160 Dec 3 15:12:14 mail kernel: [15481.143124] [8105be20] ? try_to_wake_up+0x290/0x290 Dec 3 15:12:14 mail kernel: [15481.144170] [814133fd] wait_for_completion+0x1d/0x20 Dec 3 15:12:14 mail kernel: [15481.145229] [a050f0bb] write_dev_flush+0x4b/0x140 [btrfs] Dec 3 15:12:14 mail kernel: [15481.146275] [a0511086] write_all_supers+0x6f6/0x800 [btrfs] Dec 3 15:12:14 mail kernel: [15481.147317] [a05111a3] write_ctree_super+0x13/0x20 [btrfs] Dec 3 15:12:14 mail kernel: [15481.148354] [a05164dd] btrfs_commit_transaction+0x63d/0x880 [btrfs] Dec 3 15:12:14 mail kernel: [15481.149397] [81085750] ? abort_exclusive_wait+0xb0/0xb0 Dec 3 15:12:14 mail kernel: [15481.150416] [a0516b74] ? start_transaction+0x94/0x2b0 [btrfs] Dec 3 15:12:14 mail kernel: [15481.151444] [a050ed4d] transaction_kthread+0x26d/0x290 [btrfs] Dec 3 15:12:14 mail kernel: [15481.152492] [a050eae0] ? btrfs_congested_fn+0xd0/0xd0 [btrfs] Dec 3 15:12:14 mail kernel: [15481.153519]
Re: Blocked for more than 120 seconds
] schedule+0x3f/0x60 [15601.348711] [8141416f] io_schedule+0x8f/0xd0 [15601.348714] [81101d6e] sleep_on_page+0xe/0x20 [15601.348716] [8141491f] __wait_on_bit+0x5f/0x90 [15601.348719] [81101f58] wait_on_page_bit+0x78/0x80 [15601.348722] [81085790] ? autoremove_wake_function+0x40/0x40 [15601.348725] [81102845] grab_cache_page_write_begin+0x95/0xe0 [15601.348732] [a03a1150] ? xfs_get_blocks_direct+0x20/0x20 [xfs] [15601.348736] [811967b8] block_write_begin+0x38/0xa0 [15601.348743] [a03a1213] xfs_vm_write_begin+0x43/0x70 [xfs] [15601.348746] [8110233c] generic_file_buffered_write+0x10c/0x270 [15601.348754] [a03aad66] ? xfs_iunlock+0x116/0x180 [xfs] [15601.348761] [a03a7fef] xfs_file_buffered_aio_write+0x10f/0x200 [xfs] [15601.348768] [a03a8252] xfs_file_aio_write+0x172/0x2a0 [xfs] [15601.348772] [81162d62] do_sync_write+0xd2/0x110 [15601.348775] [811f0fcc] ? security_file_permission+0x2c/0xb0 [15601.348778] [81163311] ? rw_verify_area+0x61/0xf0 [15601.348781] [8116366f] vfs_write+0xaf/0x180 [15601.348784] [81163b12] sys_pwrite64+0x82/0xb0 [15601.348787] [814178c2] system_call_fastpath+0x16/0x1b On Σάββατο, 3 Δεκέμβριος 2011 2:35:50 πμ, Konstantinos Skarlatos wrote: After about 1TB of rsyncs from multiple servers at the same time, plus some heavy filesystem loading, i believe that 3.2rc4 solves the problem for me. Now if only we had deduplication and an fsck tool :) On Παρασκευή, 2 Δεκέμβριος 2011 9:53:10 μμ, Konstantinos Skarlatos wrote: I see they got into 3.2rc4, so I am now compiling it. I will report back in a few hours On Παρασκευή, 2 Δεκέμβριος 2011 5:48:31 μμ, Tobias wrote: Am 02.12.2011 16:22, schrieb Konstantinos Skarlatos: So, the transaction close is in btrfs_evict_inode, which sounds like a deadlock recently fixed by this commit: http://git.kernel.org/?p=linux/kernel/git/mason/linux-btrfs.git;a=commit;h=aa38a711a893accf5b5192f3d705a120deaa81e0 If you pull the for-linus branch from today, hopefully the problem will be gone. This looks very good. With this Kernel i still have some hangs, but only in rsync, only under high load and they don't lock up the system - so i guess it's ok now. I still have hangs and lock ups under the same situation (rsync of many files) under 3.2rc3. rc3 made the hang appear after 200gb of files, while in rc2 i had hangs after only 11gb . Yes, i had them too in 3.2rc3! The problem where solved with patches from the btrfs-for-linus -branch. (see link above). Tobias -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Blocked for more than 120 seconds
Hi all On 2/12/2011 3:46 μμ, Tobias wrote: Hi Chris! Am 01.12.2011 19:41, schrieb Chris Mason: So, the transaction close is in btrfs_evict_inode, which sounds like a deadlock recently fixed by this commit: http://git.kernel.org/?p=linux/kernel/git/mason/linux-btrfs.git;a=commit;h=aa38a711a893accf5b5192f3d705a120deaa81e0 If you pull the for-linus branch from today, hopefully the problem will be gone. This looks very good. With this Kernel i still have some hangs, but only in rsync, only under high load and they don't lock up the system - so i guess it's ok now. I still have hangs and lock ups under the same situation (rsync of many files) under 3.2rc3. rc3 made the hang appear after 200gb of files, while in rc2 i had hangs after only 11gb . Thank You very much for Your help! When will this patches go into the main Kernel? Tobias -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Blocked for more than 120 seconds
I see they got into 3.2rc4, so I am now compiling it. I will report back in a few hours On Παρασκευή, 2 Δεκέμβριος 2011 5:48:31 μμ, Tobias wrote: Am 02.12.2011 16:22, schrieb Konstantinos Skarlatos: So, the transaction close is in btrfs_evict_inode, which sounds like a deadlock recently fixed by this commit: http://git.kernel.org/?p=linux/kernel/git/mason/linux-btrfs.git;a=commit;h=aa38a711a893accf5b5192f3d705a120deaa81e0 If you pull the for-linus branch from today, hopefully the problem will be gone. This looks very good. With this Kernel i still have some hangs, but only in rsync, only under high load and they don't lock up the system - so i guess it's ok now. I still have hangs and lock ups under the same situation (rsync of many files) under 3.2rc3. rc3 made the hang appear after 200gb of files, while in rc2 i had hangs after only 11gb . Yes, i had them too in 3.2rc3! The problem where solved with patches from the btrfs-for-linus -branch. (see link above). Tobias -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Blocked for more than 120 seconds
After about 1TB of rsyncs from multiple servers at the same time, plus some heavy filesystem loading, i believe that 3.2rc4 solves the problem for me. Now if only we had deduplication and an fsck tool :) On Παρασκευή, 2 Δεκέμβριος 2011 9:53:10 μμ, Konstantinos Skarlatos wrote: I see they got into 3.2rc4, so I am now compiling it. I will report back in a few hours On Παρασκευή, 2 Δεκέμβριος 2011 5:48:31 μμ, Tobias wrote: Am 02.12.2011 16:22, schrieb Konstantinos Skarlatos: So, the transaction close is in btrfs_evict_inode, which sounds like a deadlock recently fixed by this commit: http://git.kernel.org/?p=linux/kernel/git/mason/linux-btrfs.git;a=commit;h=aa38a711a893accf5b5192f3d705a120deaa81e0 If you pull the for-linus branch from today, hopefully the problem will be gone. This looks very good. With this Kernel i still have some hangs, but only in rsync, only under high load and they don't lock up the system - so i guess it's ok now. I still have hangs and lock ups under the same situation (rsync of many files) under 3.2rc3. rc3 made the hang appear after 200gb of files, while in rc2 i had hangs after only 11gb . Yes, i had them too in 3.2rc3! The problem where solved with patches from the btrfs-for-linus -branch. (see link above). Tobias -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Having parent transid verify failed
Hello, I have a 5.5TB Btrfs filesystem on top of a md-raid 5 device. Now if i run some file operations like find, i get these messages. kernel is 2.6.38.5-1 on arch linux May 5 14:15:12 mail kernel: [13559.089713] parent transid verify failed on 3062073683968 wanted 5181 found 5188 May 5 14:15:12 mail kernel: [13559.089834] parent transid verify failed on 3062073683968 wanted 5181 found 5188 May 5 14:15:14 mail kernel: [13560.752074] btrfs-transacti D 88007211ac78 0 5339 2 0x May 5 14:15:14 mail kernel: [13560.752078] 880023167d30 0046 8800 8800195b6000 May 5 14:15:14 mail kernel: [13560.752082] 880023167c10 02c8f27b4000 880023167fd8 88007211a9a0 May 5 14:15:14 mail kernel: [13560.752085] 880023167fd8 880023167fd8 88007211ac80 880023167fd8 May 5 14:15:14 mail kernel: [13560.752087] Call Trace: May 5 14:15:14 mail kernel: [13560.752101] [a0850d02] ? run_clustered_refs+0x132/0x830 [btrfs] May 5 14:15:14 mail kernel: [13560.752105] [813aff3d] schedule_timeout+0x2fd/0x380 May 5 14:15:14 mail kernel: [13560.752108] [813b0cf9] ? mutex_unlock+0x9/0x10 May 5 14:15:14 mail kernel: [13560.752115] [a087e9f4] ? btrfs_run_ordered_operations+0x1f4/0x210 [btrfs] May 5 14:15:14 mail kernel: [13560.752122] [a0860fa3] btrfs_commit_transaction+0x263/0x750 [btrfs] May 5 14:15:14 mail kernel: [13560.752126] [81079ff0] ? autoremove_wake_function+0x0/0x40 May 5 14:15:14 mail kernel: [13560.752131] [a085a9bd] transaction_kthread+0x26d/0x290 [btrfs] May 5 14:15:14 mail kernel: [13560.752137] [a085a750] ? transaction_kthread+0x0/0x290 [btrfs] May 5 14:15:14 mail kernel: [13560.752139] [81079717] kthread+0x87/0x90 May 5 14:15:14 mail kernel: [13560.752142] [8100bc24] kernel_thread_helper+0x4/0x10 May 5 14:15:14 mail kernel: [13560.752145] [81079690] ? kthread+0x0/0x90 May 5 14:15:14 mail kernel: [13560.752147] [8100bc20] ? kernel_thread_helper+0x0/0x10 May 5 14:15:17 mail kernel: [13564.092081] verify_parent_transid: 40736 callbacks suppressed May 5 14:15:17 mail kernel: [13564.092084] parent transid verify failed on 3062073683968 wanted 5181 found 5188 --snip-- May 5 14:17:13 mail kernel: [13679.169772] parent transid verify failed on 3062073683968 wanted 5181 found 5188 --snip-- May 5 14:17:14 mail kernel: [13680.751996] btrfs-transacti D 88007211ac78 0 5339 2 0x May 5 14:17:14 mail kernel: [13680.752000] 880023167d30 0046 8800 8800195b6000 May 5 14:17:14 mail kernel: [13680.752004] 880023167c10 02c8f27b4000 880023167fd8 88007211a9a0 May 5 14:17:14 mail kernel: [13680.752006] 880023167fd8 880023167fd8 88007211ac80 880023167fd8 May 5 14:17:14 mail kernel: [13680.752009] Call Trace: May 5 14:17:14 mail kernel: [13680.752024] [a0850d02] ? run_clustered_refs+0x132/0x830 [btrfs] May 5 14:17:14 mail kernel: [13680.752030] [813aff3d] schedule_timeout+0x2fd/0x380 May 5 14:17:14 mail kernel: [13680.752032] [813b0cf9] ? mutex_unlock+0x9/0x10 May 5 14:17:14 mail kernel: [13680.752040] [a087e9f4] ? btrfs_run_ordered_operations+0x1f4/0x210 [btrfs] May 5 14:17:14 mail kernel: [13680.752046] [a0860fa3] btrfs_commit_transaction+0x263/0x750 [btrfs] May 5 14:17:14 mail kernel: [13680.752051] [81079ff0] ? autoremove_wake_function+0x0/0x40 May 5 14:17:14 mail kernel: [13680.752057] [a085a9bd] transaction_kthread+0x26d/0x290 [btrfs] May 5 14:17:14 mail kernel: [13680.752062] [a085a750] ? transaction_kthread+0x0/0x290 [btrfs] May 5 14:17:14 mail kernel: [13680.752065] [81079717] kthread+0x87/0x90 May 5 14:17:14 mail kernel: [13680.752068] [8100bc24] kernel_thread_helper+0x4/0x10 May 5 14:17:14 mail kernel: [13680.752070] [81079690] ? kthread+0x0/0x90 May 5 14:17:14 mail kernel: [13680.752072] [8100bc20] ? kernel_thread_helper+0x0/0x10 May 5 14:17:14 mail kernel: [13680.752079] dd D 8800714c4838 0 5792 5740 0x0004 May 5 14:17:14 mail kernel: [13680.752082] 88006a205b38 0082 88006a205af8 0246 May 5 14:17:14 mail kernel: [13680.752085] ea00017f57e8 88006a205fd8 88006a205fd8 8800714c4560 May 5 14:17:14 mail kernel: [13680.752088] 88006a205fd8 88006a205fd8 8800714c4840 88006a205fd8 May 5 14:17:14 mail kernel: [13680.752090] Call Trace: May 5 14:17:14 mail kernel: [13680.752095] [810ff145] ? zone_statistics+0x75/0x90 May 5 14:17:14 mail kernel: [13680.752098] [810ea8b7] ? get_page_from_freelist+0x3c7/0x820 May 5 14:17:14 mail kernel: [13680.752101] [810e3588] ? find_get_page+0x68/0xb0 May 5 14:17:14 mail kernel: [13680.752108] [a08603f9]
Re: Having parent transid verify failed
On 5/5/2011 2:42 μμ, Chris Mason wrote: Excerpts from Konstantinos Skarlatos's message of 2011-05-05 07:19:52 -0400: Hello, I have a 5.5TB Btrfs filesystem on top of a md-raid 5 device. Now if i run some file operations like find, i get these messages. kernel is 2.6.38.5-1 on arch linux Are all of the messages for this one block? parent transid verify failed on 3062073683968 wanted 5181 found 5188 yes, only this block -chris -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Having parent transid verify failed
On 5/5/2011 6:06 μμ, Chris Mason wrote: Excerpts from Konstantinos Skarlatos's message of 2011-05-05 10:27:30 -0400: attached you can find the whole dmesg log. I can trigger the error again if more logs are needed Yes, I'll send you a patch to get rid of the printk for the transid failed message. That way we can get a clean view of the other errors. Will you be able to compile/test it? Yes, i think i will be able to make it, but because i have only done this once and in a quite hackish way, i may need some help in order to do it right. -chris -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Having parent transid verify failed
I think i made some progress. When i tried to remove the directory that i suspect contains the problematic file, i got this on the console rm -rf serverloft/ 2011 May 5 23:32:53 mail [ 200.580195] Oops: [#1] PREEMPT SMP 2011 May 5 23:32:53 mail [ 200.580220] last sysfs file: /sys/module/vt/parameters/default_utf8 2011 May 5 23:32:53 mail [ 200.581145] Stack: 2011 May 5 23:32:53 mail [ 200.581276] Call Trace: 2011 May 5 23:32:53 mail [ 200.581732] Code: cc 00 00 48 8d 91 28 e0 ff ff 48 89 e5 48 81 ec 90 00 00 00 48 89 5d d8 4c 89 65 e0 48 89 f3 4c 89 6d e8 4c 89 75 f0 4c 89 7d f8 48 8b 76 30 83 42 1c 01 48 b8 00 00 00 00 00 16 00 00 48 01 f0 2011 May 5 23:32:53 mail [ 200.583376] CR2: 0030 here is the part of dmesg that does not contain the thousands of parent transid verify failed messages May 5 23:32:51 mail kernel: [ 198.371084] parent transid verify failed on 3062073683968 wanted 5181 found 5188 May 5 23:32:51 mail kernel: [ 198.371204] parent transid verify failed on 3062073683968 wanted 5181 found 5188 May 5 23:32:53 mail kernel: [ 200.572774] Modules linked in: ipv6 btrfs zlib_deflate crc32c libcrc32c ext2 raid456 async_raid6_recov async_pq raid6_pq async_xor xor async_memcpy async_tx md_mod usb_storage uas snd_seq_dummy snd_seq_oss radeon snd_seq_midi_event ttm snd_seq snd_hda_codec_hdmi snd_seq_device drm_kms_helper ohci_hcd snd_hda_intel snd_hda_codec snd_pcm_oss snd_hwdep drm i2c_algo_bit snd_mixer_oss snd_pcm i2c_piix4 snd_timer snd soundcore snd_page_alloc ehci_hcd wmi i2c_core usbcore evdev processor button k10temp serio_raw pcspkr sg r8169 edac_core shpchp pci_hotplug edac_mce_amd mii sp5100_tco ext4 mbcache jbd2 crc16 sd_mod pata_acpi ahci libahci pata_atiixp libata scsi_mod May 5 23:32:53 mail kernel: [ 200.572808] Pid: 1037, comm: btrfs-transacti Not tainted 2.6.38-ARCH #1 May 5 23:32:53 mail kernel: [ 200.572810] Call Trace: May 5 23:32:53 mail kernel: [ 200.572817] [813a932b] ? __schedule_bug+0x59/0x5d May 5 23:32:53 mail kernel: [ 200.572820] [813af827] ? schedule+0x9f7/0xad0 May 5 23:32:53 mail kernel: [ 200.572823] [811e5827] ? generic_unplug_device+0x37/0x40 May 5 23:32:53 mail kernel: [ 200.572827] [a07ac164] ? md_raid5_unplug_device+0x64/0x110 [raid456] May 5 23:32:53 mail kernel: [ 200.572830] [a07ac223] ? raid5_unplug_queue+0x13/0x20 [raid456] May 5 23:32:53 mail kernel: [ 200.572833] [81012d79] ? read_tsc+0x9/0x20 May 5 23:32:53 mail kernel: [ 200.572837] [8108418c] ? ktime_get_ts+0xac/0xe0 May 5 23:32:53 mail kernel: [ 200.572840] [810e36c0] ? sync_page+0x0/0x50 May 5 23:32:53 mail kernel: [ 200.572842] [813af96e] ? io_schedule+0x6e/0xb0 May 5 23:32:53 mail kernel: [ 200.572844] [810e36fb] ? sync_page+0x3b/0x50 May 5 23:32:53 mail kernel: [ 200.572846] [813b0077] ? __wait_on_bit+0x57/0x80 May 5 23:32:53 mail kernel: [ 200.572848] [810e38c0] ? wait_on_page_bit+0x70/0x80 May 5 23:32:53 mail kernel: [ 200.572851] [8107a030] ? wake_bit_function+0x0/0x40 May 5 23:32:53 mail kernel: [ 200.572861] [a08348d2] ? read_extent_buffer_pages+0x412/0x480 [btrfs] May 5 23:32:53 mail kernel: [ 200.572867] [a0809e00] ? btree_get_extent+0x0/0x1b0 [btrfs] May 5 23:32:53 mail kernel: [ 200.572873] [a080ac7e] ? btree_read_extent_buffer_pages.isra.60+0x5e/0xb0 [btrfs] May 5 23:32:53 mail kernel: [ 200.572880] [a080c0bc] ? read_tree_block+0x3c/0x60 [btrfs] May 5 23:32:53 mail kernel: [ 200.572884] [a07f272b] ? read_block_for_search.isra.34+0x1fb/0x410 [btrfs] May 5 23:32:53 mail kernel: [ 200.572890] [a08417d1] ? btrfs_tree_unlock+0x51/0x60 [btrfs] May 5 23:32:53 mail kernel: [ 200.572895] [a07f5ca0] ? btrfs_search_slot+0x430/0xa30 [btrfs] May 5 23:32:53 mail kernel: [ 200.572900] [a07fb3a6] ? lookup_inline_extent_backref+0x96/0x460 [btrfs] May 5 23:32:53 mail kernel: [ 200.572904] [8112b8d3] ? kmem_cache_alloc+0x133/0x150 May 5 23:32:53 mail kernel: [ 200.572908] [a07fd452] ? __btrfs_free_extent+0xc2/0x6d0 [btrfs] May 5 23:32:53 mail kernel: [ 200.572914] [a0800f59] ? run_clustered_refs+0x389/0x830 [btrfs] May 5 23:32:53 mail kernel: [ 200.572920] [a084d900] ? btrfs_find_ref_cluster+0x10/0x190 [btrfs] May 5 23:32:53 mail kernel: [ 200.572925] [a08014c0] ? btrfs_run_delayed_refs+0xc0/0x210 [btrfs] May 5 23:32:53 mail kernel: [ 200.572927] [813b0cf9] ? mutex_unlock+0x9/0x10 May 5 23:32:53 mail kernel: [ 200.572933] [a0810db8] ? btrfs_commit_transaction+0x78/0x750 [btrfs] May 5 23:32:53 mail kernel: [ 200.572936] [81079ff0] ? autoremove_wake_function+0x0/0x40 May 5 23:32:53 mail kernel: [ 200.572941] [a080a9bd] ? transaction_kthread+0x26d/0x290 [btrfs] May 5 23:32:53 mail kernel:
Re: Having parent transid verify failed
On 5/5/2011 11:32 μμ, Chris Mason wrote: Excerpts from Konstantinos Skarlatos's message of 2011-05-05 16:27:54 -0400: I think i made some progress. When i tried to remove the directory that i suspect contains the problematic file, i got this on the console rm -rf serverloft/ Ok, our one bad block is in the extent allocation tree. This is going to be the very hardest thing to fix. Until I finish off the code to rebuild parts of the extent allocation tree, I think your best bet is to copy the files off. The big question is, what happened to make this error? Can you describe your setup in more detail? I created this btrfs filesystem on an arch linux system (amd64, quad core) with kernel 2.3.38.1. it is on top of a md raid 5. [root@linuxserver ~]# cat /proc/mdstat Personalities : [raid6] [raid5] [raid4] md0 : active raid5 sde1[3] sdc1[1] sda1[0] sdf1[4] 5860535808 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/4] [] the raid was grown from 3 devices to 4, and then btrfs was grown to max size. mount options were clear_cache,compress-force. I was investigating a performance issue that i had, because over the network i could only write to the filesystem at about 32mb/sec. when writing btrfs-delalloc- cpu usage was at 100%. While investigating i disabled compression, enabled space_cache and tried zlib compression, and various combinations, while copying large files back and forth using samba. BTW I tried to change some mount options using mount -o remount but although the new options were printed on dmesg i think that they were not enabled. I got the first error when i was copying some files and at the same time created a directory over samba. After a while i upgraded to 2.6.38.5 but nothing seems to have changed. I really dont think there is a hardware error here, but to be safe I am now running a check on the raid -chris 2011 May 5 23:32:53 mail [ 200.580195] Oops: [#1] PREEMPT SMP 2011 May 5 23:32:53 mail [ 200.580220] last sysfs file: /sys/module/vt/parameters/default_utf8 2011 May 5 23:32:53 mail [ 200.581145] Stack: 2011 May 5 23:32:53 mail [ 200.581276] Call Trace: 2011 May 5 23:32:53 mail [ 200.581732] Code: cc 00 00 48 8d 91 28 e0 ff ff 48 89 e5 48 81 ec 90 00 00 00 48 89 5d d8 4c 89 65 e0 48 89 f3 4c 89 6d e8 4c 89 75 f0 4c 89 7d f848 8b 76 30 83 42 1c 01 48 b8 00 00 00 00 00 16 00 00 48 01 f0 2011 May 5 23:32:53 mail [ 200.583376] CR2: 0030 here is the part of dmesg that does not contain the thousands of parent transid verify failed messages May 5 23:32:51 mail kernel: [ 198.371084] parent transid verify failed on 3062073683968 wanted 5181 found 5188 May 5 23:32:51 mail kernel: [ 198.371204] parent transid verify failed on 3062073683968 wanted 5181 found 5188 May 5 23:32:53 mail kernel: [ 200.572774] Modules linked in: ipv6 btrfs zlib_deflate crc32c libcrc32c ext2 raid456 async_raid6_recov async_pq raid6_pq async_xor xor async_memcpy async_tx md_mod usb_storage uas snd_seq_dummy snd_seq_oss radeon snd_seq_midi_event ttm snd_seq snd_hda_codec_hdmi snd_seq_device drm_kms_helper ohci_hcd snd_hda_intel snd_hda_codec snd_pcm_oss snd_hwdep drm i2c_algo_bit snd_mixer_oss snd_pcm i2c_piix4 snd_timer snd soundcore snd_page_alloc ehci_hcd wmi i2c_core usbcore evdev processor button k10temp serio_raw pcspkr sg r8169 edac_core shpchp pci_hotplug edac_mce_amd mii sp5100_tco ext4 mbcache jbd2 crc16 sd_mod pata_acpi ahci libahci pata_atiixp libata scsi_mod May 5 23:32:53 mail kernel: [ 200.572808] Pid: 1037, comm: btrfs-transacti Not tainted 2.6.38-ARCH #1 May 5 23:32:53 mail kernel: [ 200.572810] Call Trace: May 5 23:32:53 mail kernel: [ 200.572817] [813a932b] ? __schedule_bug+0x59/0x5d May 5 23:32:53 mail kernel: [ 200.572820] [813af827] ? schedule+0x9f7/0xad0 May 5 23:32:53 mail kernel: [ 200.572823] [811e5827] ? generic_unplug_device+0x37/0x40 May 5 23:32:53 mail kernel: [ 200.572827] [a07ac164] ? md_raid5_unplug_device+0x64/0x110 [raid456] May 5 23:32:53 mail kernel: [ 200.572830] [a07ac223] ? raid5_unplug_queue+0x13/0x20 [raid456] May 5 23:32:53 mail kernel: [ 200.572833] [81012d79] ? read_tsc+0x9/0x20 May 5 23:32:53 mail kernel: [ 200.572837] [8108418c] ? ktime_get_ts+0xac/0xe0 May 5 23:32:53 mail kernel: [ 200.572840] [810e36c0] ? sync_page+0x0/0x50 May 5 23:32:53 mail kernel: [ 200.572842] [813af96e] ? io_schedule+0x6e/0xb0 May 5 23:32:53 mail kernel: [ 200.572844] [810e36fb] ? sync_page+0x3b/0x50 May 5 23:32:53 mail kernel: [ 200.572846] [813b0077] ? __wait_on_bit+0x57/0x80 May 5 23:32:53 mail kernel: [ 200.572848] [810e38c0] ? wait_on_page_bit+0x70/0x80 May 5 23:32:53 mail kernel: [ 200.572851] [8107a030] ? wake_bit_function+0x0/0x40 May 5 23:32:53 mail kernel: [ 200.572861] [a08348d2] ? read_extent_buffer_pages+0x412/0x480 [btrfs] May 5 23:32:53
Re: Having parent transid verify failed
On 6/5/2011 2:50 πμ, Chris Mason wrote: Excerpts from Konstantinos Skarlatos's message of 2011-05-05 17:04:00 -0400: On 5/5/2011 11:32 μμ, Chris Mason wrote: Excerpts from Konstantinos Skarlatos's message of 2011-05-05 16:27:54 -0400: I think i made some progress. When i tried to remove the directory that i suspect contains the problematic file, i got this on the console rm -rf serverloft/ Ok, our one bad block is in the extent allocation tree. This is going to be the very hardest thing to fix. Until I finish off the code to rebuild parts of the extent allocation tree, I think your best bet is to copy the files off. The big question is, what happened to make this error? Can you describe your setup in more detail? I created this btrfs filesystem on an arch linux system (amd64, quad core) with kernel 2.3.38.1. it is on top of a md raid 5. [root@linuxserver ~]# cat /proc/mdstat Personalities : [raid6] [raid5] [raid4] md0 : active raid5 sde1[3] sdc1[1] sda1[0] sdf1[4] 5860535808 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/4] [] the raid was grown from 3 devices to 4, and then btrfs was grown to max size. mount options were clear_cache,compress-force. I was investigating a performance issue that i had, because over the network i could only write to the filesystem at about 32mb/sec. when writing btrfs-delalloc- cpu usage was at 100%. While investigating i disabled compression, enabled space_cache and tried zlib compression, and various combinations, while copying large files back and forth using samba. BTW I tried to change some mount options using mount -o remount but although the new options were printed on dmesg i think that they were not enabled. I got the first error when i was copying some files and at the same time created a directory over samba. After a while i upgraded to 2.6.38.5 but nothing seems to have changed. I really dont think there is a hardware error here, but to be safe I am now running a check on the raid This error basically means we didn't write the block. It could be because the write went to the wrong spot, or the hardware stack messed it up, or because of a btrfs bug. But, 2.6.38 is relatively recent. It doesn't look like memory corruption because the transids are fairly close. When you grew the raid device, did you grow a partition as well? We've had trouble in the past with block dev flushing code kicking in as devices are resized. no, I did not grow any partitions, I just added one disk to the Raid 5 md0 device, and then grew the btrfs filesystem to max size(no partitions on md0). I can remember that as a test (to see if shrink works) i shrank the fs by 1 gb and then grew it again to max size. Samba isn't doing anything exotic, and 2.6.38 has my recent fixes for rare metadata corruption bugs in btrfs. -chris -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2 v2] Btrfs: Per file/directory controls for COW and compression
Hello, I would like to ask about the status of this feature/patch, is it accepted into btrfs code, and how can I use it? I am interested in enabling compression in a specific folder(force-compress would be ideal) of a large btrfs volume, and disabling it for the rest. On 21/3/2011 10:57 πμ, liubo wrote: Data compression and data cow are controlled across the entire FS by mount options right now. ioctls are needed to set this on a per file or per directory basis. This has been proposed previously, but VFS developers wanted us to use generic ioctls rather than btrfs-specific ones. According to chris's comment, there should be just one true compression method(probably LZO) stored in the super. However, before this, we would wait for that one method is stable enough to be adopted into the super. So I list it as a long term goal, and just store it in ram today. After applying this patch, we can use the generic FS_IOC_SETFLAGS ioctl to control file and directory's datacow and compression attribute. NOTE: - The compression type is selected by such rules: If we mount btrfs with compress options, ie, zlib/lzo, the type is it. Otherwise, we'll use the default compress type (zlib today). v1-v2: Rebase the patch with the latest btrfs. Signed-off-by: Liu Boliubo2...@cn.fujitsu.com --- fs/btrfs/ctree.h |1 + fs/btrfs/disk-io.c |6 ++ fs/btrfs/inode.c | 32 fs/btrfs/ioctl.c | 41 + 4 files changed, 72 insertions(+), 8 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 8b4b9d1..b77d1a5 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -1283,6 +1283,7 @@ struct btrfs_root { #define BTRFS_INODE_NODUMP(1 8) #define BTRFS_INODE_NOATIME (1 9) #define BTRFS_INODE_DIRSYNC (1 10) +#define BTRFS_INODE_COMPRESS (1 11) /* some macros to generate set/get funcs for the struct fields. This * assumes there is a lefoo_to_cpu for every type, so lets make a simple diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 3e1ea3e..a894c12 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -1762,6 +1762,12 @@ struct btrfs_root *open_ctree(struct super_block *sb, btrfs_check_super_valid(fs_info, sb-s_flags MS_RDONLY); + /* +* In the long term, we'll store the compression type in the super +* block, and it'll be used for per file compression control. +*/ + fs_info-compress_type = BTRFS_COMPRESS_ZLIB; + ret = btrfs_parse_options(tree_root, options); if (ret) { err = ret; diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index db67821..e687bb9 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -381,7 +381,8 @@ again: */ if (!(BTRFS_I(inode)-flags BTRFS_INODE_NOCOMPRESS) (btrfs_test_opt(root, COMPRESS) || -(BTRFS_I(inode)-force_compress))) { +(BTRFS_I(inode)-force_compress) || +(BTRFS_I(inode)-flags BTRFS_INODE_COMPRESS))) { WARN_ON(pages); pages = kzalloc(sizeof(struct page *) * nr_pages, GFP_NOFS); @@ -1253,7 +1254,8 @@ static int run_delalloc_range(struct inode *inode, struct page *locked_page, ret = run_delalloc_nocow(inode, locked_page, start, end, page_started, 0, nr_written); else if (!btrfs_test_opt(root, COMPRESS) -!(BTRFS_I(inode)-force_compress)) +!(BTRFS_I(inode)-force_compress) +!(BTRFS_I(inode)-flags BTRFS_INODE_COMPRESS)) ret = cow_file_range(inode, locked_page, start, end, page_started, nr_written, 1); else @@ -4581,8 +4583,6 @@ static struct inode *btrfs_new_inode(struct btrfs_trans_handle *trans, location-offset = 0; btrfs_set_key_type(location, BTRFS_INODE_ITEM_KEY); - btrfs_inherit_iflags(inode, dir); - if ((mode S_IFREG)) { if (btrfs_test_opt(root, NODATASUM)) BTRFS_I(inode)-flags |= BTRFS_INODE_NODATASUM; @@ -4590,6 +4590,8 @@ static struct inode *btrfs_new_inode(struct btrfs_trans_handle *trans, BTRFS_I(inode)-flags |= BTRFS_INODE_NODATACOW; } + btrfs_inherit_iflags(inode, dir); + insert_inode_hash(inode); inode_tree_add(inode); return inode; @@ -6803,6 +6805,26 @@ static int btrfs_getattr(struct vfsmount *mnt, return 0; } +/* + * If a file is moved, it will inherit the cow and compression flags of the new + * directory. + */ +static void fixup_inode_flags(struct inode *dir, struct inode *inode) +{ + struct btrfs_inode *b_dir = BTRFS_I(dir); + struct btrfs_inode *b_inode = BTRFS_I(inode); + + if (b_dir-flags BTRFS_INODE_NODATACOW) + b_inode-flags |= BTRFS_INODE_NODATACOW; +
Re: btrfs balancing start - and stop?
On 1/4/2011 3:12 μμ, Helmut Hullen wrote: Hallo, Struan, Du meintest am 01.04.11: 1) Is the balancing operation expected to take many hours (or days?) on a filesystem such as this? Or are there known issues with the algorithm that are yet to be addressed? May be. Balancing about 15 GByte needed about 2 hours (or less), balancing about 2 TByte needed about 20 hours. dmesg counts down the number of remaining jobs. are you sure? here is a snippet of dmesg from a balance i did yesterday (2.6.38.1) btrfs: relocating block group 15338569728 flags 9 btrfs: found 17296 extents btrfs: found 17296 extents btrfs: relocating block group 13191086080 flags 9 btrfs: found 21029 extents btrfs: found 21029 extents btrfs: relocating block group 11043602432 flags 9 btrfs: found 4728 extents btrfs: found 4728 extents Viele Gruesse! Helmut -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Do not use free space caching!
On 1/4/2011 1:59 πμ, Josef Bacik wrote: On Thu, Mar 31, 2011 at 05:06:42PM -0400, Calvin Walton wrote: On Wed, 2011-03-30 at 17:19 -0400, Josef Bacik wrote: Hello, Just found a big bug in the free space caching stuff that will result in early ENOSPC. I'm working on fixing this bug, but it won't be until tomorrow that I'll have it completely working, so for now make sure to mount -o clear_cache so that it just clears the cache and doesn't use it. NOTE: It doesn't cause problems other than early ENOSPC, you won't get corruption or anything like that, tho you could possibly panic. Sorry for the inconvenience. Thanks, Any chance you could provide a little more information about which kernels are affected? Is it any kernel with free space cache support (is 2.6.38.x included?) - and if so, do you plan on submitting the fix to the stable kernel series? Yeah it affects any kernel that has the free space cache feature, which I think started in .37. Course you have to have specifically enabled it, so it's not a huge problem. I've submitted a patch, but since it's currently an optional feature I don't think it needs to go to stable. Thanks, So it will have to wait for 2.6.39? If possible please push it for inclusion it in the next stable of 2.6.38, as 2.6.39 is a few months away and i wont risk an early RC for my system Thanks Josef -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs balancing start - and stop?
On 1/4/2011 4:37 μμ, Hugo Mills wrote: On Fri, Apr 01, 2011 at 04:22:39PM +0300, Konstantinos Skarlatos wrote: On 1/4/2011 3:12 μμ, Helmut Hullen wrote: Du meintest am 01.04.11: dmesg counts down the number of remaining jobs. are you sure? here is a snippet of dmesg from a balance i did yesterday (2.6.38.1) btrfs: relocating block group 15338569728 flags 9 btrfs: found 17296 extents btrfs: found 17296 extents btrfs: relocating block group 13191086080 flags 9 btrfs: found 21029 extents btrfs: found 21029 extents btrfs: relocating block group 11043602432 flags 9 btrfs: found 4728 extents btrfs: found 4728 extents Count the number of block groups in the system (1GiB for data, 256MiB for metdata on a typical filesystem), and subtract the number of relocating block group messages... Not ideal, but it's possible. The balance cancel patch I mentioned earlier also supplies an additional patch for monitoring progress, which does show up in the dmesg output (as well as user-space support for prettier output). Great, I think it is very important to have a human-readable progress monitor for operations like that. Hugo. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Btrfs troubles
Hello, I have these messages from a two not full (111GB of 2TB and 452GB of 2TB free) filesystems. Eventually the filesystem mounts, but I am unable to create new files, even when i delete data. Most files are 1.45GB. [r...@linuxserver ~]# btrfs filesystem df /storage/WD20_1 Data: total=1.81TB, used=1.71TB Metadata: total=2.63GB, used=2.43GB System: total=12.00MB, used=204.00KB [r...@linuxserver ~]# btrfsck /dev/sdb1 found 1878986375168 bytes used err is 0 total csum bytes: 1832396384 total tree bytes: 2612477952 total fs tree bytes: 22331392 btree space waste bytes: 595037490 file data blocks allocated: 1922007609344 referenced 1876364451840 Btrfs Btrfs v0.19 [r...@linuxserver ~]# btrfs filesystem df /storage/WD20_2 Data: total=1.81TB, used=1.37TB Metadata: total=2.51GB, used=2.47GB System: total=12.00MB, used=204.00KB [r...@linuxserver ~]# btrfsck /dev/sda1 found 1512592834560 bytes used err is 0 total csum bytes: 1474551008 total tree bytes: 2652602368 total fs tree bytes: 301985792 btree space waste bytes: 599008365 file data blocks allocated: 1510591008768 referenced 1607206682624 Btrfs Btrfs v0.19 [ cut here ] WARNING: at fs/btrfs/extent-tree.c:3441 btrfs_block_rsv_check+0x15e/0x190 [btrfs]() Hardware name: GA-MA785G-UD3H Modules linked in: btrfs zlib_deflate crc32c libcrc32c ipv6 ext2 usbhid hid usb_storage snd_hda_codec_atihdmi radeon snd_hda_intel snd_hda_codec ttm ohci_hcd drm_kms_helper ehci_hcd drm i2c_algo_bit snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_hwdep snd_mixer_oss snd_pcm snd_timer snd soundcore snd_page_alloc usbcore shpchp pcspkr serio_raw r8169 evdev i2c_piix4 pci_hotplug processor thermal edac_core k10temp button edac_mce_amd i2c_core mii sg wmi rtc_cmos rtc_core rtc_lib ext4 mbcache jbd2 crc16 sd_mod pata_acpi ahci libahci pata_atiixp libata scsi_mod Pid: 12735, comm: ls Not tainted 2.6.35-ARCH #1 Call Trace: [8105288a] warn_slowpath_common+0x7a/0xb0 [810528d5] warn_slowpath_null+0x15/0x20 [a07f8c9e] btrfs_block_rsv_check+0x15e/0x190 [btrfs] [a080976a] __btrfs_end_transaction+0x19a/0x220 [btrfs] [a080980b] btrfs_end_transaction+0xb/0x10 [btrfs] [a081342b] btrfs_dirty_inode+0x8b/0x120 [btrfs] [81145a86] __mark_inode_dirty+0x36/0x170 [81139c0d] touch_atime+0x12d/0x170 [81134330] ? filldir+0x0/0xd0 [81134586] vfs_readdir+0xc6/0xd0 [81134670] sys_getdents+0x80/0xe0 [81373765] ? page_fault+0x25/0x30 [81009e82] system_call_fastpath+0x16/0x1b ---[ end trace a296d77e7bd54918 ]--- block_rsv size 872415232 reserved 206303232 freed 0 0 INFO: task ls:12735 blocked for more than 120 seconds. echo 0 /proc/sys/kernel/hung_task_timeout_secs disables this message. lsD 0 12735 11939 0x 88004aa5fd58 0082 880001894fa8 00014f40 00014f40 88004aa5ffd8 88004aa5ffd8 88004aa5ffd8 880057c5ef00 88004aa5ffd8 00014f40 Call Trace: [a0808654] wait_current_trans.clone.19+0x84/0xe0 [btrfs] [810718d0] ? autoremove_wake_function+0x0/0x40 [a08098ef] start_transaction+0xdf/0x250 [btrfs] [a0809aae] btrfs_start_transaction+0xe/0x10 [btrfs] [a0813438] btrfs_dirty_inode+0x98/0x120 [btrfs] [81145a86] __mark_inode_dirty+0x36/0x170 [81139c0d] touch_atime+0x12d/0x170 [81134330] ? filldir+0x0/0xd0 [81134586] vfs_readdir+0xc6/0xd0 [81134670] sys_getdents+0x80/0xe0 [81373765] ? page_fault+0x25/0x30 [81009e82] system_call_fastpath+0x16/0x1b [ cut here ] WARNING: at fs/btrfs/extent-tree.c:3441 btrfs_block_rsv_check+0x15e/0x190 [btrfs]() Hardware name: GA-MA785G-UD3H Modules linked in: btrfs zlib_deflate crc32c libcrc32c ipv6 ext2 usbhid hid usb_storage snd_hda_codec_atihdmi radeon snd_hda_intel snd_hda_codec ttm ohci_hcd drm_kms_helper ehci_hcd drm i2c_algo_bit snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_hwdep snd_mixer_oss snd_pcm snd_timer snd soundcore snd_page_alloc usbcore shpchp pcspkr serio_raw r8169 evdev i2c_piix4 pci_hotplug processor thermal edac_core k10temp button edac_mce_amd i2c_core mii sg wmi rtc_cmos rtc_core rtc_lib ext4 mbcache jbd2 crc16 sd_mod pata_acpi ahci libahci pata_atiixp libata scsi_mod Pid: 12726, comm: btrfs-transacti Tainted: GW 2.6.35-ARCH #1 Call Trace: [8105288a] warn_slowpath_common+0x7a/0xb0 [810528d5] warn_slowpath_null+0x15/0x20 [a07f8c9e] btrfs_block_rsv_check+0x15e/0x190 [btrfs] [a080976a] __btrfs_end_transaction+0x19a/0x220 [btrfs] [a080980b] btrfs_end_transaction+0xb/0x10 [btrfs] [a080948e] btrfs_commit_transaction+0x62e/0x770 [btrfs] [81371739] ? mutex_unlock+0x9/0x10 [a08099d3] ?