Re: btrfs subvolume mount with different options
Thanks, chattr +C is that's what I am currently using. Also you already answered my next question, why it is not possible to set +C attribute on the existing file :) Yours sincerely, Konstantin V. Gavrilenko - Original Message - From: "Roman Mamedov" <r...@romanrm.net> To: "Konstantin V. Gavrilenko" <k.gavrile...@arhont.com> Cc: "Linux fs Btrfs" <linux-btrfs@vger.kernel.org> Sent: Friday, 12 January, 2018 9:37:49 PM Subject: Re: btrfs subvolume mount with different options On Fri, 12 Jan 2018 17:49:38 + (GMT) "Konstantin V. Gavrilenko" <k.gavrile...@arhont.com> wrote: > Hi list, > > just wondering whether it is possible to mount two subvolumes with different > mount options, i.e. > > | > |- /a defaults,compress-force=lza You can have use different compression algorithms across the filesystem (including none), via "btrfs properties" on directories or subvolumes. They are inherited down the tree. $ mkdir test $ sudo btrfs prop set test compression zstd $ echo abc > test/def $ sudo btrfs prop get test/def compression compression=zstd But it appears this doesn't provide a way to apply compress-force. > |- /b defaults,nodatacow Nodatacow can be applied to any dir/subvolume recursively, or any file (as long as it's created but not written yet) via chattr +C. -- With respect, Roman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
btrfs subvolume mount with different options
Hi list, just wondering whether it is possible to mount two subvolumes with different mount options, i.e. | |- /a defaults,compress-force=lza | |- /b defaults,nodatacow since, when both subvolumes are mounted, and when I change the option for one it is changed for all of them. thanks in advance. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: super_total_bytes 32004083023872 mismatch with fs_devices total_rw_bytes 64008166047744
The mentioning of the device scan scode and the fact the total_bytes is double made me try hashing the raid from the fstab. So i booted, run the "inspect-internal dump-super" that confirmed that it is in order. # grep -i total_bytes hashed-inspect-internal total_bytes 32004083023872 dev_item.total_bytes32004083023872 backup_total_bytes: 32004083023872 backup_total_bytes: 32004083023872 backup_total_bytes: 32004083023872 backup_total_bytes: 32004083023872 total_bytes 32004083023872 dev_item.total_bytes32004083023872 backup_total_bytes: 32004083023872 backup_total_bytes: 32004083023872 backup_total_bytes: 32004083023872 backup_total_bytes: 32004083023872 total_bytes 32004083023872 dev_item.total_bytes32004083023872 backup_total_bytes: 32004083023872 backup_total_bytes: 32004083023872 backup_total_bytes: 32004083023872 backup_total_bytes: 32004083023872 then I unhashed the device in fstab, mounted it manually and it successfully mounted. # time mount /mnt/arh-backup1/ real2m49.021s user0m0.000s sys 0m1.244s With the unhashed device in the fstab, i rebooted and upon reboot I run mount time mount /mnt/arh-backup1/ mount: wrong fs type, bad option, bad superblock on /dev/sda, missing codepage or helper program, or other error In some cases useful info is found in syslog - try dmesg | tail or so. real1m20.499s user0m0.000s sys 0m0.045s that failed. I further waited for couple of minutes and run the mount again, and it mounted successfully. So it seems that because of the amount of time it takes mount, nearly 3 minutes, to mount the device, there is some sort of race condition, and two device scans are running at the same time, or something similar. I can say one thing for sure, it wasn't happening on 4.10 and I have only observed such behaviour on 4.12 and 4.13 p.s. the disk does not mount automatically upon boot, but can be mounted manually later # uptime 19:54:45 up 4 min, 1 user, load average: 0.30, 0.74, 0.39 # time mount /mnt/arh-backup1/ real2m52.247s user0m0.000s sys 0m1.246s Here is the dmesg extract. It seems that for some reason on 204th second the system return "open ctree failed" on 329 second, I started the mount manually. [ 204.389231] BTRFS error (device sda): open_ctree failed [ 329.234613] BTRFS info (device sda): force zlib compression [ 329.234618] BTRFS info (device sda): using free space tree [ 329.234620] BTRFS info (device sda): has skinny extents hope that helps and thanks for your help Yours sincerely, Konstantin V. Gavrilenko - Original Message - From: "Qu Wenruo" <quwenruo.bt...@gmx.com> To: "Konstantin V. Gavrilenko" <k.gavrile...@arhont.com> Cc: "Linux fs Btrfs" <linux-btrfs@vger.kernel.org> Sent: Tuesday, 24 October, 2017 3:44:21 PM Subject: Re: super_total_bytes 32004083023872 mismatch with fs_devices total_rw_bytes 64008166047744 On 2017年10月24日 19:44, Konstantin V. Gavrilenko wrote: > answers inline marked with KVG: > > Yours sincerely, > Konstantin V. Gavrilenko > > > > > - Original Message - > From: "Qu Wenruo" <quwenruo.bt...@gmx.com> > To: "Konstantin V. Gavrilenko" <k.gavrile...@arhont.com>, "Linux fs Btrfs" > <linux-btrfs@vger.kernel.org> > Sent: Tuesday, 24 October, 2017 11:37:56 AM > Subject: Re: super_total_bytes 32004083023872 mismatch with fs_devices > total_rw_bytes 64008166047744 > > > > On 2017年10月24日 17:20, Konstantin V. Gavrilenko wrote: >> Hi list, >> >> having installed the recent kernel version I am no longer able to mount the >> btrfs partition with compression on the first attempt. Previously on >> 4.10.0-37-generic everything was working fine, once I switched to >> 4.13.9-041309-generic I started getting the following error while trying to >> mount it with the same options "compress-force=zlib,space_cache=v2" >> >> [ 204.596381] BTRFS error (device sda): open_ctree failed >> [ 204.631895] BTRFS info (device sda): force zlib compression >> [ 204.631901] BTRFS info (device sda): using free space tree >> [ 204.631903] BTRFS info (device sda): has skinny extents >> [ 204.890145] BTRFS error (device sda): super_total_bytes 32004083023872 >> mismatch with fs_devices total_rw_bytes 64008166047744 >> [ 204.891276] BTRFS error (device sda): failed to read chunk tree: -22 >> [ 204.944333] BTRFS error (device sda): open_ctree failed > > Such problem c
super_total_bytes 32004083023872 mismatch with fs_devices total_rw_bytes 64008166047744
Hi list, having installed the recent kernel version I am no longer able to mount the btrfs partition with compression on the first attempt. Previously on 4.10.0-37-generic everything was working fine, once I switched to 4.13.9-041309-generic I started getting the following error while trying to mount it with the same options "compress-force=zlib,space_cache=v2" [ 204.596381] BTRFS error (device sda): open_ctree failed [ 204.631895] BTRFS info (device sda): force zlib compression [ 204.631901] BTRFS info (device sda): using free space tree [ 204.631903] BTRFS info (device sda): has skinny extents [ 204.890145] BTRFS error (device sda): super_total_bytes 32004083023872 mismatch with fs_devices total_rw_bytes 64008166047744 [ 204.891276] BTRFS error (device sda): failed to read chunk tree: -22 [ 204.944333] BTRFS error (device sda): open_ctree failed For some reason, the super_total_bytes is exactly half of total_rw_bytes. however, if after unsuccessful first mount attempt, I mount it with minimum number of options "space_cache=v2" the partition mounts. Then I umount it, and mount normally, with full set of options "compress-force=zlib,space_cache=v2" it mounts without an error. I also observed the same error on 4.12.14-041214-generic Any ideas why this might be happening? System information distribution: Ubuntu 16.04 btrfs-progs v4.8.1 later upgraded to v4.13.3 # btrfs fi usage /mnt/backup Overall: Device size: 29.11TiB Device allocated: 18.04TiB Device unallocated: 11.07TiB Device missing: 0.00B Used: 17.99TiB Free (estimated): 11.12TiB (min: 5.58TiB) Data ratio: 1.00 Metadata ratio: 2.00 Global reserve: 512.00MiB (used: 0.00B) Data,single: Size:17.93TiB, Used:17.88TiB /dev/sda 17.93TiB Metadata,DUP: Size:53.50GiB, Used:51.78GiB /dev/sda 107.00GiB System,DUP: Size:8.00MiB, Used:2.30MiB /dev/sda 16.00MiB Unallocated: /dev/sda 11.07TiB Yours sincerely, Konstantin V. Gavrilenko -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Btrfs + compression = slow performance and high cpu usage
Hello again list. I thought I would clear the things out and describe what is happening with my troubled RAID setup. So having received the help from the list, I've initially run the full defragmentation of all the data and recompressed everything with zlib. That didn't help. Then I run the full rebalance of the data and that didn't help either. So I had to take a disk out of the raid, copy all the data onto it, recreate the RAID drive with 32kb chunk size and 96kb stripe and copied the data back. Then added the disk back and resynced the raid. So currently the RAID device is Adapter 0 -- Virtual Drive Information: Virtual Drive: 0 (Target Id: 0) Name: RAID Level : Primary-5, Secondary-0, RAID Level Qualifier-3 Size: 21.830 TB Sector Size : 512 Is VD emulated : Yes Parity Size : 7.276 TB State : Optimal Strip Size : 32 KB Number Of Drives: 4 Span Depth : 1 Default Cache Policy: WriteBack, ReadAhead, Direct, No Write Cache if Bad BBU Current Cache Policy: WriteBack, ReadAhead, Direct, No Write Cache if Bad BBU Default Access Policy: Read/Write Current Access Policy: Read/Write Disk Cache Policy : Disk's Default Encryption Type : None Bad Blocks Exist: No Is VD Cached: No It is about 40% full with compressed data # btrfs fi usage /mnt/arh-backup1/ Overall: Device size: 21.83TiB Device allocated: 8.98TiB Device unallocated: 12.85TiB Device missing: 0.00B Used: 8.98TiB Free (estimated): 12.85TiB (min: 6.43TiB) Data ratio: 1.00 Metadata ratio: 2.00 Global reserve: 512.00MiB (used: 0.00B) I've decided to run a set of test, where 5 gb file was created using different blocksizes and different flags. one file with urandom data was generated and another one filled with zeroes. the data was written with compression and without compression, and it seems that without compression it is possible to gain 30-40% speed, while the cpu was running at 50% idle during the highest loads. dd write speeds (mb/s) flags: conv=fsync compress-force=zlib compress-force=none RAND ZERORAND ZERO bs1024k 387 407 584 577 bs512k 389 414 532 547 bs256k 412 409 558 585 bs128k 412 403 572 583 bs64k409 419 563 574 bs32k407 404 569 572 flags: oflag=sync compress-force=zlib compress-force=none RAND ZERORAND ZERO bs1024k 86.1 97.0203 210 bs512k 50.6 64.485.0 170 bs256k 25.0 29.867.6 67.5 bs128k 13.2 16.448.4 49.8 bs64k7.4 8.3 24.5 27.9 bs32k3.8 4.1 14.0 13.7 flags: no flags compress-force=zlib compress-force=none RAND ZERORAND ZERO bs1024k 480 419 681 595 bs512k 422 412 633 585 bs256k 413 384 707 712 bs128k 414 387 695 704 bs64k482 467 622 587 bs32k416 412 610 598 I have also run a test where I filled the array to about 97% capacity and the write speed went down by about 50% compared with the empty RAID. thanks for the help. - Original Message - From: "Peter Grandi"To: "Linux fs Btrfs" Sent: Tuesday, 1 August, 2017 10:09:03 PM Subject: Re: Btrfs + compression = slow performance and high cpu usage >> [ ... ] a "RAID5 with 128KiB writes and a 768KiB stripe >> size". [ ... ] several back-to-back 128KiB writes [ ... ] get >> merged by the 3ware firmware only if it has a persistent >> cache, and maybe your 3ware does not have one, > KOS: No I don't have persistent cache. Only the 512 Mb cache > on board of a controller, that is BBU. If it is a persistent cache, that can be battery-backed (as I wrote, but it seems that you don't have too much time to read replies) then the size of the write, 128KiB or not, should not matter much; the write will be reported complete when it hits the persistent cache (whichever technology it used), and then the HA fimware will spill write cached data to the disks using the optimal operation width. Unless the 3ware firmware is really terrible (and depending on model and vintage it can be amazingly terrible) or the battery is no longer recharging and then the host adapter switches to write-through. That you see very different rates between uncompressed and compressed writes, where the main difference is the limitation on the segment size, seems to indicate that compressed writes involve a lot of RMW, that is sub-stripe updates. As I mentioned already, it would be interesting to retry 'dd' with different 'bs' values without compression and with 'sync' (or 'direct' which only makes sense without compression). > If I had additional SSD caching on the controller I would have > mentioned it. So far you had not mentioned
Re: slow btrfs with a single kworker process using 100% CPU
Roman, initially I had a single process occupying 100% CPU, when sysrq it was indicating as "btrfs_find_space_for_alloc" but that's when I used the autodefrag, compress, forcecompress and commit=10 mount flags and space_cache was v1 by default. when I switched to "relatime,compress-force=zlib,space_cache=v2" the 100% cpu has dissapeared, but the shite performance remained. As to the chunk size, there is no information in the article about the type of data that was used. While in our case we are pretty certain about the compressed block size (32-128). I am currently inclining towards 32k as it might be ideal in a situation when we have a 5 disk raid5 array. In theory 1. The minimum compressed write (32k) would fill the chunk on a single disk, thus the IO cost of the operation would be 2 reads (original chunk + original parity) and 2 writes (new chunk + new parity) 2. The maximum compressed write (128k) would require the update of 1 chunk on each of the 4 data disks + 1 parity write Stefan what mount flags do you use? kos - Original Message - From: "Roman Mamedov" <r...@romanrm.net> To: "Konstantin V. Gavrilenko" <k.gavrile...@arhont.com> Cc: "Stefan Priebe - Profihost AG" <s.pri...@profihost.ag>, "Marat Khalili" <m...@rqc.ru>, linux-btrfs@vger.kernel.org, "Peter Grandi" <p...@btrfs.list.sabi.co.uk> Sent: Wednesday, 16 August, 2017 2:00:03 PM Subject: Re: slow btrfs with a single kworker process using 100% CPU On Wed, 16 Aug 2017 12:48:42 +0100 (BST) "Konstantin V. Gavrilenko" <k.gavrile...@arhont.com> wrote: > I believe the chunk size of 512kb is even worth for performance then the > default settings on my HW RAID of 256kb. It might be, but that does not explain the original problem reported at all. If mdraid performance would be the bottleneck, you would see high iowait, possibly some CPU load from the mdX_raidY threads. But not a single Btrfs thread pegging into 100% CPU. > So now I am moving the data from the array and will be rebuilding it with 64 > or 32 chunk size and checking the performance. 64K is the sweet spot for RAID5/6: http://louwrentius.com/linux-raid-level-and-chunk-size-the-benchmarks.html -- With respect, Roman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: slow btrfs with a single kworker process using 100% CPU
I believe the chunk size of 512kb is even worth for performance then the default settings on my HW RAID of 256kb. Peter Grandi explained it earlier on in one of his posts. QTE ++ That runs counter to this simple story: suppose a program is doing 64KiB IO: * For *reads*, there are 4 data drives and the strip size is 16KiB: the 64KiB will be read in parallel on 4 drives. If the strip size is 256KiB then the 64KiB will be read sequentially from just one disk, and 4 successive reads will be read sequentially from the same drive. * For *writes* on a parity RAID like RAID5 things are much, much more extreme: the 64KiB will be written with 16KiB strips on a 5-wide RAID5 set in parallel to 5 drives, with 4 stripes being updated with RMW. But with 256KiB strips it will partially update 5 drives, because the stripe is 1024+256KiB, and it needs to do RMW, and four successive 64KiB drives will need to do that too, even if only one drive is updated. Usually for RAID5 there is an optimization that means that only the specific target drive and the parity drives(s) need RMW, but it is still very expensive. This is the "storage for beginners" version, what happens in practice however depends a lot on specific workload profile (typical read/write size and latencies and rates), caching and queueing algorithms in both Linux and the HA firmware. ++ UNQTE I've also found another explanation of the same problem with the right chunk size and how it works here http://holyhandgrenade.org/blog/2011/08/disk-performance-part-2-raid-layouts-and-stripe-sizing/#more-1212 So in my understanding, when working with compressed data, your compressed data will vary between 128kb (urandom) and 32kb (zeroes) that will be passed to the FS to take care of. and in our setup of large chunk sizes, if we need to write 32kb-128kb of compressed data, the RAID5 would need to perform 3 read operations and 2 write operations. As updating a parity chunk requires either - The original chunk, the new chunk, and the old parity block - Or, all chunks (except for the parity chunk) in the stripe diskdisk1 disk2 disk3 disk4 chunk size 512kb 512kb 512kb 512kbP So in worst case scenario, in order to write 32kb, RAID5 would need to read (480 + 512 + P512) then write (32 + P512) That's my current understanding of the situation. I was planning to write an update to my story later on, once I hopefully solve the problem. But an intermidiary update is that I have performed full defrag with full compression (2 days). Then balance of the all data (10 days)and it didn't help the performance . So now I am moving the data from the array and will be rebuilding it with 64 or 32 chunk size and checking the performance. VG, kos - Original Message - From: "Stefan Priebe - Profihost AG" <s.pri...@profihost.ag> To: "Konstantin V. Gavrilenko" <k.gavrile...@arhont.com> Cc: "Marat Khalili" <m...@rqc.ru>, linux-btrfs@vger.kernel.org Sent: Wednesday, 16 August, 2017 11:26:38 AM Subject: Re: slow btrfs with a single kworker process using 100% CPU Am 16.08.2017 um 11:02 schrieb Konstantin V. Gavrilenko: > Could be similar issue as what I had recently, with the RAID5 and 256kb chunk > size. > please provide more information about your RAID setup. Hope this helps: # cat /proc/mdstat Personalities : [raid1] [raid6] [raid5] [raid4] [linear] [multipath] [raid0] [raid10] md0 : active raid5 sdd1[1] sdf1[4] sdc1[0] sde1[2] 11717406720 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/4] [] bitmap: 6/30 pages [24KB], 65536KB chunk md2 : active raid5 sdm1[2] sdl1[1] sdk1[0] sdn1[4] 11717406720 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/4] [] bitmap: 7/30 pages [28KB], 65536KB chunk md1 : active raid5 sdi1[2] sdg1[0] sdj1[4] sdh1[1] 11717406720 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/4] [] bitmap: 7/30 pages [28KB], 65536KB chunk md3 : active raid5 sdp1[1] sdo1[0] sdq1[2] sdr1[4] 11717406720 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/4] [] bitmap: 6/30 pages [24KB], 65536KB chunk # btrfs fi usage /vmbackup/ Overall: Device size: 43.65TiB Device allocated: 31.98TiB Device unallocated: 11.67TiB Device missing: 0.00B Used: 30.80TiB Free (estimated): 12.84TiB (min: 12.84TiB) Data ratio: 1.00 Metadata ratio: 1.00 Global reserve: 512.00MiB (used: 0.00B) Data,RAID0: Size:31.83TiB, Used:30.66TiB /dev/md07.96TiB /dev/md17.96TiB /dev/md27.96TiB /dev/md37.96TiB Metadata,RAID0: Size:153.00GiB, Used:141.34GiB /dev/md0 38.25GiB /dev/md1 38.25GiB /dev/md2 38.25GiB /dev/md3
Re: slow btrfs with a single kworker process using 100% CPU
Could be similar issue as what I had recently, with the RAID5 and 256kb chunk size. please provide more information about your RAID setup. p.s. you can also check the tread "Btrfs + compression = slow performance and high cpu usage" - Original Message - From: "Stefan Priebe - Profihost AG"To: "Marat Khalili" , linux-btrfs@vger.kernel.org Sent: Wednesday, 16 August, 2017 10:37:43 AM Subject: Re: slow btrfs with a single kworker process using 100% CPU Am 16.08.2017 um 08:53 schrieb Marat Khalili: >> I've one system where a single kworker process is using 100% CPU >> sometimes a second process comes up with 100% CPU [btrfs-transacti]. Is >> there anything i can do to get the old speed again or find the culprit? > > 1. Do you use quotas (qgroups)? No qgroups and no quota. > 2. Do you have a lot of snapshots? Have you deleted some recently? 1413 Snapshots. I'm deleting 50 of them every night. But btrfs-cleaner process isn't running / consuming CPU currently. > More info about your system would help too. Kernel is OpenSuSE Leap 42.3. btrfs is mounted with compress-force=zlib btrfs is running as a raid0 on top of 4 md raid 5 devices. Greets, Stefan -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Btrfs + compression = slow performance and high cpu usage
- Original Message - From: "Peter Grandi"To: "Linux fs Btrfs" Sent: Tuesday, 1 August, 2017 3:14:07 PM Subject: Re: Btrfs + compression = slow performance and high cpu usage > Peter, I don't think the filefrag is showing the correct > fragmentation status of the file when the compression is used. As I wrote, "their size is just limited by the compression code" which results in "128KiB writes". On a "fresh empty Btrfs volume" the compressed extents limited to 128KiB also happen to be pretty physically contiguous, but on a more fragmented free space list they can be more scattered. KOS: Ok, thanks for pointing it out. I have compared the filefrag -v on another btrfs that is not fragmented and see the difference with what is happening on the sluggish one. 5824: 186368.. 186399: 2430093383..2430093414: 32: 2430093414: encoded 5825: 186400.. 186431: 2430093384..2430093415: 32: 2430093415: encoded 5826: 186432.. 186463: 2430093385..2430093416: 32: 2430093416: encoded 5827: 186464.. 186495: 2430093386..2430093417: 32: 2430093417: encoded 5828: 186496.. 186527: 2430093387..2430093418: 32: 2430093418: encoded 5829: 186528.. 186559: 2430093388..2430093419: 32: 2430093419: encoded 5830: 186560.. 186591: 2430093389..2430093420: 32: 2430093420: encoded As I already wrote the main issue here seems to be that we are talking about a "RAID5 with 128KiB writes and a 768KiB stripe size". On MD RAID5 the slowdown because of RMW seems only to be around 30-40%, but it looks like that several back-to-back 128KiB writes get merged by the Linux IO subsystem (not sure whether that's thoroughly legal), and perhaps they get merged by the 3ware firmware only if it has a persistent cache, and maybe your 3ware does not have one, but you have kept your counsel as to that. KOS: No I don't have persistent cache. Only the 512 Mb cache on board of a controller, that is BBU. If I had additional SSD caching on the controller I would have mentioned it. I was also under impression, that in a situation where mostly extra large files will be stored on the massive, the bigger strip size would indeed increase the speed, thus I went with with the 256 Kb strip size. Would I be correct in assuming that the RAID strip size of 128 Kb will be a better choice if one plans to use the BTRFS with compression? thanks, kos -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Btrfs + compression = slow performance and high cpu usage
Peter, I don't think the filefrag is showing the correct fragmentation status of the file when the compression is used. At least the one that is installed by default in Ubuntu 16.04 - e2fsprogs | 1.42.13-1ubuntu1 So for example, fragmentation of compressed file is 320 times more then uncompressed one. root@homenas:/mnt/storage/NEW# filefrag test5g-zeroes test5g-zeroes: 40903 extents found root@homenas:/mnt/storage/NEW# filefrag test5g-data test5g-data: 129 extents found I am currently defragmenting that mountpoint, ensuring that everrything is compressed with zlib. # btrfs fi defragment -rv -czlib /mnt/arh-backup my guess is that it will take another 24-36 hours to complete and then I will redo the test to see if that has helped. will keep the list posted. p.s. any other suggestion that might help with the fragmentation and data allocation. Should I try and rebalance the data on the drive? kos - Original Message - From: "Peter Grandi"To: "Linux fs Btrfs" Sent: Monday, 31 July, 2017 1:41:07 PM Subject: Re: Btrfs + compression = slow performance and high cpu usage [ ... ] > grep 'model name' /proc/cpuinfo | sort -u > model name : Intel(R) Xeon(R) CPU E5645 @ 2.40GHz Good, contemporary CPU with all accelerations. > The sda device is a hardware RAID5 consisting of 4x8TB drives. [ ... ] > Strip Size : 256 KB So the full RMW data stripe length is 768KiB. > [ ... ] don't see the previously reported behaviour of one of > the kworker consuming 100% of the cputime, but the write speed > difference between the compression ON vs OFF is pretty large. That's weird; of course 'lzo' is a lot cheaper than 'zlib', but in my test the much higher CPU time of the latter was spread across many CPUs, while in your case it wasn't, even if the E5645 has 6 CPUs and can do 12 threads. That seemed to point to some high cost of finding free blocks, that is a very fragmented free list, or something else. > dd if=/dev/sdb of=./testing count=5120 bs=1M status=progress oflag=direct > 5368709120 bytes (5.4 GB, 5.0 GiB) copied, 26.0685 s, 206 MB/s The results with 'oflag=direct' are not relevant, because Btrfs behaves "differently" with that. > mountflags: > (rw,relatime,compress-force=zlib,space_cache=v2,subvolid=5,subvol=/) [ ... ] > dd if=/dev/sdb of=./testing count=5120 bs=1M status=progress conv=fsync > 5368709120 bytes (5.4 GB, 5.0 GiB) copied, 77.4845 s, 69.3 MB/s > mountflags: > (rw,relatime,compress-force=lzo,space_cache=v2,subvolid=5,subvol=/) [ ... ] > dd if=/dev/sdb of=./testing count=5120 bs=1M status=progress conv=fsync > 5368709120 bytes (5.4 GB, 5.0 GiB) copied, 122.321 s, 43.9 MB/s That's pretty good for a RAID5 with 128KiB writes and a 768KiB stripe size, on a 3ware, and looks like that the hw host adapter does not have a persistent cache (battery backed usually). My guess that watching transfer rates and latencies with 'iostat -dk -zyx 1' did not happen. > mountflags: (rw,relatime,space_cache=v2,subvolid=5,subvol=/) [ ... ] > dd if=/dev/sdb of=./testing count=5120 bs=1M status=progress conv=fsync > 5368709120 bytes (5.4 GB, 5.0 GiB) copied, 10.1033 s, 531 MB/s I had mentioned in my previous reply the output of 'filefrag'. That to me seems relevant here, because of RAID5 RMW and maximum extent size with Brfs compression and strip/stripe size. Perhaps redoing the tests with a 128KiB 'bs' *without* compression would be interesting, perhaps even with 'oflag=sync' instead of 'conv=fsync'. It is hard for me to see a speed issue here with Btrfs: for comparison I have done a simple test with a both a 3+1 MD RAID5 set with a 256KiB chunk size and a single block device on "contemporary" 1T/2TB drives, capable of sequential transfer rates of 150-190MB/s: soft# grep -A2 sdb3 /proc/mdstat md127 : active raid5 sde3[4] sdd3[2] sdc3[1] sdb3[0] 729808128 blocks super 1.0 level 5, 256k chunk, algorithm 2 [4/4] [] with compression: soft# mount -t btrfs -o commit=10,compress-force=zlib /dev/md/test5 /mnt/test5 soft# mount -t btrfs -o commit=10,compress-force=zlib /dev/sdg3 /mnt/sdg3 soft# rm -f /mnt/test5/testfile /mnt/sdg3/testfile soft# /usr/bin/time dd iflag=fullblock if=/dev/sda6 of=/mnt/test5/testfile bs=1M count=1 conv=fsync 1+0 records in 1+0 records out 1048576 bytes (10 GB) copied, 94.3605 s, 111 MB/s 0.01user 12.59system 1:34.36elapsed 13%CPU (0avgtext+0avgdata 2932maxresident)k 13042144inputs+20482144outputs (3major+345minor)pagefaults 0swaps soft# /usr/bin/time dd iflag=fullblock if=/dev/sda6 of=/mnt/sdg3/testfile bs=1M count=1 conv=fsync 1+0 records in 1+0 records out 1048576 bytes (10 GB) copied, 93.5885 s, 112 MB/s 0.03user 12.35system 1:33.59elapsed 13%CPU (0avgtext+0avgdata 2940maxresident)k 13042144inputs+20482400outputs
Re: Btrfs + compression = slow performance and high cpu usage
Thanks for the comments. Initially the system performed well, I don't have the benchmark details written, but the compressed vs non compressed speeds were more or less similar. However, after several weeks of usage, the system started experiencing the described slowdowns, thus I started investigating the problem. This indeed is a backup drive, but it predominantly contains large files. # ls -lahR | awk '/^-/ {print $5}' | sort | uniq -c | sort -n | tail -n 15 5 322 5 396 5 400 6 1000G 6 11 6 200G 8 24G 8 48G 13 500G 20 8.0G 25 165G 32 20G 57 100G 103 50G 201 10G # grep 'model name' /proc/cpuinfo | sort -u model name : Intel(R) Xeon(R) CPU E5645 @ 2.40GHz # lsscsi | grep 'sd[ae]' [4:2:0:0]diskLSI MR9260-8i2.13 /dev/sda The sda device is a hardware RAID5 consisting of 4x8TB drives. Virtual Drive: 0 (Target Id: 0) Name: RAID Level : Primary-5, Secondary-0, RAID Level Qualifier-3 Size: 21.830 TB Sector Size : 512 Is VD emulated : Yes Parity Size : 7.276 TB State : Optimal Strip Size : 256 KB Number Of Drives: 4 Span Depth : 1 Default Cache Policy: WriteBack, ReadAhead, Direct, No Write Cache if Bad BBU Current Cache Policy: WriteBack, ReadAhead, Direct, No Write Cache if Bad BBU Default Access Policy: Read/Write Current Access Policy: Read/Write Disk Cache Policy : Disk's Default Encryption Type : None Bad Blocks Exist: No Is VD Cached: No Number of Spans: 1 Span: 0 - Number of PDs: 4 I have changed the mount flags as suggested, and I don't see the previously reported behaviour of one of the kworker consuming 100% of the cputime, but the write speed difference between the compression ON vs OFF is pretty large. Have run several tests under with zlib, lzo and no compression and the results are rather strange. mountflags: (rw,relatime,compress-force=zlib,space_cache=v2,subvolid=5,subvol=/) dd if=/dev/sdb of=./testing count=5120 bs=1M status=progress 5368709120 bytes (5.4 GB, 5.0 GiB) copied, 93.3418 s, 57.5 MB/s dd if=/dev/sdb of=./testing count=5120 bs=1M status=progress oflag=direct 5368709120 bytes (5.4 GB, 5.0 GiB) copied, 26.0685 s, 206 MB/s dd if=/dev/sdb of=./testing count=5120 bs=1M status=progress conv=fsync 5368709120 bytes (5.4 GB, 5.0 GiB) copied, 77.4845 s, 69.3 MB/s mountflags: (rw,relatime,compress-force=lzo,space_cache=v2,subvolid=5,subvol=/) dd if=/dev/sdb of=./testing count=5120 bs=1M status=progress 5368709120 bytes (5.4 GB, 5.0 GiB) copied, 116.246 s, 46.2 MB/s dd if=/dev/sdb of=./testing count=5120 bs=1M status=progress oflag=direct 5368709120 bytes (5.4 GB, 5.0 GiB) copied, 14.704 s, 365 MB/s dd if=/dev/sdb of=./testing count=5120 bs=1M status=progress conv=fsync 5368709120 bytes (5.4 GB, 5.0 GiB) copied, 122.321 s, 43.9 MB/s mountflags: (rw,relatime,space_cache=v2,subvolid=5,subvol=/) dd if=/dev/sdb of=./testing count=5120 bs=1M status=progress 5368709120 bytes (5.4 GB, 5.0 GiB) copied, 32.2551 s, 166 MB/s dd if=/dev/sdb of=./testing count=5120 bs=1M status=progress oflag=direct 5368709120 bytes (5.4 GB, 5.0 GiB) copied, 19.9464 s, 269 MB/s dd if=/dev/sdb of=./testing count=5120 bs=1M status=progress conv=fsync 5368709120 bytes (5.4 GB, 5.0 GiB) copied, 10.1033 s, 531 MB/s The CPU usage is pretty low as well. For example when the force-compress=zlib is in effect, the cpu usage is pretty low now. Linux 4.10.0-28-generic (ais-backup1) 30/07/17_x86_64_(12 CPU) 14:31:27CPU %user %nice %system %iowait%steal %idle 14:31:28all 0.00 0.00 1.50 0.00 0.00 98.50 14:31:29all 0.00 0.00 4.78 3.52 0.00 91.69 14:31:30all 0.08 0.00 4.92 3.75 0.00 91.25 14:31:31all 0.00 0.00 4.76 3.76 0.00 91.49 14:31:32all 0.00 0.00 4.76 3.76 0.00 91.48 14:31:33all 0.08 0.00 4.67 3.76 0.00 91.49 14:31:34all 0.00 0.00 4.76 3.68 0.00 91.56 14:31:35all 0.08 0.00 4.76 3.76 0.00 91.40 14:31:36all 0.00 0.00 4.60 3.77 0.00 91.63 14:31:37all 0.00 0.00 4.68 3.68 0.00 91.64 14:31:38all 0.08 0.00 4.52 3.76 0.00 91.64 14:31:39all 0.08 0.00 4.68 3.76 0.00 91.48 14:31:40all 0.08 0.00 4.52 3.76 0.00 91.64 14:31:41all 0.00 0.00 4.61 3.77 0.00 91.62 14:31:42all 0.08 0.00 5.07 3.74 0.00 91.10 14:31:43all 0.00 0.00 4.68 3.68 0.00 91.64 14:31:44
Btrfs + compression = slow performance and high cpu usage
Hello list, I am stuck with a problem of btrfs slow performance when using compression. when the compress-force=lzo mount flag is enabled, the performance drops to 30-40 mb/s and one of the btrfs processes utilises 100% cpu time. mount options: btrfs relatime,discard,autodefrag,compress=lzo,compress-force,space_cache=v2,commit=10 The command I am testing the write throughput is # pv -tpreb /dev/sdb | dd of=./testfile bs=1M oflag=direct # top -d 1 top - 15:49:13 up 1:52, 2 users, load average: 5.28, 2.32, 1.39 Tasks: 320 total, 2 running, 318 sleeping, 0 stopped, 0 zombie %Cpu0 : 0.0 us, 2.0 sy, 0.0 ni, 77.0 id, 21.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu1 : 0.0 us, 1.0 sy, 0.0 ni, 90.0 id, 9.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu2 : 0.0 us, 1.0 sy, 0.0 ni, 72.0 id, 27.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu3 : 0.0 us,100.0 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu4 : 0.0 us, 1.0 sy, 0.0 ni, 57.0 id, 42.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu5 : 0.0 us, 0.0 sy, 0.0 ni, 96.0 id, 4.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu6 : 0.0 us, 0.0 sy, 0.0 ni, 94.0 id, 6.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu7 : 0.0 us, 1.0 sy, 0.0 ni, 95.1 id, 3.9 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu8 : 1.0 us, 2.0 sy, 0.0 ni, 24.0 id, 73.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu9 : 0.0 us, 0.0 sy, 0.0 ni, 81.8 id, 18.2 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu10 : 1.0 us, 0.0 sy, 0.0 ni, 98.0 id, 1.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu11 : 0.0 us, 2.0 sy, 0.0 ni, 83.3 id, 14.7 wa, 0.0 hi, 0.0 si, 0.0 st KiB Mem : 32934136 total, 10137496 free, 602244 used, 22194396 buff/cache KiB Swap:0 total,0 free,0 used. 30525664 avail Mem PID USER PR NIVIRTRESSHR S %CPU %MEM TIME+ COMMAND 37017 root 20 0 0 0 0 R 100.0 0.0 0:32.42 kworker/u49:8 36732 root 20 0 0 0 0 D 4.0 0.0 0:02.40 btrfs-transacti 40105 root 20 08388 3040 2000 D 4.0 0.0 0:02.88 dd The keyworker process that causes the high cpu usage is most likely searching for the free space. # echo l > /proc/sysrq-trigger # dmest -T Fri Jul 28 15:57:51 2017] CPU: 1 PID: 36430 Comm: kworker/u49:2 Not tainted 4.10.0-28-generic #32~16.04.2-Ubuntu [Fri Jul 28 15:57:51 2017] Hardware name: Supermicro X8DTL/X8DTL, BIOS 2.1b 11/16/2012 [Fri Jul 28 15:57:51 2017] Workqueue: btrfs-delalloc btrfs_delalloc_helper [btrfs] [Fri Jul 28 15:57:51 2017] task: 9ddce6206a40 task.stack: aa9121f6c000 [Fri Jul 28 15:57:51 2017] RIP: 0010:rb_next+0x1e/0x40 [Fri Jul 28 15:57:51 2017] RSP: 0018:aa9121f6fb40 EFLAGS: 0282 [Fri Jul 28 15:57:51 2017] RAX: 9dddc34df1b0 RBX: 0001 RCX: 1000 [Fri Jul 28 15:57:51 2017] RDX: 9dddc34df708 RSI: 9ddccaf470a4 RDI: 9dddc34df2d0 [Fri Jul 28 15:57:51 2017] RBP: aa9121f6fb40 R08: 0001 R09: 3000 [Fri Jul 28 15:57:51 2017] R10: R11: 0002 R12: 9ddccaf47080 [Fri Jul 28 15:57:51 2017] R13: 1000 R14: aa9121f6fc50 R15: 9dddc34df2d0 [Fri Jul 28 15:57:51 2017] FS: () GS:9ddcefa4() knlGS: [Fri Jul 28 15:57:51 2017] CS: 0010 DS: ES: CR0: 80050033 [Fri Jul 28 15:57:51 2017] Call Trace:_space_for_alloc+0xde/0x270 [btrfs] [Fri Jul 28 15:57:51 2017] btrfs_find_space_for_alloc+0xde/0x270 [btrfs] [Fri Jul 28 15:57:51 2017] find_free_extent.isra.68+0x3c6/0x1040 [btrfs]s] [Fri Jul 28 15:57:51 2017] btrfs_reserve_extent+0xab/0x210 [btrfs]btrfs] [Fri Jul 28 15:57:51 2017] submit_compressed_extents+0x154/0x580 [btrfs]s] [Fri Jul 28 15:57:51 2017] ? submit_compressed_extents+0x580/0x580 [btrfs] [Fri Jul 28 15:57:51 2017] async_cow_submit+0x82/0x90 [btrfs]00 [btrfs] [Fri Jul 28 15:57:51 2017] btrfs_scrubparity_helper+0x1fe/0x300 [btrfs] [Fri Jul 28 15:57:51 2017] btrfs_delalloc_helper+0xe/0x10 [btrfs] [Fri Jul 28 15:57:51 2017] process_one_work+0x16b/0x4a0a0 [Fri Jul 28 15:57:51 2017] worker_thread+0x4b/0x500+0x60/0x60 [Fri Jul 28 15:57:51 2017] kthread+0x109/0x1400x4a0/0x4a0 When the compression is turned off, I am able to get the maximum 500-600 mb/s write speed on this disk (raid array) with minimal cpu usage. mount options: relatime,discard,autodefrag,space_cache=v2,commit=10 # iostat -m 1 avg-cpu: %user %nice %system %iowait %steal %idle 0.080.007.74 10.770.00 81.40 Device:tpsMB_read/sMB_wrtn/sMB_readMB_wrtn sda2376.00 0.00
Re: btrfs-progs confusing message
On 04/21/2016 04:02 AM, Austin S. Hemmelgarn wrote: > On 2016-04-20 16:23, Konstantin Svist wrote: >> Pretty much all commands print out the usage message when no device is >> specified: >> >> [root@host ~]# btrfs scrub start >> btrfs scrub start: too few arguments >> usage: btrfs scrub start [-BdqrRf] [-c ioprio_class -n ioprio_classdata] >> | >> ... >> >> However, balance doesn't >> >> [root@host ~]# btrfs balance start >> ERROR: can't access 'start': No such file or directory > > And this is an example of why backwards comparability can be a pain. > The original balance command was 'btrfs filesystem balance', and had > no start, stop, or similar sub-commands. This got changed to the > current incarnation when the support for filters was added. For > backwards compatibility reasons, we decided to still accept balance > with no arguments other than the path as being the same as running > 'btrfs balance start' on that path, and then made the old name an > alias to the new one, with the restriction that you can't pass in > filters through that interface. What is happening here is that > balance is trying to interpret start as a path, not a command, hence > the message about not being able to access 'start'. > So since this is still detected as an error, why not print usage info at this point? -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
btrfs-progs confusing message
Pretty much all commands print out the usage message when no device is specified: [root@host ~]# btrfs scrub start btrfs scrub start: too few arguments usage: btrfs scrub start [-BdqrRf] [-c ioprio_class -n ioprio_classdata] | ... However, balance doesn't [root@host ~]# btrfs balance start ERROR: can't access 'start': No such file or directory -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: bedup --defrag freezing
On 08/06/2015 04:10 AM, Austin S Hemmelgarn wrote: On 2015-08-05 17:45, Konstantin Svist wrote: Hi, I've been running btrfs on Fedora for a while now, with bedup --defrag running in a night-time cronjob. Last few runs seem to have gotten stuck, without possibility of even killing the process (kill -9 doesn't work) -- all I could do is hard power cycle. Did something change recently? Is bedup simply too out of date? What should I use to de-duplicate across snapshots instead? Etc.? AFAIK, bedup hasn't been actively developed for quite a while (I'm actually kind of surprised it runs with the newest btrfs-progs). Personally, I'd suggest using duperemove (https://github.com/markfasheh/duperemove) Thanks, good to know. Tried duperemove -- it looks like it builds a database of its own checksums every time it runs... why won't it use BTRFS internal checksums for fast rejection? Would run a LOT faster... -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
bedup --defrag freezing
Hi, I've been running btrfs on Fedora for a while now, with bedup --defrag running in a night-time cronjob. Last few runs seem to have gotten stuck, without possibility of even killing the process (kill -9 doesn't work) -- all I could do is hard power cycle. Did something change recently? Is bedup simply too out of date? What should I use to de-duplicate across snapshots instead? Etc.? Thanks, Konstantin # uname -a Linux mireille.svist.net 4.0.8-200.fc21.x86_64 #1 SMP Fri Jul 10 21:09:54 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux # btrfs --version btrfs-progs v4.1 # btrfs fi show Label: none uuid: 5ac56e7d-3d04-4ffa-8160-5a47f46c2939 Total devices 1 FS bytes used 243.43GiB devid1 size 465.76GiB used 318.05GiB path /dev/sda2 btrfs-progs v4.1 # btrfs fi df / Data, single: total=309.01GiB, used=238.24GiB System, single: total=32.00MiB, used=64.00KiB Metadata, single: total=9.01GiB, used=5.19GiB GlobalReserve, single: total=512.00MiB, used=0.00B dmseg attached [0.00] CPU0 microcode updated early to revision 0x1c, date = 2014-07-03 [0.00] Initializing cgroup subsys cpuset [0.00] Initializing cgroup subsys cpu [0.00] Initializing cgroup subsys cpuacct [0.00] Linux version 4.0.8-200.fc21.x86_64 (mockbu...@bkernel02.phx2.fedoraproject.org) (gcc version 4.9.2 20150212 (Red Hat 4.9.2-6) (GCC) ) #1 SMP Fri Jul 10 21:09:54 UTC 2015 [0.00] Command line: BOOT_IMAGE=/main/boot/vmlinuz-4.0.8-200.fc21.x86_64 root=/dev/sda2 ro rootflags=subvol=main vconsole.font=latarcyrheb-sun16 quiet [0.00] e820: BIOS-provided physical RAM map: [0.00] BIOS-e820: [mem 0x-0x0009d7ff] usable [0.00] BIOS-e820: [mem 0x0009d800-0x0009] reserved [0.00] BIOS-e820: [mem 0x000e-0x000f] reserved [0.00] BIOS-e820: [mem 0x0010-0xba14] usable [0.00] BIOS-e820: [mem 0xba15-0xba156fff] ACPI NVS [0.00] BIOS-e820: [mem 0xba157000-0xba94] usable [0.00] BIOS-e820: [mem 0xba95-0xbabedfff] reserved [0.00] BIOS-e820: [mem 0xbabee000-0xcac0afff] usable [0.00] BIOS-e820: [mem 0xcac0b000-0xcb10afff] reserved [0.00] BIOS-e820: [mem 0xcb10b000-0xcb63dfff] usable [0.00] BIOS-e820: [mem 0xcb63e000-0xcb7aafff] ACPI NVS [0.00] BIOS-e820: [mem 0xcb7ab000-0xcbffefff] reserved [0.00] BIOS-e820: [mem 0xcbfff000-0xcbff] usable [0.00] BIOS-e820: [mem 0xcd00-0xcf1f] reserved [0.00] BIOS-e820: [mem 0xf800-0xfbff] reserved [0.00] BIOS-e820: [mem 0xfec0-0xfec00fff] reserved [0.00] BIOS-e820: [mem 0xfed0-0xfed03fff] reserved [0.00] BIOS-e820: [mem 0xfed1c000-0xfed1] reserved [0.00] BIOS-e820: [mem 0xfee0-0xfee00fff] reserved [0.00] BIOS-e820: [mem 0xff00-0x] reserved [0.00] BIOS-e820: [mem 0x0001-0x00022fdf] usable [0.00] NX (Execute Disable) protection: active [0.00] SMBIOS 2.8 present. [0.00] DMI: Notebook P15SM-A/SM1-A/P15SM-A/SM1-A, BIOS 4.6.5 03/27/2014 [0.00] e820: update [mem 0x-0x0fff] usable == reserved [0.00] e820: remove [mem 0x000a-0x000f] usable [0.00] e820: last_pfn = 0x22fe00 max_arch_pfn = 0x4 [0.00] MTRR default type: uncachable [0.00] MTRR fixed ranges enabled: [0.00] 0-9 write-back [0.00] A-B uncachable [0.00] C-C write-protect [0.00] D-E7FFF uncachable [0.00] E8000-F write-protect [0.00] MTRR variable ranges enabled: [0.00] 0 base 00 mask 7E write-back [0.00] 1 base 02 mask 7FE000 write-back [0.00] 2 base 022000 mask 7FF000 write-back [0.00] 3 base 00E000 mask 7FE000 uncachable [0.00] 4 base 00D000 mask 7FF000 uncachable [0.00] 5 base 00CE00 mask 7FFE00 uncachable [0.00] 6 base 00CD00 mask 7FFF00 uncachable [0.00] 7 base 022FE0 mask 7FFFE0 uncachable [0.00] 8 disabled [0.00] 9 disabled [0.00] PAT configuration [0-7]: WB WC UC- UC WB WC UC- UC [0.00] e820: update [mem 0xcd00-0x] usable == reserved [0.00] e820: last_pfn = 0xcc000 max_arch_pfn = 0x4 [0.00] found SMP MP-table at [mem 0x000fd830-0x000fd83f] mapped at [880fd830] [0.00] Base memory trampoline at [88097000] 97000 size 24576 [0.00] Using
corrupt 1, but no other indicators
I'm seeing the following message on every bootup in dmesg /var/log/messages: BTRFS: bdev /dev/sda2 errs: wr 0, rd 0, flush 0, corrupt 1, gen 0 I've tried running scrub and it doesn't indicate any errors occurred Is this normal? Is something actually corrupted? Can I fix it? Details: [root@mireille ~]# uname -a Linux mireille.svist.net 3.19.1-201.fc21.x86_64 #1 SMP Wed Mar 18 04:29:24 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux [root@mireille ~]# btrfs fi show Label: none uuid: 5ac56e7d-3d04-4ffa-8160-5a47f46c2939 Total devices 1 FS bytes used 237.28GiB devid1 size 465.76GiB used 465.76GiB path /dev/sda2 Btrfs v3.18.1 [root@mireille ~]# btrfs --version Btrfs v3.18.1 [root@mireille ~]# btrfs fi show Label: none uuid: 5ac56e7d-3d04-4ffa-8160-5a47f46c2939 Total devices 1 FS bytes used 237.28GiB devid1 size 465.76GiB used 465.76GiB path /dev/sda2 Btrfs v3.18.1 [root@mireille ~]# btrfs fi df / Data, single: total=457.75GiB, used=232.64GiB System, single: total=4.00MiB, used=80.00KiB Metadata, single: total=8.01GiB, used=4.64GiB GlobalReserve, single: total=512.00MiB, used=0.00B dmesg: http://pastebin.com/9B0h4SuA -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: PROBLEM: #89121 BTRFS mixes up mounted devices with their snapshots
Phillip Susi schrieb am 08.12.2014 um 15:59: On 12/7/2014 7:32 PM, Konstantin wrote: I'm guessing you are using metadata format 0.9 or 1.0, which put the metadata at the end of the drive and the filesystem still starts in sector zero. 1.2 is now the default and would not have this problem as its metadata is at the start of the disk ( well, 4k from the start ) and the fs starts further down. I know this and I'm using 0.9 on purpose. I need to boot from these disks so I can't use 1.2 format as the BIOS wouldn't recognize the partitions. Having an additional non-RAID disk for booting introduces a single point of failure which contrary to the idea of RAID0. The bios does not know or care about partitions. All you need is a That's only true for older BIOSs. With current EFI boards they not only care but some also mess around with GPT partition tables. partition table in the MBR and you can install grub there and have it boot the system from a mdadm 1.1 or 1.2 format array housed in a partition on the rest of the disk. The only time you really *have* to I was thinking of this solution as well but as I'm not aware of any partitioning tool caring about mdadm metadata so I rejected it. It requires a non-standard layout leaving reserved empty spaces for mdadm metadata. It's possible but it isn't documented so far I know and before losing hours of trying I chose the obvious one. use 0.9 or 1.0 ( and you really should be using 1.0 instead since it handles larger arrays and can't be confused vis. whole disk vs. partition components ) is if you are running a raid1 on the raw disk, with no partition table and then partition inside the array instead, and really, you just shouldn't be doing that. That's exactly what I want to do - running RAID1 on the whole disk as most hardware based RAID systems do. Before that I was running RAID on disk partitions for some years but this was quite a pain in comparison. Hot(un)plugging a drive brings you a lot of issues with failing mdadm commands as they don't like concurrent execution when the same physical device is affected. And rebuild of RAID partitions is done sequentially with no deterministic order. We could talk for hours about that but if interested maybe better in private as it is not BTRFS related. Anyway, to avoid a futile discussion, mdraid and its format is not the problem, it is just an example of the problem. Using dm-raid would do the same trouble, LVM apparently, too. I could think of a bunch of other cases including the use of hardware based RAID controllers. OK, it's not the majority's problem, but that's not the argument to keep a bug/flaw capable of crashing your system. dmraid solves the problem by removing the partitions from the underlying physical device ( /dev/sda ), and only exposing them on the array ( /dev/mapper/whatever ). LVM only has the problem when you take a snapshot. User space tools face the same issue and they resolve it by ignoring or deprioritizing the snapshot. I don't agree. dmraid and mdraid both remove the partitions. This is not a solution BTRFS will still crash the PC using /dev/mapper/whatever or whatever device appears in the system providing the BTRFS volume. As it is a nice feature that the kernel apparently scans for drives and automatically identifies BTRFS ones, it seems to me that this feature is useless. When in a live system a BTRFS RAID disk fails, it is not sufficient to hot-replace it, the kernel will not automatically rebalance. Commands are still needed for the task as are with mdraid. So the only point I can see at the moment where this auto-detect feature makes sense is when mounting the device for the first time. If I remember the documentation correctly, you mount one of the RAID devices and the others are automagically attached as well. But outside of the mount process, what is this auto-detect used for? So here a couple of rather simple solutions which, as far as I can see, could solve the problem: 1. Limit the auto-detect to the mount process and don't do it when devices are appearing. 2. When a BTRFS device is detected and its metadata is identical to one already mounted, just ignore it. That doesn't really solve the problem since you can still pick the wrong one to mount in the first place. Oh, it does solve the problem, you are are speaking of another problem which is always there when having several disks in a system. Mounting the wrong device can happen the case I'm describing if you use UUID, label or some other metadata related information to mount it. You won't try do that when you insert a disk you know it has the same metadata. It will not happen (except user tools outsmart you ;-)) when using the device name(s). I think it could be expected from a user mounting things manually to know or learn which device node is which drive. On the other hand in my case one of the drives is already mounted so getting
Re: PROBLEM: #89121 BTRFS mixes up mounted devices with their snapshots
Robert White schrieb am 08.12.2014 um 18:20: On 12/07/2014 04:32 PM, Konstantin wrote: I know this and I'm using 0.9 on purpose. I need to boot from these disks so I can't use 1.2 format as the BIOS wouldn't recognize the partitions. Having an additional non-RAID disk for booting introduces a single point of failure which contrary to the idea of RAID0. GRUB2 has raid 1.1 and 1.2 metadata support via the mdraid1x module. LVM is also supported. I don't know if a stack of both is supported. There is, BTW, no such thing as a (commodity) computer without a single point of failure in it somewhere. I've watched government contracts chase this demon for decades. Be it disk, controller, network card, bus chip, cpu or stick-of-ram you've got a single point of failure somewhere. Actually you likely have several such points of potential failure. For instance, are you _sure_ your BIOS is going to check the second drive if it gets read failure after starting in on your first drive? Chances are it won't because that four-hundred bytes-or-so boot loader on that first disk has no way to branch back into the bios. You can waste a lot of your life chasing that ghost and you'll still discover you've missed it and have to whip out your backup boot media. It may well be worth having a second copy of /boot around, but make sure you stay out of bandersnatch territory when designing your system. The more you over-think the plumbing, the easier it is to stop up the pipes. You are right, there is as good as always a single point of failure somewhere, even if it is the power plant providing your electricity ;-). I should have written introduces an additional single point of failure to be 100% correct but I thought this was obvious. As I have replaced dozens of damaged hard disks but only a few CPUs, RAMs etc. it is more important for me to reduce the most frequent and easy-to-solve points of failure. For more important systems there are high availability solutions which alleviate many of the problems you mention of but that's not the point here when speaking about the major bug in BTRFS which can make your system crash. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: PROBLEM: #89121 BTRFS mixes up mounted devices with their snapshots
Anand Jain wrote on 02.12.2014 at 12:54: On 02/12/2014 19:14, Goffredo Baroncelli wrote: I further investigate this issue. MegaBrutal, reported the following issue: doing a lvm snapshot of the device of a mounted btrfs fs, the new snapshot device name replaces the name of the original device in the output of /proc/mounts. This confused tools like grub-probe which report a wrong root device. very good test case indeed thanks. Actual IO would still go to the original device, until FS is remounted. This seems to be correct at least at the beginning but I wouldn't be so sure - why else the system is crashing in my case after a while when the second drive is present?! So if the kernel was not using it in some way, except the wrong /proc/mounts nothing else should happen. It has to be pointed out that instead the link under /sys/fs/btrfs/fsid/devices is correct. In this context the above sysfs path will be out of sync with the reality, its just stale sysfs entry. What happens is that *even if the filesystem is mounted*, doing a btrfs dev scan of a snapshot (of the real volume), the device name of the filesystem is replaced with the snapshot one. we have some fundamentally wrong stuff. My original patch tried to fix it. But later discovered that some external entities like systmed and boot process is using that bug as a feature and we had to revert the patch. Fundamentally scsi inquiry serial number is only number which is unique to the device (including the virtual device, but there could be some legacy virtual device which didn't follow that strictly, Anyway those I deem to be device side issue.) Btrfs depends on the combination of fsid, uuid and devid (and generation number) to identify the unique device volume, which is weak and easy to go wrong. Anand, with b96de000b, tried to fix it; however further regression appeared and Chris reverted this commit (see below). BR G.Baroncelli commit b96de000bc8bc9688b3a2abea4332bd57648a49f Author: Anand Jain anand.j...@oracle.com Date: Thu Jul 3 18:22:05 2014 +0800 Btrfs: device_list_add() should not update list when mounted [...] commit 0f23ae74f589304bf33233f85737f4fd368549eb Author: Chris Mason c...@fb.com Date: Thu Sep 18 07:49:05 2014 -0700 Revert Btrfs: device_list_add() should not update list when mounted This reverts commit b96de000bc8bc9688b3a2abea4332bd57648a49f. This commit is triggering failures to mount by subvolume id in some configurations. The main problem is how many different ways this scanning function is used, both for scanning while mounted and unmounted. A proper cleanup is too big for late rcs. [...] On 12/02/2014 09:28 AM, MegaBrutal wrote: 2014-12-02 8:50 GMT+01:00 Goffredo Baroncelli kreij...@inwind.it: On 12/02/2014 01:15 AM, MegaBrutal wrote: 2014-12-02 0:24 GMT+01:00 Robert White rwh...@pobox.com: On 12/01/2014 02:10 PM, MegaBrutal wrote: Since having duplicate UUIDs on devices is not a problem for me since I can tell them apart by LVM names, the discussion is of little relevance to my use case. Of course it's interesting and I like to read it along, it is not about the actual problem at hand. Which is why you use the device= mount option, which would take LVM names and which was repeatedly discussed as solving this very problem. Once you decide to duplicate the UUIDs with LVM snapshots you take up the burden of disambiguating your storage. Which is part of why re-reading was suggested as this was covered in some depth and _is_ _exactly_ about the problem at hand. Nope. root@reproduce-1391429:~# cat /proc/cmdline BOOT_IMAGE=/vmlinuz-3.18.0-031800rc5-generic root=/dev/mapper/vg-rootlv ro rootflags=device=/dev/mapper/vg-rootlv,subvol=@ Observe, device= mount option is added. device= options is needed only in a btrfs multi-volume scenario. If you have only one disk, this is not needed I know. I only did this as a demonstration for Robert. He insisted it will certainly solve the problem. Well, it doesn't. root@reproduce-1391429:~# ./reproduce-1391429.sh #!/bin/sh -v lvs LV VG Attr LSize Pool Origin Data% Move Log Copy% Convert rootlv vg -wi-ao--- 1.00g swap0 vg -wi-ao--- 256.00m grub-probe --target=device / /dev/mapper/vg-rootlv grep / /proc/mounts rootfs / rootfs rw 0 0 /dev/dm-1 / btrfs rw,relatime,space_cache 0 0 lvcreate --snapshot --size=128M --name z vg/rootlv Logical volume z created lvs LV VG Attr LSize Pool Origin Data% Move Log Copy% Convert rootlv vg owi-aos-- 1.00g swap0 vg -wi-ao--- 256.00m z vg swi-a-s-- 128.00m rootlv 0.11 ls -l /dev/vg/ total 0 lrwxrwxrwx 1 root root 7 Dec 2 00:12 rootlv - ../dm-1 lrwxrwxrwx 1 root root 7 Dec 2 00:12 swap0 - ../dm-0 lrwxrwxrwx 1 root root 7 Dec 2 00:12 z - ../dm-2 grub-probe --target=device / /dev/mapper/vg-z grep /
Re: PROBLEM: #89121 BTRFS mixes up mounted devices with their snapshots
Phillip Susi wrote on 02.12.2014 at 20:19: On 12/1/2014 4:45 PM, Konstantin wrote: The bug appears also when using mdadm RAID1 - when one of the drives is detached from the array then the OS discovers it and after a while (not directly, it takes several minutes) it appears under /proc/mounts: instead of /dev/md0p1 I see there /dev/sdb1. And usually after some hour or so (depending on system workload) the PC completely freezes. So discussion about the uniqueness of UUIDs or not, a crashing kernel is telling me that there is a serious bug. I'm guessing you are using metadata format 0.9 or 1.0, which put the metadata at the end of the drive and the filesystem still starts in sector zero. 1.2 is now the default and would not have this problem as its metadata is at the start of the disk ( well, 4k from the start ) and the fs starts further down. I know this and I'm using 0.9 on purpose. I need to boot from these disks so I can't use 1.2 format as the BIOS wouldn't recognize the partitions. Having an additional non-RAID disk for booting introduces a single point of failure which contrary to the idea of RAID0. Anyway, to avoid a futile discussion, mdraid and its format is not the problem, it is just an example of the problem. Using dm-raid would do the same trouble, LVM apparently, too. I could think of a bunch of other cases including the use of hardware based RAID controllers. OK, it's not the majority's problem, but that's not the argument to keep a bug/flaw capable of crashing your system. As it is a nice feature that the kernel apparently scans for drives and automatically identifies BTRFS ones, it seems to me that this feature is useless. When in a live system a BTRFS RAID disk fails, it is not sufficient to hot-replace it, the kernel will not automatically rebalance. Commands are still needed for the task as are with mdraid. So the only point I can see at the moment where this auto-detect feature makes sense is when mounting the device for the first time. If I remember the documentation correctly, you mount one of the RAID devices and the others are automagically attached as well. But outside of the mount process, what is this auto-detect used for? So here a couple of rather simple solutions which, as far as I can see, could solve the problem: 1. Limit the auto-detect to the mount process and don't do it when devices are appearing. 2. When a BTRFS device is detected and its metadata is identical to one already mounted, just ignore it. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
raid5 filesystem only mountable ro and not currently fixable after a drive produced read errors
Hello, I have a raid5 btrfs that refuses to mount rw (ro works) and I think I'm out of options to get it fixed. First, this is roughly what got my filesystem corrupted: 1. I created the raid5 fs in March 2014 using the latest code available (Btrfs 3.12) on four 4TB devices (each encrypted using dm-crypt). I also created 3 subvolumes. The command used was: mkfs.btrfs -O skinny-metadata -d raid5 -m raid5 /dev/mapper/wdred4tb[2345] 2. Around October I noticed one of the drived (wdred4tb3) produced read errors. Running a long smartctl self-test would fail as well and the reported Raw_Read_Error_Rate increased steadily. 3. Since I had a spare drive around, but replacing a device wasn't implemented back then for raid5, I decided to use the add-then-delete approach outlined here: http://marc.merlins.org/perso/btrfs/2014-03.html#Btrfs-Raid5-Status . I did *not* remove the failing drive for that. 4. The rebalance triggered by the btrfs device delete /dev/mapper/wdred4tb3 command crashed a few times (and read errors kept increasing), but each time I started it, a few hundred GiB were moved over to the newly added device. But when 414GiB were left on the failing drive, it didn't get further. It now still looks like this: # btrfs fi show /mnt/box Label: none uuid: 9f3a48b7-1b88-44f0-a387-f3712fc2c0b6 Total devices 5 FS bytes used 4.43TiB devid1 size 3.64TiB used 1.50TiB path /dev/mapper/wdred4tb2 devid2 size 3.64TiB used 414.00GiB path /dev/mapper/wdred4tb3 devid3 size 3.64TiB used 1.50TiB path /dev/mapper/wdred4tb4 devid4 size 3.64TiB used 1.50TiB path /dev/mapper/wdred4tb5 devid5 size 3.64TiB used 1.10TiB path /dev/mapper/wdred4tb1 Btrfs v3.17.2-50-gcc0723c 5. I tried several things (probably a new kernel around 3.17, propbably affected the snapshot-bug, but I don't use snapshots, only subvolumes) and ended up doing a btrfsck --repair (v3.17-rc3) on the filesystem. I still have the complete output of that, let me know if you need it. Here are some lines that seem interesting to me: # btrfsck --repair /dev/mapper/wdred4tb2 enabling repair mode Checking filesystem on /dev/mapper/wdred4tb2 UUID: 9f3a48b7-1b88-44f0-a387-f3712fc2c0b6 checking extents Check tree block failed, want=500170752, have=5421517155842471019 Check tree block failed, want=500170752, have=5421517155842471019 Check tree block failed, want=500170752, have=5421517155842471019 read block failed check_tree_block [...] owner ref check failed [500170752 16384] repair deleting extent record: key 500170752 169 0 adding new tree backref on start 500170752 len 16384 parent 7 root 7 [...] repaired damaged extent references checking free space cache cache and super generation don't match, space cache will be invalidated checking fs roots Check tree block failed, want=500170752, have=5421517155842471019 Check tree block failed, want=500170752, have=5421517155842471019 Check tree block failed, want=500170752, have=5421517155842471019 read block failed check_tree_block [...] Check tree block failed, want=668598272, have=668794880 Csum didn't match [...] checking csums Check tree block failed, want=500170752, have=5421517155842471019 Check tree block failed, want=500170752, have=5421517155842471019 Check tree block failed, want=500170752, have=5421517155842471019 read block failed check_tree_block Error going to next leaf -5 checking root refs found 1469190132145 bytes used err is 0 total csum bytes: 4750630700 total tree bytes: 6141100032 total fs tree bytes: 345964544 total extent tree bytes: 194052096 btree space waste bytes: 867842012 file data blocks allocated: 4865657503744 referenced 4895640494080 Btrfs v3.17-rc3 extent buffer leak: start 842235904 len 16384 extent buffer leak: start 842235904 len 16384 [...] 6. As far as I can remember, that was the point when mounting rw stopped working. Mounting ro seems to work quite fine though (no idea if data was lost/corrupted). I removed the failing drive today and updated to the latest integration branch of cmason's git repository (including Miao Xie's patches for raid56 replacement) and David's integration-20141125 branch for btrfs-progs. With those, I tried a mount with -o ro,degraded,recovery (works, but didn't recover). I also tried a btrfsck again, but it just prints some errors and then exits. Mounting rw with -o degraded gives the following output in dmesg: [ 7358.907119] BTRFS: open /dev/dm-4 failed [ 7358.907860] BTRFS info (device dm-6): allowing degraded mounts [ 7358.907866] BTRFS info (device dm-6): enabling auto recovery [ 7358.907870] BTRFS info (device dm-6): disk space caching is enabled [ 7358.907872] BTRFS: has skinny extents [ 7360.549993] BTRFS: bdev /dev/dm-4 errs: wr 0, rd 22288, flush 0, corrupt 0, gen 0 [ 7377.923939] BTRFS info (device dm-6): The free space cache file (7065489637376) is invalid. skip it [ 7383.443486] BTRFS (device dm-6): parent transid verify failed on
Re: PROBLEM: #89121 BTRFS mixes up mounted devices with their snapshots
MegaBrutal schrieb am 01.12.2014 um 13:56: Hi all, I've reported the bug I've previously posted about in BTRFS messes up snapshot LV with origin in the Kernel Bug Tracker. https://bugzilla.kernel.org/show_bug.cgi?id=89121 Hi MegaBrutal. If I understand your report correctly, I can give you another example where this bug is appearing. It is so bad that it leads to freezing the system and I'm quite sure it's the same thing. I was thinking about filing a bug but didn't have the time for that yet. Maybe you could add this case to your bug report as well. The bug appears also when using mdadm RAID1 - when one of the drives is detached from the array then the OS discovers it and after a while (not directly, it takes several minutes) it appears under /proc/mounts: instead of /dev/md0p1 I see there /dev/sdb1. And usually after some hour or so (depending on system workload) the PC completely freezes. So discussion about the uniqueness of UUIDs or not, a crashing kernel is telling me that there is a serious bug. While in my case detaching was intentional, there are several real possibilities when a RAID1 disk can get detached and currently this leads to crashing the server when using BTRFS. That not what is intended when using RAID ;-). In my case I wanted to do something which was working perfectly all the years before with all other file systems - checking the file system of the root disk while the server is running. The procedure is simple: 1. detach one of the disks 2. do fsck on the disk device 3. mdadm --zero-superblock on the device so it gets completely rewritten 4. mdadm --add it to the array There were some surprises with BTRFS - if 2. is not done directly after 1. btrfsck refuses to check the disk as it is reported to be mounted by /proc/mounts. And while 2. or even after finishing it the system was freezing. If I got to get to 4. fast enough everything was OK, but again, that's not what I expect from a good operating system. Any objections? Konstantin -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Two persistent problems
Josef Bacik wrote on 14.11.2014 at 23:00: On 11/14/2014 04:51 PM, Hugo Mills wrote: Chris, Josef, anyone else who's interested, On IRC, I've been seeing reports of two persistent unsolved problems. Neither is showing up very often, but both have turned up often enough to indicate that there's something specific going on worthy of investigation. One of them is definitely a btrfs problem. The other may be btrfs, or something in the block layer, or just broken hardware; it's hard to tell from where I sit. Problem 1: ENOSPC on balance This has been going on since about March this year. I can reasonably certainly recall 8-10 cases, possibly a number more. When running a balance, the operation fails with ENOSPC when there's plenty of space remaining unallocated. This happens on full balance, filtered balance, and device delete. Other than the ENOSPC on balance, the FS seems to work OK. It seems to be more prevalent on filesystems converted from ext*. The first few or more reports of this didn't make it to bugzilla, but a few of them since then have gone in. Problem 2: Unexplained zeroes Failure to mount. Transid failure, expected xyz, have 0. Chris looked at an early one of these (for Ke, on IRC) back in September (the 27th -- sadly, the public IRC logs aren't there for it, but I can supply a copy of the private log). He rapidly came to the conclusion that it was something bad going on with TRIM, replacing some blocks with zeroes. Since then, I've seen a bunch of these coming past on IRC. It seems to be a 3.17 thing. I can successfully predict the presence of an SSD and -odiscard from the have 0. I've successfully persuaded several people to put this into bugzilla and capture btrfs-images. btrfs recover doesn't generally seem to be helpful in recovering data. I think Josef had problem 1 in his sights, but I don't know if additional images or reports are helpful at this point. For problem 2, there's obviously something bad going on, but there's not much else to go on -- and the inability to recover data isn't good. For each of these, what more information should I be trying to collect from any future reporters? So for #2 I've been looking at that the last two weeks. I'm always paranoid we're screwing up one of our data integrity sort of things, either not waiting on IO to complete properly or something like that. I've built a dm target to be as evil as possible and have been running it trying to make bad things happen. I got slightly side tracked since my stress test exposed a bug in the tree log stuff an csums which I just fixed. Now that I've fixed that I'm going back to try and make the expected blah, have 0 type errors happen. As for the ENOSPC I keep meaning to look into it and I keep getting distracted with other more horrible things. Ideally I'd like to reproduce it myself, so more info on that front would be good, like do all reports use RAID/compression/some other odd set of features? Thanks for taking care of this stuff Hugo, #2 is the worst one and I'd like to be absolutely sure it's not our bug, once I'm happy we aren't I'll look at the balance thing. Josef For #2, I had a strangely damaged BTRFS I reported a week or so ago which may have similar background. Dmesg gives: parent transid verify failed on 586239082496 wanted 13329746340512024838 found 588 BTRFS: open_ctree failed The thing is that btrfsck crashes when trying to check this. As nobody seemed to be interested I reformatted this disk today. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
btrfsck crash
Hello! I got a strangely corrupted btrfs where btrfsck seems to crash. First try with v3.14 spit a large amount of messages (http://pastebin.com/J1jCzhzx http://pastebin.com/J1jCzhzx), then a run with v3.17 gives an Assertion failed error with other messages (http://pastebin.com/TE6dSjgR http://pastebin.com/TE6dSjgR). Anyone interested in looking into this or should I reformat this disk? Konstantin -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs partition remounted read-only
On 07/13/2014 10:13 AM, Chris Murphy wrote: On Jul 4, 2014, at 11:00 AM, Konstantin Svist fry@gmail.com wrote: I have an overnight cron job with /sbin/fstrim -v / /bin/bedup dedup --defrag Probably not related, but these look backwards, why not reverse them? Chris Murphy Thanks, will do that. Anything else useful I could add to the cron job, btw? I was thinking maybe a scrub operation to check for errors.. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
btrfs partition remounted read-only
I have an overnight cron job with /sbin/fstrim -v / /bin/bedup dedup --defrag Every once in a while, it causes the FS to be remounted read-only. Problem is pretty intermittent so far (aside from a few kernel revisions a while ago). Please advise. Corresponding bugs: https://bugzilla.kernel.org/show_bug.cgi?id=71311 https://bugzilla.redhat.com/show_bug.cgi?id=1071408 Addtl info: # uname -a Linux mireille.svist.net 3.14.9-200.fc20.x86_64 #1 SMP Thu Jun 26 21:40:51 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux # btrfs --version Btrfs v3.14.2 # btrfs fi show Label: none uuid: 5ac56e7d-3d04-4ffa-8160-5a47f46c2939 Total devices 1 FS bytes used 151.77GiB devid1 size 465.76GiB used 282.02GiB path /dev/sda2 Btrfs v3.14.2 # btrfs fi df / Data, single: total=277.01GiB, used=148.86GiB System, single: total=4.00MiB, used=48.00KiB Metadata, single: total=5.01GiB, used=2.91GiB -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: severe hardlink bug
Jan Schmidt list.btrfs at jan-o-sch.net writes: Please give the patch set btrfs: extended inode refs by Mark Fasheh a try (http://lwn.net/Articles/498226/). It eliminates the hard links per directory limit (introducing a rather random, artificial limit of 64k instead). Hi, Jan! I'm happy to see that there is something done on fixing that issue. Unfortunately I cannot afford to have unstable patched kernel on the production server. Probably I'll give another try to btrfs in 2013. ^__^ K. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: severe hardlink bug
Dipl.-Ing. Michael Niederle mniederle at gmx.at writes: I reinstalled over 700 packages - plt-scheme beeing the only one failing due to the btrfs link restriction. I have hit the same issue - tried to run BackupPC with a pool on btrfs filesystem. After some time the error of too many links (31) appeared to me. Now I'm forced to migrate to some other filesystem... -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: severe hardlink bug
C Anthony Risinger anthony at xtfx.me writes: btrfs only fails when you have hundreds of hardlinks to the same file in the *same* directory ... certainly not a standard use case. use snapshots to your advantage: - snap source - rsync --inplace source to target (with some other opts that have been discussed on list) - snap target - {rinse-and-repeat-in-24-hrs} I understand that the case is only for *same* directory. You can claim that it's not a standard use case, but first Michael hit that, now me. There's at least one more case - https://lists.samba.org/archive/rsync/2011-December/027117.html The count of such cases will be increasing and the sooner it will be fixed - the less pain it will bring to the users. I know fixing that is a big structural change, but it will become worse with time. If it's not going to be fixed - I don't care. Right now I'm forced to migrate to old mdadm raid-1 or ZFS. The sad thing is that I really LOVED btrfs. Only that. ^__^ K. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] btrfs: fix warning in iput for bad-inode
iput() shouldn't be called for inodes in I_NEW state, lets call __destroy_inode() and btrfs_destroy_inode() instead [1.871723] WARNING: at fs/inode.c:1309 iput+0x1d9/0x200() [1.873722] Modules linked in: [1.873722] Pid: 1, comm: swapper Tainted: GW 3.1.0-rc2-zurg #58 [1.875722] Call Trace: [1.875722] [8113cb99] ? iput+0x1d9/0x200 [1.876722] [81044c3a] warn_slowpath_common+0x7a/0xb0 [1.877722] [81044c85] warn_slowpath_null+0x15/0x20 [1.879722] [8113cb99] iput+0x1d9/0x200 [1.879722] [81295cf4] btrfs_iget+0x1c4/0x450 [1.881721] [812b7e6b] ? btrfs_tree_read_unlock_blocking+0x3b/0x60 [1.882721] [8111769a] ? kmem_cache_free+0x2a/0x160 [1.883721] [812966f3] btrfs_lookup_dentry+0x413/0x490 [1.885721] [8103b1e1] ? get_parent_ip+0x11/0x50 [1.886720] [81296781] btrfs_lookup+0x11/0x30 [1.887720] [8112de50] d_alloc_and_lookup+0x40/0x80 [1.888720] [8113ac10] ? d_lookup+0x30/0x50 [1.889720] [811301a8] do_lookup+0x288/0x370 [1.890720] [8103b1e1] ? get_parent_ip+0x11/0x50 [1.891720] [81132210] do_last+0xe0/0x910 [1.892720] [81132b4d] path_openat+0xcd/0x3a0 [1.893719] [813bab4b] ? wait_for_xmitr+0x3b/0xa0 [1.895719] [8131c50a] ? put_dec_full+0x5a/0xb0 [1.896719] [813babdb] ? serial8250_console_putchar+0x2b/0x40 [1.897719] [81132e7d] do_filp_open+0x3d/0xa0 [1.898719] [8103b1e1] ? get_parent_ip+0x11/0x50 [1.899718] [8103b1e1] ? get_parent_ip+0x11/0x50 [1.900718] [816e80fd] ? sub_preempt_count+0x9d/0xd0 [1.902718] [8112a09d] open_exec+0x2d/0xf0 [1.903718] [8112aaaf] do_execve_common.isra.32+0x12f/0x340 [1.906717] [8112acd6] do_execve+0x16/0x20 [1.907717] [8100af02] sys_execve+0x42/0x70 [1.908717] [816ed968] kernel_execve+0x68/0xd0 [1.909717] [816d828e] ? run_init_process+0x1e/0x20 [1.911717] [816d831e] init_post+0x8e/0xc0 [1.912716] [81cb8c79] kernel_init+0x13d/0x13d [1.913716] [816ed8f4] kernel_thread_helper+0x4/0x10 [1.914716] [81cb8b3c] ? start_kernel+0x33f/0x33f [1.915716] [816ed8f0] ? gs_change+0xb/0xb Signed-off-by: Konstantin Khlebnikov khlebni...@openvz.org --- fs/btrfs/inode.c | 10 +++--- 1 files changed, 3 insertions(+), 7 deletions(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 15fceef..3e949bd 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -3952,7 +3952,6 @@ struct inode *btrfs_iget(struct super_block *s, struct btrfs_key *location, struct btrfs_root *root, int *new) { struct inode *inode; - int bad_inode = 0; inode = btrfs_iget_locked(s, location-objectid, root); if (!inode) @@ -3968,15 +3967,12 @@ struct inode *btrfs_iget(struct super_block *s, struct btrfs_key *location, if (new) *new = 1; } else { - bad_inode = 1; + __destroy_inode(inode); + btrfs_destroy_inode(inode); + inode = ERR_PTR(-ESTALE); } } - if (bad_inode) { - iput(inode); - inode = ERR_PTR(-ESTALE); - } - return inode; } -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html