Re: Help with space
Absolutely. I'd like to know the answer to this, as 13 tera will take a considerable amount of time to back up anywhere, assuming I find a place. I'm considering rebuilding a smaller raid with newer drives (it was originally built using 16 250 gig western digital drives, it's about eleven years old now, having been in use the entire time without failure, I'm considering replacing each 250 gig with a 3 tera alternative). Unfortunately, between upgrading the host and building a new raid the expense isn't something I'm anticipating with pleasure... On Fri, Feb 28, 2014 at 1:27 AM, Duncan <1i5t5.dun...@cox.net> wrote: > Roman Mamedov posted on Fri, 28 Feb 2014 10:34:36 +0600 as excerpted: > >> But then as others mentioned it may be risky to use this FS on 32-bit at >> all, so I'd suggest trying anything else only after you reboot into a >> 64-bit kernel. > > Based on what I've read on-list, btrfs is not arch-agnostic, with certain > on-disk sizes set to native kernel page size, etc, so a filesystem > created on one arch may well not work on another. > > Question: Does this apply to x86/amd64? Will a filesystem created/used > on 32-bit x86 even mount/work on 64-bit amd64/x86_64, or does upgrading > to 64-bit imply backing up (in this case) double-digit TiB of data to > something other than btrfs and testing it, doing a mkfs on the original > filesystem once in 64-bit mode, and restoring all that data from backup? > > If the existing 32-bit x86 btrfs can't be used on 64-bit amd64, > transferring all that data (assuming there's something big enough > available to transfer it to!) to backup and then restoring it is going to > hurt! > > -- > Duncan - List replies preferred. No HTML msgs. > "Every nonfree program has a lord, a master -- > and if you use the program, he is your master." Richard Stallman > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Help with space
Apologies for the late reply, I'd assumed the issue was closed even given the unusual behavior. My mount options are: /dev/sdb1 on /var/lib/nobody/fs/ubfterra type btrfs (rw,noatime,nodatasum,nodatacow,noacl,space_cache,skip_balance) I only recently added nodatacow and skip_balance in an attempt to figure out where the missing space had gone. I don't know what impact it might have if any on things. I've got a full balance running at the moment which, after about a day or so, has managed to process 5% of the chunks it's considering (988 out of about 18396 chunks balanced (989 considered), 95% left). The amount of free space has vacillated slightly, growing by about a gig to shrink back. As far as objects in the file system missing, I've not seen any such. I've a lot of files of various data types, the majority is encoded japanese animation. Since I actually play these files via samba from a htpc, particularly the more recent additions, I'd hazard to guess that if something were breaking I'd have tripped across it by now, the unusual used to free space delta being the exception. My brother also uses this raid for data storage, he's something of a closet meteorologist and is fascinated by tornadoes. He hasn't noticed any unusual behavior either. I'm in the process of sourcing a 64 bit capable system in the hopes that will resolve the issue. Neither of us are currently writing anything to the file system for fear of things breaking, but both have been reading from it without issue other than the noticeable impact in performance balance seems to be having. Thanks for the help. -Justin On Fri, Feb 28, 2014 at 12:26 AM, Chris Murphy wrote: > > On Feb 27, 2014, at 11:13 PM, Chris Murphy wrote: > >> >> On Feb 27, 2014, at 11:19 AM, Justin Brown wrote: >> >>> terra:/var/lib/nobody/fs/ubfterra # btrfs fi df . >>> Data, single: total=17.58TiB, used=17.57TiB >>> System, DUP: total=8.00MiB, used=1.93MiB >>> System, single: total=4.00MiB, used=0.00 >>> Metadata, DUP: total=392.00GiB, used=33.50GiB >>> Metadata, single: total=8.00MiB, used=0.00 >> >> After glancing at this again, what I thought might be going on might not be >> going on. The fact it has 17+TB already used, not merely allocated, doesn't >> seem possible if there's a hard 16TB limit for Btrfs on 32-bit kernels. >> >> But then I don't know why du -h is reporting only 13T total used. And I'm >> unconvinced this is a balance issue either. Is anything obviously missing >> from the file system? > > What are your mount options? Maybe compression? > > Clearly du is calculating things differently. I'm getting: > > du -sch = 4.2G > df -h= 5.4G > btrfs df = 4.7G data and 620MB metadata(total). > > I am using compress=lzo. > > Chris Murphy > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Help with space
On Fri, 28 Feb 2014 07:27:06 + (UTC) Duncan <1i5t5.dun...@cox.net> wrote: > Based on what I've read on-list, btrfs is not arch-agnostic, with certain > on-disk sizes set to native kernel page size, etc, so a filesystem > created on one arch may well not work on another. > > Question: Does this apply to x86/amd64? Will a filesystem created/used > on 32-bit x86 even mount/work on 64-bit amd64/x86_64, or does upgrading > to 64-bit imply backing up (in this case) double-digit TiB of data to > something other than btrfs and testing it, doing a mkfs on the original > filesystem once in 64-bit mode, and restoring all that data from backup? Page size (4K) is the same on both i386 and amd64. It's also the same on ARM. Problem arises only on architectures like MIPS and PowerPC, some variants of which use 16K or 64K page sizes. Other than this page size issue, it has no arch-specific dependencies, e.g. no on-disk structures with "CPU-native integer" sized fields etc, that'd be too crazy to be true. -- With respect, Roman signature.asc Description: PGP signature
Re: Help with space
Roman Mamedov posted on Fri, 28 Feb 2014 10:34:36 +0600 as excerpted: > But then as others mentioned it may be risky to use this FS on 32-bit at > all, so I'd suggest trying anything else only after you reboot into a > 64-bit kernel. Based on what I've read on-list, btrfs is not arch-agnostic, with certain on-disk sizes set to native kernel page size, etc, so a filesystem created on one arch may well not work on another. Question: Does this apply to x86/amd64? Will a filesystem created/used on 32-bit x86 even mount/work on 64-bit amd64/x86_64, or does upgrading to 64-bit imply backing up (in this case) double-digit TiB of data to something other than btrfs and testing it, doing a mkfs on the original filesystem once in 64-bit mode, and restoring all that data from backup? If the existing 32-bit x86 btrfs can't be used on 64-bit amd64, transferring all that data (assuming there's something big enough available to transfer it to!) to backup and then restoring it is going to hurt! -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Incremental backup over writable snapshot
GEO posted on Thu, 27 Feb 2014 14:10:25 +0100 as excerpted: > Does anyone have a technical info regarding the reliability of the > incremental backup process using the said method? Stepping back from your specific method for a moment... You're using btrfs send/receive, which I wouldn't exactly call entirely reliable ATM -- just look at all patches going by on the list to fix it up ATM. In theory it should /get/ there, but it's very much in flux at this moment; certainly nothing I'd personally rely on here. Btrfs itself is still only semi-stable, and that's one of the more advanced and currently least likely to work without errors features. (Tho raid5/6 mode is worse, since from all I've read send/receive should at least fail up-front if it's going to fail, while raid5/6 will currently look like it's working... until you actually need the raid5/6 redundancy and btrfs data integrity mode aspects!) >From what I've read, *IF* the send/receive process completes without errors it should make a reasonably reliable backup. The problem is that there's a lot of error-triggering corner-cases ATM, and given your definitely non-standard use-case, I expect your chances of running into such errors is higher than normal. But if send/receive /does/ complete without errors, AFAIK it should be a reliable replication. Meanwhile, over time those corner-cases should be worked out, and I've seen nothing in your use-case that says it /shouldn't/ work, once send/ receive itself is working reliably. Your use-case may be an odd corner- case, but it should either work or not, and once btrfs send/receive is working reliably, based on all I've read both from you and on the list in general, your case too should work reliably. =:^) But for the moment, unless you're aim is to be a guinea pig working closely with the devs to test an interesting corner-case and report problems so they can be traced and fixed, I'd suggest using some other method. Give btrfs send/receive, and the filesystem as a whole, another six months or a year to mature and stabilize, and AFAIK your suggested method might not be the most efficient or recommended way to do things for the reasons others have given, but it should none-the-less work. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Help with space
On Feb 27, 2014, at 11:13 PM, Chris Murphy wrote: > > On Feb 27, 2014, at 11:19 AM, Justin Brown wrote: > >> terra:/var/lib/nobody/fs/ubfterra # btrfs fi df . >> Data, single: total=17.58TiB, used=17.57TiB >> System, DUP: total=8.00MiB, used=1.93MiB >> System, single: total=4.00MiB, used=0.00 >> Metadata, DUP: total=392.00GiB, used=33.50GiB >> Metadata, single: total=8.00MiB, used=0.00 > > After glancing at this again, what I thought might be going on might not be > going on. The fact it has 17+TB already used, not merely allocated, doesn't > seem possible if there's a hard 16TB limit for Btrfs on 32-bit kernels. > > But then I don't know why du -h is reporting only 13T total used. And I'm > unconvinced this is a balance issue either. Is anything obviously missing > from the file system? What are your mount options? Maybe compression? Clearly du is calculating things differently. I'm getting: du -sch = 4.2G df -h= 5.4G btrfs df = 4.7G data and 620MB metadata(total). I am using compress=lzo. Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Help with space
On Feb 27, 2014, at 11:19 AM, Justin Brown wrote: > terra:/var/lib/nobody/fs/ubfterra # btrfs fi df . > Data, single: total=17.58TiB, used=17.57TiB > System, DUP: total=8.00MiB, used=1.93MiB > System, single: total=4.00MiB, used=0.00 > Metadata, DUP: total=392.00GiB, used=33.50GiB > Metadata, single: total=8.00MiB, used=0.00 After glancing at this again, what I thought might be going on might not be going on. The fact it has 17+TB already used, not merely allocated, doesn't seem possible if there's a hard 16TB limit for Btrfs on 32-bit kernels. But then I don't know why du -h is reporting only 13T total used. And I'm unconvinced this is a balance issue either. Is anything obviously missing from the file system? Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Help with space
On Feb 27, 2014, at 9:21 PM, Dave Chinner wrote: >> >> http://lists.centos.org/pipermail/centos/2011-April/109142.html > > > > No, he didn't fill it with 16TB of data and then have it fail. He > made a new filesystem *larger* than 16TB and tried to mount it: > > | On a CentOS 32-bit backup server with a 17TB LVM logical volume on > | EMC storage. Worked great, until it rolled 16TB. Then it quit > | working. Altogether. /var/log/messages told me that the > | filesystem was too large to be mounted. Had to re-image the VM as > | a 64-bit CentOS, and then re-attached the RDM's to the LUNs > | holding the PV's for the LV, and it mounted instantly, and we > | kept on trucking. > > This just backs up what I told you originally - that XFS has always > refused to mount >16TB filesystems on 32 bit systems. That isn't how I read that at all. It was a 17TB LV, working great (i.e. mounted) until it was filled with 16TB, then it quite working and could not subsequently be mounted until put on a 64-bit kernel. I don't see how it's "working great" if it's not mountable. > >>> I said that it was limited on XFS, not that the limit was a >>> result of a user making a filesystem too large and then finding >>> out it didn't work. Indeed, you can't do that on XFS - mkfs will >>> refuse to run on a block device it can't access the last block >>> on, and the kernel has the same "can I access the last block of >>> the filesystem" sanity checks that are run at mount and growfs >>> time. >> >> Nope. What I reported on the XFS list, I had used mkfs.xfs while >> running 32bit kernel on a 20TB virtual disk. It did not fail to >> make the file system, it failed only to mount it. > > You said no such thing. All you said was you couldn't mount a > filesystem > 16TB - you made no mention of how you made the fs, what > the block device was or any other details. All correct. It wasn't intended as a bug report, it seemed normal. What I reported = the mount failure. VBox 25TB VDI as a single block device, as well as 5x 5TB VDIs in an 20TB linear LV, as well as a 100TB virtual size LV using LVM thinp - all can be formatted with default mkfs.xfs with no complaints. 3.13.4-200.fc20.i686+PAE xfsprogs-3.1.11-2.fc20.i686 > >> It was the same >> booted virtual machine, I created the file system and immediately >> mounted it. If you want the specifics, I'll post on the XFS list >> with versions and reproduce steps. > > Did you check to see whether the block device silently wrapped at > 16TB? There's a real good chance it did - but you might have got > lucky because mkfs.xfs uses direct IO and *maybe* that works > correctly on block devices on 32 bit systems. I wouldn't bet on it, > though, given it's something we don't support and therefore never > test…. I did not check to see if any of the block devices silently wrapped, I don't know how to do that although I have a strace of the mkfs on the 100TB virtual LV here: https://dl.dropboxusercontent.com/u/3253801/mkfsxfs32bit100TBvLV.txt Chris Murphy-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BUG: >16TB Btrfs volumes are mountable on 32 bit kernels
On Thu, Feb 27, 2014 at 04:07:06PM -0500, Josef Bacik wrote: > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA1 > > On 02/27/2014 04:05 PM, Chris Murphy wrote: > > User reports successfully formatting and using an ~18TB Btrfs > > volume on hardware raid5 using i686 kernel for over a year, and > > then suddenly the file system starts behaving weirdly: > > > > https://urldefense.proofpoint.com/v1/url?u=http://www.mail-archive.com/linux-btrfs%40vger.kernel.org/msg31856.html&k=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0A&r=cKCbChRKsMpTX8ybrSkonQ%3D%3D%0A&m=6eUt5RgBggFh930oFrH19iR4z%2BFVzT%2F0%2F4dYPt3g48U%3D%0A&s=5ac126734d7fa1d3238ab09a2ddc021a8dcc8fff7b022560a4d068be2de37c00 > > > > > > > > I think this is due to the kernel page cache address space being > > 16TB limited on 32-bit kernels, as mentioned by Dave Chinner in > > this thread: > > > > https://urldefense.proofpoint.com/v1/url?u=http://oss.sgi.com/pipermail/xfs/2014-February/034588.html&k=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0A&r=cKCbChRKsMpTX8ybrSkonQ%3D%3D%0A&m=6eUt5RgBggFh930oFrH19iR4z%2BFVzT%2F0%2F4dYPt3g48U%3D%0A&s=3e45f9288e6a77bc1a24dded368802c2ab46b812bf59953f74d4ee1d4141f7d2 > > > > So it sounds like it shouldn't be possible to mount a Btrfs volume > > larger than 16TB on 32-bit kernels. This is consistent with ext4 > > and XFS which refuse to mount large file systems. > > > > > > Well that's not good, I'll fix this up. Thanks, Well, don't go assuming there's a problem just because I made an off-hand comment. i.e my comment was simply "maybe it hasn't been tested", and not an assertion that there is a bug or a problem Cheers, Dave. -- Dave Chinner da...@fromorbit.com -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Help with space
On Thu, 27 Feb 2014 12:19:05 -0600 Justin Brown wrote: > I've a 18 tera hardware raid 5 (areca ARC-1170 w/ 8 3 gig drives) in Do you sleep well at night knowing that if one disk fails, you end up with basically a RAID0 of 7x3TB disks? And that if 2nd one encounters unreadable sector during rebuild, you lost your data? RAID5 actually stopped working 5 years ago, apparently you didn't get the memo. :) http://hardware.slashdot.org/story/08/10/21/2126252/why-raid-5-stops-working-in-2009 > need of help. Disk usage (du) shows 13 tera allocated yet strangely > enough df shows approx. 780 gigs are free. It seems, somehow, btrfs > has eaten roughly 4 tera internally. I've run a scrub and a balance > usage=5 with no success, in fact I lost about 20 gigs after the Did you run balance with "-dusage=5" or "-musage=5"? Or both? What is the output of the balance command? > terra:/var/lib/nobody/fs/ubfterra # btrfs fi df . > Data, single: total=17.58TiB, used=17.57TiB > System, DUP: total=8.00MiB, used=1.93MiB > System, single: total=4.00MiB, used=0.00 > Metadata, DUP: total=392.00GiB, used=33.50GiB ^ If you'd use "-musage=5", I think this metadata reserve should have been shrunk, and you'd gain a lot more free space. But then as others mentioned it may be risky to use this FS on 32-bit at all, so I'd suggest trying anything else only after you reboot into a 64-bit kernel. -- With respect, Roman signature.asc Description: PGP signature
Re: Help with space
On Thu, Feb 27, 2014 at 05:27:48PM -0700, Chris Murphy wrote: > > On Feb 27, 2014, at 5:12 PM, Dave Chinner > wrote: > > > On Thu, Feb 27, 2014 at 02:11:19PM -0700, Chris Murphy wrote: > >> > >> On Feb 27, 2014, at 1:49 PM, otakujunct...@gmail.com wrote: > >> > >>> Yes it's an ancient 32 bit machine. There must be a complex > >>> bug involved as the system, when originally mounted, claimed > >>> the correct free space and only as used over time did the > >>> discrepancy between used and free grow. I'm afraid I chose > >>> btrfs because it appeared capable of breaking the 16 tera > >>> limit on a 32 bit system. If this isn't the case then it's > >>> incredible that I've been using this file system for about a > >>> year without difficulty until now. > >> > >> Yep, it's not a good bug. This happened some years ago on XFS > >> too, where people would use the file system for a long time and > >> then at 16TB+1byte written to the volume, kablewy! And then it > >> wasn't usable at all, until put on a 64-bit kernel. > >> > >> http://oss.sgi.com/pipermail/xfs/2014-February/034588.html > > > > Well, no, that's not what I said. > > What are you thinking I said you said? I wasn't quoting or > paraphrasing anything you've said above. I had done a google > search on this early and found some rather old threads where some > people had this experience of making a large file system on a > 32-bit kernel, and only after filling it beyond 16TB did they run > into the problem. Here is one of them: > > http://lists.centos.org/pipermail/centos/2011-April/109142.html No, he didn't fill it with 16TB of data and then have it fail. He made a new filesystem *larger* than 16TB and tried to mount it: | On a CentOS 32-bit backup server with a 17TB LVM logical volume on | EMC storage. Worked great, until it rolled 16TB. Then it quit | working. Altogether. /var/log/messages told me that the | filesystem was too large to be mounted. Had to re-image the VM as | a 64-bit CentOS, and then re-attached the RDM's to the LUNs | holding the PV's for the LV, and it mounted instantly, and we | kept on trucking. This just backs up what I told you originally - that XFS has always refused to mount >16TB filesystems on 32 bit systems. > > I said that it was limited on XFS, not that the limit was a > > result of a user making a filesystem too large and then finding > > out it didn't work. Indeed, you can't do that on XFS - mkfs will > > refuse to run on a block device it can't access the last block > > on, and the kernel has the same "can I access the last block of > > the filesystem" sanity checks that are run at mount and growfs > > time. > > Nope. What I reported on the XFS list, I had used mkfs.xfs while > running 32bit kernel on a 20TB virtual disk. It did not fail to > make the file system, it failed only to mount it. You said no such thing. All you said was you couldn't mount a filesystem > 16TB - you made no mention of how you made the fs, what the block device was or any other details. > It was the same > booted virtual machine, I created the file system and immediately > mounted it. If you want the specifics, I'll post on the XFS list > with versions and reproduce steps. Did you check to see whether the block device silently wrapped at 16TB? There's a real good chance it did - but you might have got lucky because mkfs.xfs uses direct IO and *maybe* that works correctly on block devices on 32 bit systems. I wouldn't bet on it, though, given it's something we don't support and therefore never test Cheers, Dave. -- Dave Chinner da...@fromorbit.com -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v5 11/18] btrfs: Replace fs_info->cache_workers workqueue with btrfs_workqueue.
Replace the fs_info->cache_workers with the newly created btrfs_workqueue. Signed-off-by: Qu Wenruo Tested-by: David Sterba --- Changelog: v1->v2: None v2->v3: - Use the btrfs_workqueue_struct to replace submit_workers. v3->v4: - Use the simplified btrfs_alloc_workqueue API. v4->v5: None --- fs/btrfs/ctree.h | 4 ++-- fs/btrfs/disk-io.c | 10 +- fs/btrfs/extent-tree.c | 6 +++--- fs/btrfs/super.c | 2 +- 4 files changed, 11 insertions(+), 11 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index a7b0bdd..06a64fb 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -1221,7 +1221,7 @@ struct btrfs_caching_control { struct list_head list; struct mutex mutex; wait_queue_head_t wait; - struct btrfs_work work; + struct btrfs_work_struct work; struct btrfs_block_group_cache *block_group; u64 progress; atomic_t count; @@ -1516,7 +1516,7 @@ struct btrfs_fs_info { struct btrfs_workqueue_struct *endio_write_workers; struct btrfs_workqueue_struct *endio_freespace_worker; struct btrfs_workqueue_struct *submit_workers; - struct btrfs_workers caching_workers; + struct btrfs_workqueue_struct *caching_workers; struct btrfs_workers readahead_workers; /* diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 12586b1..391cadf 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -2003,7 +2003,7 @@ static void btrfs_stop_all_workers(struct btrfs_fs_info *fs_info) btrfs_destroy_workqueue(fs_info->endio_freespace_worker); btrfs_destroy_workqueue(fs_info->submit_workers); btrfs_stop_workers(&fs_info->delayed_workers); - btrfs_stop_workers(&fs_info->caching_workers); + btrfs_destroy_workqueue(fs_info->caching_workers); btrfs_stop_workers(&fs_info->readahead_workers); btrfs_destroy_workqueue(fs_info->flush_workers); btrfs_stop_workers(&fs_info->qgroup_rescan_workers); @@ -2481,8 +2481,8 @@ int open_ctree(struct super_block *sb, fs_info->flush_workers = btrfs_alloc_workqueue("flush_delalloc", flags, max_active, 0); - btrfs_init_workers(&fs_info->caching_workers, "cache", - fs_info->thread_pool_size, NULL); + fs_info->caching_workers = + btrfs_alloc_workqueue("cache", flags, max_active, 0); /* * a higher idle thresh on the submit workers makes it much more @@ -2533,7 +2533,6 @@ int open_ctree(struct super_block *sb, ret = btrfs_start_workers(&fs_info->generic_worker); ret |= btrfs_start_workers(&fs_info->fixup_workers); ret |= btrfs_start_workers(&fs_info->delayed_workers); - ret |= btrfs_start_workers(&fs_info->caching_workers); ret |= btrfs_start_workers(&fs_info->readahead_workers); ret |= btrfs_start_workers(&fs_info->qgroup_rescan_workers); if (ret) { @@ -2545,7 +2544,8 @@ int open_ctree(struct super_block *sb, fs_info->endio_workers && fs_info->endio_meta_workers && fs_info->endio_meta_write_workers && fs_info->endio_write_workers && fs_info->endio_raid56_workers && - fs_info->endio_freespace_worker && fs_info->rmw_workers)) { + fs_info->endio_freespace_worker && fs_info->rmw_workers && + fs_info->caching_workers)) { err = -ENOMEM; goto fail_sb_buffer; } diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 32312e0..bb58082 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -378,7 +378,7 @@ static u64 add_new_free_space(struct btrfs_block_group_cache *block_group, return total_added; } -static noinline void caching_thread(struct btrfs_work *work) +static noinline void caching_thread(struct btrfs_work_struct *work) { struct btrfs_block_group_cache *block_group; struct btrfs_fs_info *fs_info; @@ -549,7 +549,7 @@ static int cache_block_group(struct btrfs_block_group_cache *cache, caching_ctl->block_group = cache; caching_ctl->progress = cache->key.objectid; atomic_set(&caching_ctl->count, 1); - caching_ctl->work.func = caching_thread; + btrfs_init_work(&caching_ctl->work, caching_thread, NULL, NULL); spin_lock(&cache->lock); /* @@ -640,7 +640,7 @@ static int cache_block_group(struct btrfs_block_group_cache *cache, btrfs_get_block_group(cache); - btrfs_queue_worker(&fs_info->caching_workers, &caching_ctl->work); + btrfs_queue_work(fs_info->caching_workers, &caching_ctl->work); return ret; } diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c index 919eb36..cd52e20 100644 --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -1320,7 +1320,7 @@ static void btrfs_resize_thread_pool(struct btrfs_fs_info *fs_info, btrfs_workqueue_set_max(fs_info->workers, new_pool_size); btrfs_work
[PATCH v5 08/18] btrfs: Replace fs_info->flush_workers with btrfs_workqueue.
Replace the fs_info->submit_workers with the newly created btrfs_workqueue. Signed-off-by: Qu Wenruo Tested-by: David Sterba --- Changelog: v1->v2: None v2->v3: - Use the btrfs_workqueue_struct to replace submit_workers. v3->v4: - Use the simplified btrfs_alloc_workqueue API. v4->v5: None --- fs/btrfs/ctree.h| 4 ++-- fs/btrfs/disk-io.c | 10 -- fs/btrfs/inode.c| 8 fs/btrfs/ordered-data.c | 13 +++-- fs/btrfs/ordered-data.h | 2 +- 5 files changed, 18 insertions(+), 19 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 9af6804..f1377c9 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -1507,7 +1507,7 @@ struct btrfs_fs_info { struct btrfs_workers generic_worker; struct btrfs_workqueue_struct *workers; struct btrfs_workqueue_struct *delalloc_workers; - struct btrfs_workers flush_workers; + struct btrfs_workqueue_struct *flush_workers; struct btrfs_workers endio_workers; struct btrfs_workers endio_meta_workers; struct btrfs_workers endio_raid56_workers; @@ -3677,7 +3677,7 @@ struct btrfs_delalloc_work { int delay_iput; struct completion completion; struct list_head list; - struct btrfs_work work; + struct btrfs_work_struct work; }; struct btrfs_delalloc_work *btrfs_alloc_delalloc_work(struct inode *inode, diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 8b118ed..772fa39 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -2006,7 +2006,7 @@ static void btrfs_stop_all_workers(struct btrfs_fs_info *fs_info) btrfs_stop_workers(&fs_info->delayed_workers); btrfs_stop_workers(&fs_info->caching_workers); btrfs_stop_workers(&fs_info->readahead_workers); - btrfs_stop_workers(&fs_info->flush_workers); + btrfs_destroy_workqueue(fs_info->flush_workers); btrfs_stop_workers(&fs_info->qgroup_rescan_workers); } @@ -2479,9 +2479,8 @@ int open_ctree(struct super_block *sb, fs_info->delalloc_workers = btrfs_alloc_workqueue("delalloc", flags, max_active, 2); - btrfs_init_workers(&fs_info->flush_workers, "flush_delalloc", - fs_info->thread_pool_size, NULL); - + fs_info->flush_workers = + btrfs_alloc_workqueue("flush_delalloc", flags, max_active, 0); btrfs_init_workers(&fs_info->caching_workers, "cache", fs_info->thread_pool_size, NULL); @@ -2556,14 +2555,13 @@ int open_ctree(struct super_block *sb, ret |= btrfs_start_workers(&fs_info->delayed_workers); ret |= btrfs_start_workers(&fs_info->caching_workers); ret |= btrfs_start_workers(&fs_info->readahead_workers); - ret |= btrfs_start_workers(&fs_info->flush_workers); ret |= btrfs_start_workers(&fs_info->qgroup_rescan_workers); if (ret) { err = -ENOMEM; goto fail_sb_buffer; } if (!(fs_info->workers && fs_info->delalloc_workers && - fs_info->submit_workers)) { + fs_info->submit_workers && fs_info->flush_workers)) { err = -ENOMEM; goto fail_sb_buffer; } diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 01cfe99..7627b60 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -8372,7 +8372,7 @@ out_notrans: return ret; } -static void btrfs_run_delalloc_work(struct btrfs_work *work) +static void btrfs_run_delalloc_work(struct btrfs_work_struct *work) { struct btrfs_delalloc_work *delalloc_work; struct inode *inode; @@ -8410,7 +8410,7 @@ struct btrfs_delalloc_work *btrfs_alloc_delalloc_work(struct inode *inode, work->inode = inode; work->wait = wait; work->delay_iput = delay_iput; - work->work.func = btrfs_run_delalloc_work; + btrfs_init_work(&work->work, btrfs_run_delalloc_work, NULL, NULL); return work; } @@ -8462,8 +8462,8 @@ static int __start_delalloc_inodes(struct btrfs_root *root, int delay_iput) goto out; } list_add_tail(&work->list, &works); - btrfs_queue_worker(&root->fs_info->flush_workers, - &work->work); + btrfs_queue_work(root->fs_info->flush_workers, +&work->work); cond_resched(); spin_lock(&root->delalloc_lock); diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c index 138a7d7..6fa8219 100644 --- a/fs/btrfs/ordered-data.c +++ b/fs/btrfs/ordered-data.c @@ -576,7 +576,7 @@ void btrfs_remove_ordered_extent(struct inode *inode, wake_up(&entry->wait); } -static void btrfs_run_ordered_extent_work(struct btrfs_work *work) +static void btrfs_run_ordered_extent_work(struct btrfs_work_struct *work) { struct btrfs_ordered_extent *ordered; @@ -609,10 +609,11 @@ int btrfs_wait_o
[PATCH v5 05/18] btrfs: Replace fs_info->workers with btrfs_workqueue.
Use the newly created btrfs_workqueue_struct to replace the original fs_info->workers Signed-off-by: Qu Wenruo Tested-by: David Sterba --- Changelog: v1->v2: None v2->v3: None v3->v4: - Use the simplified btrfs_alloc_workqueue API. v4->v5: None --- fs/btrfs/ctree.h | 2 +- fs/btrfs/disk-io.c | 41 + fs/btrfs/super.c | 2 +- 3 files changed, 23 insertions(+), 22 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index dac6653..448df5e 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -1505,7 +1505,7 @@ struct btrfs_fs_info { * two */ struct btrfs_workers generic_worker; - struct btrfs_workers workers; + struct btrfs_workqueue_struct *workers; struct btrfs_workers delalloc_workers; struct btrfs_workers flush_workers; struct btrfs_workers endio_workers; diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index cc1b423..4040a43 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -108,7 +108,7 @@ struct async_submit_bio { * can't tell us where in the file the bio should go */ u64 bio_offset; - struct btrfs_work work; + struct btrfs_work_struct work; int error; }; @@ -738,12 +738,12 @@ int btrfs_bio_wq_end_io(struct btrfs_fs_info *info, struct bio *bio, unsigned long btrfs_async_submit_limit(struct btrfs_fs_info *info) { unsigned long limit = min_t(unsigned long, - info->workers.max_workers, + info->thread_pool_size, info->fs_devices->open_devices); return 256 * limit; } -static void run_one_async_start(struct btrfs_work *work) +static void run_one_async_start(struct btrfs_work_struct *work) { struct async_submit_bio *async; int ret; @@ -756,7 +756,7 @@ static void run_one_async_start(struct btrfs_work *work) async->error = ret; } -static void run_one_async_done(struct btrfs_work *work) +static void run_one_async_done(struct btrfs_work_struct *work) { struct btrfs_fs_info *fs_info; struct async_submit_bio *async; @@ -783,7 +783,7 @@ static void run_one_async_done(struct btrfs_work *work) async->bio_offset); } -static void run_one_async_free(struct btrfs_work *work) +static void run_one_async_free(struct btrfs_work_struct *work) { struct async_submit_bio *async; @@ -811,11 +811,9 @@ int btrfs_wq_submit_bio(struct btrfs_fs_info *fs_info, struct inode *inode, async->submit_bio_start = submit_bio_start; async->submit_bio_done = submit_bio_done; - async->work.func = run_one_async_start; - async->work.ordered_func = run_one_async_done; - async->work.ordered_free = run_one_async_free; + btrfs_init_work(&async->work, run_one_async_start, + run_one_async_done, run_one_async_free); - async->work.flags = 0; async->bio_flags = bio_flags; async->bio_offset = bio_offset; @@ -824,9 +822,9 @@ int btrfs_wq_submit_bio(struct btrfs_fs_info *fs_info, struct inode *inode, atomic_inc(&fs_info->nr_async_submits); if (rw & REQ_SYNC) - btrfs_set_work_high_prio(&async->work); + btrfs_set_work_high_priority(&async->work); - btrfs_queue_worker(&fs_info->workers, &async->work); + btrfs_queue_work(fs_info->workers, &async->work); while (atomic_read(&fs_info->async_submit_draining) && atomic_read(&fs_info->nr_async_submits)) { @@ -1996,7 +1994,7 @@ static void btrfs_stop_all_workers(struct btrfs_fs_info *fs_info) btrfs_stop_workers(&fs_info->generic_worker); btrfs_stop_workers(&fs_info->fixup_workers); btrfs_stop_workers(&fs_info->delalloc_workers); - btrfs_stop_workers(&fs_info->workers); + btrfs_destroy_workqueue(fs_info->workers); btrfs_stop_workers(&fs_info->endio_workers); btrfs_stop_workers(&fs_info->endio_meta_workers); btrfs_stop_workers(&fs_info->endio_raid56_workers); @@ -2100,6 +2098,8 @@ int open_ctree(struct super_block *sb, int err = -EINVAL; int num_backups_tried = 0; int backup_index = 0; + int max_active; + int flags = WQ_MEM_RECLAIM | WQ_FREEZABLE | WQ_UNBOUND; bool create_uuid_tree; bool check_uuid_tree; @@ -2468,12 +2468,13 @@ int open_ctree(struct super_block *sb, goto fail_alloc; } + max_active = fs_info->thread_pool_size; btrfs_init_workers(&fs_info->generic_worker, "genwork", 1, NULL); - btrfs_init_workers(&fs_info->workers, "worker", - fs_info->thread_pool_size, - &fs_info->generic_worker); + fs_info->workers = + btrfs_alloc_workqueue("worker", flags | WQ_HIGHPRI, +
[PATCH v5 09/18] btrfs: Replace fs_info->endio_* workqueue with btrfs_workqueue.
Replace the fs_info->endio_* workqueues with the newly created btrfs_workqueue. Signed-off-by: Qu Wenruo Tested-by: David Sterba --- Changelog: v1->v2: None v2->v3: - Use the btrfs_workqueue_struct to replace submit_workers. v3->v4: - Use the simplified btrfs_alloc_workqueue API. v4->v5: None --- fs/btrfs/ctree.h| 12 +++--- fs/btrfs/disk-io.c | 104 +--- fs/btrfs/inode.c| 20 +- fs/btrfs/ordered-data.h | 2 +- fs/btrfs/super.c| 11 ++--- 5 files changed, 68 insertions(+), 81 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index f1377c9..3db87da 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -1508,13 +1508,13 @@ struct btrfs_fs_info { struct btrfs_workqueue_struct *workers; struct btrfs_workqueue_struct *delalloc_workers; struct btrfs_workqueue_struct *flush_workers; - struct btrfs_workers endio_workers; - struct btrfs_workers endio_meta_workers; - struct btrfs_workers endio_raid56_workers; + struct btrfs_workqueue_struct *endio_workers; + struct btrfs_workqueue_struct *endio_meta_workers; + struct btrfs_workqueue_struct *endio_raid56_workers; struct btrfs_workers rmw_workers; - struct btrfs_workers endio_meta_write_workers; - struct btrfs_workers endio_write_workers; - struct btrfs_workers endio_freespace_worker; + struct btrfs_workqueue_struct *endio_meta_write_workers; + struct btrfs_workqueue_struct *endio_write_workers; + struct btrfs_workqueue_struct *endio_freespace_worker; struct btrfs_workqueue_struct *submit_workers; struct btrfs_workers caching_workers; struct btrfs_workers readahead_workers; diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 772fa39..28b303c 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -55,7 +55,7 @@ #endif static struct extent_io_ops btree_extent_io_ops; -static void end_workqueue_fn(struct btrfs_work *work); +static void end_workqueue_fn(struct btrfs_work_struct *work); static void free_fs_root(struct btrfs_root *root); static int btrfs_check_super_valid(struct btrfs_fs_info *fs_info, int read_only); @@ -86,7 +86,7 @@ struct end_io_wq { int error; int metadata; struct list_head list; - struct btrfs_work work; + struct btrfs_work_struct work; }; /* @@ -678,32 +678,31 @@ static void end_workqueue_bio(struct bio *bio, int err) fs_info = end_io_wq->info; end_io_wq->error = err; - end_io_wq->work.func = end_workqueue_fn; - end_io_wq->work.flags = 0; + btrfs_init_work(&end_io_wq->work, end_workqueue_fn, NULL, NULL); if (bio->bi_rw & REQ_WRITE) { if (end_io_wq->metadata == BTRFS_WQ_ENDIO_METADATA) - btrfs_queue_worker(&fs_info->endio_meta_write_workers, - &end_io_wq->work); + btrfs_queue_work(fs_info->endio_meta_write_workers, +&end_io_wq->work); else if (end_io_wq->metadata == BTRFS_WQ_ENDIO_FREE_SPACE) - btrfs_queue_worker(&fs_info->endio_freespace_worker, - &end_io_wq->work); + btrfs_queue_work(fs_info->endio_freespace_worker, +&end_io_wq->work); else if (end_io_wq->metadata == BTRFS_WQ_ENDIO_RAID56) - btrfs_queue_worker(&fs_info->endio_raid56_workers, - &end_io_wq->work); + btrfs_queue_work(fs_info->endio_raid56_workers, +&end_io_wq->work); else - btrfs_queue_worker(&fs_info->endio_write_workers, - &end_io_wq->work); + btrfs_queue_work(fs_info->endio_write_workers, +&end_io_wq->work); } else { if (end_io_wq->metadata == BTRFS_WQ_ENDIO_RAID56) - btrfs_queue_worker(&fs_info->endio_raid56_workers, - &end_io_wq->work); + btrfs_queue_work(fs_info->endio_raid56_workers, +&end_io_wq->work); else if (end_io_wq->metadata) - btrfs_queue_worker(&fs_info->endio_meta_workers, - &end_io_wq->work); + btrfs_queue_work(fs_info->endio_meta_workers, +&end_io_wq->work); else - btrfs_queue_worker(&fs_info->endio_workers, - &end_io_wq->work); + btrfs_queue_work(fs_info->
[PATCH v5 04/18] btrfs: Add threshold workqueue based on kernel workqueue
The original btrfs_workers has thresholding functions to dynamically create or destroy kthreads. Though there is no such function in kernel workqueue because the worker is not created manually, we can still use the workqueue_set_max_active to simulated the behavior, mainly to achieve a better HDD performance by setting a high threshold on submit_workers. (Sadly, no resource can be saved) So in this patch, extra workqueue pending counters are introduced to dynamically change the max active of each btrfs_workqueue_struct, hoping to restore the behavior of the original thresholding function. Also, workqueue_set_max_active use a mutex to protect workqueue_struct, which is not meant to be called too frequently, so a new interval mechanism is applied, that will only call workqueue_set_max_active after a count of work is queued. Hoping to balance both the random and sequence performance on HDD. Signed-off-by: Qu Wenruo Tested-by: David Sterba --- Changelog: v2->v3: - Add thresholding mechanism to simulate the old thresholding mechanism. - Will not enable thresholding when thresh is set to small value. v3->v4: None v4->v5: None --- fs/btrfs/async-thread.c | 107 fs/btrfs/async-thread.h | 3 +- 2 files changed, 101 insertions(+), 9 deletions(-) diff --git a/fs/btrfs/async-thread.c b/fs/btrfs/async-thread.c index 193c849..977bce2 100644 --- a/fs/btrfs/async-thread.c +++ b/fs/btrfs/async-thread.c @@ -30,6 +30,9 @@ #define WORK_ORDER_DONE_BIT 2 #define WORK_HIGH_PRIO_BIT 3 +#define NO_THRESHOLD (-1) +#define DFT_THRESHOLD (32) + /* * container for the kthread task pointer and the list of pending work * One of these is allocated per thread. @@ -737,6 +740,14 @@ struct __btrfs_workqueue_struct { /* Spinlock for ordered_list */ spinlock_t list_lock; + + /* Thresholding related variants */ + atomic_t pending; + int max_active; + int current_max; + int thresh; + unsigned int count; + spinlock_t thres_lock; }; struct btrfs_workqueue_struct { @@ -745,19 +756,34 @@ struct btrfs_workqueue_struct { }; static inline struct __btrfs_workqueue_struct -*__btrfs_alloc_workqueue(char *name, int flags, int max_active) +*__btrfs_alloc_workqueue(char *name, int flags, int max_active, int thresh) { struct __btrfs_workqueue_struct *ret = kzalloc(sizeof(*ret), GFP_NOFS); if (unlikely(!ret)) return NULL; + ret->max_active = max_active; + atomic_set(&ret->pending, 0); + if (thresh == 0) + thresh = DFT_THRESHOLD; + /* For low threshold, disabling threshold is a better choice */ + if (thresh < DFT_THRESHOLD) { + ret->current_max = max_active; + ret->thresh = NO_THRESHOLD; + } else { + ret->current_max = 1; + ret->thresh = thresh; + } + if (flags & WQ_HIGHPRI) ret->normal_wq = alloc_workqueue("%s-%s-high", flags, -max_active, "btrfs", name); +ret->max_active, +"btrfs", name); else ret->normal_wq = alloc_workqueue("%s-%s", flags, -max_active, "btrfs", name); +ret->max_active, "btrfs", +name); if (unlikely(!ret->normal_wq)) { kfree(ret); return NULL; @@ -765,6 +791,7 @@ static inline struct __btrfs_workqueue_struct INIT_LIST_HEAD(&ret->ordered_list); spin_lock_init(&ret->list_lock); + spin_lock_init(&ret->thres_lock); return ret; } @@ -773,7 +800,8 @@ __btrfs_destroy_workqueue(struct __btrfs_workqueue_struct *wq); struct btrfs_workqueue_struct *btrfs_alloc_workqueue(char *name, int flags, -int max_active) +int max_active, +int thresh) { struct btrfs_workqueue_struct *ret = kzalloc(sizeof(*ret), GFP_NOFS); @@ -781,14 +809,15 @@ struct btrfs_workqueue_struct *btrfs_alloc_workqueue(char *name, return NULL; ret->normal = __btrfs_alloc_workqueue(name, flags & ~WQ_HIGHPRI, - max_active); + max_active, thresh); if (unlikely(!ret->normal)) { kfree(ret); return NULL; } if (flags & WQ_HIGHPRI) { - ret->high = __btrfs_alloc_workqueue(name, flags, max_active); + ret->high = __btrfs_alloc_workqueue(name, flags, max_active, +
[PATCH v5 13/18] btrfs: Replace fs_info->fixup_workers workqueue with btrfs_workqueue.
Replace the fs_info->fixup_workers with the newly created btrfs_workqueue. Signed-off-by: Qu Wenruo Tested-by: David Sterba --- Changelog: v1->v2: None v2->v3: - Use the btrfs_workqueue_struct to replace submit_workers. v3->v4: - Use the simplified btrfs_alloc_workqueue API. v4->v5: None --- fs/btrfs/ctree.h | 2 +- fs/btrfs/disk-io.c | 10 +- fs/btrfs/inode.c | 8 fs/btrfs/super.c | 1 - 4 files changed, 10 insertions(+), 11 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 3d6f490..95a1e66 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -1524,7 +1524,7 @@ struct btrfs_fs_info { * the cow mechanism and make them safe to write. It happens * for the sys_munmap function call path */ - struct btrfs_workers fixup_workers; + struct btrfs_workqueue_struct *fixup_workers; struct btrfs_workers delayed_workers; struct task_struct *transaction_kthread; struct task_struct *cleaner_kthread; diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index ca6d0cf..4da34df 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -1991,7 +1991,7 @@ static noinline int next_root_backup(struct btrfs_fs_info *info, static void btrfs_stop_all_workers(struct btrfs_fs_info *fs_info) { btrfs_stop_workers(&fs_info->generic_worker); - btrfs_stop_workers(&fs_info->fixup_workers); + btrfs_destroy_workqueue(fs_info->fixup_workers); btrfs_destroy_workqueue(fs_info->delalloc_workers); btrfs_destroy_workqueue(fs_info->workers); btrfs_destroy_workqueue(fs_info->endio_workers); @@ -2494,8 +2494,8 @@ int open_ctree(struct super_block *sb, min_t(u64, fs_devices->num_devices, max_active), 64); - btrfs_init_workers(&fs_info->fixup_workers, "fixup", 1, - &fs_info->generic_worker); + fs_info->fixup_workers = + btrfs_alloc_workqueue("fixup", flags, 1, 0); /* * endios are largely parallel and should have a very @@ -2528,7 +2528,6 @@ int open_ctree(struct super_block *sb, * return -ENOMEM if any of these fail. */ ret = btrfs_start_workers(&fs_info->generic_worker); - ret |= btrfs_start_workers(&fs_info->fixup_workers); ret |= btrfs_start_workers(&fs_info->delayed_workers); ret |= btrfs_start_workers(&fs_info->qgroup_rescan_workers); if (ret) { @@ -2541,7 +2540,8 @@ int open_ctree(struct super_block *sb, fs_info->endio_meta_write_workers && fs_info->endio_write_workers && fs_info->endio_raid56_workers && fs_info->endio_freespace_worker && fs_info->rmw_workers && - fs_info->caching_workers && fs_info->readahead_workers)) { + fs_info->caching_workers && fs_info->readahead_workers && + fs_info->fixup_workers)) { err = -ENOMEM; goto fail_sb_buffer; } diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 4023c90..81395d6 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -1748,10 +1748,10 @@ int btrfs_set_extent_delalloc(struct inode *inode, u64 start, u64 end, /* see btrfs_writepage_start_hook for details on why this is required */ struct btrfs_writepage_fixup { struct page *page; - struct btrfs_work work; + struct btrfs_work_struct work; }; -static void btrfs_writepage_fixup_worker(struct btrfs_work *work) +static void btrfs_writepage_fixup_worker(struct btrfs_work_struct *work) { struct btrfs_writepage_fixup *fixup; struct btrfs_ordered_extent *ordered; @@ -1842,9 +1842,9 @@ static int btrfs_writepage_start_hook(struct page *page, u64 start, u64 end) SetPageChecked(page); page_cache_get(page); - fixup->work.func = btrfs_writepage_fixup_worker; + btrfs_init_work(&fixup->work, btrfs_writepage_fixup_worker, NULL, NULL); fixup->page = page; - btrfs_queue_worker(&root->fs_info->fixup_workers, &fixup->work); + btrfs_queue_work(root->fs_info->fixup_workers, &fixup->work); return -EBUSY; } diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c index 56c5533..3614053 100644 --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -1321,7 +1321,6 @@ static void btrfs_resize_thread_pool(struct btrfs_fs_info *fs_info, btrfs_workqueue_set_max(fs_info->delalloc_workers, new_pool_size); btrfs_workqueue_set_max(fs_info->submit_workers, new_pool_size); btrfs_workqueue_set_max(fs_info->caching_workers, new_pool_size); - btrfs_set_max_workers(&fs_info->fixup_workers, new_pool_size); btrfs_workqueue_set_max(fs_info->endio_workers, new_pool_size); btrfs_workqueue_set_max(fs_info->endio_meta_workers, new_pool_size); btrfs_workqueue_set_max(fs_info->endio_meta_write_workers, -- 1.9.0 -- To unsubscribe from this list: send the lin
[PATCH v5 06/18] btrfs: Replace fs_info->delalloc_workers with btrfs_workqueue
Much like the fs_info->workers, replace the fs_info->delalloc_workers use the same btrfs_workqueue. Signed-off-by: Qu Wenruo Tested-by: David Sterba --- Changelog: v1->v2: None v2->v3: None v3->v4: - Use the simplified btrfs_alloc_workqueue API. v4->v5: None --- fs/btrfs/ctree.h | 2 +- fs/btrfs/disk-io.c | 12 fs/btrfs/inode.c | 18 -- fs/btrfs/super.c | 2 +- 4 files changed, 14 insertions(+), 20 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 448df5e..4e11f4b 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -1506,7 +1506,7 @@ struct btrfs_fs_info { */ struct btrfs_workers generic_worker; struct btrfs_workqueue_struct *workers; - struct btrfs_workers delalloc_workers; + struct btrfs_workqueue_struct *delalloc_workers; struct btrfs_workers flush_workers; struct btrfs_workers endio_workers; struct btrfs_workers endio_meta_workers; diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 4040a43..f97bd17 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -1993,7 +1993,7 @@ static void btrfs_stop_all_workers(struct btrfs_fs_info *fs_info) { btrfs_stop_workers(&fs_info->generic_worker); btrfs_stop_workers(&fs_info->fixup_workers); - btrfs_stop_workers(&fs_info->delalloc_workers); + btrfs_destroy_workqueue(fs_info->delalloc_workers); btrfs_destroy_workqueue(fs_info->workers); btrfs_stop_workers(&fs_info->endio_workers); btrfs_stop_workers(&fs_info->endio_meta_workers); @@ -2476,8 +2476,8 @@ int open_ctree(struct super_block *sb, btrfs_alloc_workqueue("worker", flags | WQ_HIGHPRI, max_active, 16); - btrfs_init_workers(&fs_info->delalloc_workers, "delalloc", - fs_info->thread_pool_size, NULL); + fs_info->delalloc_workers = + btrfs_alloc_workqueue("delalloc", flags, max_active, 2); btrfs_init_workers(&fs_info->flush_workers, "flush_delalloc", fs_info->thread_pool_size, NULL); @@ -2495,9 +2495,6 @@ int open_ctree(struct super_block *sb, */ fs_info->submit_workers.idle_thresh = 64; - fs_info->delalloc_workers.idle_thresh = 2; - fs_info->delalloc_workers.ordered = 1; - btrfs_init_workers(&fs_info->fixup_workers, "fixup", 1, &fs_info->generic_worker); btrfs_init_workers(&fs_info->endio_workers, "endio", @@ -2548,7 +2545,6 @@ int open_ctree(struct super_block *sb, */ ret = btrfs_start_workers(&fs_info->generic_worker); ret |= btrfs_start_workers(&fs_info->submit_workers); - ret |= btrfs_start_workers(&fs_info->delalloc_workers); ret |= btrfs_start_workers(&fs_info->fixup_workers); ret |= btrfs_start_workers(&fs_info->endio_workers); ret |= btrfs_start_workers(&fs_info->endio_meta_workers); @@ -2566,7 +2562,7 @@ int open_ctree(struct super_block *sb, err = -ENOMEM; goto fail_sb_buffer; } - if (!(fs_info->workers)) { + if (!(fs_info->workers && fs_info->delalloc_workers)) { err = -ENOMEM; goto fail_sb_buffer; } diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 197edee..01cfe99 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -324,7 +324,7 @@ struct async_cow { u64 start; u64 end; struct list_head extents; - struct btrfs_work work; + struct btrfs_work_struct work; }; static noinline int add_async_extent(struct async_cow *cow, @@ -1000,7 +1000,7 @@ out_unlock: /* * work queue call back to started compression on a file and pages */ -static noinline void async_cow_start(struct btrfs_work *work) +static noinline void async_cow_start(struct btrfs_work_struct *work) { struct async_cow *async_cow; int num_added = 0; @@ -1018,7 +1018,7 @@ static noinline void async_cow_start(struct btrfs_work *work) /* * work queue call back to submit previously compressed pages */ -static noinline void async_cow_submit(struct btrfs_work *work) +static noinline void async_cow_submit(struct btrfs_work_struct *work) { struct async_cow *async_cow; struct btrfs_root *root; @@ -1039,7 +1039,7 @@ static noinline void async_cow_submit(struct btrfs_work *work) submit_compressed_extents(async_cow->inode, async_cow); } -static noinline void async_cow_free(struct btrfs_work *work) +static noinline void async_cow_free(struct btrfs_work_struct *work) { struct async_cow *async_cow; async_cow = container_of(work, struct async_cow, work); @@ -1076,17 +1076,15 @@ static int cow_file_range_async(struct inode *inode, struct page *locked_page, async_cow->end = cur_end; INIT_LIST_HEAD(&async_cow->extents); - async_cow->work.func
[PATCH v5 07/18] btrfs: Replace fs_info->submit_workers with btrfs_workqueue.
Much like the fs_info->workers, replace the fs_info->submit_workers use the same btrfs_workqueue. Signed-off-by: Qu Wenruo Tested-by: David Sterba --- Changelog: v1->v2: None v2->v3: None v3->v4: - Use the simplified btrfs_alloc_workqueue API. v4->v5: None --- fs/btrfs/ctree.h | 2 +- fs/btrfs/disk-io.c | 17 + fs/btrfs/super.c | 2 +- fs/btrfs/volumes.c | 11 ++- fs/btrfs/volumes.h | 2 +- 5 files changed, 18 insertions(+), 16 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 4e11f4b..9af6804 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -1515,7 +1515,7 @@ struct btrfs_fs_info { struct btrfs_workers endio_meta_write_workers; struct btrfs_workers endio_write_workers; struct btrfs_workers endio_freespace_worker; - struct btrfs_workers submit_workers; + struct btrfs_workqueue_struct *submit_workers; struct btrfs_workers caching_workers; struct btrfs_workers readahead_workers; diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index f97bd17..8b118ed 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -2002,7 +2002,7 @@ static void btrfs_stop_all_workers(struct btrfs_fs_info *fs_info) btrfs_stop_workers(&fs_info->endio_meta_write_workers); btrfs_stop_workers(&fs_info->endio_write_workers); btrfs_stop_workers(&fs_info->endio_freespace_worker); - btrfs_stop_workers(&fs_info->submit_workers); + btrfs_destroy_workqueue(fs_info->submit_workers); btrfs_stop_workers(&fs_info->delayed_workers); btrfs_stop_workers(&fs_info->caching_workers); btrfs_stop_workers(&fs_info->readahead_workers); @@ -2482,18 +2482,19 @@ int open_ctree(struct super_block *sb, btrfs_init_workers(&fs_info->flush_workers, "flush_delalloc", fs_info->thread_pool_size, NULL); - btrfs_init_workers(&fs_info->submit_workers, "submit", - min_t(u64, fs_devices->num_devices, - fs_info->thread_pool_size), NULL); btrfs_init_workers(&fs_info->caching_workers, "cache", fs_info->thread_pool_size, NULL); - /* a higher idle thresh on the submit workers makes it much more + /* +* a higher idle thresh on the submit workers makes it much more * likely that bios will be send down in a sane order to the * devices */ - fs_info->submit_workers.idle_thresh = 64; + fs_info->submit_workers = + btrfs_alloc_workqueue("submit", flags, + min_t(u64, fs_devices->num_devices, + max_active), 64); btrfs_init_workers(&fs_info->fixup_workers, "fixup", 1, &fs_info->generic_worker); @@ -2544,7 +2545,6 @@ int open_ctree(struct super_block *sb, * return -ENOMEM if any of these fail. */ ret = btrfs_start_workers(&fs_info->generic_worker); - ret |= btrfs_start_workers(&fs_info->submit_workers); ret |= btrfs_start_workers(&fs_info->fixup_workers); ret |= btrfs_start_workers(&fs_info->endio_workers); ret |= btrfs_start_workers(&fs_info->endio_meta_workers); @@ -2562,7 +2562,8 @@ int open_ctree(struct super_block *sb, err = -ENOMEM; goto fail_sb_buffer; } - if (!(fs_info->workers && fs_info->delalloc_workers)) { + if (!(fs_info->workers && fs_info->delalloc_workers && + fs_info->submit_workers)) { err = -ENOMEM; goto fail_sb_buffer; } diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c index e164d13..2d69b6d 100644 --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -1319,7 +1319,7 @@ static void btrfs_resize_thread_pool(struct btrfs_fs_info *fs_info, btrfs_set_max_workers(&fs_info->generic_worker, new_pool_size); btrfs_workqueue_set_max(fs_info->workers, new_pool_size); btrfs_workqueue_set_max(fs_info->delalloc_workers, new_pool_size); - btrfs_set_max_workers(&fs_info->submit_workers, new_pool_size); + btrfs_workqueue_set_max(fs_info->submit_workers, new_pool_size); btrfs_set_max_workers(&fs_info->caching_workers, new_pool_size); btrfs_set_max_workers(&fs_info->fixup_workers, new_pool_size); btrfs_set_max_workers(&fs_info->endio_workers, new_pool_size); diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 82a63b1..0066cff 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -415,7 +415,8 @@ loop_lock: device->running_pending = 1; spin_unlock(&device->io_lock); - btrfs_requeue_work(&device->work); + btrfs_queue_work(fs_info->submit_workers, +&device->work); goto done; } /* u
[PATCH v5 14/18] btrfs: Replace fs_info->delayed_workers workqueue with btrfs_workqueue.
Replace the fs_info->delayed_workers with the newly created btrfs_workqueue. Signed-off-by: Qu Wenruo Tested-by: David Sterba --- Changelog: v1->v2: None v2->v3: - Use the btrfs_workqueue_struct to replace submit_workers. v3->v4: - Use the simplified btrfs_alloc_workqueue API. v4->v5: None --- fs/btrfs/ctree.h | 2 +- fs/btrfs/delayed-inode.c | 10 +- fs/btrfs/disk-io.c | 10 -- fs/btrfs/super.c | 2 +- 4 files changed, 11 insertions(+), 13 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 95a1e66..07b563d 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -1525,7 +1525,7 @@ struct btrfs_fs_info { * for the sys_munmap function call path */ struct btrfs_workqueue_struct *fixup_workers; - struct btrfs_workers delayed_workers; + struct btrfs_workqueue_struct *delayed_workers; struct task_struct *transaction_kthread; struct task_struct *cleaner_kthread; int thread_pool_size; diff --git a/fs/btrfs/delayed-inode.c b/fs/btrfs/delayed-inode.c index 451b00c..76e85d6 100644 --- a/fs/btrfs/delayed-inode.c +++ b/fs/btrfs/delayed-inode.c @@ -1318,10 +1318,10 @@ void btrfs_remove_delayed_node(struct inode *inode) struct btrfs_async_delayed_work { struct btrfs_delayed_root *delayed_root; int nr; - struct btrfs_work work; + struct btrfs_work_struct work; }; -static void btrfs_async_run_delayed_root(struct btrfs_work *work) +static void btrfs_async_run_delayed_root(struct btrfs_work_struct *work) { struct btrfs_async_delayed_work *async_work; struct btrfs_delayed_root *delayed_root; @@ -1392,11 +1392,11 @@ static int btrfs_wq_run_delayed_node(struct btrfs_delayed_root *delayed_root, return -ENOMEM; async_work->delayed_root = delayed_root; - async_work->work.func = btrfs_async_run_delayed_root; - async_work->work.flags = 0; + btrfs_init_work(&async_work->work, btrfs_async_run_delayed_root, + NULL, NULL); async_work->nr = nr; - btrfs_queue_worker(&root->fs_info->delayed_workers, &async_work->work); + btrfs_queue_work(root->fs_info->delayed_workers, &async_work->work); return 0; } diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 4da34df..ac8e9c2 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -2002,7 +2002,7 @@ static void btrfs_stop_all_workers(struct btrfs_fs_info *fs_info) btrfs_destroy_workqueue(fs_info->endio_write_workers); btrfs_destroy_workqueue(fs_info->endio_freespace_worker); btrfs_destroy_workqueue(fs_info->submit_workers); - btrfs_stop_workers(&fs_info->delayed_workers); + btrfs_destroy_workqueue(fs_info->delayed_workers); btrfs_destroy_workqueue(fs_info->caching_workers); btrfs_destroy_workqueue(fs_info->readahead_workers); btrfs_destroy_workqueue(fs_info->flush_workers); @@ -2515,9 +2515,8 @@ int open_ctree(struct super_block *sb, btrfs_alloc_workqueue("endio-write", flags, max_active, 2); fs_info->endio_freespace_worker = btrfs_alloc_workqueue("freespace-write", flags, max_active, 0); - btrfs_init_workers(&fs_info->delayed_workers, "delayed-meta", - fs_info->thread_pool_size, - &fs_info->generic_worker); + fs_info->delayed_workers = + btrfs_alloc_workqueue("delayed-meta", flags, max_active, 0); fs_info->readahead_workers = btrfs_alloc_workqueue("readahead", flags, max_active, 2); btrfs_init_workers(&fs_info->qgroup_rescan_workers, "qgroup-rescan", 1, @@ -2528,7 +2527,6 @@ int open_ctree(struct super_block *sb, * return -ENOMEM if any of these fail. */ ret = btrfs_start_workers(&fs_info->generic_worker); - ret |= btrfs_start_workers(&fs_info->delayed_workers); ret |= btrfs_start_workers(&fs_info->qgroup_rescan_workers); if (ret) { err = -ENOMEM; @@ -2541,7 +2539,7 @@ int open_ctree(struct super_block *sb, fs_info->endio_write_workers && fs_info->endio_raid56_workers && fs_info->endio_freespace_worker && fs_info->rmw_workers && fs_info->caching_workers && fs_info->readahead_workers && - fs_info->fixup_workers)) { + fs_info->fixup_workers && fs_info->delayed_workers)) { err = -ENOMEM; goto fail_sb_buffer; } diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c index 3614053..5a355c4 100644 --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -1327,7 +1327,7 @@ static void btrfs_resize_thread_pool(struct btrfs_fs_info *fs_info, new_pool_size); btrfs_workqueue_set_max(fs_info->endio_write_workers, new_pool_size); btrfs_workqueue_set_max(fs_info->endio_freespace_worker, new_pool_size); - btrfs_se
[PATCH v5 18/18] btrfs: Cleanup the "_struct" suffix in btrfs_workequeue
Since the "_struct" suffix is mainly used for distinguish the differnt btrfs_work between the original and the newly created one, there is no need using the suffix since all btrfs_workers are changed into btrfs_workqueue. Also this patch fixed some codes whose code style is changed due to the too long "_struct" suffix. Signed-off-by: Qu Wenruo Tested-by: David Sterba --- Changelog: v3->v4: - Remove the "_struct" suffix. v4->v5: None --- fs/btrfs/async-thread.c | 66 fs/btrfs/async-thread.h | 34 - fs/btrfs/ctree.h | 44 fs/btrfs/delayed-inode.c | 4 +-- fs/btrfs/disk-io.c | 14 +- fs/btrfs/extent-tree.c | 2 +- fs/btrfs/inode.c | 18 ++--- fs/btrfs/ordered-data.c | 2 +- fs/btrfs/ordered-data.h | 4 +-- fs/btrfs/qgroup.c| 2 +- fs/btrfs/raid56.c| 14 +- fs/btrfs/reada.c | 5 ++-- fs/btrfs/scrub.c | 23 - fs/btrfs/volumes.c | 2 +- fs/btrfs/volumes.h | 2 +- 15 files changed, 116 insertions(+), 120 deletions(-) diff --git a/fs/btrfs/async-thread.c b/fs/btrfs/async-thread.c index 2a5f383..a709585 100644 --- a/fs/btrfs/async-thread.c +++ b/fs/btrfs/async-thread.c @@ -32,7 +32,7 @@ #define NO_THRESHOLD (-1) #define DFT_THRESHOLD (32) -struct __btrfs_workqueue_struct { +struct __btrfs_workqueue { struct workqueue_struct *normal_wq; /* List head pointing to ordered work list */ struct list_head ordered_list; @@ -49,15 +49,15 @@ struct __btrfs_workqueue_struct { spinlock_t thres_lock; }; -struct btrfs_workqueue_struct { - struct __btrfs_workqueue_struct *normal; - struct __btrfs_workqueue_struct *high; +struct btrfs_workqueue { + struct __btrfs_workqueue *normal; + struct __btrfs_workqueue *high; }; -static inline struct __btrfs_workqueue_struct +static inline struct __btrfs_workqueue *__btrfs_alloc_workqueue(char *name, int flags, int max_active, int thresh) { - struct __btrfs_workqueue_struct *ret = kzalloc(sizeof(*ret), GFP_NOFS); + struct __btrfs_workqueue *ret = kzalloc(sizeof(*ret), GFP_NOFS); if (unlikely(!ret)) return NULL; @@ -95,14 +95,14 @@ static inline struct __btrfs_workqueue_struct } static inline void -__btrfs_destroy_workqueue(struct __btrfs_workqueue_struct *wq); +__btrfs_destroy_workqueue(struct __btrfs_workqueue *wq); -struct btrfs_workqueue_struct *btrfs_alloc_workqueue(char *name, -int flags, -int max_active, -int thresh) +struct btrfs_workqueue *btrfs_alloc_workqueue(char *name, + int flags, + int max_active, + int thresh) { - struct btrfs_workqueue_struct *ret = kzalloc(sizeof(*ret), GFP_NOFS); + struct btrfs_workqueue *ret = kzalloc(sizeof(*ret), GFP_NOFS); if (unlikely(!ret)) return NULL; @@ -131,7 +131,7 @@ struct btrfs_workqueue_struct *btrfs_alloc_workqueue(char *name, * This hook WILL be called in IRQ handler context, * so workqueue_set_max_active MUST NOT be called in this hook */ -static inline void thresh_queue_hook(struct __btrfs_workqueue_struct *wq) +static inline void thresh_queue_hook(struct __btrfs_workqueue *wq) { if (wq->thresh == NO_THRESHOLD) return; @@ -143,7 +143,7 @@ static inline void thresh_queue_hook(struct __btrfs_workqueue_struct *wq) * This hook is called in kthread content. * So workqueue_set_max_active is called here. */ -static inline void thresh_exec_hook(struct __btrfs_workqueue_struct *wq) +static inline void thresh_exec_hook(struct __btrfs_workqueue *wq) { int new_max_active; long pending; @@ -186,10 +186,10 @@ out: } } -static void run_ordered_work(struct __btrfs_workqueue_struct *wq) +static void run_ordered_work(struct __btrfs_workqueue *wq) { struct list_head *list = &wq->ordered_list; - struct btrfs_work_struct *work; + struct btrfs_work *work; spinlock_t *lock = &wq->list_lock; unsigned long flags; @@ -197,7 +197,7 @@ static void run_ordered_work(struct __btrfs_workqueue_struct *wq) spin_lock_irqsave(lock, flags); if (list_empty(list)) break; - work = list_entry(list->next, struct btrfs_work_struct, + work = list_entry(list->next, struct btrfs_work, ordered_list); if (!test_bit(WORK_DONE_BIT, &work->flags)) break; @@ -229,11 +229,11 @@ static void run_ordered_work(struct __btrfs_workqueue_struct *wq) static void normal_work
[PATCH v5 17/18] btrfs: Cleanup the old btrfs_worker.
Since all the btrfs_worker is replaced with the newly created btrfs_workqueue, the old codes can be easily remove. Signed-off-by: Quwenruo Tested-by: David Sterba --- Changelog: v1->v2: None v2->v3: - Reuse the old async-thred.[ch] files. v3->v4: - Reuse the old WORK_* bits. v4->v5: None --- fs/btrfs/async-thread.c | 707 +--- fs/btrfs/async-thread.h | 100 --- fs/btrfs/ctree.h| 1 - fs/btrfs/disk-io.c | 12 - fs/btrfs/super.c| 8 - 5 files changed, 3 insertions(+), 825 deletions(-) diff --git a/fs/btrfs/async-thread.c b/fs/btrfs/async-thread.c index 977bce2..2a5f383 100644 --- a/fs/btrfs/async-thread.c +++ b/fs/btrfs/async-thread.c @@ -25,714 +25,13 @@ #include #include "async-thread.h" -#define WORK_QUEUED_BIT 0 -#define WORK_DONE_BIT 1 -#define WORK_ORDER_DONE_BIT 2 -#define WORK_HIGH_PRIO_BIT 3 +#define WORK_DONE_BIT 0 +#define WORK_ORDER_DONE_BIT 1 +#define WORK_HIGH_PRIO_BIT 2 #define NO_THRESHOLD (-1) #define DFT_THRESHOLD (32) -/* - * container for the kthread task pointer and the list of pending work - * One of these is allocated per thread. - */ -struct btrfs_worker_thread { - /* pool we belong to */ - struct btrfs_workers *workers; - - /* list of struct btrfs_work that are waiting for service */ - struct list_head pending; - struct list_head prio_pending; - - /* list of worker threads from struct btrfs_workers */ - struct list_head worker_list; - - /* kthread */ - struct task_struct *task; - - /* number of things on the pending list */ - atomic_t num_pending; - - /* reference counter for this struct */ - atomic_t refs; - - unsigned long sequence; - - /* protects the pending list. */ - spinlock_t lock; - - /* set to non-zero when this thread is already awake and kicking */ - int working; - - /* are we currently idle */ - int idle; -}; - -static int __btrfs_start_workers(struct btrfs_workers *workers); - -/* - * btrfs_start_workers uses kthread_run, which can block waiting for memory - * for a very long time. It will actually throttle on page writeback, - * and so it may not make progress until after our btrfs worker threads - * process all of the pending work structs in their queue - * - * This means we can't use btrfs_start_workers from inside a btrfs worker - * thread that is used as part of cleaning dirty memory, which pretty much - * involves all of the worker threads. - * - * Instead we have a helper queue who never has more than one thread - * where we scheduler thread start operations. This worker_start struct - * is used to contain the work and hold a pointer to the queue that needs - * another worker. - */ -struct worker_start { - struct btrfs_work work; - struct btrfs_workers *queue; -}; - -static void start_new_worker_func(struct btrfs_work *work) -{ - struct worker_start *start; - start = container_of(work, struct worker_start, work); - __btrfs_start_workers(start->queue); - kfree(start); -} - -/* - * helper function to move a thread onto the idle list after it - * has finished some requests. - */ -static void check_idle_worker(struct btrfs_worker_thread *worker) -{ - if (!worker->idle && atomic_read(&worker->num_pending) < - worker->workers->idle_thresh / 2) { - unsigned long flags; - spin_lock_irqsave(&worker->workers->lock, flags); - worker->idle = 1; - - /* the list may be empty if the worker is just starting */ - if (!list_empty(&worker->worker_list) && - !worker->workers->stopping) { - list_move(&worker->worker_list, -&worker->workers->idle_list); - } - spin_unlock_irqrestore(&worker->workers->lock, flags); - } -} - -/* - * helper function to move a thread off the idle list after new - * pending work is added. - */ -static void check_busy_worker(struct btrfs_worker_thread *worker) -{ - if (worker->idle && atomic_read(&worker->num_pending) >= - worker->workers->idle_thresh) { - unsigned long flags; - spin_lock_irqsave(&worker->workers->lock, flags); - worker->idle = 0; - - if (!list_empty(&worker->worker_list) && - !worker->workers->stopping) { - list_move_tail(&worker->worker_list, - &worker->workers->worker_list); - } - spin_unlock_irqrestore(&worker->workers->lock, flags); - } -} - -static void check_pending_worker_creates(struct btrfs_worker_thread *worker) -{ - struct btrfs_workers *workers = worker->workers; - struct worker_start *start; - unsigned long flags; - - rmb(); - if (!workers->atomic_start_pending) - re
[PATCH v5 01/18] btrfs: Cleanup the unused struct async_sched.
The struct async_sched is not used by any codes and can be removed. Signed-off-by: Qu Wenruo Reviewed-by: Josef Bacik Tested-by: David Sterba --- Changelog: v1->v2: None. v2->v3: None. v3->v4: None: v4->v5: None --- fs/btrfs/volumes.c | 7 --- 1 file changed, 7 deletions(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 07629e9..82a63b1 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -5323,13 +5323,6 @@ static void btrfs_end_bio(struct bio *bio, int err) } } -struct async_sched { - struct bio *bio; - int rw; - struct btrfs_fs_info *info; - struct btrfs_work work; -}; - /* * see run_scheduled_bios for a description of why bios are collected for * async submit. -- 1.9.0 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v5 12/18] btrfs: Replace fs_info->readahead_workers workqueue with btrfs_workqueue.
Replace the fs_info->readahead_workers with the newly created btrfs_workqueue. Signed-off-by: Qu Wenruo Tested-by: David Sterba --- Changelog: v1->v2: None v2->v3: - Use the btrfs_workqueue_struct to replace submit_workers. v3->v4: - Use the simplified btrfs_alloc_workqueue API. v4->v5: None --- fs/btrfs/ctree.h | 2 +- fs/btrfs/disk-io.c | 12 fs/btrfs/reada.c | 9 + fs/btrfs/super.c | 2 +- 4 files changed, 11 insertions(+), 14 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 06a64fb..3d6f490 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -1517,7 +1517,7 @@ struct btrfs_fs_info { struct btrfs_workqueue_struct *endio_freespace_worker; struct btrfs_workqueue_struct *submit_workers; struct btrfs_workqueue_struct *caching_workers; - struct btrfs_workers readahead_workers; + struct btrfs_workqueue_struct *readahead_workers; /* * fixup workers take dirty pages that didn't properly go through diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 391cadf..ca6d0cf 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -2004,7 +2004,7 @@ static void btrfs_stop_all_workers(struct btrfs_fs_info *fs_info) btrfs_destroy_workqueue(fs_info->submit_workers); btrfs_stop_workers(&fs_info->delayed_workers); btrfs_destroy_workqueue(fs_info->caching_workers); - btrfs_stop_workers(&fs_info->readahead_workers); + btrfs_destroy_workqueue(fs_info->readahead_workers); btrfs_destroy_workqueue(fs_info->flush_workers); btrfs_stop_workers(&fs_info->qgroup_rescan_workers); } @@ -2518,14 +2518,11 @@ int open_ctree(struct super_block *sb, btrfs_init_workers(&fs_info->delayed_workers, "delayed-meta", fs_info->thread_pool_size, &fs_info->generic_worker); - btrfs_init_workers(&fs_info->readahead_workers, "readahead", - fs_info->thread_pool_size, - &fs_info->generic_worker); + fs_info->readahead_workers = + btrfs_alloc_workqueue("readahead", flags, max_active, 2); btrfs_init_workers(&fs_info->qgroup_rescan_workers, "qgroup-rescan", 1, &fs_info->generic_worker); - fs_info->readahead_workers.idle_thresh = 2; - /* * btrfs_start_workers can really only fail because of ENOMEM so just * return -ENOMEM if any of these fail. @@ -2533,7 +2530,6 @@ int open_ctree(struct super_block *sb, ret = btrfs_start_workers(&fs_info->generic_worker); ret |= btrfs_start_workers(&fs_info->fixup_workers); ret |= btrfs_start_workers(&fs_info->delayed_workers); - ret |= btrfs_start_workers(&fs_info->readahead_workers); ret |= btrfs_start_workers(&fs_info->qgroup_rescan_workers); if (ret) { err = -ENOMEM; @@ -2545,7 +2541,7 @@ int open_ctree(struct super_block *sb, fs_info->endio_meta_write_workers && fs_info->endio_write_workers && fs_info->endio_raid56_workers && fs_info->endio_freespace_worker && fs_info->rmw_workers && - fs_info->caching_workers)) { + fs_info->caching_workers && fs_info->readahead_workers)) { err = -ENOMEM; goto fail_sb_buffer; } diff --git a/fs/btrfs/reada.c b/fs/btrfs/reada.c index 31c797c..9e01d36 100644 --- a/fs/btrfs/reada.c +++ b/fs/btrfs/reada.c @@ -91,7 +91,8 @@ struct reada_zone { }; struct reada_machine_work { - struct btrfs_work work; + struct btrfs_work_struct + work; struct btrfs_fs_info*fs_info; }; @@ -733,7 +734,7 @@ static int reada_start_machine_dev(struct btrfs_fs_info *fs_info, } -static void reada_start_machine_worker(struct btrfs_work *work) +static void reada_start_machine_worker(struct btrfs_work_struct *work) { struct reada_machine_work *rmw; struct btrfs_fs_info *fs_info; @@ -793,10 +794,10 @@ static void reada_start_machine(struct btrfs_fs_info *fs_info) /* FIXME we cannot handle this properly right now */ BUG(); } - rmw->work.func = reada_start_machine_worker; + btrfs_init_work(&rmw->work, reada_start_machine_worker, NULL, NULL); rmw->fs_info = fs_info; - btrfs_queue_worker(&fs_info->readahead_workers, &rmw->work); + btrfs_queue_work(fs_info->readahead_workers, &rmw->work); } #ifdef DEBUG diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c index cd52e20..56c5533 100644 --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -1329,7 +1329,7 @@ static void btrfs_resize_thread_pool(struct btrfs_fs_info *fs_info, btrfs_workqueue_set_max(fs_info->endio_write_workers, new_pool_size); btrfs_workqueue_set_max(fs_info->endio_freespace_worker, new_pool_size); btrfs_set_max_workers(&fs_info->delaye
[PATCH v5 10/18] btrfs: Replace fs_info->rmw_workers workqueue with btrfs_workqueue.
Replace the fs_info->rmw_workers with the newly created btrfs_workqueue. Signed-off-by: Qu Wenruo Tested-by: David Sterba --- Changelog: v1->v2: None v2->v3: - Use the btrfs_workqueue_struct to replace submit_workers. v3->v4: - Use the simplified btrfs_alloc_workqueue API. v4->v5: None --- fs/btrfs/ctree.h | 2 +- fs/btrfs/disk-io.c | 12 fs/btrfs/raid56.c | 35 --- 3 files changed, 21 insertions(+), 28 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 3db87da..a7b0bdd 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -1511,7 +1511,7 @@ struct btrfs_fs_info { struct btrfs_workqueue_struct *endio_workers; struct btrfs_workqueue_struct *endio_meta_workers; struct btrfs_workqueue_struct *endio_raid56_workers; - struct btrfs_workers rmw_workers; + struct btrfs_workqueue_struct *rmw_workers; struct btrfs_workqueue_struct *endio_meta_write_workers; struct btrfs_workqueue_struct *endio_write_workers; struct btrfs_workqueue_struct *endio_freespace_worker; diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 28b303c..12586b1 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -1997,7 +1997,7 @@ static void btrfs_stop_all_workers(struct btrfs_fs_info *fs_info) btrfs_destroy_workqueue(fs_info->endio_workers); btrfs_destroy_workqueue(fs_info->endio_meta_workers); btrfs_destroy_workqueue(fs_info->endio_raid56_workers); - btrfs_stop_workers(&fs_info->rmw_workers); + btrfs_destroy_workqueue(fs_info->rmw_workers); btrfs_destroy_workqueue(fs_info->endio_meta_write_workers); btrfs_destroy_workqueue(fs_info->endio_write_workers); btrfs_destroy_workqueue(fs_info->endio_freespace_worker); @@ -2509,9 +2509,8 @@ int open_ctree(struct super_block *sb, btrfs_alloc_workqueue("endio-meta-write", flags, max_active, 2); fs_info->endio_raid56_workers = btrfs_alloc_workqueue("endio-raid56", flags, max_active, 4); - btrfs_init_workers(&fs_info->rmw_workers, - "rmw", fs_info->thread_pool_size, - &fs_info->generic_worker); + fs_info->rmw_workers = + btrfs_alloc_workqueue("rmw", flags, max_active, 2); fs_info->endio_write_workers = btrfs_alloc_workqueue("endio-write", flags, max_active, 2); fs_info->endio_freespace_worker = @@ -2525,8 +2524,6 @@ int open_ctree(struct super_block *sb, btrfs_init_workers(&fs_info->qgroup_rescan_workers, "qgroup-rescan", 1, &fs_info->generic_worker); - fs_info->rmw_workers.idle_thresh = 2; - fs_info->readahead_workers.idle_thresh = 2; /* @@ -2535,7 +2532,6 @@ int open_ctree(struct super_block *sb, */ ret = btrfs_start_workers(&fs_info->generic_worker); ret |= btrfs_start_workers(&fs_info->fixup_workers); - ret |= btrfs_start_workers(&fs_info->rmw_workers); ret |= btrfs_start_workers(&fs_info->delayed_workers); ret |= btrfs_start_workers(&fs_info->caching_workers); ret |= btrfs_start_workers(&fs_info->readahead_workers); @@ -2549,7 +2545,7 @@ int open_ctree(struct super_block *sb, fs_info->endio_workers && fs_info->endio_meta_workers && fs_info->endio_meta_write_workers && fs_info->endio_write_workers && fs_info->endio_raid56_workers && - fs_info->endio_freespace_worker)) { + fs_info->endio_freespace_worker && fs_info->rmw_workers)) { err = -ENOMEM; goto fail_sb_buffer; } diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c index 24ac218..5afa564 100644 --- a/fs/btrfs/raid56.c +++ b/fs/btrfs/raid56.c @@ -87,7 +87,7 @@ struct btrfs_raid_bio { /* * for scheduling work in the helper threads */ - struct btrfs_work work; + struct btrfs_work_struct work; /* * bio list and bio_list_lock are used @@ -166,8 +166,8 @@ struct btrfs_raid_bio { static int __raid56_parity_recover(struct btrfs_raid_bio *rbio); static noinline void finish_rmw(struct btrfs_raid_bio *rbio); -static void rmw_work(struct btrfs_work *work); -static void read_rebuild_work(struct btrfs_work *work); +static void rmw_work(struct btrfs_work_struct *work); +static void read_rebuild_work(struct btrfs_work_struct *work); static void async_rmw_stripe(struct btrfs_raid_bio *rbio); static void async_read_rebuild(struct btrfs_raid_bio *rbio); static int fail_bio_stripe(struct btrfs_raid_bio *rbio, struct bio *bio); @@ -1416,20 +1416,18 @@ cleanup: static void async_rmw_stripe(struct btrfs_raid_bio *rbio) { - rbio->work.flags = 0; - rbio->work.func = rmw_work; + btrfs_init_work(&rbio->work, rmw_work, NULL, NULL); - btrfs_queue_worker(&rbio->fs_info->rmw_workers, - &r
[PATCH v5 02/18] btrfs: Added btrfs_workqueue_struct implemented ordered execution based on kernel workqueue
Use kernel workqueue to implement a new btrfs_workqueue_struct, which has the ordering execution feature like the btrfs_worker. The func is executed in a concurrency way, and the ordred_func/ordered_free is executed in the sequence them are queued after the corresponding func is done. The new btrfs_workqueue works much like the original one, one workqueue for normal work and a list for ordered work. When a work is queued, ordered work will be added to the list and helper function will be queued into the workqueue. The helper function will execute a normal work and then check and execute as many ordered work as possible in the sequence they were queued. At this patch, high priority work queue or thresholding is not added yet. The high priority feature and thresholding will be added in the following patches. Signed-off-by: Qu Wenruo Signed-off-by: Lai Jiangshan Tested-by: David Sterba --- Changelog: v1->v2: None. v2->v3: - Fix the potential deadlock discovered by kernel lockdep. - Reuse the async-thread.[ch] files. - Make the ordered_func optional, which makes it adaptable to all btrfs_workers. v3->v4: - Use the old list method to implement ordered workqueue. Previous 3 wq implement needs extra time waiting for scheduling, which caused up to 40% performance drop in compress tests. The old list method(after executing a normal work, check the order_list and executing) does not need the extra scheduling things. - Simplify the btrfs_alloc_workqueue parameters. Now only one name is needed, and ordered work mechanism is determined using work->ordered_func. - Fix memory leak in btrfs_destroy_workqueue. v4->v5: - Fix a multithread free-and-use bug reported by Josef and David. --- fs/btrfs/async-thread.c | 137 fs/btrfs/async-thread.h | 27 ++ 2 files changed, 164 insertions(+) diff --git a/fs/btrfs/async-thread.c b/fs/btrfs/async-thread.c index 0b78bf2..905de02 100644 --- a/fs/btrfs/async-thread.c +++ b/fs/btrfs/async-thread.c @@ -1,5 +1,6 @@ /* * Copyright (C) 2007 Oracle. All rights reserved. + * Copyright (C) 2014 Fujitsu. All rights reserved. * * This program is free software; you can redistribute it and/or * modify it under the terms of the GNU General Public @@ -21,6 +22,7 @@ #include #include #include +#include #include "async-thread.h" #define WORK_QUEUED_BIT 0 @@ -727,3 +729,138 @@ void btrfs_queue_worker(struct btrfs_workers *workers, struct btrfs_work *work) wake_up_process(worker->task); spin_unlock_irqrestore(&worker->lock, flags); } + +struct btrfs_workqueue_struct { + struct workqueue_struct *normal_wq; + /* List head pointing to ordered work list */ + struct list_head ordered_list; + + /* Spinlock for ordered_list */ + spinlock_t list_lock; +}; + +struct btrfs_workqueue_struct *btrfs_alloc_workqueue(char *name, +int flags, +int max_active) +{ + struct btrfs_workqueue_struct *ret = kzalloc(sizeof(*ret), GFP_NOFS); + + if (unlikely(!ret)) + return NULL; + + ret->normal_wq = alloc_workqueue("%s-%s", flags, max_active, +"btrfs", name); + if (unlikely(!ret->normal_wq)) { + kfree(ret); + return NULL; + } + + INIT_LIST_HEAD(&ret->ordered_list); + spin_lock_init(&ret->list_lock); + return ret; +} + +static void run_ordered_work(struct btrfs_workqueue_struct *wq) +{ + struct list_head *list = &wq->ordered_list; + struct btrfs_work_struct *work; + spinlock_t *lock = &wq->list_lock; + unsigned long flags; + + while (1) { + spin_lock_irqsave(lock, flags); + if (list_empty(list)) + break; + work = list_entry(list->next, struct btrfs_work_struct, + ordered_list); + if (!test_bit(WORK_DONE_BIT, &work->flags)) + break; + + /* +* we are going to call the ordered done function, but +* we leave the work item on the list as a barrier so +* that later work items that are done don't have their +* functions called before this one returns +*/ + if (test_and_set_bit(WORK_ORDER_DONE_BIT, &work->flags)) + break; + spin_unlock_irqrestore(lock, flags); + work->ordered_func(work); + + /* now take the lock again and drop our item from the list */ + spin_lock_irqsave(lock, flags); + list_del(&work->ordered_list); + spin_unlock_irqrestore(lock, flags); + + /* +* we don't want to call the ordered free functions +
[PATCH v5 03/18] btrfs: Add high priority workqueue support for btrfs_workqueue_struct
Add high priority function to btrfs_workqueue. This is implemented by embedding a btrfs_workqueue into a btrfs_workqueue and use some helper functions to differ the normal priority wq and high priority wq. So the high priority wq is completely independent from the normal workqueue. Signed-off-by: Qu Wenruo Tested-by: David Sterba --- Changelog: v1->v2: None v2->v3: None v3->v4: - Implement high priority workqueue independently. Now high priority wq is implemented as a normal btrfs_workqueue, with independent ordering/thresholding mechanism. This fixed the problem that high priority wq and normal wq shared one ordered wq. v4->v5: None --- fs/btrfs/async-thread.c | 91 ++--- fs/btrfs/async-thread.h | 5 ++- 2 files changed, 83 insertions(+), 13 deletions(-) diff --git a/fs/btrfs/async-thread.c b/fs/btrfs/async-thread.c index 905de02..193c849 100644 --- a/fs/btrfs/async-thread.c +++ b/fs/btrfs/async-thread.c @@ -730,7 +730,7 @@ void btrfs_queue_worker(struct btrfs_workers *workers, struct btrfs_work *work) spin_unlock_irqrestore(&worker->lock, flags); } -struct btrfs_workqueue_struct { +struct __btrfs_workqueue_struct { struct workqueue_struct *normal_wq; /* List head pointing to ordered work list */ struct list_head ordered_list; @@ -739,6 +739,38 @@ struct btrfs_workqueue_struct { spinlock_t list_lock; }; +struct btrfs_workqueue_struct { + struct __btrfs_workqueue_struct *normal; + struct __btrfs_workqueue_struct *high; +}; + +static inline struct __btrfs_workqueue_struct +*__btrfs_alloc_workqueue(char *name, int flags, int max_active) +{ + struct __btrfs_workqueue_struct *ret = kzalloc(sizeof(*ret), GFP_NOFS); + + if (unlikely(!ret)) + return NULL; + + if (flags & WQ_HIGHPRI) + ret->normal_wq = alloc_workqueue("%s-%s-high", flags, +max_active, "btrfs", name); + else + ret->normal_wq = alloc_workqueue("%s-%s", flags, +max_active, "btrfs", name); + if (unlikely(!ret->normal_wq)) { + kfree(ret); + return NULL; + } + + INIT_LIST_HEAD(&ret->ordered_list); + spin_lock_init(&ret->list_lock); + return ret; +} + +static inline void +__btrfs_destroy_workqueue(struct __btrfs_workqueue_struct *wq); + struct btrfs_workqueue_struct *btrfs_alloc_workqueue(char *name, int flags, int max_active) @@ -748,19 +780,25 @@ struct btrfs_workqueue_struct *btrfs_alloc_workqueue(char *name, if (unlikely(!ret)) return NULL; - ret->normal_wq = alloc_workqueue("%s-%s", flags, max_active, -"btrfs", name); - if (unlikely(!ret->normal_wq)) { + ret->normal = __btrfs_alloc_workqueue(name, flags & ~WQ_HIGHPRI, + max_active); + if (unlikely(!ret->normal)) { kfree(ret); return NULL; } - INIT_LIST_HEAD(&ret->ordered_list); - spin_lock_init(&ret->list_lock); + if (flags & WQ_HIGHPRI) { + ret->high = __btrfs_alloc_workqueue(name, flags, max_active); + if (unlikely(!ret->high)) { + __btrfs_destroy_workqueue(ret->normal); + kfree(ret); + return NULL; + } + } return ret; } -static void run_ordered_work(struct btrfs_workqueue_struct *wq) +static void run_ordered_work(struct __btrfs_workqueue_struct *wq) { struct list_head *list = &wq->ordered_list; struct btrfs_work_struct *work; @@ -804,7 +842,7 @@ static void run_ordered_work(struct btrfs_workqueue_struct *wq) static void normal_work_helper(struct work_struct *arg) { struct btrfs_work_struct *work; - struct btrfs_workqueue_struct *wq; + struct __btrfs_workqueue_struct *wq; int need_order = 0; work = container_of(arg, struct btrfs_work_struct, normal_work); @@ -840,8 +878,8 @@ void btrfs_init_work(struct btrfs_work_struct *work, work->flags = 0; } -void btrfs_queue_work(struct btrfs_workqueue_struct *wq, - struct btrfs_work_struct *work) +static inline void __btrfs_queue_work(struct __btrfs_workqueue_struct *wq, + struct btrfs_work_struct *work) { unsigned long flags; @@ -854,13 +892,42 @@ void btrfs_queue_work(struct btrfs_workqueue_struct *wq, queue_work(wq->normal_wq, &work->normal_work); } -void btrfs_destroy_workqueue(struct btrfs_workqueue_struct *wq) +void btrfs_queue_work(struct btrfs_workqueue_struct *wq, + struct btrfs_work_struct *work) +{ + struct __bt
[PATCH v5 16/18] btrfs: Replace fs_info->scrub_* workqueue with btrfs_workqueue.
Replace the fs_info->scrub_* with the newly created btrfs_workqueue. Signed-off-by: Qu Wenruo Tested-by: David Sterba --- Changelog: v1->v2: None v2->v3: - Use the btrfs_workqueue_struct to replace submit_workers. v3->v4: - Use the simplified btrfs_alloc_workqueue API. v4->v5: None --- fs/btrfs/ctree.h | 6 ++-- fs/btrfs/scrub.c | 93 ++-- fs/btrfs/super.c | 4 +-- 3 files changed, 55 insertions(+), 48 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index f8f62d0..9aece57 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -1605,9 +1605,9 @@ struct btrfs_fs_info { atomic_t scrub_cancel_req; wait_queue_head_t scrub_pause_wait; int scrub_workers_refcnt; - struct btrfs_workers scrub_workers; - struct btrfs_workers scrub_wr_completion_workers; - struct btrfs_workers scrub_nocow_workers; + struct btrfs_workqueue_struct *scrub_workers; + struct btrfs_workqueue_struct *scrub_wr_completion_workers; + struct btrfs_workqueue_struct *scrub_nocow_workers; #ifdef CONFIG_BTRFS_FS_CHECK_INTEGRITY u32 check_integrity_print_mask; diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c index 51c342b..9223b7b 100644 --- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -96,7 +96,8 @@ struct scrub_bio { #endif int page_count; int next_free; - struct btrfs_work work; + struct btrfs_work_struct + work; }; struct scrub_block { @@ -154,7 +155,8 @@ struct scrub_fixup_nodatasum { struct btrfs_device *dev; u64 logical; struct btrfs_root *root; - struct btrfs_work work; + struct btrfs_work_struct + work; int mirror_num; }; @@ -172,7 +174,8 @@ struct scrub_copy_nocow_ctx { int mirror_num; u64 physical_for_dev_replace; struct list_headinodes; - struct btrfs_work work; + struct btrfs_work_struct + work; }; struct scrub_warning { @@ -231,7 +234,7 @@ static int scrub_pages(struct scrub_ctx *sctx, u64 logical, u64 len, u64 gen, int mirror_num, u8 *csum, int force, u64 physical_for_dev_replace); static void scrub_bio_end_io(struct bio *bio, int err); -static void scrub_bio_end_io_worker(struct btrfs_work *work); +static void scrub_bio_end_io_worker(struct btrfs_work_struct *work); static void scrub_block_complete(struct scrub_block *sblock); static void scrub_remap_extent(struct btrfs_fs_info *fs_info, u64 extent_logical, u64 extent_len, @@ -248,14 +251,14 @@ static int scrub_add_page_to_wr_bio(struct scrub_ctx *sctx, struct scrub_page *spage); static void scrub_wr_submit(struct scrub_ctx *sctx); static void scrub_wr_bio_end_io(struct bio *bio, int err); -static void scrub_wr_bio_end_io_worker(struct btrfs_work *work); +static void scrub_wr_bio_end_io_worker(struct btrfs_work_struct *work); static int write_page_nocow(struct scrub_ctx *sctx, u64 physical_for_dev_replace, struct page *page); static int copy_nocow_pages_for_inode(u64 inum, u64 offset, u64 root, struct scrub_copy_nocow_ctx *ctx); static int copy_nocow_pages(struct scrub_ctx *sctx, u64 logical, u64 len, int mirror_num, u64 physical_for_dev_replace); -static void copy_nocow_pages_worker(struct btrfs_work *work); +static void copy_nocow_pages_worker(struct btrfs_work_struct *work); static void __scrub_blocked_if_needed(struct btrfs_fs_info *fs_info); static void scrub_blocked_if_needed(struct btrfs_fs_info *fs_info); @@ -418,7 +421,8 @@ struct scrub_ctx *scrub_setup_ctx(struct btrfs_device *dev, int is_dev_replace) sbio->index = i; sbio->sctx = sctx; sbio->page_count = 0; - sbio->work.func = scrub_bio_end_io_worker; + btrfs_init_work(&sbio->work, scrub_bio_end_io_worker, + NULL, NULL); if (i != SCRUB_BIOS_PER_SCTX - 1) sctx->bios[i]->next_free = i + 1; @@ -723,7 +727,7 @@ out: return -EIO; } -static void scrub_fixup_nodatasum(struct btrfs_work *work) +static void scrub_fixup_nodatasum(struct btrfs_work_struct *work) { int ret; struct scrub_fixup_nodatasum *fixup; @@ -987,9 +991,10 @@ nodatasum_case: fixup_nodatasum->root = fs_info->extent_root; fixup_nodatasum->mirror_num = failed_mirror_index + 1; scrub_pending_trans_workers_inc(sctx); - fixup_nodatasum->work.func = scrub_fixup_nodatasum; - btrfs_queue_worker(&fs_
[PATCH v5 15/18] btrfs: Replace fs_info->qgroup_rescan_worker workqueue with btrfs_workqueue.
Replace the fs_info->qgroup_rescan_worker with the newly created btrfs_workqueue. Signed-off-by: Qu Wenruo Tested-by: David Sterba --- Changelog: v1->v2: None v2->v3: - Use the btrfs_workqueue_struct to replace submit_workers. v3->v4: - Use the simplified btrfs_alloc_workqueue API. v4->v5: None --- fs/btrfs/ctree.h | 4 ++-- fs/btrfs/disk-io.c | 10 +- fs/btrfs/qgroup.c | 17 + 3 files changed, 16 insertions(+), 15 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 07b563d..f8f62d0 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -1648,9 +1648,9 @@ struct btrfs_fs_info { /* qgroup rescan items */ struct mutex qgroup_rescan_lock; /* protects the progress item */ struct btrfs_key qgroup_rescan_progress; - struct btrfs_workers qgroup_rescan_workers; + struct btrfs_workqueue_struct *qgroup_rescan_workers; struct completion qgroup_rescan_completion; - struct btrfs_work qgroup_rescan_work; + struct btrfs_work_struct qgroup_rescan_work; /* filesystem state */ unsigned long fs_state; diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index ac8e9c2..e3507c5 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -2006,7 +2006,7 @@ static void btrfs_stop_all_workers(struct btrfs_fs_info *fs_info) btrfs_destroy_workqueue(fs_info->caching_workers); btrfs_destroy_workqueue(fs_info->readahead_workers); btrfs_destroy_workqueue(fs_info->flush_workers); - btrfs_stop_workers(&fs_info->qgroup_rescan_workers); + btrfs_destroy_workqueue(fs_info->qgroup_rescan_workers); } static void free_root_extent_buffers(struct btrfs_root *root) @@ -2519,15 +2519,14 @@ int open_ctree(struct super_block *sb, btrfs_alloc_workqueue("delayed-meta", flags, max_active, 0); fs_info->readahead_workers = btrfs_alloc_workqueue("readahead", flags, max_active, 2); - btrfs_init_workers(&fs_info->qgroup_rescan_workers, "qgroup-rescan", 1, - &fs_info->generic_worker); + fs_info->qgroup_rescan_workers = + btrfs_alloc_workqueue("qgroup-rescan", flags, 1, 0); /* * btrfs_start_workers can really only fail because of ENOMEM so just * return -ENOMEM if any of these fail. */ ret = btrfs_start_workers(&fs_info->generic_worker); - ret |= btrfs_start_workers(&fs_info->qgroup_rescan_workers); if (ret) { err = -ENOMEM; goto fail_sb_buffer; @@ -2539,7 +2538,8 @@ int open_ctree(struct super_block *sb, fs_info->endio_write_workers && fs_info->endio_raid56_workers && fs_info->endio_freespace_worker && fs_info->rmw_workers && fs_info->caching_workers && fs_info->readahead_workers && - fs_info->fixup_workers && fs_info->delayed_workers)) { + fs_info->fixup_workers && fs_info->delayed_workers && + fs_info->qgroup_rescan_workers)) { err = -ENOMEM; goto fail_sb_buffer; } diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c index 472302a..38617cc 100644 --- a/fs/btrfs/qgroup.c +++ b/fs/btrfs/qgroup.c @@ -1509,8 +1509,8 @@ int btrfs_run_qgroups(struct btrfs_trans_handle *trans, ret = qgroup_rescan_init(fs_info, 0, 1); if (!ret) { qgroup_rescan_zero_tracking(fs_info); - btrfs_queue_worker(&fs_info->qgroup_rescan_workers, - &fs_info->qgroup_rescan_work); + btrfs_queue_work(fs_info->qgroup_rescan_workers, +&fs_info->qgroup_rescan_work); } ret = 0; } @@ -1984,7 +1984,7 @@ out: return ret; } -static void btrfs_qgroup_rescan_worker(struct btrfs_work *work) +static void btrfs_qgroup_rescan_worker(struct btrfs_work_struct *work) { struct btrfs_fs_info *fs_info = container_of(work, struct btrfs_fs_info, qgroup_rescan_work); @@ -2095,7 +2095,8 @@ qgroup_rescan_init(struct btrfs_fs_info *fs_info, u64 progress_objectid, memset(&fs_info->qgroup_rescan_work, 0, sizeof(fs_info->qgroup_rescan_work)); - fs_info->qgroup_rescan_work.func = btrfs_qgroup_rescan_worker; + btrfs_init_work(&fs_info->qgroup_rescan_work, + btrfs_qgroup_rescan_worker, NULL, NULL); if (ret) { err: @@ -2158,8 +2159,8 @@ btrfs_qgroup_rescan(struct btrfs_fs_info *fs_info) qgroup_rescan_zero_tracking(fs_info); - btrfs_queue_worker(&fs_info->qgroup_rescan_workers, - &fs_info->qgroup_rescan_work); + btrfs_queue_work(fs_info->qgroup_rescan_workers, +&fs_info->qgroup_rescan_work); return 0; } @@ -2190,6 +2191,6 @@
[PATCH v5 00/18] Replace btrfs_workers with kernel workqueue based btrfs_workqueue
Add a new btrfs_workqueue_struct which use kernel workqueue to implement most of the original btrfs_workers, to replace btrfs_workers. With this patchset, redundant workqueue codes are replaced with kernel workqueue infrastructure, which not only reduces the code size but also the effort to maintain it. The result(somewhat outdated though) from sysbench shows minor improvement on the following server: CPU: two-way Xeon X5660 RAM: 4G HDD: SAS HDD, 150G total, 100G partition for btrfs test Test result on default mount option: https://docs.google.com/spreadsheet/ccc?key=0AhpkL3ehzX3pdENjajJTWFg5d1BWbExnYWFpMTJxeUE&usp=sharing Test result on "-o compress" mount option: https://docs.google.com/spreadsheet/ccc?key=0AhpkL3ehzX3pdHdTTEJ6OW96SXJFaDR5enB1SzMzc0E&usp=sharing Changelog: v1->v2: - Fix some workqueue flags. v2->v3: - Add the thresholding mechanism to simulate the old behavior - Convert all the btrfs_workers to btrfs_workrqueue_struct. - Fix some potential deadlock when executed in IRQ handler. v3->v4: - Change the ordered workqueue implement to fix the performance drop in 32K multi thread random write. - Change the high priority workqueue implement to get an independent high workqueue without starving problem. - Simplify the btrfs_alloc_workqueue parameters. - Coding style cleanup. - Remove the redundant "_struct" suffix. v4->v5: - Fix a multithread free-and-use bug reported by Josef and David. Qu Wenruo (18): btrfs: Cleanup the unused struct async_sched. btrfs: Added btrfs_workqueue_struct implemented ordered execution based on kernel workqueue btrfs: Add high priority workqueue support for btrfs_workqueue_struct btrfs: Add threshold workqueue based on kernel workqueue btrfs: Replace fs_info->workers with btrfs_workqueue. btrfs: Replace fs_info->delalloc_workers with btrfs_workqueue btrfs: Replace fs_info->submit_workers with btrfs_workqueue. btrfs: Replace fs_info->flush_workers with btrfs_workqueue. btrfs: Replace fs_info->endio_* workqueue with btrfs_workqueue. btrfs: Replace fs_info->rmw_workers workqueue with btrfs_workqueue. btrfs: Replace fs_info->cache_workers workqueue with btrfs_workqueue. btrfs: Replace fs_info->readahead_workers workqueue with btrfs_workqueue. btrfs: Replace fs_info->fixup_workers workqueue with btrfs_workqueue. btrfs: Replace fs_info->delayed_workers workqueue with btrfs_workqueue. btrfs: Replace fs_info->qgroup_rescan_worker workqueue with btrfs_workqueue. btrfs: Replace fs_info->scrub_* workqueue with btrfs_workqueue. btrfs: Cleanup the old btrfs_worker. btrfs: Cleanup the "_struct" suffix in btrfs_workequeue fs/btrfs/async-thread.c | 830 --- fs/btrfs/async-thread.h | 119 ++- fs/btrfs/ctree.h | 39 ++- fs/btrfs/delayed-inode.c | 6 +- fs/btrfs/disk-io.c | 212 +--- fs/btrfs/extent-tree.c | 4 +- fs/btrfs/inode.c | 38 +-- fs/btrfs/ordered-data.c | 11 +- fs/btrfs/qgroup.c| 15 +- fs/btrfs/raid56.c| 21 +- fs/btrfs/reada.c | 4 +- fs/btrfs/scrub.c | 70 ++-- fs/btrfs/super.c | 36 +- fs/btrfs/volumes.c | 16 +- 14 files changed, 446 insertions(+), 975 deletions(-) -- 1.9.0 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 3.14.0-rc3 btrfs scrub is preventing my laptop from going to sleep
Hi Marc, On 02/28/2014 03:06 AM, Marc MERLIN wrote: This does not happen consistently, but sometimes: PM: Preparing system for mem sleep Freezing user space processes ... (...) Freezing of tasks failed after 20.002 seconds (1 tasks refusing to freeze, wq_busy=0): btrfs D 88017639c800 0 12239 12224 0x0084 880165ec1960 0086 880165ec1fd8 88017639c2d0 000141c0 88017639c2d0 88007b874000 8804062fa480 880175837ec0 88007b874220 880165ec1970 Call Trace: [] schedule+0x73/0x75 [] scrub_pages+0x27e/0x426 [] ? finish_wait+0x65/0x65 [] scrub_stripe+0xada/0xc9e [] scrub_chunk.isra.9+0xd6/0x10d [] scrub_enumerate_chunks+0x274/0x418 [] ? finish_wait+0x3/0x65 [] btrfs_scrub_dev+0x254/0x3cb [] ? __mnt_want_write+0x62/0x78 [] btrfs_ioctl+0x1114/0x24b1 [] ? cache_alloc+0x1c/0x29b [] ? kmem_cache_alloc_node+0xef/0x179 [] ? _raw_spin_unlock+0x17/0x2a [] do_vfs_ioctl+0x3d2/0x41d [] ? __fget+0x6f/0x79 [] SyS_ioctl+0x57/0x82 [] system_call_fastpath+0x1a/0x1f Could you run the following command when scrub is blocked, we can know more why scrub is blocked here. # echo w > /proc/sysrq-trigger # dmesg Thanks, Wang And then I end up with a hot laptop and a mostly dead battery in my backpack. As far as I know, this was not happening with 3.13, unless I'm doing something differently without knowing. My laptop went to sleep just fine while I was typing this Email, so I'm guessing it's only btrfs scrub that causes the problem with sleep. Marc -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Help with space
On Feb 27, 2014, at 5:12 PM, Dave Chinner wrote: > On Thu, Feb 27, 2014 at 02:11:19PM -0700, Chris Murphy wrote: >> >> On Feb 27, 2014, at 1:49 PM, otakujunct...@gmail.com wrote: >> >>> Yes it's an ancient 32 bit machine. There must be a complex bug >>> involved as the system, when originally mounted, claimed the >>> correct free space and only as used over time did the >>> discrepancy between used and free grow. I'm afraid I chose >>> btrfs because it appeared capable of breaking the 16 tera limit >>> on a 32 bit system. If this isn't the case then it's incredible >>> that I've been using this file system for about a year without >>> difficulty until now. >> >> Yep, it's not a good bug. This happened some years ago on XFS too, >> where people would use the file system for a long time and then at >> 16TB+1byte written to the volume, kablewy! And then it wasn't >> usable at all, until put on a 64-bit kernel. >> >> http://oss.sgi.com/pipermail/xfs/2014-February/034588.html > > Well, no, that's not what I said. What are you thinking I said you said? I wasn't quoting or paraphrasing anything you've said above. I had done a google search on this early and found some rather old threads where some people had this experience of making a large file system on a 32-bit kernel, and only after filling it beyond 16TB did they run into the problem. Here is one of them: http://lists.centos.org/pipermail/centos/2011-April/109142.html > I said that it was limited on XFS, > not that the limit was a result of a user making a filesystem too > large and then finding out it didn't work. Indeed, you can't do that > on XFS - mkfs will refuse to run on a block device it can't access the > last block on, and the kernel has the same "can I access the last > block of the filesystem" sanity checks that are run at mount and > growfs time. Nope. What I reported on the XFS list, I had used mkfs.xfs while running 32bit kernel on a 20TB virtual disk. It did not fail to make the file system, it failed only to mount it. It was the same booted virtual machine, I created the file system and immediately mounted it. If you want the specifics, I'll post on the XFS list with versions and reproduce steps. > > IOWs, XFS has *never* allowed >16TB on 32 bit systems on Linux. OK that's fine, I've only reported what other people said they experienced, and it comes as no surprise they might have been confused. Although not knowing the size of one's file system would seem to be rare. Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Help with space
On Thu, Feb 27, 2014 at 02:11:19PM -0700, Chris Murphy wrote: > > On Feb 27, 2014, at 1:49 PM, otakujunct...@gmail.com wrote: > > > Yes it's an ancient 32 bit machine. There must be a complex bug > > involved as the system, when originally mounted, claimed the > > correct free space and only as used over time did the > > discrepancy between used and free grow. I'm afraid I chose > > btrfs because it appeared capable of breaking the 16 tera limit > > on a 32 bit system. If this isn't the case then it's incredible > > that I've been using this file system for about a year without > > difficulty until now. > > Yep, it's not a good bug. This happened some years ago on XFS too, > where people would use the file system for a long time and then at > 16TB+1byte written to the volume, kablewy! And then it wasn't > usable at all, until put on a 64-bit kernel. > > http://oss.sgi.com/pipermail/xfs/2014-February/034588.html Well, no, that's not what I said. I said that it was limited on XFS, not that the limit was a result of a user making a filesystem too large and then finding out it didn't work. Indeed, you can't do that on XFS - mkfs will refuse to run on a block device it can't access the last block on, and the kernel has the same "can I access the last block of the filesystem" sanity checks that are run at mount and growfs time. IOWs, XFS has *never* allowed >16TB on 32 bit systems on Linux. And, historically speaking, it didn't even allow it on Irix. Irix on 32 bit systems was limited to 1TB (2^31 sectors of 2^9 bytes = 1TB), and only as Linux gained sufficient capability on 32 bit systems (e.g. CONFIG_LBD) was the limit increased. The limit we are now at is the address space index being 32 bits, so the size is limited by 2^32 * PAGE_SIZE = 2^44 = 16TB i.e Back when XFS was still being ported to Linux from Irix in 2000: 203 #if !XFS_BIG_FILESYSTEMS 204 if (sbp->sb_dblocks > INT_MAX || sbp->sb_rblocks > INT_MAX) { 205 cmn_err(CE_WARN, 206 "XFS: File systems greater than 1TB not supported on this system.\n"); 207 return XFS_ERROR(E2BIG); 208 } 209 #endif (http://oss.sgi.com/cgi-bin/gitweb.cgi?p=archive/xfs-import.git;a=blob;f=fs/xfs/xfs_mount.c;hb=60a4726a60437654e2af369ccc8458376e1657b9) So, good story, but is not true. Cheers, Dave. -- Dave Chinner da...@fromorbit.com -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 3.14.0-rc3 btrfs scrub is preventing my laptop from going to sleep
On Thu, Feb 27, 2014 at 11:06:56AM -0800, Marc MERLIN wrote: > This does not happen consistently, but sometimes: > > PM: Preparing system for mem sleep > Freezing user space processes ... > (...) > Freezing of tasks failed after 20.002 seconds (1 tasks refusing to freeze, > wq_busy=0): > btrfs D 88017639c800 0 12239 12224 0x0084 > 880165ec1960 0086 880165ec1fd8 88017639c2d0 > 000141c0 88017639c2d0 88007b874000 8804062fa480 > 880175837ec0 88007b874220 880165ec1970 > Call Trace: > [] schedule+0x73/0x75 > [] scrub_pages+0x27e/0x426 > [] ? finish_wait+0x65/0x65 > [] scrub_stripe+0xada/0xc9e > [] scrub_chunk.isra.9+0xd6/0x10d > [] scrub_enumerate_chunks+0x274/0x418 > [] ? finish_wait+0x3/0x65 > [] btrfs_scrub_dev+0x254/0x3cb > [] ? __mnt_want_write+0x62/0x78 > [] btrfs_ioctl+0x1114/0x24b1 > [] ? cache_alloc+0x1c/0x29b > [] ? kmem_cache_alloc_node+0xef/0x179 > [] ? _raw_spin_unlock+0x17/0x2a > [] do_vfs_ioctl+0x3d2/0x41d > [] ? __fget+0x6f/0x79 > [] SyS_ioctl+0x57/0x82 > [] system_call_fastpath+0x1a/0x1f Some time later, I go this one, not sure if it's btrfs' fault or not: usb 1-11: new full-speed USB device number 7 using xhci_hcd Freezing of tasks failed after 20.006 seconds (1 tasks refusing to freeze, wq_busy=0): laptop_mode D 8800048c4a80 0 6657 1 0x0084 880037f2bde0 0086 880037f2bfd8 8800048c4550 000141c0 8800048c4550 8804072280e8 8804072280ec 8800048c4550 8804072280f0 880037f2bdf0 Call Trace: [] schedule+0x73/0x75 [] schedule_preempt_disabled+0x18/0x24 [] __mutex_lock_slowpath+0x158/0x1cf [] mutex_lock+0x17/0x27 [] control_store+0x44/0xb1 [] dev_attr_store+0x18/0x24 [] sysfs_kf_write+0x3e/0x40 [] kernfs_fop_write+0xc2/0xff [] vfs_write+0xab/0x107 [] SyS_write+0x46/0x79 [] system_call_fastpath+0x1a/0x1f Restarting tasks ... done. -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ | PGP 1024R/763BE901 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BUG: >16TB Btrfs volumes are mountable on 32 bit kernels
On Feb 27, 2014, at 2:07 PM, Josef Bacik wrote: > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA1 > > On 02/27/2014 04:05 PM, Chris Murphy wrote: >> User reports successfully formatting and using an ~18TB Btrfs >> volume on hardware raid5 using i686 kernel for over a year, and >> then suddenly the file system starts behaving weirdly: >> >> https://urldefense.proofpoint.com/v1/url?u=http://www.mail-archive.com/linux-btrfs%40vger.kernel.org/msg31856.html&k=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0A&r=cKCbChRKsMpTX8ybrSkonQ%3D%3D%0A&m=6eUt5RgBggFh930oFrH19iR4z%2BFVzT%2F0%2F4dYPt3g48U%3D%0A&s=5ac126734d7fa1d3238ab09a2ddc021a8dcc8fff7b022560a4d068be2de37c00 >> >> >> >> I think this is due to the kernel page cache address space being >> 16TB limited on 32-bit kernels, as mentioned by Dave Chinner in >> this thread: >> >> https://urldefense.proofpoint.com/v1/url?u=http://oss.sgi.com/pipermail/xfs/2014-February/034588.html&k=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0A&r=cKCbChRKsMpTX8ybrSkonQ%3D%3D%0A&m=6eUt5RgBggFh930oFrH19iR4z%2BFVzT%2F0%2F4dYPt3g48U%3D%0A&s=3e45f9288e6a77bc1a24dded368802c2ab46b812bf59953f74d4ee1d4141f7d2 >> >> So it sounds like it shouldn't be possible to mount a Btrfs volume >> larger than 16TB on 32-bit kernels. This is consistent with ext4 >> and XFS which refuse to mount large file systems. >> >> > > Well that's not good, I'll fix this up. Thanks, Is it a valid or goofy work around to partition this 21TB volume into two equal portions, and then: mkfs.btrfs -d single -m raid1 /dev/sdb[12] Maybe it's too much of an edge case to permit it even if it worked? Chris Murphy-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Help with space
On Feb 27, 2014, at 1:49 PM, otakujunct...@gmail.com wrote: > Yes it's an ancient 32 bit machine. There must be a complex bug involved as > the system, when originally mounted, claimed the correct free space and only > as used over time did the discrepancy between used and free grow. I'm afraid > I chose btrfs because it appeared capable of breaking the 16 tera limit on a > 32 bit system. If this isn't the case then it's incredible that I've been > using this file system for about a year without difficulty until now. Yep, it's not a good bug. This happened some years ago on XFS too, where people would use the file system for a long time and then at 16TB+1byte written to the volume, kablewy! And then it wasn't usable at all, until put on a 64-bit kernel. http://oss.sgi.com/pipermail/xfs/2014-February/034588.html I can't tell you if there's a work around for this other than to go to a 64bit kernel. Maybe you could partition the raid5 into two 9TB block devices, and then format the two partitions with -d single -m raid1. That way it behaves as one volume, and alternates 1GB chunks to the two partitions. This should be decent performing for large files, but otherwise it's possible that you will sometimes have the allocator writing to two data chunks on what it thinks are two drives, atthe same time, but it's actually writing to the physical device (array) at the same time. Hardware raid should optimize some of this, but I don't know what the penalty will be, if it'll work for your use case. And I definitely don't know if the kernel page cache limit applies to the block device (partition) or if it applies to the file system. It sounds like it applies to the block device, so this might be a way around this if you had to stick to a 32bit system. Chris Murphy-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BUG: >16TB Btrfs volumes are mountable on 32 bit kernels
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 02/27/2014 04:05 PM, Chris Murphy wrote: > User reports successfully formatting and using an ~18TB Btrfs > volume on hardware raid5 using i686 kernel for over a year, and > then suddenly the file system starts behaving weirdly: > > https://urldefense.proofpoint.com/v1/url?u=http://www.mail-archive.com/linux-btrfs%40vger.kernel.org/msg31856.html&k=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0A&r=cKCbChRKsMpTX8ybrSkonQ%3D%3D%0A&m=6eUt5RgBggFh930oFrH19iR4z%2BFVzT%2F0%2F4dYPt3g48U%3D%0A&s=5ac126734d7fa1d3238ab09a2ddc021a8dcc8fff7b022560a4d068be2de37c00 > > > > I think this is due to the kernel page cache address space being > 16TB limited on 32-bit kernels, as mentioned by Dave Chinner in > this thread: > > https://urldefense.proofpoint.com/v1/url?u=http://oss.sgi.com/pipermail/xfs/2014-February/034588.html&k=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0A&r=cKCbChRKsMpTX8ybrSkonQ%3D%3D%0A&m=6eUt5RgBggFh930oFrH19iR4z%2BFVzT%2F0%2F4dYPt3g48U%3D%0A&s=3e45f9288e6a77bc1a24dded368802c2ab46b812bf59953f74d4ee1d4141f7d2 > > So it sounds like it shouldn't be possible to mount a Btrfs volume > larger than 16TB on 32-bit kernels. This is consistent with ext4 > and XFS which refuse to mount large file systems. > > Well that's not good, I'll fix this up. Thanks, Josef -BEGIN PGP SIGNATURE- Version: GnuPG v1 Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQIcBAEBAgAGBQJTD6j6AAoJEANb+wAKly3BhIEQAJheOf/NMEurHSlxnLWYuRog thRJMk+je1Ae9Sz93B5/0OztrmzYhK+OoQhWuF79OxVPoMeZ2Ta5qqmeNw3U+dRn T44SjlYRnerq0ksVt9xR9j2zMXWatgO5+20doZpeESco/IRWYkakQTyrWj9WUATN 7YQsxZB57nijrOvig0GPmMtH9PriscsPQhMVDuTDIHkvfWgk0M2oqu/0TZl9f5xA Es1uK0rv6KsExVQix+4GjWc/RBpl2QzxGEq/Ct+vcL+HaiKIERXuEw5liP9dSamj Wqbkkli+FDftBx/GXGTA38VYSxLExrlF891R4fOXWUcqDvlLwhdBpZExeYCV9MUz lEtaZaKUa3eeRBwzuxeLT8mEvY3BqvePQg8Io7auuIHG4fuRlOWWRiDG7bpTTPlD NFZACEDlGGdXNli7TqQ82La9kxFDvXCISnfxNbbu2vlXqL/HqQom1HiPwgMNIDQ7 0UIOLW5X+gg++kH7ArhOv19B7FR2i50wxuJSwj2/XSLELAPFAd9/BMI+3DXWfkE4 qZwnHEt8bVKR/yJ+srnRC2mZP41eHWHA6c9IXEGU/STy2uOdnwnXoS+KAdNWEt1d QRlr79S8Mhf7U8Acx/LhgwkbB1npmm0xssZmK2WycSyU7A66rdk0Cc+gfVyZOW5C k68LvCitzpU1W7MSMmPt =9S2b -END PGP SIGNATURE- -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
BUG: >16TB Btrfs volumes are mountable on 32 bit kernels
User reports successfully formatting and using an ~18TB Btrfs volume on hardware raid5 using i686 kernel for over a year, and then suddenly the file system starts behaving weirdly: http://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg31856.html I think this is due to the kernel page cache address space being 16TB limited on 32-bit kernels, as mentioned by Dave Chinner in this thread: http://oss.sgi.com/pipermail/xfs/2014-February/034588.html So it sounds like it shouldn't be possible to mount a Btrfs volume larger than 16TB on 32-bit kernels. This is consistent with ext4 and XFS which refuse to mount large file systems. Chris Murphy-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Help with space
Yes it's an ancient 32 bit machine. There must be a complex bug involved as the system, when originally mounted, claimed the correct free space and only as used over time did the discrepancy between used and free grow. I'm afraid I chose btrfs because it appeared capable of breaking the 16 tera limit on a 32 bit system. If this isn't the case then it's incredible that I've been using this file system for about a year without difficulty until now. -Justin Sent from my iPad > On Feb 27, 2014, at 1:51 PM, Chris Murphy wrote: > > >> On Feb 27, 2014, at 12:27 PM, Chris Murphy wrote: >> This is on i686? >> >> The kernel page cache is limited to 16TB on i686, so effectively your block >> device is limited to 16TB. While the file system successfully creates, I >> think it's a bug that the mount -t btrfs command is probably a btrfs bug. > > Yes Chris, circular logic day. It's probably a btrfs bug that the mount > command succeeds. > > So let us know if this is i686 or x86_64, because if it's the former it's a > bug that should get fixed. > > > Chris Murphy > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Help with space
On Feb 27, 2014, at 12:27 PM, Chris Murphy wrote: > This is on i686? > > The kernel page cache is limited to 16TB on i686, so effectively your block > device is limited to 16TB. While the file system successfully creates, I > think it's a bug that the mount -t btrfs command is probably a btrfs bug. Yes Chris, circular logic day. It's probably a btrfs bug that the mount command succeeds. So let us know if this is i686 or x86_64, because if it's the former it's a bug that should get fixed. Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Help with space
On Feb 27, 2014, at 11:19 AM, Justin Brown wrote: > I've a 18 tera hardware raid 5 (areca ARC-1170 w/ 8 3 gig drives) in > need of help. Disk usage (du) shows 13 tera allocated yet strangely > enough df shows approx. 780 gigs are free. It seems, somehow, btrfs > has eaten roughly 4 tera internally. I've run a scrub and a balance > usage=5 with no success, in fact I lost about 20 gigs after the > balance attempt. Some numbers: > > terra:/var/lib/nobody/fs/ubfterra # uname -a > Linux terra 3.12.4-2.44-desktop #1 SMP PREEMPT Mon Dec 9 03:14:51 CST > 2013 i686 i686 i386 GNU/Linux This is on i686? The kernel page cache is limited to 16TB on i686, so effectively your block device is limited to 16TB. While the file system successfully creates, I think it's a bug that the mount -t btrfs command is probably a btrfs bug. The way this works for XFS and ext4 is mount fails. EXT4-fs (sdc): filesystem too large to mount safely on this system XFS (sdc): file system too large to be mounted on this system. If you're on a 32-bit OS, the file system might be toast, I'm not really sure. But I'd immediately stop using it and only use 64-bit OS for file systems of this size. Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
3.14.0-rc3 btrfs scrub is preventing my laptop from going to sleep
This does not happen consistently, but sometimes: PM: Preparing system for mem sleep Freezing user space processes ... (...) Freezing of tasks failed after 20.002 seconds (1 tasks refusing to freeze, wq_busy=0): btrfs D 88017639c800 0 12239 12224 0x0084 880165ec1960 0086 880165ec1fd8 88017639c2d0 000141c0 88017639c2d0 88007b874000 8804062fa480 880175837ec0 88007b874220 880165ec1970 Call Trace: [] schedule+0x73/0x75 [] scrub_pages+0x27e/0x426 [] ? finish_wait+0x65/0x65 [] scrub_stripe+0xada/0xc9e [] scrub_chunk.isra.9+0xd6/0x10d [] scrub_enumerate_chunks+0x274/0x418 [] ? finish_wait+0x3/0x65 [] btrfs_scrub_dev+0x254/0x3cb [] ? __mnt_want_write+0x62/0x78 [] btrfs_ioctl+0x1114/0x24b1 [] ? cache_alloc+0x1c/0x29b [] ? kmem_cache_alloc_node+0xef/0x179 [] ? _raw_spin_unlock+0x17/0x2a [] do_vfs_ioctl+0x3d2/0x41d [] ? __fget+0x6f/0x79 [] SyS_ioctl+0x57/0x82 [] system_call_fastpath+0x1a/0x1f And then I end up with a hot laptop and a mostly dead battery in my backpack. As far as I know, this was not happening with 3.13, unless I'm doing something differently without knowing. My laptop went to sleep just fine while I was typing this Email, so I'm guessing it's only btrfs scrub that causes the problem with sleep. Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ | PGP 1024R/763BE901 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Help with space
I've a 18 tera hardware raid 5 (areca ARC-1170 w/ 8 3 gig drives) in need of help. Disk usage (du) shows 13 tera allocated yet strangely enough df shows approx. 780 gigs are free. It seems, somehow, btrfs has eaten roughly 4 tera internally. I've run a scrub and a balance usage=5 with no success, in fact I lost about 20 gigs after the balance attempt. Some numbers: terra:/var/lib/nobody/fs/ubfterra # uname -a Linux terra 3.12.4-2.44-desktop #1 SMP PREEMPT Mon Dec 9 03:14:51 CST 2013 i686 i686 i386 GNU/Linux terra:/var/lib/nobody/fs/ubfterra # parted -l Model: Areca ARC-1170-VOL#00 (scsi) Disk /dev/sdb: 21.0TB Sector size (logical/physical): 4096B/4096B Partition Table: gpt Number Start End SizeFile system Name Flags 1 1049kB 21.0TB 21.0TB Linux filesystem terra:/var/lib/nobody/fs/ubfterra # du -shc * 1.7M40588-4-1376856876.jpg 2.7M40588-4-1376856876b.jpg 1008G Anime 180GDoctor Who (classic) 5.5TDownloads 28G Flash Rescue 1.9TJus 3.6TTornado 4.0Kdirsanime 4.0Kfilesanime 55G home videos 0 testsub 4.0Kunsharedanime 13T total terra:/var/lib/nobody/fs/ubfterra # btrfs fi show /dev/sdb1 Label: ubfterra uuid: 40f0f692-c68c-4af7-ade2-c15a127ceab5 Total devices 1 FS bytes used 17.61TiB devid1 size 19.10TiB used 18.34TiB path /dev/sdb1 Btrfs v3.12 terra:/var/lib/nobody/fs/ubfterra # btrfs fi df . Data, single: total=17.58TiB, used=17.57TiB System, DUP: total=8.00MiB, used=1.93MiB System, single: total=4.00MiB, used=0.00 Metadata, DUP: total=392.00GiB, used=33.50GiB Metadata, single: total=8.00MiB, used=0.00 I use no subvolumes nor are there any snapshots, at least as near as I can tell. Any suggestions as to how to recover the missing space assuming it's possible? Any help is most appreciated. -Justin -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: What are the linux kernel versions incompatibilities with btrfs?
Hi, I can't give you a specific answer to your question. But because btrfs is still under heavy development you shouldn't use it with those old kernels at all in my oppinion. You should never be more than one version away from the current stable kernel. Regards, Felix On Thu, Feb 27, 2014 at 5:31 PM, Brent Millare wrote: > I read that usage of a btrfs volume with a newer kernel can render it > unreadable when that same volume is used with an older kernel. I have > a mobile storage device that will be used by different linux > distributions and kernels. What are the kernel version > incompatibilities I might have to worry about? The machines I will use > have kernel versions 3.2.0, 3.5.0, and higher. > > -Brent > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/9] Btrfs: use bitfield instead of integer data type for the some variants in btrfs_root
On Wed, Feb 26, 2014 at 05:10:05PM +0800, Miao Xie wrote: > On Sat, 22 Feb 2014 01:23:37 +0100, David Sterba wrote: > > On Thu, Feb 20, 2014 at 06:08:54PM +0800, Miao Xie wrote: > >> @@ -1352,13 +1347,15 @@ static struct btrfs_root *alloc_log_tree(struct > >> btrfs_trans_handle *trans, > >>root->root_key.objectid = BTRFS_TREE_LOG_OBJECTID; > >>root->root_key.type = BTRFS_ROOT_ITEM_KEY; > >>root->root_key.offset = BTRFS_TREE_LOG_OBJECTID; > >> + > >>/* > >> + * DON'T set REF_COWS for log trees > >> + * > >> * log trees do not get reference counted because they go away > >> * before a real commit is actually done. They do store pointers > >> * to file data extents, and those reference counts still get > >> * updated (along with back refs to the log tree). > >> */ > >> - root->ref_cows = 0; > > > > This looks like a bugfix hidden in a cleanup patch. If it is standalone > > and not related to changes in this patchset, it makes sense to send it > > separately (and possibly CC stable). > > It is a cleanup because we have set it to 0 before. > > I add this comment just to remind the other developer that don't set this > flag. > (The old one is not so striking, I think.) Ox, thanks for explanation. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
What are the linux kernel versions incompatibilities with btrfs?
I read that usage of a btrfs volume with a newer kernel can render it unreadable when that same volume is used with an older kernel. I have a mobile storage device that will be used by different linux distributions and kernels. What are the kernel version incompatibilities I might have to worry about? The machines I will use have kernel versions 3.2.0, 3.5.0, and higher. -Brent -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: throttle delayed refs better
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 02/27/2014 10:38 AM, 钱凯 wrote: > I'm a little confused of what "avg_delayed_ref_runtime" means. > > In __btrfs_run_delayed_refs(), "avg_delayed_ref_runtime" is set to > the runtime of all delayed refs processed in current transaction > commit. However, in btrfs_should_throttle_delayed_refs(), we based > on the following condition to decide whether throttle refs or not: > * avg_runtime = > fs_info->avg_delayed_ref_runtime; if (num_entries * avg_runtime >= > NSEC_PER_SEC) return 1; * > It looks like "avg_delayed_ref_runtime" is used as runtime of each > delayed ref processed in average here. So what does it really > means? > Yeah I screwed this up, I should have been dividing the total time by the number of delayed refs I ran. I have a patch locally to fix it and I'll send it out after I finish my qgroup work. Thanks, Josef -BEGIN PGP SIGNATURE- Version: GnuPG v1 Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQIcBAEBAgAGBQJTD2AlAAoJEANb+wAKly3BQkEP/0F/LGGDsO+x63SAFh/apRZo ZVmzi1yJGiArFImFs8IwZHKgr/HpP9yYYFqyDCTSYrErI32bjpPbSDKlFDiIKYBq 6mTptPlC6AJQcMJf3oV2SqUoQxI6Ea+04QaTtZwE5pDaTZsjD47QYfSyw/i+YwOr Ds11ayDeU3FSj8JVYDKFg5ZBifv/mIHbh1fb8xc4R5XCWsbRzIL9LiQa9c56EEOq vzXp57TIetbJdliK0cYQtPkA7R40us8TqVBH5MfcZPgITyBun3e0zrGxWmW6caTs viejEbqDhyHLHCing+mMI6GX7w16duq5oG+w4nnjjyuMzWAyNN2pxloqQsWwOyv8 7+33JZCtVG/txRMIXkvc3bqzetrUyPAruo+M3pstN7B2dph6TDV0QJSFnxee6mKf 4/zseNOJtQqjHe5QJNcVJtkDaxgGBkSONHLm5Gz8rFU3XKcNZQcocV+0EtIjE7Zs D5oDYCAyrxG1VKoFWhdaS883PDokRr75jcnFui4GhhFr5OAOdS3OOTLKVizWUag1 O11d9XsjnzLWiVTsZH+f4K0ONQcUwJFV0zADgYsXtU2LDHHNIPZX9+qSAa+L66hT Ki6hocoZ4cXyGWcTZPtlGHxAmV2kEh8/Tr1ePfwy7FzTrg9hWUGLXY0DliQDPmIB w3TdOa+Ghjl8dcaGc2rX =kSsY -END PGP SIGNATURE- -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: throttle delayed refs better
I'm a little confused of what "avg_delayed_ref_runtime" means. In __btrfs_run_delayed_refs(), "avg_delayed_ref_runtime" is set to the runtime of all delayed refs processed in current transaction commit. However, in btrfs_should_throttle_delayed_refs(), we based on the following condition to decide whether throttle refs or not: * avg_runtime = fs_info->avg_delayed_ref_runtime; if (num_entries * avg_runtime >= NSEC_PER_SEC) return 1; * It looks like "avg_delayed_ref_runtime" is used as runtime of each delayed ref processed in average here. So what does it really means? Thanks, Kai 2014-01-24 2:07 GMT+08:00 Josef Bacik : > On one of our gluster clusters we noticed some pretty big lag spikes. This > turned out to be because our transaction commit was taking like 3 minutes to > complete. This is because we have like 30 gigs of metadata, so our global > reserve would end up being the max which is like 512 mb. So our throttling > code > would allow a ridiculous amount of delayed refs to build up and then they'd > all > get run at transaction commit time, and for a cold mounted file system that > could take up to 3 minutes to run. So fix the throttling to be based on both > the size of the global reserve and how long it takes us to run delayed refs. > This patch tracks the time it takes to run delayed refs and then only allows 1 > seconds worth of outstanding delayed refs at a time. This way it will > auto-tune > itself from cold cache up to when everything is in memory and it no longer has > to go to disk. This makes our transaction commits take much less time to run. > Thanks, > > Signed-off-by: Josef Bacik > --- > fs/btrfs/ctree.h | 3 +++ > fs/btrfs/disk-io.c | 2 +- > fs/btrfs/extent-tree.c | 41 - > fs/btrfs/transaction.c | 4 ++-- > 4 files changed, 46 insertions(+), 4 deletions(-) > > diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h > index 3cebb4a..ca6bcc3 100644 > --- a/fs/btrfs/ctree.h > +++ b/fs/btrfs/ctree.h > @@ -1360,6 +1360,7 @@ struct btrfs_fs_info { > > u64 generation; > u64 last_trans_committed; > + u64 avg_delayed_ref_runtime; > > /* > * this is updated to the current trans every time a full commit > @@ -3172,6 +3173,8 @@ static inline u64 btrfs_calc_trunc_metadata_size(struct > btrfs_root *root, > > int btrfs_should_throttle_delayed_refs(struct btrfs_trans_handle *trans, >struct btrfs_root *root); > +int btrfs_check_space_for_delayed_refs(struct btrfs_trans_handle *trans, > + struct btrfs_root *root); > void btrfs_put_block_group(struct btrfs_block_group_cache *cache); > int btrfs_run_delayed_refs(struct btrfs_trans_handle *trans, >struct btrfs_root *root, unsigned long count); > diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c > index ed23127..f0e7bbe 100644 > --- a/fs/btrfs/disk-io.c > +++ b/fs/btrfs/disk-io.c > @@ -2185,7 +2185,7 @@ int open_ctree(struct super_block *sb, > fs_info->free_chunk_space = 0; > fs_info->tree_mod_log = RB_ROOT; > fs_info->commit_interval = BTRFS_DEFAULT_COMMIT_INTERVAL; > - > + fs_info->avg_delayed_ref_runtime = div64_u64(NSEC_PER_SEC, 64); > /* readahead state */ > INIT_RADIX_TREE(&fs_info->reada_tree, GFP_NOFS & ~__GFP_WAIT); > spin_lock_init(&fs_info->reada_lock); > diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c > index c77156c..b532259 100644 > --- a/fs/btrfs/extent-tree.c > +++ b/fs/btrfs/extent-tree.c > @@ -2322,8 +2322,10 @@ static noinline int __btrfs_run_delayed_refs(struct > btrfs_trans_handle *trans, > struct btrfs_delayed_ref_head *locked_ref = NULL; > struct btrfs_delayed_extent_op *extent_op; > struct btrfs_fs_info *fs_info = root->fs_info; > + ktime_t start = ktime_get(); > int ret; > unsigned long count = 0; > + unsigned long actual_count = 0; > int must_insert_reserved = 0; > > delayed_refs = &trans->transaction->delayed_refs; > @@ -2452,6 +2454,7 @@ static noinline int __btrfs_run_delayed_refs(struct > btrfs_trans_handle *trans, > &delayed_refs->href_root); > spin_unlock(&delayed_refs->lock); > } else { > + actual_count++; > ref->in_tree = 0; > rb_erase(&ref->rb_node, &locked_ref->ref_root); > } > @@ -2502,6 +2505,26 @@ static noinline int __btrfs_run_delayed_refs(struct > btrfs_trans_handle *trans, > count++; > cond_resched(); > } > + > + /* > +* We don't want to include ref heads since we can have empty ref > heads > +* and those will drastically skew our runtime down
[PATCH] Btrfs-progs: make sure to save mirror_num only if it is set
If we are cycling through all of the mirrors trying to find the best one we need to make sure we set best_mirror to an actual mirror number and not 0. Otherwise we could end up reading a mirror that wasn't the best and make everybody sad. Thanks, Signed-off-by: Josef Bacik --- disk-io.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/disk-io.c b/disk-io.c index e840177..0bd1bb0 100644 --- a/disk-io.c +++ b/disk-io.c @@ -297,7 +297,7 @@ struct extent_buffer *read_tree_block(struct btrfs_root *root, u64 bytenr, ignore = 1; continue; } - if (btrfs_header_generation(eb) > best_transid) { + if (btrfs_header_generation(eb) > best_transid && mirror_num) { best_transid = btrfs_header_generation(eb); good_mirror = mirror_num; } -- 1.8.3.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Btrfs-progs: record generation for tree blocks in fsck
When working with a user who had a broken file system I noticed that we were reading a bad copy of a block when the other copy was perfectly fine. This is because we don't keep track of the parent generation for tree blocks, so we just read whichever copy we damned well please with no regards for which is best. This fixes this problem by recording the parent generation of the tree block so we can be sure to read the most correct copy before we check it, which will give us a better chance of fixing really broken filesystems. Thanks, Signed-off-by: Josef Bacik --- cmds-check.c | 32 +--- 1 file changed, 25 insertions(+), 7 deletions(-) diff --git a/cmds-check.c b/cmds-check.c index 2911af0..2fc5253 100644 --- a/cmds-check.c +++ b/cmds-check.c @@ -98,6 +98,7 @@ struct extent_record { u64 refs; u64 extent_item_refs; u64 generation; + u64 parent_generation; u64 info_objectid; u64 num_duplicates; u8 info_level; @@ -2643,7 +2644,7 @@ static struct data_backref *alloc_data_backref(struct extent_record *rec, } static int add_extent_rec(struct cache_tree *extent_cache, - struct btrfs_key *parent_key, + struct btrfs_key *parent_key, u64 parent_gen, u64 start, u64 nr, u64 extent_item_refs, int is_root, int inc_ref, int set_checked, int metadata, int extent_rec, u64 max_size) @@ -2719,6 +2720,8 @@ static int add_extent_rec(struct cache_tree *extent_cache, if (parent_key) btrfs_cpu_key_to_disk(&rec->parent_key, parent_key); + if (parent_gen) + rec->parent_generation = parent_gen; if (rec->max_size < max_size) rec->max_size = max_size; @@ -2759,6 +2762,11 @@ static int add_extent_rec(struct cache_tree *extent_cache, else memset(&rec->parent_key, 0, sizeof(*parent_key)); + if (parent_gen) + rec->parent_generation = parent_gen; + else + rec->parent_generation = 0; + rec->cache.start = start; rec->cache.size = nr; ret = insert_cache_extent(extent_cache, &rec->cache); @@ -2780,7 +2788,7 @@ static int add_tree_backref(struct cache_tree *extent_cache, u64 bytenr, cache = lookup_cache_extent(extent_cache, bytenr, 1); if (!cache) { - add_extent_rec(extent_cache, NULL, bytenr, + add_extent_rec(extent_cache, NULL, 0, bytenr, 1, 0, 0, 0, 0, 1, 0, 0); cache = lookup_cache_extent(extent_cache, bytenr, 1); if (!cache) @@ -2828,7 +2836,7 @@ static int add_data_backref(struct cache_tree *extent_cache, u64 bytenr, cache = lookup_cache_extent(extent_cache, bytenr, 1); if (!cache) { - add_extent_rec(extent_cache, NULL, bytenr, 1, 0, 0, 0, 0, + add_extent_rec(extent_cache, NULL, 0, bytenr, 1, 0, 0, 0, 0, 0, 0, max_size); cache = lookup_cache_extent(extent_cache, bytenr, 1); if (!cache) @@ -3315,7 +3323,7 @@ static int process_extent_item(struct btrfs_root *root, #else BUG(); #endif - return add_extent_rec(extent_cache, NULL, key.objectid, + return add_extent_rec(extent_cache, NULL, 0, key.objectid, num_bytes, refs, 0, 0, 0, metadata, 1, num_bytes); } @@ -3323,7 +3331,7 @@ static int process_extent_item(struct btrfs_root *root, ei = btrfs_item_ptr(eb, slot, struct btrfs_extent_item); refs = btrfs_extent_refs(eb, ei); - add_extent_rec(extent_cache, NULL, key.objectid, num_bytes, + add_extent_rec(extent_cache, NULL, 0, key.objectid, num_bytes, refs, 0, 0, 0, metadata, 1, num_bytes); ptr = (unsigned long)(ei + 1); @@ -3836,6 +3844,7 @@ static int run_next_block(struct btrfs_trans_handle *trans, u64 owner; u64 flags; u64 ptr; + u64 gen = 0; int ret = 0; int i; int nritems; @@ -3885,8 +3894,16 @@ static int run_next_block(struct btrfs_trans_handle *trans, free(cache); } + cache = lookup_cache_extent(extent_cache, bytenr, size); + if (cache) { + struct extent_record *rec; + + rec = container_of(cache, struct extent_record, cache); + gen = rec->parent_generation; + } + /* fixme, get the real parent transid */ - buf = read_tree_block(root, bytenr, size, 0); + buf = read_tree_block(root, bytenr, size, gen); if (!extent_buffer_uptodate(buf)) { record_bad_block_io(root->fs_info, extent_cache, bytenr, size);
Re: Incremental backup over writable snapshot
@Kai, Thank you very much for your reply. Sorry, I just saw it now. I will take care of the mailing issue now, so that it does not happen again in the future. Sorry for the inconveniences! -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Incremental backup over writable snapshot
Does anyone have a technical info regarding the reliability of the incremental backup process using the said method? (Apart from all the recommendations not to do it that way) So the question I am interested in: Should it work or not? I did some testing myself and it seemed to work, however I cannot find out if it backs up unnecessary blocks and thus making the incremental step space inefficient. That information would help me very much! Thank you very much! On Wednesday 19 February 2014 14:45:57 GEO wrote: > Hi, > > As suggested in another thread, I would like to know the reliability of the > following backup scheme: > > Suppose I have a subvolume of my homedirectory called @home. > > Now I am interested in making incremental backups of data in home I am > interested in, but not everything, so I create a normal snapshot of @home > called @home-w and delete the files/folders I am not interested in backing > up. After that I create a readonly snapshot of @home-w called @home-r, that > I sent to my target volume with btrfs send. > > After that is done, I do regular backups, by always going over the writeable > snapshot where I remove always the same directories I am not interested and > send the difference to the target volume with btrfs send -p @home-r > @home-r-1| btrfs receive /path/of/target/volume. > > I do not like the idea of making subvolumes of all directories I am not > interested in backing up. > > So what I would like to know now is the following: Could there be drawbacks > of doing this resp. could I further optimize my backup strategy, as I > experienced it takes a while for deleting large files in the writeable > snapshot (What does it write there?) > > Could my method somehow lead to inefficiency in terms of the disk space used > at the target volume (I mean, could the deleting cause a change, so that > more is actually transferred as change, than in reality is?)? > > One last question would be: Is there a quick way I could verify the local > read only snapshot used last time is the same as the one synced to the > target volume last time? > > > Thank you for your support and the great work! -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: use btrfs_crc32c everywhere instead of libcrc32c
Hi, I am the Arch user who initially reported this problem to the AUR ( https://aur.archlinux.org/packages/linux-mainline/). 2014-02-27 13:43 GMT+01:00 Filipe David Manana : > On Wed, Feb 26, 2014 at 11:26 PM, WorMzy Tykashi > wrote: > > On 29 January 2014 21:06, Filipe David Borba Manana > wrote: > >> After the commit titled "Btrfs: fix btrfs boot when compiled as > built-in", > >> LIBCRC32C requirement was removed from btrfs' Kconfig. This made it not > >> possible to build a kernel with btrfs enabled (either as module or > built-in) > >> if libcrc32c is not enabled as well. So just replace all uses of > libcrc32c > >> with the equivalent function in btrfs hash.h - btrfs_crc32c. > >> > >> Signed-off-by: Filipe David Borba Manana > >> --- > >> fs/btrfs/check-integrity.c |4 ++-- > >> fs/btrfs/disk-io.c |4 ++-- > >> fs/btrfs/send.c|4 ++-- > >> 3 files changed, 6 insertions(+), 6 deletions(-) > >> > >> diff --git a/fs/btrfs/check-integrity.c b/fs/btrfs/check-integrity.c > >> index 160fb50..39bfd56 100644 > >> --- a/fs/btrfs/check-integrity.c > >> +++ b/fs/btrfs/check-integrity.c > >> @@ -92,11 +92,11 @@ > >> #include > >> #include > >> #include > >> -#include > >> #include > >> #include > >> #include "ctree.h" > >> #include "disk-io.h" > >> +#include "hash.h" > >> #include "transaction.h" > >> #include "extent_io.h" > >> #include "volumes.h" > >> @@ -1823,7 +1823,7 @@ static int btrfsic_test_for_metadata(struct > btrfsic_state *state, > >> size_t sublen = i ? PAGE_CACHE_SIZE : > >> (PAGE_CACHE_SIZE - BTRFS_CSUM_SIZE); > >> > >> - crc = crc32c(crc, data, sublen); > >> + crc = btrfs_crc32c(crc, data, sublen); > >> } > >> btrfs_csum_final(crc, csum); > >> if (memcmp(csum, h->csum, state->csum_size)) > >> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c > >> index 7619147..3903bd3 100644 > >> --- a/fs/btrfs/disk-io.c > >> +++ b/fs/btrfs/disk-io.c > >> @@ -26,7 +26,6 @@ > >> #include > >> #include > >> #include > >> -#include > >> #include > >> #include > >> #include > >> @@ -35,6 +34,7 @@ > >> #include > >> #include "ctree.h" > >> #include "disk-io.h" > >> +#include "hash.h" > >> #include "transaction.h" > >> #include "btrfs_inode.h" > >> #include "volumes.h" > >> @@ -244,7 +244,7 @@ out: > >> > >> u32 btrfs_csum_data(char *data, u32 seed, size_t len) > >> { > >> - return crc32c(seed, data, len); > >> + return btrfs_crc32c(seed, data, len); > >> } > >> > >> void btrfs_csum_final(u32 crc, char *result) > >> diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c > >> index 04c07ed..31b76d0 100644 > >> --- a/fs/btrfs/send.c > >> +++ b/fs/btrfs/send.c > >> @@ -24,12 +24,12 @@ > >> #include > >> #include > >> #include > >> -#include > >> #include > >> #include > >> > >> #include "send.h" > >> #include "backref.h" > >> +#include "hash.h" > >> #include "locking.h" > >> #include "disk-io.h" > >> #include "btrfs_inode.h" > >> @@ -620,7 +620,7 @@ static int send_cmd(struct send_ctx *sctx) > >> hdr->len = cpu_to_le32(sctx->send_size - sizeof(*hdr)); > >> hdr->crc = 0; > >> > >> - crc = crc32c(0, (unsigned char *)sctx->send_buf, > sctx->send_size); > >> + crc = btrfs_crc32c(0, (unsigned char *)sctx->send_buf, > sctx->send_size); > >> hdr->crc = cpu_to_le32(crc); > >> > >> ret = write_buf(sctx->send_filp, sctx->send_buf, > sctx->send_size, > >> -- > >> 1.7.9.5 > >> > >> -- > >> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" > in > >> the body of a message to majord...@vger.kernel.org > >> More majordomo info at http://vger.kernel.org/majordomo-info.html > > > > Hi, > > Hi > > > > > Ever since this patch was committed (git ref > > 0b947aff1599afbbd2ec07ada87b05af0f94cf10), the btrfs module > > (presumably intentionally) no longer depends on the crc32c module. > > To be more clear, it no longer depends on LIBCRC32C (which is just a > convenience library to access crypto's crc32c). > It still depends on CRYPTO and CRYPTO_CRC32C (which is what LIBCRC32C > uses). > > > However, this means that this module is not pulled in during initrd > > creation (at least using mkinitcpio on Arch Linux), and as a result, > > the btrfs module cannot be loaded. Instead modprobe complains with: > > "Unknown symbol in module, or unknown parameter (see dmesg)". > > That is weird. On debian creating the initrd via kernel's makefile > (make modules_install && make install) works for me (don't know if it > uses mkinitcpio or something else). > > > > > Unfortunately there is no accompanying message in dmesg, so I can't > > provide much more information. However, I have bisected the commit to > > confirm that this problem was introduced by this patch. The following > > is a grep of btrfs module's dependencies before and after this was > > committed: > > > > $ grep btrfs pkg/lib/
Re: [PATCH] Btrfs: use btrfs_crc32c everywhere instead of libcrc32c
On Wed, Feb 26, 2014 at 11:26 PM, WorMzy Tykashi wrote: > On 29 January 2014 21:06, Filipe David Borba Manana > wrote: >> After the commit titled "Btrfs: fix btrfs boot when compiled as built-in", >> LIBCRC32C requirement was removed from btrfs' Kconfig. This made it not >> possible to build a kernel with btrfs enabled (either as module or built-in) >> if libcrc32c is not enabled as well. So just replace all uses of libcrc32c >> with the equivalent function in btrfs hash.h - btrfs_crc32c. >> >> Signed-off-by: Filipe David Borba Manana >> --- >> fs/btrfs/check-integrity.c |4 ++-- >> fs/btrfs/disk-io.c |4 ++-- >> fs/btrfs/send.c|4 ++-- >> 3 files changed, 6 insertions(+), 6 deletions(-) >> >> diff --git a/fs/btrfs/check-integrity.c b/fs/btrfs/check-integrity.c >> index 160fb50..39bfd56 100644 >> --- a/fs/btrfs/check-integrity.c >> +++ b/fs/btrfs/check-integrity.c >> @@ -92,11 +92,11 @@ >> #include >> #include >> #include >> -#include >> #include >> #include >> #include "ctree.h" >> #include "disk-io.h" >> +#include "hash.h" >> #include "transaction.h" >> #include "extent_io.h" >> #include "volumes.h" >> @@ -1823,7 +1823,7 @@ static int btrfsic_test_for_metadata(struct >> btrfsic_state *state, >> size_t sublen = i ? PAGE_CACHE_SIZE : >> (PAGE_CACHE_SIZE - BTRFS_CSUM_SIZE); >> >> - crc = crc32c(crc, data, sublen); >> + crc = btrfs_crc32c(crc, data, sublen); >> } >> btrfs_csum_final(crc, csum); >> if (memcmp(csum, h->csum, state->csum_size)) >> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c >> index 7619147..3903bd3 100644 >> --- a/fs/btrfs/disk-io.c >> +++ b/fs/btrfs/disk-io.c >> @@ -26,7 +26,6 @@ >> #include >> #include >> #include >> -#include >> #include >> #include >> #include >> @@ -35,6 +34,7 @@ >> #include >> #include "ctree.h" >> #include "disk-io.h" >> +#include "hash.h" >> #include "transaction.h" >> #include "btrfs_inode.h" >> #include "volumes.h" >> @@ -244,7 +244,7 @@ out: >> >> u32 btrfs_csum_data(char *data, u32 seed, size_t len) >> { >> - return crc32c(seed, data, len); >> + return btrfs_crc32c(seed, data, len); >> } >> >> void btrfs_csum_final(u32 crc, char *result) >> diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c >> index 04c07ed..31b76d0 100644 >> --- a/fs/btrfs/send.c >> +++ b/fs/btrfs/send.c >> @@ -24,12 +24,12 @@ >> #include >> #include >> #include >> -#include >> #include >> #include >> >> #include "send.h" >> #include "backref.h" >> +#include "hash.h" >> #include "locking.h" >> #include "disk-io.h" >> #include "btrfs_inode.h" >> @@ -620,7 +620,7 @@ static int send_cmd(struct send_ctx *sctx) >> hdr->len = cpu_to_le32(sctx->send_size - sizeof(*hdr)); >> hdr->crc = 0; >> >> - crc = crc32c(0, (unsigned char *)sctx->send_buf, sctx->send_size); >> + crc = btrfs_crc32c(0, (unsigned char *)sctx->send_buf, >> sctx->send_size); >> hdr->crc = cpu_to_le32(crc); >> >> ret = write_buf(sctx->send_filp, sctx->send_buf, sctx->send_size, >> -- >> 1.7.9.5 >> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in >> the body of a message to majord...@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html > > Hi, Hi > > Ever since this patch was committed (git ref > 0b947aff1599afbbd2ec07ada87b05af0f94cf10), the btrfs module > (presumably intentionally) no longer depends on the crc32c module. To be more clear, it no longer depends on LIBCRC32C (which is just a convenience library to access crypto's crc32c). It still depends on CRYPTO and CRYPTO_CRC32C (which is what LIBCRC32C uses). > However, this means that this module is not pulled in during initrd > creation (at least using mkinitcpio on Arch Linux), and as a result, > the btrfs module cannot be loaded. Instead modprobe complains with: > "Unknown symbol in module, or unknown parameter (see dmesg)". That is weird. On debian creating the initrd via kernel's makefile (make modules_install && make install) works for me (don't know if it uses mkinitcpio or something else). > > Unfortunately there is no accompanying message in dmesg, so I can't > provide much more information. However, I have bisected the commit to > confirm that this problem was introduced by this patch. The following > is a grep of btrfs module's dependencies before and after this was > committed: > > $ grep btrfs pkg/lib/modules/3.13.0-ARCH-00150-g8101c8d/modules.dep > kernel/fs/btrfs/btrfs.ko: kernel/lib/raid6/raid6_pq.ko > kernel/lib/libcrc32c.ko kernel/crypto/xor.ko > > $ grep btrfs pkg/lib/modules/3.13.0-ARCH-00151-g0b947af/modules.dep > kernel/fs/btrfs/btrfs.ko: kernel/lib/raid6/raid6_pq.ko kernel/crypto/xor.ko > > As you can see, the dependency on kernel/lib/libcrc32c.ko was removed. Yep, it is intentional. > > However, if crc32c.ko is manually added to the
[PATCH 2/2 v3] Btrfs: check if directory has already been created smarter
Currently to check whether a directory has been created, we search DIR_INDEX items one by one to check if children has been processed. Try to picture such a scenario: . |-- dir(ino X) |-- foo_1(ino X+1) |-- ... |-- foo_k(ino X+k) With the current way, we have to check all the k DIR_INDEX items to find it is a fresh new one. So this introduced a rbtree to store those directories which are created out of order, and in the above case, we just need an O(log n) search instead of O(n) search. Signed-off-by: Liu Bo --- v3: fix typo, s/O(1)/O(n)/g, thanks Wang Shilong. v2: fix wrong patch name. fs/btrfs/send.c | 87 - 1 file changed, 43 insertions(+), 44 deletions(-) diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c index 33063d1..fcad93c 100644 --- a/fs/btrfs/send.c +++ b/fs/btrfs/send.c @@ -175,6 +175,9 @@ struct send_ctx { * own move/rename can be performed. */ struct rb_root waiting_dir_moves; + + /* directories which are created out of order, check did_create_dir() */ + struct rb_root out_of_order; }; struct pending_dir_move { @@ -2494,56 +2497,40 @@ out: */ static int did_create_dir(struct send_ctx *sctx, u64 dir) { - int ret = 0; - struct btrfs_path *path = NULL; - struct btrfs_key key; - struct btrfs_key found_key; - struct btrfs_key di_key; - struct extent_buffer *eb; - struct btrfs_dir_item *di; - int slot; + struct rb_node **p = &sctx->out_of_order.rb_node; + struct rb_node *parent = NULL; + struct send_dir_node *entry = NULL; + int cur_is_dir = !!(dir == sctx->cur_ino); - path = alloc_path_for_send(); - if (!path) { - ret = -ENOMEM; - goto out; - } + verbose_printk("dir=%llu cur_ino=%llu send_progress=%llu\n", +dir, sctx->cur_ino, sctx->send_progress); - key.objectid = dir; - key.type = BTRFS_DIR_INDEX_KEY; - key.offset = 0; - while (1) { - ret = btrfs_search_slot_for_read(sctx->send_root, &key, path, - 1, 0); - if (ret < 0) - goto out; - if (!ret) { - eb = path->nodes[0]; - slot = path->slots[0]; - btrfs_item_key_to_cpu(eb, &found_key, slot); - } - if (ret || found_key.objectid != key.objectid || - found_key.type != key.type) { - ret = 0; - goto out; + while (*p) { + parent = *p; + entry = rb_entry(parent, struct send_dir_node, node); + if (dir < entry->ino) { + p = &(*p)->rb_left; + } else if (dir > entry->ino) { + p = &(*p)->rb_right; + } else { + if (cur_is_dir) { + rb_erase(&entry->node, &sctx->out_of_order); + kfree(entry); + } + return 1; } + } - di = btrfs_item_ptr(eb, slot, struct btrfs_dir_item); - btrfs_dir_item_key_to_cpu(eb, di, &di_key); - - if (di_key.type != BTRFS_ROOT_ITEM_KEY && - di_key.objectid < sctx->send_progress) { - ret = 1; - goto out; - } + if (!cur_is_dir) { + entry = kmalloc(sizeof(*entry), GFP_NOFS); + if (!entry) + return -ENOMEM; + entry->ino = dir; - key.offset = found_key.offset + 1; - btrfs_release_path(path); + rb_link_node(&entry->node, parent, p); + rb_insert_color(&entry->node, &sctx->out_of_order); } - -out: - btrfs_free_path(path); - return ret; + return 0; } /* @@ -5340,6 +5327,7 @@ long btrfs_ioctl_send(struct file *mnt_file, void __user *arg_) sctx->pending_dir_moves = RB_ROOT; sctx->waiting_dir_moves = RB_ROOT; + sctx->out_of_order = RB_ROOT; sctx->clone_roots = vzalloc(sizeof(struct clone_root) * (arg->clone_sources_count + 1)); @@ -5477,6 +5465,17 @@ out: kfree(dm); } + WARN_ON(sctx && !ret && !RB_EMPTY_ROOT(&sctx->out_of_order)); + while (sctx && !RB_EMPTY_ROOT(&sctx->out_of_order)) { + struct rb_node *n; + struct send_dir_node *entry; + + n = rb_first(&sctx->out_of_order); + entry = rb_entry(n, struct send_dir_node, node); + rb_erase(&entry->node, &sctx->out_of_order); + kfree(entry); + } + if (sort_clone_roots) { for (i = 0; i < sctx->clone_roo
[PATCH] Btrfs: skip search tree for REG files
It is really unnecessary to search tree again for @gen, @mode and @rdev in the case of REG inodes' creation, as we've got btrfs_inode_item in sctx, and @gen, @mode and @rdev can easily be fetched. Signed-off-by: Liu Bo --- fs/btrfs/send.c | 19 +++ 1 file changed, 15 insertions(+), 4 deletions(-) diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c index 3609685..5b493e8 100644 --- a/fs/btrfs/send.c +++ b/fs/btrfs/send.c @@ -109,6 +109,7 @@ struct send_ctx { int cur_inode_deleted; u64 cur_inode_size; u64 cur_inode_mode; + u64 cur_inode_rdev; u64 cur_inode_last_extent; u64 send_progress; @@ -2432,10 +2433,16 @@ verbose_printk("btrfs: send_create_inode %llu\n", ino); if (!p) return -ENOMEM; - ret = get_inode_info(sctx->send_root, ino, NULL, &gen, &mode, NULL, - NULL, &rdev); - if (ret < 0) - goto out; + if (ino != sctx->cur_ino) { + ret = get_inode_info(sctx->send_root, ino, NULL, &gen, &mode, +NULL, NULL, &rdev); + if (ret < 0) + goto out; + } else { + gen = sctx->cur_inode_gen; + mode = sctx->cur_inode_mode; + rdev = sctx->cur_inode_rdev; + } if (S_ISREG(mode)) { cmd = BTRFS_SEND_C_MKFILE; @@ -4827,6 +4834,8 @@ static int changed_inode(struct send_ctx *sctx, sctx->left_path->nodes[0], left_ii); sctx->cur_inode_mode = btrfs_inode_mode( sctx->left_path->nodes[0], left_ii); + sctx->cur_inode_rdev = btrfs_inode_rdev( + sctx->left_path->nodes[0], left_ii); if (sctx->cur_ino != BTRFS_FIRST_FREE_OBJECTID) ret = send_create_inode_if_needed(sctx); } else if (result == BTRFS_COMPARE_TREE_DELETED) { @@ -4871,6 +4880,8 @@ static int changed_inode(struct send_ctx *sctx, sctx->left_path->nodes[0], left_ii); sctx->cur_inode_mode = btrfs_inode_mode( sctx->left_path->nodes[0], left_ii); + sctx->cur_inode_rdev = btrfs_inode_rdev( + sctx->left_path->nodes[0], left_ii); ret = send_create_inode_if_needed(sctx); if (ret < 0) goto out; -- 1.8.2.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2 v2] Btrfs: check if directory has already been created smarter
On Thu, Feb 27, 2014 at 04:01:23PM +0800, Wang Shilong wrote: > On 02/27/2014 03:47 PM, Liu Bo wrote: > >Currently to check whether a directory has been created, we search > >DIR_INDEX items one by one to check if children has been processed. > > > >Try to picture such a scenario: > >. > >|-- dir(ino X) > > |-- foo_1(ino X+1) > > |-- ... > > |-- foo_k(ino X+k) > > > >With the current way, we have to check all the k DIR_INDEX items > >to find it is a fresh new one. > > > >So this introduced a rbtree to store those directories which are > >created out of order, and in the above case, we just need an O(logn) > >search instead of O(1) search. > Just a reminder, we ususally call O(n) rather O(1) here. > If we falls O(1) to O(logn)..things are becoming worse~~ Good catch, my bad. thanks, -liubo -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2 v2] Btrfs: check if directory has already been created smarter
On 02/27/2014 03:47 PM, Liu Bo wrote: Currently to check whether a directory has been created, we search DIR_INDEX items one by one to check if children has been processed. Try to picture such a scenario: . |-- dir(ino X) |-- foo_1(ino X+1) |-- ... |-- foo_k(ino X+k) With the current way, we have to check all the k DIR_INDEX items to find it is a fresh new one. So this introduced a rbtree to store those directories which are created out of order, and in the above case, we just need an O(logn) search instead of O(1) search. Just a reminder, we ususally call O(n) rather O(1) here. If we falls O(1) to O(logn)..things are becoming worse~~ Thanks, Wang Signed-off-by: Liu Bo --- v2: fix wrong patch name. fs/btrfs/send.c | 87 - 1 file changed, 43 insertions(+), 44 deletions(-) diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c index 33063d1..fcad93c 100644 --- a/fs/btrfs/send.c +++ b/fs/btrfs/send.c @@ -175,6 +175,9 @@ struct send_ctx { * own move/rename can be performed. */ struct rb_root waiting_dir_moves; + + /* directories which are created out of order, check did_create_dir() */ + struct rb_root out_of_order; }; struct pending_dir_move { @@ -2494,56 +2497,40 @@ out: */ static int did_create_dir(struct send_ctx *sctx, u64 dir) { - int ret = 0; - struct btrfs_path *path = NULL; - struct btrfs_key key; - struct btrfs_key found_key; - struct btrfs_key di_key; - struct extent_buffer *eb; - struct btrfs_dir_item *di; - int slot; + struct rb_node **p = &sctx->out_of_order.rb_node; + struct rb_node *parent = NULL; + struct send_dir_node *entry = NULL; + int cur_is_dir = !!(dir == sctx->cur_ino); - path = alloc_path_for_send(); - if (!path) { - ret = -ENOMEM; - goto out; - } + verbose_printk("dir=%llu cur_ino=%llu send_progress=%llu\n", +dir, sctx->cur_ino, sctx->send_progress); - key.objectid = dir; - key.type = BTRFS_DIR_INDEX_KEY; - key.offset = 0; - while (1) { - ret = btrfs_search_slot_for_read(sctx->send_root, &key, path, - 1, 0); - if (ret < 0) - goto out; - if (!ret) { - eb = path->nodes[0]; - slot = path->slots[0]; - btrfs_item_key_to_cpu(eb, &found_key, slot); - } - if (ret || found_key.objectid != key.objectid || - found_key.type != key.type) { - ret = 0; - goto out; + while (*p) { + parent = *p; + entry = rb_entry(parent, struct send_dir_node, node); + if (dir < entry->ino) { + p = &(*p)->rb_left; + } else if (dir > entry->ino) { + p = &(*p)->rb_right; + } else { + if (cur_is_dir) { + rb_erase(&entry->node, &sctx->out_of_order); + kfree(entry); + } + return 1; } + } - di = btrfs_item_ptr(eb, slot, struct btrfs_dir_item); - btrfs_dir_item_key_to_cpu(eb, di, &di_key); - - if (di_key.type != BTRFS_ROOT_ITEM_KEY && - di_key.objectid < sctx->send_progress) { - ret = 1; - goto out; - } + if (!cur_is_dir) { + entry = kmalloc(sizeof(*entry), GFP_NOFS); + if (!entry) + return -ENOMEM; + entry->ino = dir; - key.offset = found_key.offset + 1; - btrfs_release_path(path); + rb_link_node(&entry->node, parent, p); + rb_insert_color(&entry->node, &sctx->out_of_order); } - -out: - btrfs_free_path(path); - return ret; + return 0; } /* @@ -5340,6 +5327,7 @@ long btrfs_ioctl_send(struct file *mnt_file, void __user *arg_) sctx->pending_dir_moves = RB_ROOT; sctx->waiting_dir_moves = RB_ROOT; + sctx->out_of_order = RB_ROOT; sctx->clone_roots = vzalloc(sizeof(struct clone_root) * (arg->clone_sources_count + 1)); @@ -5477,6 +5465,17 @@ out: kfree(dm); } + WARN_ON(sctx && !ret && !RB_EMPTY_ROOT(&sctx->out_of_order)); + while (sctx && !RB_EMPTY_ROOT(&sctx->out_of_order)) { + struct rb_node *n; + struct send_dir_node *entry; + + n = rb_first(&sctx->out_of_order); + entry = rb_entry(n, struct send_dir_node, node); + rb_erase(&entry->node, &sctx->out_of_order); + kfree(e