Re: Help with space

2014-02-27 Thread Justin Brown
Absolutely.  I'd like to know the answer to this, as 13 tera will take
a considerable amount of time to back up anywhere, assuming I find a
place.  I'm considering rebuilding a smaller raid with newer drives
(it was originally built using 16 250 gig western digital drives, it's
about eleven years old now, having been in use the entire time without
failure, I'm considering replacing each 250 gig with a 3 tera
alternative).  Unfortunately, between upgrading the host and building
a new raid the expense isn't something I'm anticipating with
pleasure...

On Fri, Feb 28, 2014 at 1:27 AM, Duncan <1i5t5.dun...@cox.net> wrote:
> Roman Mamedov posted on Fri, 28 Feb 2014 10:34:36 +0600 as excerpted:
>
>> But then as others mentioned it may be risky to use this FS on 32-bit at
>> all, so I'd suggest trying anything else only after you reboot into a
>> 64-bit kernel.
>
> Based on what I've read on-list, btrfs is not arch-agnostic, with certain
> on-disk sizes set to native kernel page size, etc, so a filesystem
> created on one arch may well not work on another.
>
> Question: Does this apply to x86/amd64?  Will a filesystem created/used
> on 32-bit x86 even mount/work on 64-bit amd64/x86_64, or does upgrading
> to 64-bit imply backing up (in this case) double-digit TiB of data to
> something other than btrfs and testing it, doing a mkfs on the original
> filesystem once in 64-bit mode, and restoring all that data from backup?
>
> If the existing 32-bit x86 btrfs can't be used on 64-bit amd64,
> transferring all that data (assuming there's something big enough
> available to transfer it to!) to backup and then restoring it is going to
> hurt!
>
> --
> Duncan - List replies preferred.   No HTML msgs.
> "Every nonfree program has a lord, a master --
> and if you use the program, he is your master."  Richard Stallman
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Help with space

2014-02-27 Thread Justin Brown
Apologies for the late reply, I'd assumed the issue was closed even
given the unusual behavior.  My mount options are:

/dev/sdb1 on /var/lib/nobody/fs/ubfterra type btrfs
(rw,noatime,nodatasum,nodatacow,noacl,space_cache,skip_balance)

I only recently added nodatacow and skip_balance in an attempt to
figure out where the missing space had gone.  I don't know what impact
it might have if any on things.  I've got a full balance running at
the moment which, after about a day or so, has managed to process 5%
of the chunks it's considering (988 out of about 18396 chunks balanced
(989 considered),  95% left).  The amount of free space has vacillated
slightly, growing by about a gig to shrink back.  As far as objects in
the file system missing, I've not seen any such.  I've a lot of files
of various data types, the majority is encoded japanese animation.
Since I actually play these files via samba from a htpc, particularly
the more recent additions, I'd hazard to guess that if something were
breaking I'd have tripped across it by now, the unusual used to free
space delta being the exception.  My brother also uses this raid for
data storage, he's something of a closet meteorologist and is
fascinated by tornadoes.  He hasn't noticed any unusual behavior
either.  I'm in the process of sourcing a 64 bit capable system in the
hopes that will resolve the issue.  Neither of us are currently
writing anything to the file system for fear of things breaking, but
both have been reading from it without issue other than the noticeable
impact in performance balance seems to be having.  Thanks for the
help.

-Justin


On Fri, Feb 28, 2014 at 12:26 AM, Chris Murphy  wrote:
>
> On Feb 27, 2014, at 11:13 PM, Chris Murphy  wrote:
>
>>
>> On Feb 27, 2014, at 11:19 AM, Justin Brown  wrote:
>>
>>> terra:/var/lib/nobody/fs/ubfterra # btrfs fi df .
>>> Data, single: total=17.58TiB, used=17.57TiB
>>> System, DUP: total=8.00MiB, used=1.93MiB
>>> System, single: total=4.00MiB, used=0.00
>>> Metadata, DUP: total=392.00GiB, used=33.50GiB
>>> Metadata, single: total=8.00MiB, used=0.00
>>
>> After glancing at this again, what I thought might be going on might not be 
>> going on. The fact it has 17+TB already used, not merely allocated, doesn't 
>> seem possible if there's a hard 16TB limit for Btrfs on 32-bit kernels.
>>
>> But then I don't know why du -h is reporting only 13T total used. And I'm 
>> unconvinced this is a balance issue either. Is anything obviously missing 
>> from the file system?
>
> What are your mount options? Maybe compression?
>
> Clearly du is calculating things differently. I'm getting:
>
> du -sch = 4.2G
> df -h= 5.4G
> btrfs df  = 4.7G data and 620MB metadata(total).
>
> I am using compress=lzo.
>
> Chris Murphy
>
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Help with space

2014-02-27 Thread Roman Mamedov
On Fri, 28 Feb 2014 07:27:06 + (UTC)
Duncan <1i5t5.dun...@cox.net> wrote:

> Based on what I've read on-list, btrfs is not arch-agnostic, with certain 
> on-disk sizes set to native kernel page size, etc, so a filesystem 
> created on one arch may well not work on another.
> 
> Question: Does this apply to x86/amd64?  Will a filesystem created/used 
> on 32-bit x86 even mount/work on 64-bit amd64/x86_64, or does upgrading 
> to 64-bit imply backing up (in this case) double-digit TiB of data to 
> something other than btrfs and testing it, doing a mkfs on the original 
> filesystem once in 64-bit mode, and restoring all that data from backup?

Page size (4K) is the same on both i386 and amd64. It's also the same on ARM.

Problem arises only on architectures like MIPS and PowerPC, some variants of
which use 16K or 64K page sizes.

Other than this page size issue, it has no arch-specific dependencies,  e.g.
no on-disk structures with "CPU-native integer" sized fields etc, that'd be too
crazy to be true.

-- 
With respect,
Roman


signature.asc
Description: PGP signature


Re: Help with space

2014-02-27 Thread Duncan
Roman Mamedov posted on Fri, 28 Feb 2014 10:34:36 +0600 as excerpted:

> But then as others mentioned it may be risky to use this FS on 32-bit at
> all, so I'd suggest trying anything else only after you reboot into a
> 64-bit kernel.

Based on what I've read on-list, btrfs is not arch-agnostic, with certain 
on-disk sizes set to native kernel page size, etc, so a filesystem 
created on one arch may well not work on another.

Question: Does this apply to x86/amd64?  Will a filesystem created/used 
on 32-bit x86 even mount/work on 64-bit amd64/x86_64, or does upgrading 
to 64-bit imply backing up (in this case) double-digit TiB of data to 
something other than btrfs and testing it, doing a mkfs on the original 
filesystem once in 64-bit mode, and restoring all that data from backup?

If the existing 32-bit x86 btrfs can't be used on 64-bit amd64, 
transferring all that data (assuming there's something big enough 
available to transfer it to!) to backup and then restoring it is going to 
hurt!

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Incremental backup over writable snapshot

2014-02-27 Thread Duncan
GEO posted on Thu, 27 Feb 2014 14:10:25 +0100 as excerpted:

> Does anyone have a technical info regarding the reliability of the
> incremental backup process using the said method?

Stepping back from your specific method for a moment...

You're using btrfs send/receive, which I wouldn't exactly call entirely 
reliable ATM -- just look at all patches going by on the list to fix it 
up ATM.  In theory it should /get/ there, but it's very much in flux at 
this moment; certainly nothing I'd personally rely on here.  Btrfs itself 
is still only semi-stable, and that's one of the more advanced and 
currently least likely to work without errors features.  (Tho raid5/6 
mode is worse, since from all I've read send/receive should at least fail 
up-front if it's going to fail, while raid5/6 will currently look like 
it's working... until you actually need the raid5/6 redundancy and btrfs 
data integrity mode aspects!)

>From what I've read, *IF* the send/receive process completes without 
errors it should make a reasonably reliable backup.  The problem is that 
there's a lot of error-triggering corner-cases ATM, and given your 
definitely non-standard use-case, I expect your chances of running into 
such errors is higher than normal.  But if send/receive /does/ complete 
without errors, AFAIK it should be a reliable replication.

Meanwhile, over time those corner-cases should be worked out, and I've 
seen nothing in your use-case that says it /shouldn't/ work, once send/
receive itself is working reliably.  Your use-case may be an odd corner-
case, but it should either work or not, and once btrfs send/receive is 
working reliably, based on all I've read both from you and on the list in 
general, your case too should work reliably. =:^)

But for the moment, unless you're aim is to be a guinea pig working 
closely with the devs to test an interesting corner-case and report 
problems so they can be traced and fixed, I'd suggest using some other 
method.  Give btrfs send/receive, and the filesystem as a whole, another 
six months or a year to mature and stabilize, and AFAIK your suggested 
method might not be the most efficient or recommended way to do things 
for the reasons others have given, but it should none-the-less work.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Help with space

2014-02-27 Thread Chris Murphy

On Feb 27, 2014, at 11:13 PM, Chris Murphy  wrote:

> 
> On Feb 27, 2014, at 11:19 AM, Justin Brown  wrote:
> 
>> terra:/var/lib/nobody/fs/ubfterra # btrfs fi df .
>> Data, single: total=17.58TiB, used=17.57TiB
>> System, DUP: total=8.00MiB, used=1.93MiB
>> System, single: total=4.00MiB, used=0.00
>> Metadata, DUP: total=392.00GiB, used=33.50GiB
>> Metadata, single: total=8.00MiB, used=0.00
> 
> After glancing at this again, what I thought might be going on might not be 
> going on. The fact it has 17+TB already used, not merely allocated, doesn't 
> seem possible if there's a hard 16TB limit for Btrfs on 32-bit kernels.
> 
> But then I don't know why du -h is reporting only 13T total used. And I'm 
> unconvinced this is a balance issue either. Is anything obviously missing 
> from the file system?

What are your mount options? Maybe compression?

Clearly du is calculating things differently. I'm getting:

du -sch = 4.2G
df -h= 5.4G
btrfs df  = 4.7G data and 620MB metadata(total).

I am using compress=lzo.

Chris Murphy

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Help with space

2014-02-27 Thread Chris Murphy

On Feb 27, 2014, at 11:19 AM, Justin Brown  wrote:

> terra:/var/lib/nobody/fs/ubfterra # btrfs fi df .
> Data, single: total=17.58TiB, used=17.57TiB
> System, DUP: total=8.00MiB, used=1.93MiB
> System, single: total=4.00MiB, used=0.00
> Metadata, DUP: total=392.00GiB, used=33.50GiB
> Metadata, single: total=8.00MiB, used=0.00

After glancing at this again, what I thought might be going on might not be 
going on. The fact it has 17+TB already used, not merely allocated, doesn't 
seem possible if there's a hard 16TB limit for Btrfs on 32-bit kernels.

But then I don't know why du -h is reporting only 13T total used. And I'm 
unconvinced this is a balance issue either. Is anything obviously missing from 
the file system?


Chris Murphy

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Help with space

2014-02-27 Thread Chris Murphy

On Feb 27, 2014, at 9:21 PM, Dave Chinner  wrote:
>> 
>> http://lists.centos.org/pipermail/centos/2011-April/109142.html
> 
> 
> 
> No, he didn't fill it with 16TB of data and then have it fail. He
> made a new filesystem *larger* than 16TB and tried to mount it:
> 
> | On a CentOS 32-bit backup server with a 17TB LVM logical volume on
> | EMC storage.  Worked great, until it rolled 16TB.  Then it quit
> | working.  Altogether.  /var/log/messages told me that the
> | filesystem was too large to be mounted. Had to re-image the VM as
> | a 64-bit CentOS, and then re-attached the RDM's to the LUNs
> | holding the PV's for the LV, and it mounted instantly, and we
> | kept on trucking.
> 
> This just backs up what I told you originally - that XFS has always
> refused to mount >16TB filesystems on 32 bit systems.

That isn't how I read that at all. It was a 17TB LV, working great (i.e. 
mounted) until it was filled with 16TB, then it quite working and could not 
subsequently be mounted until put on a 64-bit kernel.

I don't see how it's "working great" if it's not mountable.



> 
>>> I said that it was limited on XFS, not that the limit was a
>>> result of a user making a filesystem too large and then finding
>>> out it didn't work. Indeed, you can't do that on XFS - mkfs will
>>> refuse to run on a block device it can't access the last block
>>> on, and the kernel has the same "can I access the last block of
>>> the filesystem" sanity checks that are run at mount and growfs
>>> time.
>> 
>> Nope. What I reported on the XFS list, I had used mkfs.xfs while
>> running 32bit kernel on a 20TB virtual disk. It did not fail to
>> make the file system, it failed only to mount it.
> 
> You said no such thing. All you said was you couldn't mount a
> filesystem > 16TB - you made no mention of how you made the fs, what
> the block device was or any other details.

All correct. It wasn't intended as a bug report, it seemed normal. What I 
reported = the mount failure.

VBox 25TB VDI as a single block device, as well as 5x 5TB VDIs in an 20TB 
linear LV, as well as a 100TB virtual size LV using LVM thinp - all can be 
formatted with default mkfs.xfs with no complaints.

3.13.4-200.fc20.i686+PAE
xfsprogs-3.1.11-2.fc20.i686


> 
>> It was the same
>> booted virtual machine, I created the file system and immediately
>> mounted it. If you want the specifics, I'll post on the XFS list
>> with versions and reproduce steps.
> 
> Did you check to see whether the block device silently wrapped at
> 16TB? There's a real good chance it did - but you might have got
> lucky because mkfs.xfs uses direct IO and *maybe* that works
> correctly on block devices on 32 bit systems. I wouldn't bet on it,
> though, given it's something we don't support and therefore never
> test….

I did not check to see if any of the block devices silently wrapped, I don't 
know how to do that although I have a strace of the mkfs on the 100TB virtual 
LV here:

https://dl.dropboxusercontent.com/u/3253801/mkfsxfs32bit100TBvLV.txt


Chris Murphy--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG: >16TB Btrfs volumes are mountable on 32 bit kernels

2014-02-27 Thread Dave Chinner
On Thu, Feb 27, 2014 at 04:07:06PM -0500, Josef Bacik wrote:
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
> 
> On 02/27/2014 04:05 PM, Chris Murphy wrote:
> > User reports successfully formatting and using an ~18TB Btrfs
> > volume on hardware raid5 using i686 kernel for over a year, and
> > then suddenly the file system starts behaving weirdly:
> > 
> > https://urldefense.proofpoint.com/v1/url?u=http://www.mail-archive.com/linux-btrfs%40vger.kernel.org/msg31856.html&k=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0A&r=cKCbChRKsMpTX8ybrSkonQ%3D%3D%0A&m=6eUt5RgBggFh930oFrH19iR4z%2BFVzT%2F0%2F4dYPt3g48U%3D%0A&s=5ac126734d7fa1d3238ab09a2ddc021a8dcc8fff7b022560a4d068be2de37c00
> >
> > 
> > 
> > I think this is due to the kernel page cache address space being
> > 16TB limited on 32-bit kernels, as mentioned by Dave Chinner in
> > this thread:
> > 
> > https://urldefense.proofpoint.com/v1/url?u=http://oss.sgi.com/pipermail/xfs/2014-February/034588.html&k=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0A&r=cKCbChRKsMpTX8ybrSkonQ%3D%3D%0A&m=6eUt5RgBggFh930oFrH19iR4z%2BFVzT%2F0%2F4dYPt3g48U%3D%0A&s=3e45f9288e6a77bc1a24dded368802c2ab46b812bf59953f74d4ee1d4141f7d2
> >
> >  So it sounds like it shouldn't be possible to mount a Btrfs volume
> > larger than 16TB on 32-bit kernels. This is consistent with ext4
> > and XFS which refuse to mount large file systems.
> > 
> > 
> 
> Well that's not good, I'll fix this up.  Thanks,

Well, don't go assuming there's a problem just because I made an
off-hand comment. i.e my comment was simply "maybe it hasn't been
tested", and not an assertion that there is a bug or a problem

Cheers,

Dave.

-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Help with space

2014-02-27 Thread Roman Mamedov
On Thu, 27 Feb 2014 12:19:05 -0600
Justin Brown  wrote:

> I've a 18 tera hardware raid 5 (areca ARC-1170 w/ 8 3 gig drives) in

Do you sleep well at night knowing that if one disk fails, you end up with
basically a RAID0 of 7x3TB disks? And that if 2nd one encounters unreadable
sector during rebuild, you lost your data? RAID5 actually stopped working 5
years ago, apparently you didn't get the memo. :)
http://hardware.slashdot.org/story/08/10/21/2126252/why-raid-5-stops-working-in-2009

> need of help.  Disk usage (du) shows 13 tera allocated yet strangely
> enough df shows approx. 780 gigs are free.  It seems, somehow, btrfs
> has eaten roughly 4 tera internally.  I've run a scrub and a balance
> usage=5 with no success, in fact I lost about 20 gigs after the

Did you run balance with "-dusage=5" or "-musage=5"? Or both?
What is the output of the balance command?

> terra:/var/lib/nobody/fs/ubfterra # btrfs fi df .
> Data, single: total=17.58TiB, used=17.57TiB
> System, DUP: total=8.00MiB, used=1.93MiB
> System, single: total=4.00MiB, used=0.00
> Metadata, DUP: total=392.00GiB, used=33.50GiB
   ^

If you'd use "-musage=5", I think this metadata reserve should have been
shrunk, and you'd gain a lot more free space.

But then as others mentioned it may be risky to use this FS on 32-bit at all,
so I'd suggest trying anything else only after you reboot into a 64-bit kernel.

-- 
With respect,
Roman


signature.asc
Description: PGP signature


Re: Help with space

2014-02-27 Thread Dave Chinner
On Thu, Feb 27, 2014 at 05:27:48PM -0700, Chris Murphy wrote:
> 
> On Feb 27, 2014, at 5:12 PM, Dave Chinner 
> wrote:
> 
> > On Thu, Feb 27, 2014 at 02:11:19PM -0700, Chris Murphy wrote:
> >> 
> >> On Feb 27, 2014, at 1:49 PM, otakujunct...@gmail.com wrote:
> >> 
> >>> Yes it's an ancient 32 bit machine.  There must be a complex
> >>> bug involved as the system, when originally mounted, claimed
> >>> the correct free space and only as used over time did the
> >>> discrepancy between used and free grow.  I'm afraid I chose
> >>> btrfs because it appeared capable of breaking the 16 tera
> >>> limit on a 32 bit system.  If this isn't the case then it's
> >>> incredible that I've been using this file system for about a
> >>> year without difficulty until now.
> >> 
> >> Yep, it's not a good bug. This happened some years ago on XFS
> >> too, where people would use the file system for a long time and
> >> then at 16TB+1byte written to the volume, kablewy! And then it
> >> wasn't usable at all, until put on a 64-bit kernel.
> >> 
> >> http://oss.sgi.com/pipermail/xfs/2014-February/034588.html
> > 
> > Well, no, that's not what I said.
> 
> What are you thinking I said you said? I wasn't quoting or
> paraphrasing anything you've said above. I had done a google
> search on this early and found some rather old threads where some
> people had this experience of making a large file system on a
> 32-bit kernel, and only after filling it beyond 16TB did they run
> into the problem. Here is one of them:
> 
> http://lists.centos.org/pipermail/centos/2011-April/109142.html



No, he didn't fill it with 16TB of data and then have it fail. He
made a new filesystem *larger* than 16TB and tried to mount it:

| On a CentOS 32-bit backup server with a 17TB LVM logical volume on
| EMC storage.  Worked great, until it rolled 16TB.  Then it quit
| working.  Altogether.  /var/log/messages told me that the
| filesystem was too large to be mounted. Had to re-image the VM as
| a 64-bit CentOS, and then re-attached the RDM's to the LUNs
| holding the PV's for the LV, and it mounted instantly, and we
| kept on trucking.

This just backs up what I told you originally - that XFS has always
refused to mount >16TB filesystems on 32 bit systems.

> > I said that it was limited on XFS, not that the limit was a
> > result of a user making a filesystem too large and then finding
> > out it didn't work. Indeed, you can't do that on XFS - mkfs will
> > refuse to run on a block device it can't access the last block
> > on, and the kernel has the same "can I access the last block of
> > the filesystem" sanity checks that are run at mount and growfs
> > time.
> 
> Nope. What I reported on the XFS list, I had used mkfs.xfs while
> running 32bit kernel on a 20TB virtual disk. It did not fail to
> make the file system, it failed only to mount it.

You said no such thing. All you said was you couldn't mount a
filesystem > 16TB - you made no mention of how you made the fs, what
the block device was or any other details.

> It was the same
> booted virtual machine, I created the file system and immediately
> mounted it. If you want the specifics, I'll post on the XFS list
> with versions and reproduce steps.

Did you check to see whether the block device silently wrapped at
16TB? There's a real good chance it did - but you might have got
lucky because mkfs.xfs uses direct IO and *maybe* that works
correctly on block devices on 32 bit systems. I wouldn't bet on it,
though, given it's something we don't support and therefore never
test

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v5 11/18] btrfs: Replace fs_info->cache_workers workqueue with btrfs_workqueue.

2014-02-27 Thread Qu Wenruo
Replace the fs_info->cache_workers with the newly created
btrfs_workqueue.

Signed-off-by: Qu Wenruo 
Tested-by: David Sterba 
---
Changelog:
v1->v2:
  None
v2->v3:
  - Use the btrfs_workqueue_struct to replace submit_workers.
v3->v4:
  - Use the simplified btrfs_alloc_workqueue API.
v4->v5:
  None
---
 fs/btrfs/ctree.h   |  4 ++--
 fs/btrfs/disk-io.c | 10 +-
 fs/btrfs/extent-tree.c |  6 +++---
 fs/btrfs/super.c   |  2 +-
 4 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index a7b0bdd..06a64fb 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1221,7 +1221,7 @@ struct btrfs_caching_control {
struct list_head list;
struct mutex mutex;
wait_queue_head_t wait;
-   struct btrfs_work work;
+   struct btrfs_work_struct work;
struct btrfs_block_group_cache *block_group;
u64 progress;
atomic_t count;
@@ -1516,7 +1516,7 @@ struct btrfs_fs_info {
struct btrfs_workqueue_struct *endio_write_workers;
struct btrfs_workqueue_struct *endio_freespace_worker;
struct btrfs_workqueue_struct *submit_workers;
-   struct btrfs_workers caching_workers;
+   struct btrfs_workqueue_struct *caching_workers;
struct btrfs_workers readahead_workers;
 
/*
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 12586b1..391cadf 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2003,7 +2003,7 @@ static void btrfs_stop_all_workers(struct btrfs_fs_info 
*fs_info)
btrfs_destroy_workqueue(fs_info->endio_freespace_worker);
btrfs_destroy_workqueue(fs_info->submit_workers);
btrfs_stop_workers(&fs_info->delayed_workers);
-   btrfs_stop_workers(&fs_info->caching_workers);
+   btrfs_destroy_workqueue(fs_info->caching_workers);
btrfs_stop_workers(&fs_info->readahead_workers);
btrfs_destroy_workqueue(fs_info->flush_workers);
btrfs_stop_workers(&fs_info->qgroup_rescan_workers);
@@ -2481,8 +2481,8 @@ int open_ctree(struct super_block *sb,
fs_info->flush_workers =
btrfs_alloc_workqueue("flush_delalloc", flags, max_active, 0);
 
-   btrfs_init_workers(&fs_info->caching_workers, "cache",
-  fs_info->thread_pool_size, NULL);
+   fs_info->caching_workers =
+   btrfs_alloc_workqueue("cache", flags, max_active, 0);
 
/*
 * a higher idle thresh on the submit workers makes it much more
@@ -2533,7 +2533,6 @@ int open_ctree(struct super_block *sb,
ret = btrfs_start_workers(&fs_info->generic_worker);
ret |= btrfs_start_workers(&fs_info->fixup_workers);
ret |= btrfs_start_workers(&fs_info->delayed_workers);
-   ret |= btrfs_start_workers(&fs_info->caching_workers);
ret |= btrfs_start_workers(&fs_info->readahead_workers);
ret |= btrfs_start_workers(&fs_info->qgroup_rescan_workers);
if (ret) {
@@ -2545,7 +2544,8 @@ int open_ctree(struct super_block *sb,
  fs_info->endio_workers && fs_info->endio_meta_workers &&
  fs_info->endio_meta_write_workers &&
  fs_info->endio_write_workers && fs_info->endio_raid56_workers &&
- fs_info->endio_freespace_worker && fs_info->rmw_workers)) {
+ fs_info->endio_freespace_worker && fs_info->rmw_workers &&
+ fs_info->caching_workers)) {
err = -ENOMEM;
goto fail_sb_buffer;
}
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 32312e0..bb58082 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -378,7 +378,7 @@ static u64 add_new_free_space(struct 
btrfs_block_group_cache *block_group,
return total_added;
 }
 
-static noinline void caching_thread(struct btrfs_work *work)
+static noinline void caching_thread(struct btrfs_work_struct *work)
 {
struct btrfs_block_group_cache *block_group;
struct btrfs_fs_info *fs_info;
@@ -549,7 +549,7 @@ static int cache_block_group(struct btrfs_block_group_cache 
*cache,
caching_ctl->block_group = cache;
caching_ctl->progress = cache->key.objectid;
atomic_set(&caching_ctl->count, 1);
-   caching_ctl->work.func = caching_thread;
+   btrfs_init_work(&caching_ctl->work, caching_thread, NULL, NULL);
 
spin_lock(&cache->lock);
/*
@@ -640,7 +640,7 @@ static int cache_block_group(struct btrfs_block_group_cache 
*cache,
 
btrfs_get_block_group(cache);
 
-   btrfs_queue_worker(&fs_info->caching_workers, &caching_ctl->work);
+   btrfs_queue_work(fs_info->caching_workers, &caching_ctl->work);
 
return ret;
 }
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 919eb36..cd52e20 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -1320,7 +1320,7 @@ static void btrfs_resize_thread_pool(struct btrfs_fs_info 
*fs_info,
btrfs_workqueue_set_max(fs_info->workers, new_pool_size);
btrfs_work

[PATCH v5 08/18] btrfs: Replace fs_info->flush_workers with btrfs_workqueue.

2014-02-27 Thread Qu Wenruo
Replace the fs_info->submit_workers with the newly created
btrfs_workqueue.

Signed-off-by: Qu Wenruo 
Tested-by: David Sterba 
---
Changelog:
v1->v2:
  None
v2->v3:
  - Use the btrfs_workqueue_struct to replace submit_workers.
v3->v4:
  - Use the simplified btrfs_alloc_workqueue API.
v4->v5:
  None
---
 fs/btrfs/ctree.h|  4 ++--
 fs/btrfs/disk-io.c  | 10 --
 fs/btrfs/inode.c|  8 
 fs/btrfs/ordered-data.c | 13 +++--
 fs/btrfs/ordered-data.h |  2 +-
 5 files changed, 18 insertions(+), 19 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 9af6804..f1377c9 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1507,7 +1507,7 @@ struct btrfs_fs_info {
struct btrfs_workers generic_worker;
struct btrfs_workqueue_struct *workers;
struct btrfs_workqueue_struct *delalloc_workers;
-   struct btrfs_workers flush_workers;
+   struct btrfs_workqueue_struct *flush_workers;
struct btrfs_workers endio_workers;
struct btrfs_workers endio_meta_workers;
struct btrfs_workers endio_raid56_workers;
@@ -3677,7 +3677,7 @@ struct btrfs_delalloc_work {
int delay_iput;
struct completion completion;
struct list_head list;
-   struct btrfs_work work;
+   struct btrfs_work_struct work;
 };
 
 struct btrfs_delalloc_work *btrfs_alloc_delalloc_work(struct inode *inode,
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 8b118ed..772fa39 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2006,7 +2006,7 @@ static void btrfs_stop_all_workers(struct btrfs_fs_info 
*fs_info)
btrfs_stop_workers(&fs_info->delayed_workers);
btrfs_stop_workers(&fs_info->caching_workers);
btrfs_stop_workers(&fs_info->readahead_workers);
-   btrfs_stop_workers(&fs_info->flush_workers);
+   btrfs_destroy_workqueue(fs_info->flush_workers);
btrfs_stop_workers(&fs_info->qgroup_rescan_workers);
 }
 
@@ -2479,9 +2479,8 @@ int open_ctree(struct super_block *sb,
fs_info->delalloc_workers =
btrfs_alloc_workqueue("delalloc", flags, max_active, 2);
 
-   btrfs_init_workers(&fs_info->flush_workers, "flush_delalloc",
-  fs_info->thread_pool_size, NULL);
-
+   fs_info->flush_workers =
+   btrfs_alloc_workqueue("flush_delalloc", flags, max_active, 0);
 
btrfs_init_workers(&fs_info->caching_workers, "cache",
   fs_info->thread_pool_size, NULL);
@@ -2556,14 +2555,13 @@ int open_ctree(struct super_block *sb,
ret |= btrfs_start_workers(&fs_info->delayed_workers);
ret |= btrfs_start_workers(&fs_info->caching_workers);
ret |= btrfs_start_workers(&fs_info->readahead_workers);
-   ret |= btrfs_start_workers(&fs_info->flush_workers);
ret |= btrfs_start_workers(&fs_info->qgroup_rescan_workers);
if (ret) {
err = -ENOMEM;
goto fail_sb_buffer;
}
if (!(fs_info->workers && fs_info->delalloc_workers &&
- fs_info->submit_workers)) {
+ fs_info->submit_workers && fs_info->flush_workers)) {
err = -ENOMEM;
goto fail_sb_buffer;
}
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 01cfe99..7627b60 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -8372,7 +8372,7 @@ out_notrans:
return ret;
 }
 
-static void btrfs_run_delalloc_work(struct btrfs_work *work)
+static void btrfs_run_delalloc_work(struct btrfs_work_struct *work)
 {
struct btrfs_delalloc_work *delalloc_work;
struct inode *inode;
@@ -8410,7 +8410,7 @@ struct btrfs_delalloc_work 
*btrfs_alloc_delalloc_work(struct inode *inode,
work->inode = inode;
work->wait = wait;
work->delay_iput = delay_iput;
-   work->work.func = btrfs_run_delalloc_work;
+   btrfs_init_work(&work->work, btrfs_run_delalloc_work, NULL, NULL);
 
return work;
 }
@@ -8462,8 +8462,8 @@ static int __start_delalloc_inodes(struct btrfs_root 
*root, int delay_iput)
goto out;
}
list_add_tail(&work->list, &works);
-   btrfs_queue_worker(&root->fs_info->flush_workers,
-  &work->work);
+   btrfs_queue_work(root->fs_info->flush_workers,
+&work->work);
 
cond_resched();
spin_lock(&root->delalloc_lock);
diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c
index 138a7d7..6fa8219 100644
--- a/fs/btrfs/ordered-data.c
+++ b/fs/btrfs/ordered-data.c
@@ -576,7 +576,7 @@ void btrfs_remove_ordered_extent(struct inode *inode,
wake_up(&entry->wait);
 }
 
-static void btrfs_run_ordered_extent_work(struct btrfs_work *work)
+static void btrfs_run_ordered_extent_work(struct btrfs_work_struct *work)
 {
struct btrfs_ordered_extent *ordered;
 
@@ -609,10 +609,11 @@ int btrfs_wait_o

[PATCH v5 05/18] btrfs: Replace fs_info->workers with btrfs_workqueue.

2014-02-27 Thread Qu Wenruo
Use the newly created btrfs_workqueue_struct to replace the original
fs_info->workers

Signed-off-by: Qu Wenruo 
Tested-by: David Sterba 
---
Changelog:
v1->v2:
  None
v2->v3:
  None
v3->v4:
  - Use the simplified btrfs_alloc_workqueue API.
v4->v5:
  None
---
 fs/btrfs/ctree.h   |  2 +-
 fs/btrfs/disk-io.c | 41 +
 fs/btrfs/super.c   |  2 +-
 3 files changed, 23 insertions(+), 22 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index dac6653..448df5e 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1505,7 +1505,7 @@ struct btrfs_fs_info {
 * two
 */
struct btrfs_workers generic_worker;
-   struct btrfs_workers workers;
+   struct btrfs_workqueue_struct *workers;
struct btrfs_workers delalloc_workers;
struct btrfs_workers flush_workers;
struct btrfs_workers endio_workers;
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index cc1b423..4040a43 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -108,7 +108,7 @@ struct async_submit_bio {
 * can't tell us where in the file the bio should go
 */
u64 bio_offset;
-   struct btrfs_work work;
+   struct btrfs_work_struct work;
int error;
 };
 
@@ -738,12 +738,12 @@ int btrfs_bio_wq_end_io(struct btrfs_fs_info *info, 
struct bio *bio,
 unsigned long btrfs_async_submit_limit(struct btrfs_fs_info *info)
 {
unsigned long limit = min_t(unsigned long,
-   info->workers.max_workers,
+   info->thread_pool_size,
info->fs_devices->open_devices);
return 256 * limit;
 }
 
-static void run_one_async_start(struct btrfs_work *work)
+static void run_one_async_start(struct btrfs_work_struct *work)
 {
struct async_submit_bio *async;
int ret;
@@ -756,7 +756,7 @@ static void run_one_async_start(struct btrfs_work *work)
async->error = ret;
 }
 
-static void run_one_async_done(struct btrfs_work *work)
+static void run_one_async_done(struct btrfs_work_struct *work)
 {
struct btrfs_fs_info *fs_info;
struct async_submit_bio *async;
@@ -783,7 +783,7 @@ static void run_one_async_done(struct btrfs_work *work)
   async->bio_offset);
 }
 
-static void run_one_async_free(struct btrfs_work *work)
+static void run_one_async_free(struct btrfs_work_struct *work)
 {
struct async_submit_bio *async;
 
@@ -811,11 +811,9 @@ int btrfs_wq_submit_bio(struct btrfs_fs_info *fs_info, 
struct inode *inode,
async->submit_bio_start = submit_bio_start;
async->submit_bio_done = submit_bio_done;
 
-   async->work.func = run_one_async_start;
-   async->work.ordered_func = run_one_async_done;
-   async->work.ordered_free = run_one_async_free;
+   btrfs_init_work(&async->work, run_one_async_start,
+   run_one_async_done, run_one_async_free);
 
-   async->work.flags = 0;
async->bio_flags = bio_flags;
async->bio_offset = bio_offset;
 
@@ -824,9 +822,9 @@ int btrfs_wq_submit_bio(struct btrfs_fs_info *fs_info, 
struct inode *inode,
atomic_inc(&fs_info->nr_async_submits);
 
if (rw & REQ_SYNC)
-   btrfs_set_work_high_prio(&async->work);
+   btrfs_set_work_high_priority(&async->work);
 
-   btrfs_queue_worker(&fs_info->workers, &async->work);
+   btrfs_queue_work(fs_info->workers, &async->work);
 
while (atomic_read(&fs_info->async_submit_draining) &&
  atomic_read(&fs_info->nr_async_submits)) {
@@ -1996,7 +1994,7 @@ static void btrfs_stop_all_workers(struct btrfs_fs_info 
*fs_info)
btrfs_stop_workers(&fs_info->generic_worker);
btrfs_stop_workers(&fs_info->fixup_workers);
btrfs_stop_workers(&fs_info->delalloc_workers);
-   btrfs_stop_workers(&fs_info->workers);
+   btrfs_destroy_workqueue(fs_info->workers);
btrfs_stop_workers(&fs_info->endio_workers);
btrfs_stop_workers(&fs_info->endio_meta_workers);
btrfs_stop_workers(&fs_info->endio_raid56_workers);
@@ -2100,6 +2098,8 @@ int open_ctree(struct super_block *sb,
int err = -EINVAL;
int num_backups_tried = 0;
int backup_index = 0;
+   int max_active;
+   int flags = WQ_MEM_RECLAIM | WQ_FREEZABLE | WQ_UNBOUND;
bool create_uuid_tree;
bool check_uuid_tree;
 
@@ -2468,12 +2468,13 @@ int open_ctree(struct super_block *sb,
goto fail_alloc;
}
 
+   max_active = fs_info->thread_pool_size;
btrfs_init_workers(&fs_info->generic_worker,
   "genwork", 1, NULL);
 
-   btrfs_init_workers(&fs_info->workers, "worker",
-  fs_info->thread_pool_size,
-  &fs_info->generic_worker);
+   fs_info->workers =
+   btrfs_alloc_workqueue("worker", flags | WQ_HIGHPRI,
+ 

[PATCH v5 09/18] btrfs: Replace fs_info->endio_* workqueue with btrfs_workqueue.

2014-02-27 Thread Qu Wenruo
Replace the fs_info->endio_* workqueues with the newly created
btrfs_workqueue.

Signed-off-by: Qu Wenruo 
Tested-by: David Sterba 
---
Changelog:
v1->v2:
  None
v2->v3:
  - Use the btrfs_workqueue_struct to replace submit_workers.
v3->v4:
  - Use the simplified btrfs_alloc_workqueue API.
v4->v5:
  None
---
 fs/btrfs/ctree.h|  12 +++---
 fs/btrfs/disk-io.c  | 104 +---
 fs/btrfs/inode.c|  20 +-
 fs/btrfs/ordered-data.h |   2 +-
 fs/btrfs/super.c|  11 ++---
 5 files changed, 68 insertions(+), 81 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index f1377c9..3db87da 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1508,13 +1508,13 @@ struct btrfs_fs_info {
struct btrfs_workqueue_struct *workers;
struct btrfs_workqueue_struct *delalloc_workers;
struct btrfs_workqueue_struct *flush_workers;
-   struct btrfs_workers endio_workers;
-   struct btrfs_workers endio_meta_workers;
-   struct btrfs_workers endio_raid56_workers;
+   struct btrfs_workqueue_struct *endio_workers;
+   struct btrfs_workqueue_struct *endio_meta_workers;
+   struct btrfs_workqueue_struct *endio_raid56_workers;
struct btrfs_workers rmw_workers;
-   struct btrfs_workers endio_meta_write_workers;
-   struct btrfs_workers endio_write_workers;
-   struct btrfs_workers endio_freespace_worker;
+   struct btrfs_workqueue_struct *endio_meta_write_workers;
+   struct btrfs_workqueue_struct *endio_write_workers;
+   struct btrfs_workqueue_struct *endio_freespace_worker;
struct btrfs_workqueue_struct *submit_workers;
struct btrfs_workers caching_workers;
struct btrfs_workers readahead_workers;
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 772fa39..28b303c 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -55,7 +55,7 @@
 #endif
 
 static struct extent_io_ops btree_extent_io_ops;
-static void end_workqueue_fn(struct btrfs_work *work);
+static void end_workqueue_fn(struct btrfs_work_struct *work);
 static void free_fs_root(struct btrfs_root *root);
 static int btrfs_check_super_valid(struct btrfs_fs_info *fs_info,
int read_only);
@@ -86,7 +86,7 @@ struct end_io_wq {
int error;
int metadata;
struct list_head list;
-   struct btrfs_work work;
+   struct btrfs_work_struct work;
 };
 
 /*
@@ -678,32 +678,31 @@ static void end_workqueue_bio(struct bio *bio, int err)
 
fs_info = end_io_wq->info;
end_io_wq->error = err;
-   end_io_wq->work.func = end_workqueue_fn;
-   end_io_wq->work.flags = 0;
+   btrfs_init_work(&end_io_wq->work, end_workqueue_fn, NULL, NULL);
 
if (bio->bi_rw & REQ_WRITE) {
if (end_io_wq->metadata == BTRFS_WQ_ENDIO_METADATA)
-   btrfs_queue_worker(&fs_info->endio_meta_write_workers,
-  &end_io_wq->work);
+   btrfs_queue_work(fs_info->endio_meta_write_workers,
+&end_io_wq->work);
else if (end_io_wq->metadata == BTRFS_WQ_ENDIO_FREE_SPACE)
-   btrfs_queue_worker(&fs_info->endio_freespace_worker,
-  &end_io_wq->work);
+   btrfs_queue_work(fs_info->endio_freespace_worker,
+&end_io_wq->work);
else if (end_io_wq->metadata == BTRFS_WQ_ENDIO_RAID56)
-   btrfs_queue_worker(&fs_info->endio_raid56_workers,
-  &end_io_wq->work);
+   btrfs_queue_work(fs_info->endio_raid56_workers,
+&end_io_wq->work);
else
-   btrfs_queue_worker(&fs_info->endio_write_workers,
-  &end_io_wq->work);
+   btrfs_queue_work(fs_info->endio_write_workers,
+&end_io_wq->work);
} else {
if (end_io_wq->metadata == BTRFS_WQ_ENDIO_RAID56)
-   btrfs_queue_worker(&fs_info->endio_raid56_workers,
-  &end_io_wq->work);
+   btrfs_queue_work(fs_info->endio_raid56_workers,
+&end_io_wq->work);
else if (end_io_wq->metadata)
-   btrfs_queue_worker(&fs_info->endio_meta_workers,
-  &end_io_wq->work);
+   btrfs_queue_work(fs_info->endio_meta_workers,
+&end_io_wq->work);
else
-   btrfs_queue_worker(&fs_info->endio_workers,
-  &end_io_wq->work);
+   btrfs_queue_work(fs_info->

[PATCH v5 04/18] btrfs: Add threshold workqueue based on kernel workqueue

2014-02-27 Thread Qu Wenruo
The original btrfs_workers has thresholding functions to dynamically
create or destroy kthreads.

Though there is no such function in kernel workqueue because the worker
is not created manually, we can still use the workqueue_set_max_active
to simulated the behavior, mainly to achieve a better HDD performance by
setting a high threshold on submit_workers.
(Sadly, no resource can be saved)

So in this patch, extra workqueue pending counters are introduced to
dynamically change the max active of each btrfs_workqueue_struct, hoping
to restore the behavior of the original thresholding function.

Also, workqueue_set_max_active use a mutex to protect workqueue_struct,
which is not meant to be called too frequently, so a new interval
mechanism is applied, that will only call workqueue_set_max_active after
a count of work is queued. Hoping to balance both the random and
sequence performance on HDD.

Signed-off-by: Qu Wenruo 
Tested-by: David Sterba 
---
Changelog:
v2->v3:
  - Add thresholding mechanism to simulate the old thresholding mechanism.
  - Will not enable thresholding when thresh is set to small value.
v3->v4:
  None
v4->v5:
  None
---
 fs/btrfs/async-thread.c | 107 
 fs/btrfs/async-thread.h |   3 +-
 2 files changed, 101 insertions(+), 9 deletions(-)

diff --git a/fs/btrfs/async-thread.c b/fs/btrfs/async-thread.c
index 193c849..977bce2 100644
--- a/fs/btrfs/async-thread.c
+++ b/fs/btrfs/async-thread.c
@@ -30,6 +30,9 @@
 #define WORK_ORDER_DONE_BIT 2
 #define WORK_HIGH_PRIO_BIT 3
 
+#define NO_THRESHOLD (-1)
+#define DFT_THRESHOLD (32)
+
 /*
  * container for the kthread task pointer and the list of pending work
  * One of these is allocated per thread.
@@ -737,6 +740,14 @@ struct __btrfs_workqueue_struct {
 
/* Spinlock for ordered_list */
spinlock_t list_lock;
+
+   /* Thresholding related variants */
+   atomic_t pending;
+   int max_active;
+   int current_max;
+   int thresh;
+   unsigned int count;
+   spinlock_t thres_lock;
 };
 
 struct btrfs_workqueue_struct {
@@ -745,19 +756,34 @@ struct btrfs_workqueue_struct {
 };
 
 static inline struct __btrfs_workqueue_struct
-*__btrfs_alloc_workqueue(char *name, int flags, int max_active)
+*__btrfs_alloc_workqueue(char *name, int flags, int max_active, int thresh)
 {
struct __btrfs_workqueue_struct *ret = kzalloc(sizeof(*ret), GFP_NOFS);
 
if (unlikely(!ret))
return NULL;
 
+   ret->max_active = max_active;
+   atomic_set(&ret->pending, 0);
+   if (thresh == 0)
+   thresh = DFT_THRESHOLD;
+   /* For low threshold, disabling threshold is a better choice */
+   if (thresh < DFT_THRESHOLD) {
+   ret->current_max = max_active;
+   ret->thresh = NO_THRESHOLD;
+   } else {
+   ret->current_max = 1;
+   ret->thresh = thresh;
+   }
+
if (flags & WQ_HIGHPRI)
ret->normal_wq = alloc_workqueue("%s-%s-high", flags,
-max_active, "btrfs", name);
+ret->max_active,
+"btrfs", name);
else
ret->normal_wq = alloc_workqueue("%s-%s", flags,
-max_active, "btrfs", name);
+ret->max_active, "btrfs",
+name);
if (unlikely(!ret->normal_wq)) {
kfree(ret);
return NULL;
@@ -765,6 +791,7 @@ static inline struct __btrfs_workqueue_struct
 
INIT_LIST_HEAD(&ret->ordered_list);
spin_lock_init(&ret->list_lock);
+   spin_lock_init(&ret->thres_lock);
return ret;
 }
 
@@ -773,7 +800,8 @@ __btrfs_destroy_workqueue(struct __btrfs_workqueue_struct 
*wq);
 
 struct btrfs_workqueue_struct *btrfs_alloc_workqueue(char *name,
 int flags,
-int max_active)
+int max_active,
+int thresh)
 {
struct btrfs_workqueue_struct *ret = kzalloc(sizeof(*ret), GFP_NOFS);
 
@@ -781,14 +809,15 @@ struct btrfs_workqueue_struct *btrfs_alloc_workqueue(char 
*name,
return NULL;
 
ret->normal = __btrfs_alloc_workqueue(name, flags & ~WQ_HIGHPRI,
- max_active);
+ max_active, thresh);
if (unlikely(!ret->normal)) {
kfree(ret);
return NULL;
}
 
if (flags & WQ_HIGHPRI) {
-   ret->high = __btrfs_alloc_workqueue(name, flags, max_active);
+   ret->high = __btrfs_alloc_workqueue(name, flags, max_active,
+ 

[PATCH v5 13/18] btrfs: Replace fs_info->fixup_workers workqueue with btrfs_workqueue.

2014-02-27 Thread Qu Wenruo
Replace the fs_info->fixup_workers with the newly created
btrfs_workqueue.

Signed-off-by: Qu Wenruo 
Tested-by: David Sterba 
---
Changelog:
v1->v2:
  None
v2->v3:
  - Use the btrfs_workqueue_struct to replace submit_workers.
v3->v4:
  - Use the simplified btrfs_alloc_workqueue API.
v4->v5:
  None
---
 fs/btrfs/ctree.h   |  2 +-
 fs/btrfs/disk-io.c | 10 +-
 fs/btrfs/inode.c   |  8 
 fs/btrfs/super.c   |  1 -
 4 files changed, 10 insertions(+), 11 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 3d6f490..95a1e66 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1524,7 +1524,7 @@ struct btrfs_fs_info {
 * the cow mechanism and make them safe to write.  It happens
 * for the sys_munmap function call path
 */
-   struct btrfs_workers fixup_workers;
+   struct btrfs_workqueue_struct *fixup_workers;
struct btrfs_workers delayed_workers;
struct task_struct *transaction_kthread;
struct task_struct *cleaner_kthread;
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index ca6d0cf..4da34df 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -1991,7 +1991,7 @@ static noinline int next_root_backup(struct btrfs_fs_info 
*info,
 static void btrfs_stop_all_workers(struct btrfs_fs_info *fs_info)
 {
btrfs_stop_workers(&fs_info->generic_worker);
-   btrfs_stop_workers(&fs_info->fixup_workers);
+   btrfs_destroy_workqueue(fs_info->fixup_workers);
btrfs_destroy_workqueue(fs_info->delalloc_workers);
btrfs_destroy_workqueue(fs_info->workers);
btrfs_destroy_workqueue(fs_info->endio_workers);
@@ -2494,8 +2494,8 @@ int open_ctree(struct super_block *sb,
  min_t(u64, fs_devices->num_devices,
max_active), 64);
 
-   btrfs_init_workers(&fs_info->fixup_workers, "fixup", 1,
-  &fs_info->generic_worker);
+   fs_info->fixup_workers =
+   btrfs_alloc_workqueue("fixup", flags, 1, 0);
 
/*
 * endios are largely parallel and should have a very
@@ -2528,7 +2528,6 @@ int open_ctree(struct super_block *sb,
 * return -ENOMEM if any of these fail.
 */
ret = btrfs_start_workers(&fs_info->generic_worker);
-   ret |= btrfs_start_workers(&fs_info->fixup_workers);
ret |= btrfs_start_workers(&fs_info->delayed_workers);
ret |= btrfs_start_workers(&fs_info->qgroup_rescan_workers);
if (ret) {
@@ -2541,7 +2540,8 @@ int open_ctree(struct super_block *sb,
  fs_info->endio_meta_write_workers &&
  fs_info->endio_write_workers && fs_info->endio_raid56_workers &&
  fs_info->endio_freespace_worker && fs_info->rmw_workers &&
- fs_info->caching_workers && fs_info->readahead_workers)) {
+ fs_info->caching_workers && fs_info->readahead_workers &&
+ fs_info->fixup_workers)) {
err = -ENOMEM;
goto fail_sb_buffer;
}
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 4023c90..81395d6 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -1748,10 +1748,10 @@ int btrfs_set_extent_delalloc(struct inode *inode, u64 
start, u64 end,
 /* see btrfs_writepage_start_hook for details on why this is required */
 struct btrfs_writepage_fixup {
struct page *page;
-   struct btrfs_work work;
+   struct btrfs_work_struct work;
 };
 
-static void btrfs_writepage_fixup_worker(struct btrfs_work *work)
+static void btrfs_writepage_fixup_worker(struct btrfs_work_struct *work)
 {
struct btrfs_writepage_fixup *fixup;
struct btrfs_ordered_extent *ordered;
@@ -1842,9 +1842,9 @@ static int btrfs_writepage_start_hook(struct page *page, 
u64 start, u64 end)
 
SetPageChecked(page);
page_cache_get(page);
-   fixup->work.func = btrfs_writepage_fixup_worker;
+   btrfs_init_work(&fixup->work, btrfs_writepage_fixup_worker, NULL, NULL);
fixup->page = page;
-   btrfs_queue_worker(&root->fs_info->fixup_workers, &fixup->work);
+   btrfs_queue_work(root->fs_info->fixup_workers, &fixup->work);
return -EBUSY;
 }
 
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 56c5533..3614053 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -1321,7 +1321,6 @@ static void btrfs_resize_thread_pool(struct btrfs_fs_info 
*fs_info,
btrfs_workqueue_set_max(fs_info->delalloc_workers, new_pool_size);
btrfs_workqueue_set_max(fs_info->submit_workers, new_pool_size);
btrfs_workqueue_set_max(fs_info->caching_workers, new_pool_size);
-   btrfs_set_max_workers(&fs_info->fixup_workers, new_pool_size);
btrfs_workqueue_set_max(fs_info->endio_workers, new_pool_size);
btrfs_workqueue_set_max(fs_info->endio_meta_workers, new_pool_size);
btrfs_workqueue_set_max(fs_info->endio_meta_write_workers,
-- 
1.9.0

--
To unsubscribe from this list: send the lin

[PATCH v5 06/18] btrfs: Replace fs_info->delalloc_workers with btrfs_workqueue

2014-02-27 Thread Qu Wenruo
Much like the fs_info->workers, replace the fs_info->delalloc_workers
use the same btrfs_workqueue.

Signed-off-by: Qu Wenruo 
Tested-by: David Sterba 
---
Changelog:
v1->v2:
  None
v2->v3:
  None
v3->v4:
  - Use the simplified btrfs_alloc_workqueue API.
v4->v5:
  None
---
 fs/btrfs/ctree.h   |  2 +-
 fs/btrfs/disk-io.c | 12 
 fs/btrfs/inode.c   | 18 --
 fs/btrfs/super.c   |  2 +-
 4 files changed, 14 insertions(+), 20 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 448df5e..4e11f4b 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1506,7 +1506,7 @@ struct btrfs_fs_info {
 */
struct btrfs_workers generic_worker;
struct btrfs_workqueue_struct *workers;
-   struct btrfs_workers delalloc_workers;
+   struct btrfs_workqueue_struct *delalloc_workers;
struct btrfs_workers flush_workers;
struct btrfs_workers endio_workers;
struct btrfs_workers endio_meta_workers;
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 4040a43..f97bd17 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -1993,7 +1993,7 @@ static void btrfs_stop_all_workers(struct btrfs_fs_info 
*fs_info)
 {
btrfs_stop_workers(&fs_info->generic_worker);
btrfs_stop_workers(&fs_info->fixup_workers);
-   btrfs_stop_workers(&fs_info->delalloc_workers);
+   btrfs_destroy_workqueue(fs_info->delalloc_workers);
btrfs_destroy_workqueue(fs_info->workers);
btrfs_stop_workers(&fs_info->endio_workers);
btrfs_stop_workers(&fs_info->endio_meta_workers);
@@ -2476,8 +2476,8 @@ int open_ctree(struct super_block *sb,
btrfs_alloc_workqueue("worker", flags | WQ_HIGHPRI,
  max_active, 16);
 
-   btrfs_init_workers(&fs_info->delalloc_workers, "delalloc",
-  fs_info->thread_pool_size, NULL);
+   fs_info->delalloc_workers =
+   btrfs_alloc_workqueue("delalloc", flags, max_active, 2);
 
btrfs_init_workers(&fs_info->flush_workers, "flush_delalloc",
   fs_info->thread_pool_size, NULL);
@@ -2495,9 +2495,6 @@ int open_ctree(struct super_block *sb,
 */
fs_info->submit_workers.idle_thresh = 64;
 
-   fs_info->delalloc_workers.idle_thresh = 2;
-   fs_info->delalloc_workers.ordered = 1;
-
btrfs_init_workers(&fs_info->fixup_workers, "fixup", 1,
   &fs_info->generic_worker);
btrfs_init_workers(&fs_info->endio_workers, "endio",
@@ -2548,7 +2545,6 @@ int open_ctree(struct super_block *sb,
 */
ret = btrfs_start_workers(&fs_info->generic_worker);
ret |= btrfs_start_workers(&fs_info->submit_workers);
-   ret |= btrfs_start_workers(&fs_info->delalloc_workers);
ret |= btrfs_start_workers(&fs_info->fixup_workers);
ret |= btrfs_start_workers(&fs_info->endio_workers);
ret |= btrfs_start_workers(&fs_info->endio_meta_workers);
@@ -2566,7 +2562,7 @@ int open_ctree(struct super_block *sb,
err = -ENOMEM;
goto fail_sb_buffer;
}
-   if (!(fs_info->workers)) {
+   if (!(fs_info->workers && fs_info->delalloc_workers)) {
err = -ENOMEM;
goto fail_sb_buffer;
}
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 197edee..01cfe99 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -324,7 +324,7 @@ struct async_cow {
u64 start;
u64 end;
struct list_head extents;
-   struct btrfs_work work;
+   struct btrfs_work_struct work;
 };
 
 static noinline int add_async_extent(struct async_cow *cow,
@@ -1000,7 +1000,7 @@ out_unlock:
 /*
  * work queue call back to started compression on a file and pages
  */
-static noinline void async_cow_start(struct btrfs_work *work)
+static noinline void async_cow_start(struct btrfs_work_struct *work)
 {
struct async_cow *async_cow;
int num_added = 0;
@@ -1018,7 +1018,7 @@ static noinline void async_cow_start(struct btrfs_work 
*work)
 /*
  * work queue call back to submit previously compressed pages
  */
-static noinline void async_cow_submit(struct btrfs_work *work)
+static noinline void async_cow_submit(struct btrfs_work_struct *work)
 {
struct async_cow *async_cow;
struct btrfs_root *root;
@@ -1039,7 +1039,7 @@ static noinline void async_cow_submit(struct btrfs_work 
*work)
submit_compressed_extents(async_cow->inode, async_cow);
 }
 
-static noinline void async_cow_free(struct btrfs_work *work)
+static noinline void async_cow_free(struct btrfs_work_struct *work)
 {
struct async_cow *async_cow;
async_cow = container_of(work, struct async_cow, work);
@@ -1076,17 +1076,15 @@ static int cow_file_range_async(struct inode *inode, 
struct page *locked_page,
async_cow->end = cur_end;
INIT_LIST_HEAD(&async_cow->extents);
 
-   async_cow->work.func 

[PATCH v5 07/18] btrfs: Replace fs_info->submit_workers with btrfs_workqueue.

2014-02-27 Thread Qu Wenruo
Much like the fs_info->workers, replace the fs_info->submit_workers
use the same btrfs_workqueue.

Signed-off-by: Qu Wenruo 
Tested-by: David Sterba 
---
Changelog:
v1->v2:
  None
v2->v3:
  None
v3->v4:
  - Use the simplified btrfs_alloc_workqueue API.
v4->v5:
  None
---
 fs/btrfs/ctree.h   |  2 +-
 fs/btrfs/disk-io.c | 17 +
 fs/btrfs/super.c   |  2 +-
 fs/btrfs/volumes.c | 11 ++-
 fs/btrfs/volumes.h |  2 +-
 5 files changed, 18 insertions(+), 16 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 4e11f4b..9af6804 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1515,7 +1515,7 @@ struct btrfs_fs_info {
struct btrfs_workers endio_meta_write_workers;
struct btrfs_workers endio_write_workers;
struct btrfs_workers endio_freespace_worker;
-   struct btrfs_workers submit_workers;
+   struct btrfs_workqueue_struct *submit_workers;
struct btrfs_workers caching_workers;
struct btrfs_workers readahead_workers;
 
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index f97bd17..8b118ed 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2002,7 +2002,7 @@ static void btrfs_stop_all_workers(struct btrfs_fs_info 
*fs_info)
btrfs_stop_workers(&fs_info->endio_meta_write_workers);
btrfs_stop_workers(&fs_info->endio_write_workers);
btrfs_stop_workers(&fs_info->endio_freespace_worker);
-   btrfs_stop_workers(&fs_info->submit_workers);
+   btrfs_destroy_workqueue(fs_info->submit_workers);
btrfs_stop_workers(&fs_info->delayed_workers);
btrfs_stop_workers(&fs_info->caching_workers);
btrfs_stop_workers(&fs_info->readahead_workers);
@@ -2482,18 +2482,19 @@ int open_ctree(struct super_block *sb,
btrfs_init_workers(&fs_info->flush_workers, "flush_delalloc",
   fs_info->thread_pool_size, NULL);
 
-   btrfs_init_workers(&fs_info->submit_workers, "submit",
-  min_t(u64, fs_devices->num_devices,
-  fs_info->thread_pool_size), NULL);
 
btrfs_init_workers(&fs_info->caching_workers, "cache",
   fs_info->thread_pool_size, NULL);
 
-   /* a higher idle thresh on the submit workers makes it much more
+   /*
+* a higher idle thresh on the submit workers makes it much more
 * likely that bios will be send down in a sane order to the
 * devices
 */
-   fs_info->submit_workers.idle_thresh = 64;
+   fs_info->submit_workers =
+   btrfs_alloc_workqueue("submit", flags,
+ min_t(u64, fs_devices->num_devices,
+   max_active), 64);
 
btrfs_init_workers(&fs_info->fixup_workers, "fixup", 1,
   &fs_info->generic_worker);
@@ -2544,7 +2545,6 @@ int open_ctree(struct super_block *sb,
 * return -ENOMEM if any of these fail.
 */
ret = btrfs_start_workers(&fs_info->generic_worker);
-   ret |= btrfs_start_workers(&fs_info->submit_workers);
ret |= btrfs_start_workers(&fs_info->fixup_workers);
ret |= btrfs_start_workers(&fs_info->endio_workers);
ret |= btrfs_start_workers(&fs_info->endio_meta_workers);
@@ -2562,7 +2562,8 @@ int open_ctree(struct super_block *sb,
err = -ENOMEM;
goto fail_sb_buffer;
}
-   if (!(fs_info->workers && fs_info->delalloc_workers)) {
+   if (!(fs_info->workers && fs_info->delalloc_workers &&
+ fs_info->submit_workers)) {
err = -ENOMEM;
goto fail_sb_buffer;
}
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index e164d13..2d69b6d 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -1319,7 +1319,7 @@ static void btrfs_resize_thread_pool(struct btrfs_fs_info 
*fs_info,
btrfs_set_max_workers(&fs_info->generic_worker, new_pool_size);
btrfs_workqueue_set_max(fs_info->workers, new_pool_size);
btrfs_workqueue_set_max(fs_info->delalloc_workers, new_pool_size);
-   btrfs_set_max_workers(&fs_info->submit_workers, new_pool_size);
+   btrfs_workqueue_set_max(fs_info->submit_workers, new_pool_size);
btrfs_set_max_workers(&fs_info->caching_workers, new_pool_size);
btrfs_set_max_workers(&fs_info->fixup_workers, new_pool_size);
btrfs_set_max_workers(&fs_info->endio_workers, new_pool_size);
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 82a63b1..0066cff 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -415,7 +415,8 @@ loop_lock:
device->running_pending = 1;
 
spin_unlock(&device->io_lock);
-   btrfs_requeue_work(&device->work);
+   btrfs_queue_work(fs_info->submit_workers,
+&device->work);
goto done;
}
/* u

[PATCH v5 14/18] btrfs: Replace fs_info->delayed_workers workqueue with btrfs_workqueue.

2014-02-27 Thread Qu Wenruo
Replace the fs_info->delayed_workers with the newly created
btrfs_workqueue.

Signed-off-by: Qu Wenruo 
Tested-by: David Sterba 
---
Changelog:
v1->v2:
  None
v2->v3:
  - Use the btrfs_workqueue_struct to replace submit_workers.
v3->v4:
  - Use the simplified btrfs_alloc_workqueue API.
v4->v5:
  None
---
 fs/btrfs/ctree.h |  2 +-
 fs/btrfs/delayed-inode.c | 10 +-
 fs/btrfs/disk-io.c   | 10 --
 fs/btrfs/super.c |  2 +-
 4 files changed, 11 insertions(+), 13 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 95a1e66..07b563d 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1525,7 +1525,7 @@ struct btrfs_fs_info {
 * for the sys_munmap function call path
 */
struct btrfs_workqueue_struct *fixup_workers;
-   struct btrfs_workers delayed_workers;
+   struct btrfs_workqueue_struct *delayed_workers;
struct task_struct *transaction_kthread;
struct task_struct *cleaner_kthread;
int thread_pool_size;
diff --git a/fs/btrfs/delayed-inode.c b/fs/btrfs/delayed-inode.c
index 451b00c..76e85d6 100644
--- a/fs/btrfs/delayed-inode.c
+++ b/fs/btrfs/delayed-inode.c
@@ -1318,10 +1318,10 @@ void btrfs_remove_delayed_node(struct inode *inode)
 struct btrfs_async_delayed_work {
struct btrfs_delayed_root *delayed_root;
int nr;
-   struct btrfs_work work;
+   struct btrfs_work_struct work;
 };
 
-static void btrfs_async_run_delayed_root(struct btrfs_work *work)
+static void btrfs_async_run_delayed_root(struct btrfs_work_struct *work)
 {
struct btrfs_async_delayed_work *async_work;
struct btrfs_delayed_root *delayed_root;
@@ -1392,11 +1392,11 @@ static int btrfs_wq_run_delayed_node(struct 
btrfs_delayed_root *delayed_root,
return -ENOMEM;
 
async_work->delayed_root = delayed_root;
-   async_work->work.func = btrfs_async_run_delayed_root;
-   async_work->work.flags = 0;
+   btrfs_init_work(&async_work->work, btrfs_async_run_delayed_root,
+   NULL, NULL);
async_work->nr = nr;
 
-   btrfs_queue_worker(&root->fs_info->delayed_workers, &async_work->work);
+   btrfs_queue_work(root->fs_info->delayed_workers, &async_work->work);
return 0;
 }
 
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 4da34df..ac8e9c2 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2002,7 +2002,7 @@ static void btrfs_stop_all_workers(struct btrfs_fs_info 
*fs_info)
btrfs_destroy_workqueue(fs_info->endio_write_workers);
btrfs_destroy_workqueue(fs_info->endio_freespace_worker);
btrfs_destroy_workqueue(fs_info->submit_workers);
-   btrfs_stop_workers(&fs_info->delayed_workers);
+   btrfs_destroy_workqueue(fs_info->delayed_workers);
btrfs_destroy_workqueue(fs_info->caching_workers);
btrfs_destroy_workqueue(fs_info->readahead_workers);
btrfs_destroy_workqueue(fs_info->flush_workers);
@@ -2515,9 +2515,8 @@ int open_ctree(struct super_block *sb,
btrfs_alloc_workqueue("endio-write", flags, max_active, 2);
fs_info->endio_freespace_worker =
btrfs_alloc_workqueue("freespace-write", flags, max_active, 0);
-   btrfs_init_workers(&fs_info->delayed_workers, "delayed-meta",
-  fs_info->thread_pool_size,
-  &fs_info->generic_worker);
+   fs_info->delayed_workers =
+   btrfs_alloc_workqueue("delayed-meta", flags, max_active, 0);
fs_info->readahead_workers =
btrfs_alloc_workqueue("readahead", flags, max_active, 2);
btrfs_init_workers(&fs_info->qgroup_rescan_workers, "qgroup-rescan", 1,
@@ -2528,7 +2527,6 @@ int open_ctree(struct super_block *sb,
 * return -ENOMEM if any of these fail.
 */
ret = btrfs_start_workers(&fs_info->generic_worker);
-   ret |= btrfs_start_workers(&fs_info->delayed_workers);
ret |= btrfs_start_workers(&fs_info->qgroup_rescan_workers);
if (ret) {
err = -ENOMEM;
@@ -2541,7 +2539,7 @@ int open_ctree(struct super_block *sb,
  fs_info->endio_write_workers && fs_info->endio_raid56_workers &&
  fs_info->endio_freespace_worker && fs_info->rmw_workers &&
  fs_info->caching_workers && fs_info->readahead_workers &&
- fs_info->fixup_workers)) {
+ fs_info->fixup_workers && fs_info->delayed_workers)) {
err = -ENOMEM;
goto fail_sb_buffer;
}
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 3614053..5a355c4 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -1327,7 +1327,7 @@ static void btrfs_resize_thread_pool(struct btrfs_fs_info 
*fs_info,
new_pool_size);
btrfs_workqueue_set_max(fs_info->endio_write_workers, new_pool_size);
btrfs_workqueue_set_max(fs_info->endio_freespace_worker, new_pool_size);
-   btrfs_se

[PATCH v5 18/18] btrfs: Cleanup the "_struct" suffix in btrfs_workequeue

2014-02-27 Thread Qu Wenruo
Since the "_struct" suffix is mainly used for distinguish the differnt
btrfs_work between the original and the newly created one,
there is no need using the suffix since all btrfs_workers are changed
into btrfs_workqueue.

Also this patch fixed some codes whose code style is changed due to the
too long "_struct" suffix.

Signed-off-by: Qu Wenruo 
Tested-by: David Sterba 
---
Changelog:
v3->v4:
  - Remove the "_struct" suffix.
v4->v5:
  None
---
 fs/btrfs/async-thread.c  | 66 
 fs/btrfs/async-thread.h  | 34 -
 fs/btrfs/ctree.h | 44 
 fs/btrfs/delayed-inode.c |  4 +--
 fs/btrfs/disk-io.c   | 14 +-
 fs/btrfs/extent-tree.c   |  2 +-
 fs/btrfs/inode.c | 18 ++---
 fs/btrfs/ordered-data.c  |  2 +-
 fs/btrfs/ordered-data.h  |  4 +--
 fs/btrfs/qgroup.c|  2 +-
 fs/btrfs/raid56.c| 14 +-
 fs/btrfs/reada.c |  5 ++--
 fs/btrfs/scrub.c | 23 -
 fs/btrfs/volumes.c   |  2 +-
 fs/btrfs/volumes.h   |  2 +-
 15 files changed, 116 insertions(+), 120 deletions(-)

diff --git a/fs/btrfs/async-thread.c b/fs/btrfs/async-thread.c
index 2a5f383..a709585 100644
--- a/fs/btrfs/async-thread.c
+++ b/fs/btrfs/async-thread.c
@@ -32,7 +32,7 @@
 #define NO_THRESHOLD (-1)
 #define DFT_THRESHOLD (32)
 
-struct __btrfs_workqueue_struct {
+struct __btrfs_workqueue {
struct workqueue_struct *normal_wq;
/* List head pointing to ordered work list */
struct list_head ordered_list;
@@ -49,15 +49,15 @@ struct __btrfs_workqueue_struct {
spinlock_t thres_lock;
 };
 
-struct btrfs_workqueue_struct {
-   struct __btrfs_workqueue_struct *normal;
-   struct __btrfs_workqueue_struct *high;
+struct btrfs_workqueue {
+   struct __btrfs_workqueue *normal;
+   struct __btrfs_workqueue *high;
 };
 
-static inline struct __btrfs_workqueue_struct
+static inline struct __btrfs_workqueue
 *__btrfs_alloc_workqueue(char *name, int flags, int max_active, int thresh)
 {
-   struct __btrfs_workqueue_struct *ret = kzalloc(sizeof(*ret), GFP_NOFS);
+   struct __btrfs_workqueue *ret = kzalloc(sizeof(*ret), GFP_NOFS);
 
if (unlikely(!ret))
return NULL;
@@ -95,14 +95,14 @@ static inline struct __btrfs_workqueue_struct
 }
 
 static inline void
-__btrfs_destroy_workqueue(struct __btrfs_workqueue_struct *wq);
+__btrfs_destroy_workqueue(struct __btrfs_workqueue *wq);
 
-struct btrfs_workqueue_struct *btrfs_alloc_workqueue(char *name,
-int flags,
-int max_active,
-int thresh)
+struct btrfs_workqueue *btrfs_alloc_workqueue(char *name,
+ int flags,
+ int max_active,
+ int thresh)
 {
-   struct btrfs_workqueue_struct *ret = kzalloc(sizeof(*ret), GFP_NOFS);
+   struct btrfs_workqueue *ret = kzalloc(sizeof(*ret), GFP_NOFS);
 
if (unlikely(!ret))
return NULL;
@@ -131,7 +131,7 @@ struct btrfs_workqueue_struct *btrfs_alloc_workqueue(char 
*name,
  * This hook WILL be called in IRQ handler context,
  * so workqueue_set_max_active MUST NOT be called in this hook
  */
-static inline void thresh_queue_hook(struct __btrfs_workqueue_struct *wq)
+static inline void thresh_queue_hook(struct __btrfs_workqueue *wq)
 {
if (wq->thresh == NO_THRESHOLD)
return;
@@ -143,7 +143,7 @@ static inline void thresh_queue_hook(struct 
__btrfs_workqueue_struct *wq)
  * This hook is called in kthread content.
  * So workqueue_set_max_active is called here.
  */
-static inline void thresh_exec_hook(struct __btrfs_workqueue_struct *wq)
+static inline void thresh_exec_hook(struct __btrfs_workqueue *wq)
 {
int new_max_active;
long pending;
@@ -186,10 +186,10 @@ out:
}
 }
 
-static void run_ordered_work(struct __btrfs_workqueue_struct *wq)
+static void run_ordered_work(struct __btrfs_workqueue *wq)
 {
struct list_head *list = &wq->ordered_list;
-   struct btrfs_work_struct *work;
+   struct btrfs_work *work;
spinlock_t *lock = &wq->list_lock;
unsigned long flags;
 
@@ -197,7 +197,7 @@ static void run_ordered_work(struct 
__btrfs_workqueue_struct *wq)
spin_lock_irqsave(lock, flags);
if (list_empty(list))
break;
-   work = list_entry(list->next, struct btrfs_work_struct,
+   work = list_entry(list->next, struct btrfs_work,
  ordered_list);
if (!test_bit(WORK_DONE_BIT, &work->flags))
break;
@@ -229,11 +229,11 @@ static void run_ordered_work(struct 
__btrfs_workqueue_struct *wq)
 
 static void normal_work

[PATCH v5 17/18] btrfs: Cleanup the old btrfs_worker.

2014-02-27 Thread Qu Wenruo
Since all the btrfs_worker is replaced with the newly created
btrfs_workqueue, the old codes can be easily remove.

Signed-off-by: Quwenruo 
Tested-by: David Sterba 
---
Changelog:
v1->v2:
  None
v2->v3:
  - Reuse the old async-thred.[ch] files.
v3->v4:
  - Reuse the old WORK_* bits.
v4->v5:
  None
---
 fs/btrfs/async-thread.c | 707 +---
 fs/btrfs/async-thread.h | 100 ---
 fs/btrfs/ctree.h|   1 -
 fs/btrfs/disk-io.c  |  12 -
 fs/btrfs/super.c|   8 -
 5 files changed, 3 insertions(+), 825 deletions(-)

diff --git a/fs/btrfs/async-thread.c b/fs/btrfs/async-thread.c
index 977bce2..2a5f383 100644
--- a/fs/btrfs/async-thread.c
+++ b/fs/btrfs/async-thread.c
@@ -25,714 +25,13 @@
 #include 
 #include "async-thread.h"
 
-#define WORK_QUEUED_BIT 0
-#define WORK_DONE_BIT 1
-#define WORK_ORDER_DONE_BIT 2
-#define WORK_HIGH_PRIO_BIT 3
+#define WORK_DONE_BIT 0
+#define WORK_ORDER_DONE_BIT 1
+#define WORK_HIGH_PRIO_BIT 2
 
 #define NO_THRESHOLD (-1)
 #define DFT_THRESHOLD (32)
 
-/*
- * container for the kthread task pointer and the list of pending work
- * One of these is allocated per thread.
- */
-struct btrfs_worker_thread {
-   /* pool we belong to */
-   struct btrfs_workers *workers;
-
-   /* list of struct btrfs_work that are waiting for service */
-   struct list_head pending;
-   struct list_head prio_pending;
-
-   /* list of worker threads from struct btrfs_workers */
-   struct list_head worker_list;
-
-   /* kthread */
-   struct task_struct *task;
-
-   /* number of things on the pending list */
-   atomic_t num_pending;
-
-   /* reference counter for this struct */
-   atomic_t refs;
-
-   unsigned long sequence;
-
-   /* protects the pending list. */
-   spinlock_t lock;
-
-   /* set to non-zero when this thread is already awake and kicking */
-   int working;
-
-   /* are we currently idle */
-   int idle;
-};
-
-static int __btrfs_start_workers(struct btrfs_workers *workers);
-
-/*
- * btrfs_start_workers uses kthread_run, which can block waiting for memory
- * for a very long time.  It will actually throttle on page writeback,
- * and so it may not make progress until after our btrfs worker threads
- * process all of the pending work structs in their queue
- *
- * This means we can't use btrfs_start_workers from inside a btrfs worker
- * thread that is used as part of cleaning dirty memory, which pretty much
- * involves all of the worker threads.
- *
- * Instead we have a helper queue who never has more than one thread
- * where we scheduler thread start operations.  This worker_start struct
- * is used to contain the work and hold a pointer to the queue that needs
- * another worker.
- */
-struct worker_start {
-   struct btrfs_work work;
-   struct btrfs_workers *queue;
-};
-
-static void start_new_worker_func(struct btrfs_work *work)
-{
-   struct worker_start *start;
-   start = container_of(work, struct worker_start, work);
-   __btrfs_start_workers(start->queue);
-   kfree(start);
-}
-
-/*
- * helper function to move a thread onto the idle list after it
- * has finished some requests.
- */
-static void check_idle_worker(struct btrfs_worker_thread *worker)
-{
-   if (!worker->idle && atomic_read(&worker->num_pending) <
-   worker->workers->idle_thresh / 2) {
-   unsigned long flags;
-   spin_lock_irqsave(&worker->workers->lock, flags);
-   worker->idle = 1;
-
-   /* the list may be empty if the worker is just starting */
-   if (!list_empty(&worker->worker_list) &&
-   !worker->workers->stopping) {
-   list_move(&worker->worker_list,
-&worker->workers->idle_list);
-   }
-   spin_unlock_irqrestore(&worker->workers->lock, flags);
-   }
-}
-
-/*
- * helper function to move a thread off the idle list after new
- * pending work is added.
- */
-static void check_busy_worker(struct btrfs_worker_thread *worker)
-{
-   if (worker->idle && atomic_read(&worker->num_pending) >=
-   worker->workers->idle_thresh) {
-   unsigned long flags;
-   spin_lock_irqsave(&worker->workers->lock, flags);
-   worker->idle = 0;
-
-   if (!list_empty(&worker->worker_list) &&
-   !worker->workers->stopping) {
-   list_move_tail(&worker->worker_list,
- &worker->workers->worker_list);
-   }
-   spin_unlock_irqrestore(&worker->workers->lock, flags);
-   }
-}
-
-static void check_pending_worker_creates(struct btrfs_worker_thread *worker)
-{
-   struct btrfs_workers *workers = worker->workers;
-   struct worker_start *start;
-   unsigned long flags;
-
-   rmb();
-   if (!workers->atomic_start_pending)
-   re

[PATCH v5 01/18] btrfs: Cleanup the unused struct async_sched.

2014-02-27 Thread Qu Wenruo
The struct async_sched is not used by any codes and can be removed.

Signed-off-by: Qu Wenruo 
Reviewed-by: Josef Bacik 
Tested-by: David Sterba 
---
Changelog:
v1->v2:
  None.
v2->v3:
  None.
v3->v4:
  None:
v4->v5:
  None
---
 fs/btrfs/volumes.c | 7 ---
 1 file changed, 7 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 07629e9..82a63b1 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -5323,13 +5323,6 @@ static void btrfs_end_bio(struct bio *bio, int err)
}
 }
 
-struct async_sched {
-   struct bio *bio;
-   int rw;
-   struct btrfs_fs_info *info;
-   struct btrfs_work work;
-};
-
 /*
  * see run_scheduled_bios for a description of why bios are collected for
  * async submit.
-- 
1.9.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v5 12/18] btrfs: Replace fs_info->readahead_workers workqueue with btrfs_workqueue.

2014-02-27 Thread Qu Wenruo
Replace the fs_info->readahead_workers with the newly created
btrfs_workqueue.

Signed-off-by: Qu Wenruo 
Tested-by: David Sterba 
---
Changelog:
v1->v2:
  None
v2->v3:
  - Use the btrfs_workqueue_struct to replace submit_workers.
v3->v4:
  - Use the simplified btrfs_alloc_workqueue API.
v4->v5:
  None
---
 fs/btrfs/ctree.h   |  2 +-
 fs/btrfs/disk-io.c | 12 
 fs/btrfs/reada.c   |  9 +
 fs/btrfs/super.c   |  2 +-
 4 files changed, 11 insertions(+), 14 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 06a64fb..3d6f490 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1517,7 +1517,7 @@ struct btrfs_fs_info {
struct btrfs_workqueue_struct *endio_freespace_worker;
struct btrfs_workqueue_struct *submit_workers;
struct btrfs_workqueue_struct *caching_workers;
-   struct btrfs_workers readahead_workers;
+   struct btrfs_workqueue_struct *readahead_workers;
 
/*
 * fixup workers take dirty pages that didn't properly go through
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 391cadf..ca6d0cf 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2004,7 +2004,7 @@ static void btrfs_stop_all_workers(struct btrfs_fs_info 
*fs_info)
btrfs_destroy_workqueue(fs_info->submit_workers);
btrfs_stop_workers(&fs_info->delayed_workers);
btrfs_destroy_workqueue(fs_info->caching_workers);
-   btrfs_stop_workers(&fs_info->readahead_workers);
+   btrfs_destroy_workqueue(fs_info->readahead_workers);
btrfs_destroy_workqueue(fs_info->flush_workers);
btrfs_stop_workers(&fs_info->qgroup_rescan_workers);
 }
@@ -2518,14 +2518,11 @@ int open_ctree(struct super_block *sb,
btrfs_init_workers(&fs_info->delayed_workers, "delayed-meta",
   fs_info->thread_pool_size,
   &fs_info->generic_worker);
-   btrfs_init_workers(&fs_info->readahead_workers, "readahead",
-  fs_info->thread_pool_size,
-  &fs_info->generic_worker);
+   fs_info->readahead_workers =
+   btrfs_alloc_workqueue("readahead", flags, max_active, 2);
btrfs_init_workers(&fs_info->qgroup_rescan_workers, "qgroup-rescan", 1,
   &fs_info->generic_worker);
 
-   fs_info->readahead_workers.idle_thresh = 2;
-
/*
 * btrfs_start_workers can really only fail because of ENOMEM so just
 * return -ENOMEM if any of these fail.
@@ -2533,7 +2530,6 @@ int open_ctree(struct super_block *sb,
ret = btrfs_start_workers(&fs_info->generic_worker);
ret |= btrfs_start_workers(&fs_info->fixup_workers);
ret |= btrfs_start_workers(&fs_info->delayed_workers);
-   ret |= btrfs_start_workers(&fs_info->readahead_workers);
ret |= btrfs_start_workers(&fs_info->qgroup_rescan_workers);
if (ret) {
err = -ENOMEM;
@@ -2545,7 +2541,7 @@ int open_ctree(struct super_block *sb,
  fs_info->endio_meta_write_workers &&
  fs_info->endio_write_workers && fs_info->endio_raid56_workers &&
  fs_info->endio_freespace_worker && fs_info->rmw_workers &&
- fs_info->caching_workers)) {
+ fs_info->caching_workers && fs_info->readahead_workers)) {
err = -ENOMEM;
goto fail_sb_buffer;
}
diff --git a/fs/btrfs/reada.c b/fs/btrfs/reada.c
index 31c797c..9e01d36 100644
--- a/fs/btrfs/reada.c
+++ b/fs/btrfs/reada.c
@@ -91,7 +91,8 @@ struct reada_zone {
 };
 
 struct reada_machine_work {
-   struct btrfs_work   work;
+   struct btrfs_work_struct
+   work;
struct btrfs_fs_info*fs_info;
 };
 
@@ -733,7 +734,7 @@ static int reada_start_machine_dev(struct btrfs_fs_info 
*fs_info,
 
 }
 
-static void reada_start_machine_worker(struct btrfs_work *work)
+static void reada_start_machine_worker(struct btrfs_work_struct *work)
 {
struct reada_machine_work *rmw;
struct btrfs_fs_info *fs_info;
@@ -793,10 +794,10 @@ static void reada_start_machine(struct btrfs_fs_info 
*fs_info)
/* FIXME we cannot handle this properly right now */
BUG();
}
-   rmw->work.func = reada_start_machine_worker;
+   btrfs_init_work(&rmw->work, reada_start_machine_worker, NULL, NULL);
rmw->fs_info = fs_info;
 
-   btrfs_queue_worker(&fs_info->readahead_workers, &rmw->work);
+   btrfs_queue_work(fs_info->readahead_workers, &rmw->work);
 }
 
 #ifdef DEBUG
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index cd52e20..56c5533 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -1329,7 +1329,7 @@ static void btrfs_resize_thread_pool(struct btrfs_fs_info 
*fs_info,
btrfs_workqueue_set_max(fs_info->endio_write_workers, new_pool_size);
btrfs_workqueue_set_max(fs_info->endio_freespace_worker, new_pool_size);
btrfs_set_max_workers(&fs_info->delaye

[PATCH v5 10/18] btrfs: Replace fs_info->rmw_workers workqueue with btrfs_workqueue.

2014-02-27 Thread Qu Wenruo
Replace the fs_info->rmw_workers with the newly created
btrfs_workqueue.

Signed-off-by: Qu Wenruo 
Tested-by: David Sterba 
---
Changelog:
v1->v2:
  None
v2->v3:
  - Use the btrfs_workqueue_struct to replace submit_workers.
v3->v4:
  - Use the simplified btrfs_alloc_workqueue API.
v4->v5:
  None
---
 fs/btrfs/ctree.h   |  2 +-
 fs/btrfs/disk-io.c | 12 
 fs/btrfs/raid56.c  | 35 ---
 3 files changed, 21 insertions(+), 28 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 3db87da..a7b0bdd 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1511,7 +1511,7 @@ struct btrfs_fs_info {
struct btrfs_workqueue_struct *endio_workers;
struct btrfs_workqueue_struct *endio_meta_workers;
struct btrfs_workqueue_struct *endio_raid56_workers;
-   struct btrfs_workers rmw_workers;
+   struct btrfs_workqueue_struct *rmw_workers;
struct btrfs_workqueue_struct *endio_meta_write_workers;
struct btrfs_workqueue_struct *endio_write_workers;
struct btrfs_workqueue_struct *endio_freespace_worker;
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 28b303c..12586b1 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -1997,7 +1997,7 @@ static void btrfs_stop_all_workers(struct btrfs_fs_info 
*fs_info)
btrfs_destroy_workqueue(fs_info->endio_workers);
btrfs_destroy_workqueue(fs_info->endio_meta_workers);
btrfs_destroy_workqueue(fs_info->endio_raid56_workers);
-   btrfs_stop_workers(&fs_info->rmw_workers);
+   btrfs_destroy_workqueue(fs_info->rmw_workers);
btrfs_destroy_workqueue(fs_info->endio_meta_write_workers);
btrfs_destroy_workqueue(fs_info->endio_write_workers);
btrfs_destroy_workqueue(fs_info->endio_freespace_worker);
@@ -2509,9 +2509,8 @@ int open_ctree(struct super_block *sb,
btrfs_alloc_workqueue("endio-meta-write", flags, max_active, 2);
fs_info->endio_raid56_workers =
btrfs_alloc_workqueue("endio-raid56", flags, max_active, 4);
-   btrfs_init_workers(&fs_info->rmw_workers,
-  "rmw", fs_info->thread_pool_size,
-  &fs_info->generic_worker);
+   fs_info->rmw_workers =
+   btrfs_alloc_workqueue("rmw", flags, max_active, 2);
fs_info->endio_write_workers =
btrfs_alloc_workqueue("endio-write", flags, max_active, 2);
fs_info->endio_freespace_worker =
@@ -2525,8 +2524,6 @@ int open_ctree(struct super_block *sb,
btrfs_init_workers(&fs_info->qgroup_rescan_workers, "qgroup-rescan", 1,
   &fs_info->generic_worker);
 
-   fs_info->rmw_workers.idle_thresh = 2;
-
fs_info->readahead_workers.idle_thresh = 2;
 
/*
@@ -2535,7 +2532,6 @@ int open_ctree(struct super_block *sb,
 */
ret = btrfs_start_workers(&fs_info->generic_worker);
ret |= btrfs_start_workers(&fs_info->fixup_workers);
-   ret |= btrfs_start_workers(&fs_info->rmw_workers);
ret |= btrfs_start_workers(&fs_info->delayed_workers);
ret |= btrfs_start_workers(&fs_info->caching_workers);
ret |= btrfs_start_workers(&fs_info->readahead_workers);
@@ -2549,7 +2545,7 @@ int open_ctree(struct super_block *sb,
  fs_info->endio_workers && fs_info->endio_meta_workers &&
  fs_info->endio_meta_write_workers &&
  fs_info->endio_write_workers && fs_info->endio_raid56_workers &&
- fs_info->endio_freespace_worker)) {
+ fs_info->endio_freespace_worker && fs_info->rmw_workers)) {
err = -ENOMEM;
goto fail_sb_buffer;
}
diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
index 24ac218..5afa564 100644
--- a/fs/btrfs/raid56.c
+++ b/fs/btrfs/raid56.c
@@ -87,7 +87,7 @@ struct btrfs_raid_bio {
/*
 * for scheduling work in the helper threads
 */
-   struct btrfs_work work;
+   struct btrfs_work_struct work;
 
/*
 * bio list and bio_list_lock are used
@@ -166,8 +166,8 @@ struct btrfs_raid_bio {
 
 static int __raid56_parity_recover(struct btrfs_raid_bio *rbio);
 static noinline void finish_rmw(struct btrfs_raid_bio *rbio);
-static void rmw_work(struct btrfs_work *work);
-static void read_rebuild_work(struct btrfs_work *work);
+static void rmw_work(struct btrfs_work_struct *work);
+static void read_rebuild_work(struct btrfs_work_struct *work);
 static void async_rmw_stripe(struct btrfs_raid_bio *rbio);
 static void async_read_rebuild(struct btrfs_raid_bio *rbio);
 static int fail_bio_stripe(struct btrfs_raid_bio *rbio, struct bio *bio);
@@ -1416,20 +1416,18 @@ cleanup:
 
 static void async_rmw_stripe(struct btrfs_raid_bio *rbio)
 {
-   rbio->work.flags = 0;
-   rbio->work.func = rmw_work;
+   btrfs_init_work(&rbio->work, rmw_work, NULL, NULL);
 
-   btrfs_queue_worker(&rbio->fs_info->rmw_workers,
-  &r

[PATCH v5 02/18] btrfs: Added btrfs_workqueue_struct implemented ordered execution based on kernel workqueue

2014-02-27 Thread Qu Wenruo
Use kernel workqueue to implement a new btrfs_workqueue_struct, which
has the ordering execution feature like the btrfs_worker.

The func is executed in a concurrency way, and the
ordred_func/ordered_free is executed in the sequence them are queued
after the corresponding func is done.

The new btrfs_workqueue works much like the original one, one workqueue
for normal work and a list for ordered work.
When a work is queued, ordered work will be added to the list and helper
function will be queued into the workqueue.
The helper function will execute a normal work and then check and execute as 
many
ordered work as possible in the sequence they were queued.

At this patch, high priority work queue or thresholding is not added yet.
The high priority feature and thresholding will be added in the following 
patches.

Signed-off-by: Qu Wenruo 
Signed-off-by: Lai Jiangshan 
Tested-by: David Sterba 
---
Changelog:
v1->v2:
  None.
v2->v3:
  - Fix the potential deadlock discovered by kernel lockdep.
  - Reuse the async-thread.[ch] files.
  - Make the ordered_func optional, which makes it adaptable to
all btrfs_workers.
v3->v4:
  - Use the old list method to implement ordered workqueue.
Previous 3 wq implement needs extra time waiting for scheduling,
which caused up to 40% performance drop in compress tests.
The old list method(after executing a normal work, check the order_list
and executing) does not need the extra scheduling things.
  - Simplify the btrfs_alloc_workqueue parameters.
Now only one name is needed, and ordered work mechanism is determined using
work->ordered_func.
  - Fix memory leak in btrfs_destroy_workqueue.
v4->v5:
  - Fix a multithread free-and-use bug reported by Josef and David.
---
 fs/btrfs/async-thread.c | 137 
 fs/btrfs/async-thread.h |  27 ++
 2 files changed, 164 insertions(+)

diff --git a/fs/btrfs/async-thread.c b/fs/btrfs/async-thread.c
index 0b78bf2..905de02 100644
--- a/fs/btrfs/async-thread.c
+++ b/fs/btrfs/async-thread.c
@@ -1,5 +1,6 @@
 /*
  * Copyright (C) 2007 Oracle.  All rights reserved.
+ * Copyright (C) 2014 Fujitsu.  All rights reserved.
  *
  * This program is free software; you can redistribute it and/or
  * modify it under the terms of the GNU General Public
@@ -21,6 +22,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "async-thread.h"
 
 #define WORK_QUEUED_BIT 0
@@ -727,3 +729,138 @@ void btrfs_queue_worker(struct btrfs_workers *workers, 
struct btrfs_work *work)
wake_up_process(worker->task);
spin_unlock_irqrestore(&worker->lock, flags);
 }
+
+struct btrfs_workqueue_struct {
+   struct workqueue_struct *normal_wq;
+   /* List head pointing to ordered work list */
+   struct list_head ordered_list;
+
+   /* Spinlock for ordered_list */
+   spinlock_t list_lock;
+};
+
+struct btrfs_workqueue_struct *btrfs_alloc_workqueue(char *name,
+int flags,
+int max_active)
+{
+   struct btrfs_workqueue_struct *ret = kzalloc(sizeof(*ret), GFP_NOFS);
+
+   if (unlikely(!ret))
+   return NULL;
+
+   ret->normal_wq = alloc_workqueue("%s-%s", flags, max_active,
+"btrfs", name);
+   if (unlikely(!ret->normal_wq)) {
+   kfree(ret);
+   return NULL;
+   }
+
+   INIT_LIST_HEAD(&ret->ordered_list);
+   spin_lock_init(&ret->list_lock);
+   return ret;
+}
+
+static void run_ordered_work(struct btrfs_workqueue_struct *wq)
+{
+   struct list_head *list = &wq->ordered_list;
+   struct btrfs_work_struct *work;
+   spinlock_t *lock = &wq->list_lock;
+   unsigned long flags;
+
+   while (1) {
+   spin_lock_irqsave(lock, flags);
+   if (list_empty(list))
+   break;
+   work = list_entry(list->next, struct btrfs_work_struct,
+ ordered_list);
+   if (!test_bit(WORK_DONE_BIT, &work->flags))
+   break;
+
+   /*
+* we are going to call the ordered done function, but
+* we leave the work item on the list as a barrier so
+* that later work items that are done don't have their
+* functions called before this one returns
+*/
+   if (test_and_set_bit(WORK_ORDER_DONE_BIT, &work->flags))
+   break;
+   spin_unlock_irqrestore(lock, flags);
+   work->ordered_func(work);
+
+   /* now take the lock again and drop our item from the list */
+   spin_lock_irqsave(lock, flags);
+   list_del(&work->ordered_list);
+   spin_unlock_irqrestore(lock, flags);
+
+   /*
+* we don't want to call the ordered free functions
+  

[PATCH v5 03/18] btrfs: Add high priority workqueue support for btrfs_workqueue_struct

2014-02-27 Thread Qu Wenruo
Add high priority function to btrfs_workqueue.

This is implemented by embedding a btrfs_workqueue into a
btrfs_workqueue and use some helper functions to differ the normal
priority wq and high priority wq.
So the high priority wq is completely independent from the normal
workqueue.

Signed-off-by: Qu Wenruo 
Tested-by: David Sterba 
---
Changelog:
v1->v2:
  None
v2->v3:
  None
v3->v4:
  - Implement high priority workqueue independently.
Now high priority wq is implemented as a normal btrfs_workqueue,
with independent ordering/thresholding mechanism.
This fixed the problem that high priority wq and normal wq shared one
ordered wq.
v4->v5:
  None
---
 fs/btrfs/async-thread.c | 91 ++---
 fs/btrfs/async-thread.h |  5 ++-
 2 files changed, 83 insertions(+), 13 deletions(-)

diff --git a/fs/btrfs/async-thread.c b/fs/btrfs/async-thread.c
index 905de02..193c849 100644
--- a/fs/btrfs/async-thread.c
+++ b/fs/btrfs/async-thread.c
@@ -730,7 +730,7 @@ void btrfs_queue_worker(struct btrfs_workers *workers, 
struct btrfs_work *work)
spin_unlock_irqrestore(&worker->lock, flags);
 }
 
-struct btrfs_workqueue_struct {
+struct __btrfs_workqueue_struct {
struct workqueue_struct *normal_wq;
/* List head pointing to ordered work list */
struct list_head ordered_list;
@@ -739,6 +739,38 @@ struct btrfs_workqueue_struct {
spinlock_t list_lock;
 };
 
+struct btrfs_workqueue_struct {
+   struct __btrfs_workqueue_struct *normal;
+   struct __btrfs_workqueue_struct *high;
+};
+
+static inline struct __btrfs_workqueue_struct
+*__btrfs_alloc_workqueue(char *name, int flags, int max_active)
+{
+   struct __btrfs_workqueue_struct *ret = kzalloc(sizeof(*ret), GFP_NOFS);
+
+   if (unlikely(!ret))
+   return NULL;
+
+   if (flags & WQ_HIGHPRI)
+   ret->normal_wq = alloc_workqueue("%s-%s-high", flags,
+max_active, "btrfs", name);
+   else
+   ret->normal_wq = alloc_workqueue("%s-%s", flags,
+max_active, "btrfs", name);
+   if (unlikely(!ret->normal_wq)) {
+   kfree(ret);
+   return NULL;
+   }
+
+   INIT_LIST_HEAD(&ret->ordered_list);
+   spin_lock_init(&ret->list_lock);
+   return ret;
+}
+
+static inline void
+__btrfs_destroy_workqueue(struct __btrfs_workqueue_struct *wq);
+
 struct btrfs_workqueue_struct *btrfs_alloc_workqueue(char *name,
 int flags,
 int max_active)
@@ -748,19 +780,25 @@ struct btrfs_workqueue_struct *btrfs_alloc_workqueue(char 
*name,
if (unlikely(!ret))
return NULL;
 
-   ret->normal_wq = alloc_workqueue("%s-%s", flags, max_active,
-"btrfs", name);
-   if (unlikely(!ret->normal_wq)) {
+   ret->normal = __btrfs_alloc_workqueue(name, flags & ~WQ_HIGHPRI,
+ max_active);
+   if (unlikely(!ret->normal)) {
kfree(ret);
return NULL;
}
 
-   INIT_LIST_HEAD(&ret->ordered_list);
-   spin_lock_init(&ret->list_lock);
+   if (flags & WQ_HIGHPRI) {
+   ret->high = __btrfs_alloc_workqueue(name, flags, max_active);
+   if (unlikely(!ret->high)) {
+   __btrfs_destroy_workqueue(ret->normal);
+   kfree(ret);
+   return NULL;
+   }
+   }
return ret;
 }
 
-static void run_ordered_work(struct btrfs_workqueue_struct *wq)
+static void run_ordered_work(struct __btrfs_workqueue_struct *wq)
 {
struct list_head *list = &wq->ordered_list;
struct btrfs_work_struct *work;
@@ -804,7 +842,7 @@ static void run_ordered_work(struct btrfs_workqueue_struct 
*wq)
 static void normal_work_helper(struct work_struct *arg)
 {
struct btrfs_work_struct *work;
-   struct btrfs_workqueue_struct *wq;
+   struct __btrfs_workqueue_struct *wq;
int need_order = 0;
 
work = container_of(arg, struct btrfs_work_struct, normal_work);
@@ -840,8 +878,8 @@ void btrfs_init_work(struct btrfs_work_struct *work,
work->flags = 0;
 }
 
-void btrfs_queue_work(struct btrfs_workqueue_struct *wq,
- struct btrfs_work_struct *work)
+static inline void __btrfs_queue_work(struct __btrfs_workqueue_struct *wq,
+ struct btrfs_work_struct *work)
 {
unsigned long flags;
 
@@ -854,13 +892,42 @@ void btrfs_queue_work(struct btrfs_workqueue_struct *wq,
queue_work(wq->normal_wq, &work->normal_work);
 }
 
-void btrfs_destroy_workqueue(struct btrfs_workqueue_struct *wq)
+void btrfs_queue_work(struct btrfs_workqueue_struct *wq,
+ struct btrfs_work_struct *work)
+{
+   struct __bt

[PATCH v5 16/18] btrfs: Replace fs_info->scrub_* workqueue with btrfs_workqueue.

2014-02-27 Thread Qu Wenruo
Replace the fs_info->scrub_* with the newly created
btrfs_workqueue.

Signed-off-by: Qu Wenruo 
Tested-by: David Sterba 
---
Changelog:
v1->v2:
  None
v2->v3:
  - Use the btrfs_workqueue_struct to replace submit_workers.
v3->v4:
  - Use the simplified btrfs_alloc_workqueue API.
v4->v5:
  None
---
 fs/btrfs/ctree.h |  6 ++--
 fs/btrfs/scrub.c | 93 ++--
 fs/btrfs/super.c |  4 +--
 3 files changed, 55 insertions(+), 48 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index f8f62d0..9aece57 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1605,9 +1605,9 @@ struct btrfs_fs_info {
atomic_t scrub_cancel_req;
wait_queue_head_t scrub_pause_wait;
int scrub_workers_refcnt;
-   struct btrfs_workers scrub_workers;
-   struct btrfs_workers scrub_wr_completion_workers;
-   struct btrfs_workers scrub_nocow_workers;
+   struct btrfs_workqueue_struct *scrub_workers;
+   struct btrfs_workqueue_struct *scrub_wr_completion_workers;
+   struct btrfs_workqueue_struct *scrub_nocow_workers;
 
 #ifdef CONFIG_BTRFS_FS_CHECK_INTEGRITY
u32 check_integrity_print_mask;
diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index 51c342b..9223b7b 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -96,7 +96,8 @@ struct scrub_bio {
 #endif
int page_count;
int next_free;
-   struct btrfs_work   work;
+   struct btrfs_work_struct
+   work;
 };
 
 struct scrub_block {
@@ -154,7 +155,8 @@ struct scrub_fixup_nodatasum {
struct btrfs_device *dev;
u64 logical;
struct btrfs_root   *root;
-   struct btrfs_work   work;
+   struct btrfs_work_struct
+   work;
int mirror_num;
 };
 
@@ -172,7 +174,8 @@ struct scrub_copy_nocow_ctx {
int mirror_num;
u64 physical_for_dev_replace;
struct list_headinodes;
-   struct btrfs_work   work;
+   struct btrfs_work_struct
+   work;
 };
 
 struct scrub_warning {
@@ -231,7 +234,7 @@ static int scrub_pages(struct scrub_ctx *sctx, u64 logical, 
u64 len,
   u64 gen, int mirror_num, u8 *csum, int force,
   u64 physical_for_dev_replace);
 static void scrub_bio_end_io(struct bio *bio, int err);
-static void scrub_bio_end_io_worker(struct btrfs_work *work);
+static void scrub_bio_end_io_worker(struct btrfs_work_struct *work);
 static void scrub_block_complete(struct scrub_block *sblock);
 static void scrub_remap_extent(struct btrfs_fs_info *fs_info,
   u64 extent_logical, u64 extent_len,
@@ -248,14 +251,14 @@ static int scrub_add_page_to_wr_bio(struct scrub_ctx 
*sctx,
struct scrub_page *spage);
 static void scrub_wr_submit(struct scrub_ctx *sctx);
 static void scrub_wr_bio_end_io(struct bio *bio, int err);
-static void scrub_wr_bio_end_io_worker(struct btrfs_work *work);
+static void scrub_wr_bio_end_io_worker(struct btrfs_work_struct *work);
 static int write_page_nocow(struct scrub_ctx *sctx,
u64 physical_for_dev_replace, struct page *page);
 static int copy_nocow_pages_for_inode(u64 inum, u64 offset, u64 root,
  struct scrub_copy_nocow_ctx *ctx);
 static int copy_nocow_pages(struct scrub_ctx *sctx, u64 logical, u64 len,
int mirror_num, u64 physical_for_dev_replace);
-static void copy_nocow_pages_worker(struct btrfs_work *work);
+static void copy_nocow_pages_worker(struct btrfs_work_struct *work);
 static void __scrub_blocked_if_needed(struct btrfs_fs_info *fs_info);
 static void scrub_blocked_if_needed(struct btrfs_fs_info *fs_info);
 
@@ -418,7 +421,8 @@ struct scrub_ctx *scrub_setup_ctx(struct btrfs_device *dev, 
int is_dev_replace)
sbio->index = i;
sbio->sctx = sctx;
sbio->page_count = 0;
-   sbio->work.func = scrub_bio_end_io_worker;
+   btrfs_init_work(&sbio->work, scrub_bio_end_io_worker,
+   NULL, NULL);
 
if (i != SCRUB_BIOS_PER_SCTX - 1)
sctx->bios[i]->next_free = i + 1;
@@ -723,7 +727,7 @@ out:
return -EIO;
 }
 
-static void scrub_fixup_nodatasum(struct btrfs_work *work)
+static void scrub_fixup_nodatasum(struct btrfs_work_struct *work)
 {
int ret;
struct scrub_fixup_nodatasum *fixup;
@@ -987,9 +991,10 @@ nodatasum_case:
fixup_nodatasum->root = fs_info->extent_root;
fixup_nodatasum->mirror_num = failed_mirror_index + 1;
scrub_pending_trans_workers_inc(sctx);
-   fixup_nodatasum->work.func = scrub_fixup_nodatasum;
-   btrfs_queue_worker(&fs_

[PATCH v5 15/18] btrfs: Replace fs_info->qgroup_rescan_worker workqueue with btrfs_workqueue.

2014-02-27 Thread Qu Wenruo
Replace the fs_info->qgroup_rescan_worker with the newly created
btrfs_workqueue.

Signed-off-by: Qu Wenruo 
Tested-by: David Sterba 
---
Changelog:
v1->v2:
  None
v2->v3:
  - Use the btrfs_workqueue_struct to replace submit_workers.
v3->v4:
  - Use the simplified btrfs_alloc_workqueue API.
v4->v5:
  None
---
 fs/btrfs/ctree.h   |  4 ++--
 fs/btrfs/disk-io.c | 10 +-
 fs/btrfs/qgroup.c  | 17 +
 3 files changed, 16 insertions(+), 15 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 07b563d..f8f62d0 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1648,9 +1648,9 @@ struct btrfs_fs_info {
/* qgroup rescan items */
struct mutex qgroup_rescan_lock; /* protects the progress item */
struct btrfs_key qgroup_rescan_progress;
-   struct btrfs_workers qgroup_rescan_workers;
+   struct btrfs_workqueue_struct *qgroup_rescan_workers;
struct completion qgroup_rescan_completion;
-   struct btrfs_work qgroup_rescan_work;
+   struct btrfs_work_struct qgroup_rescan_work;
 
/* filesystem state */
unsigned long fs_state;
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index ac8e9c2..e3507c5 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2006,7 +2006,7 @@ static void btrfs_stop_all_workers(struct btrfs_fs_info 
*fs_info)
btrfs_destroy_workqueue(fs_info->caching_workers);
btrfs_destroy_workqueue(fs_info->readahead_workers);
btrfs_destroy_workqueue(fs_info->flush_workers);
-   btrfs_stop_workers(&fs_info->qgroup_rescan_workers);
+   btrfs_destroy_workqueue(fs_info->qgroup_rescan_workers);
 }
 
 static void free_root_extent_buffers(struct btrfs_root *root)
@@ -2519,15 +2519,14 @@ int open_ctree(struct super_block *sb,
btrfs_alloc_workqueue("delayed-meta", flags, max_active, 0);
fs_info->readahead_workers =
btrfs_alloc_workqueue("readahead", flags, max_active, 2);
-   btrfs_init_workers(&fs_info->qgroup_rescan_workers, "qgroup-rescan", 1,
-  &fs_info->generic_worker);
+   fs_info->qgroup_rescan_workers =
+   btrfs_alloc_workqueue("qgroup-rescan", flags, 1, 0);
 
/*
 * btrfs_start_workers can really only fail because of ENOMEM so just
 * return -ENOMEM if any of these fail.
 */
ret = btrfs_start_workers(&fs_info->generic_worker);
-   ret |= btrfs_start_workers(&fs_info->qgroup_rescan_workers);
if (ret) {
err = -ENOMEM;
goto fail_sb_buffer;
@@ -2539,7 +2538,8 @@ int open_ctree(struct super_block *sb,
  fs_info->endio_write_workers && fs_info->endio_raid56_workers &&
  fs_info->endio_freespace_worker && fs_info->rmw_workers &&
  fs_info->caching_workers && fs_info->readahead_workers &&
- fs_info->fixup_workers && fs_info->delayed_workers)) {
+ fs_info->fixup_workers && fs_info->delayed_workers &&
+ fs_info->qgroup_rescan_workers)) {
err = -ENOMEM;
goto fail_sb_buffer;
}
diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index 472302a..38617cc 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -1509,8 +1509,8 @@ int btrfs_run_qgroups(struct btrfs_trans_handle *trans,
ret = qgroup_rescan_init(fs_info, 0, 1);
if (!ret) {
qgroup_rescan_zero_tracking(fs_info);
-   btrfs_queue_worker(&fs_info->qgroup_rescan_workers,
-  &fs_info->qgroup_rescan_work);
+   btrfs_queue_work(fs_info->qgroup_rescan_workers,
+&fs_info->qgroup_rescan_work);
}
ret = 0;
}
@@ -1984,7 +1984,7 @@ out:
return ret;
 }
 
-static void btrfs_qgroup_rescan_worker(struct btrfs_work *work)
+static void btrfs_qgroup_rescan_worker(struct btrfs_work_struct *work)
 {
struct btrfs_fs_info *fs_info = container_of(work, struct btrfs_fs_info,
 qgroup_rescan_work);
@@ -2095,7 +2095,8 @@ qgroup_rescan_init(struct btrfs_fs_info *fs_info, u64 
progress_objectid,
 
memset(&fs_info->qgroup_rescan_work, 0,
   sizeof(fs_info->qgroup_rescan_work));
-   fs_info->qgroup_rescan_work.func = btrfs_qgroup_rescan_worker;
+   btrfs_init_work(&fs_info->qgroup_rescan_work,
+   btrfs_qgroup_rescan_worker, NULL, NULL);
 
if (ret) {
 err:
@@ -2158,8 +2159,8 @@ btrfs_qgroup_rescan(struct btrfs_fs_info *fs_info)
 
qgroup_rescan_zero_tracking(fs_info);
 
-   btrfs_queue_worker(&fs_info->qgroup_rescan_workers,
-  &fs_info->qgroup_rescan_work);
+   btrfs_queue_work(fs_info->qgroup_rescan_workers,
+&fs_info->qgroup_rescan_work);
 
return 0;
 }
@@ -2190,6 +2191,6 @@ 

[PATCH v5 00/18] Replace btrfs_workers with kernel workqueue based btrfs_workqueue

2014-02-27 Thread Qu Wenruo
Add a new btrfs_workqueue_struct which use kernel workqueue to implement
most of the original btrfs_workers, to replace btrfs_workers.

With this patchset, redundant workqueue codes are replaced with kernel
workqueue infrastructure, which not only reduces the code size but also the
effort to maintain it.

The result(somewhat outdated though) from sysbench shows minor improvement on 
the following server:
CPU: two-way Xeon X5660
RAM: 4G 
HDD: SAS HDD, 150G total, 100G partition for btrfs test

Test result on default mount option:
https://docs.google.com/spreadsheet/ccc?key=0AhpkL3ehzX3pdENjajJTWFg5d1BWbExnYWFpMTJxeUE&usp=sharing

Test result on "-o compress" mount option:
https://docs.google.com/spreadsheet/ccc?key=0AhpkL3ehzX3pdHdTTEJ6OW96SXJFaDR5enB1SzMzc0E&usp=sharing

Changelog:
v1->v2:
  - Fix some workqueue flags.
v2->v3:
  - Add the thresholding mechanism to simulate the old behavior
  - Convert all the btrfs_workers to btrfs_workrqueue_struct.
  - Fix some potential deadlock when executed in IRQ handler.
v3->v4:
  - Change the ordered workqueue implement to fix the performance drop in 32K
multi thread random write.
  - Change the high priority workqueue implement to get an independent high
workqueue without starving problem.
  - Simplify the btrfs_alloc_workqueue parameters.
  - Coding style cleanup.
  - Remove the redundant "_struct" suffix.
v4->v5:
  - Fix a multithread free-and-use bug reported by Josef and David.

Qu Wenruo (18):
  btrfs: Cleanup the unused struct async_sched.
  btrfs: Added btrfs_workqueue_struct implemented ordered execution
based on kernel workqueue
  btrfs: Add high priority workqueue support for btrfs_workqueue_struct
  btrfs: Add threshold workqueue based on kernel workqueue
  btrfs: Replace fs_info->workers with btrfs_workqueue.
  btrfs: Replace fs_info->delalloc_workers with btrfs_workqueue
  btrfs: Replace fs_info->submit_workers with btrfs_workqueue.
  btrfs: Replace fs_info->flush_workers with btrfs_workqueue.
  btrfs: Replace fs_info->endio_* workqueue with btrfs_workqueue.
  btrfs: Replace fs_info->rmw_workers workqueue with btrfs_workqueue.
  btrfs: Replace fs_info->cache_workers workqueue with btrfs_workqueue.
  btrfs: Replace fs_info->readahead_workers workqueue with
btrfs_workqueue.
  btrfs: Replace fs_info->fixup_workers workqueue with btrfs_workqueue.
  btrfs: Replace fs_info->delayed_workers workqueue with
btrfs_workqueue.
  btrfs: Replace fs_info->qgroup_rescan_worker workqueue with
btrfs_workqueue.
  btrfs: Replace fs_info->scrub_* workqueue with btrfs_workqueue.
  btrfs: Cleanup the old btrfs_worker.
  btrfs: Cleanup the "_struct" suffix in btrfs_workequeue

 fs/btrfs/async-thread.c  | 830 ---
 fs/btrfs/async-thread.h  | 119 ++-
 fs/btrfs/ctree.h |  39 ++-
 fs/btrfs/delayed-inode.c |   6 +-
 fs/btrfs/disk-io.c   | 212 +---
 fs/btrfs/extent-tree.c   |   4 +-
 fs/btrfs/inode.c |  38 +--
 fs/btrfs/ordered-data.c  |  11 +-
 fs/btrfs/qgroup.c|  15 +-
 fs/btrfs/raid56.c|  21 +-
 fs/btrfs/reada.c |   4 +-
 fs/btrfs/scrub.c |  70 ++--
 fs/btrfs/super.c |  36 +-
 fs/btrfs/volumes.c   |  16 +-
 14 files changed, 446 insertions(+), 975 deletions(-)

-- 
1.9.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 3.14.0-rc3 btrfs scrub is preventing my laptop from going to sleep

2014-02-27 Thread Wang Shilong

Hi Marc,

On 02/28/2014 03:06 AM, Marc MERLIN wrote:

This does not happen consistently, but sometimes:

PM: Preparing system for mem sleep
Freezing user space processes ...
(...)
  Freezing of tasks failed after 20.002 seconds (1 tasks refusing to freeze, 
wq_busy=0):
  btrfs   D 88017639c800 0 12239  12224 0x0084
   880165ec1960 0086 880165ec1fd8 88017639c2d0
   000141c0 88017639c2d0 88007b874000 8804062fa480
    880175837ec0 88007b874220 880165ec1970
  Call Trace:
   [] schedule+0x73/0x75
   [] scrub_pages+0x27e/0x426
   [] ? finish_wait+0x65/0x65
   [] scrub_stripe+0xada/0xc9e
   [] scrub_chunk.isra.9+0xd6/0x10d
   [] scrub_enumerate_chunks+0x274/0x418
   [] ? finish_wait+0x3/0x65
   [] btrfs_scrub_dev+0x254/0x3cb
   [] ? __mnt_want_write+0x62/0x78
   [] btrfs_ioctl+0x1114/0x24b1
   [] ? cache_alloc+0x1c/0x29b
   [] ? kmem_cache_alloc_node+0xef/0x179
   [] ? _raw_spin_unlock+0x17/0x2a
   [] do_vfs_ioctl+0x3d2/0x41d
   [] ? __fget+0x6f/0x79
   [] SyS_ioctl+0x57/0x82
   [] system_call_fastpath+0x1a/0x1f

Could you run the following command when scrub is blocked, we can know more
why scrub is blocked here.

# echo w >  /proc/sysrq-trigger
# dmesg

Thanks,
Wang


And then I end up with a hot laptop and a mostly dead battery in my backpack.

As far as I know, this was not happening with 3.13, unless I'm doing
something differently without knowing.

My laptop went to sleep just fine while I was typing this Email, so I'm guessing
it's only btrfs scrub that causes the problem with sleep.

Marc


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Help with space

2014-02-27 Thread Chris Murphy

On Feb 27, 2014, at 5:12 PM, Dave Chinner  wrote:

> On Thu, Feb 27, 2014 at 02:11:19PM -0700, Chris Murphy wrote:
>> 
>> On Feb 27, 2014, at 1:49 PM, otakujunct...@gmail.com wrote:
>> 
>>> Yes it's an ancient 32 bit machine.  There must be a complex bug
>>> involved as the system, when originally mounted, claimed the
>>> correct free space and only as used over time did the
>>> discrepancy between used and free grow.  I'm afraid I chose
>>> btrfs because it appeared capable of breaking the 16 tera limit
>>> on a 32 bit system.  If this isn't the case then it's incredible
>>> that I've been using this file system for about a year without
>>> difficulty until now.
>> 
>> Yep, it's not a good bug. This happened some years ago on XFS too,
>> where people would use the file system for a long time and then at
>> 16TB+1byte written to the volume, kablewy! And then it wasn't
>> usable at all, until put on a 64-bit kernel.
>> 
>> http://oss.sgi.com/pipermail/xfs/2014-February/034588.html
> 
> Well, no, that's not what I said.

What are you thinking I said you said? I wasn't quoting or paraphrasing 
anything you've said above. I had done a google search on this early and found 
some rather old threads where some people had this experience of making a large 
file system on a 32-bit kernel, and only after filling it beyond 16TB did they 
run into the problem. Here is one of them:

http://lists.centos.org/pipermail/centos/2011-April/109142.html



> I said that it was limited on XFS,
> not that the limit was a result of a user making a filesystem too
> large and then finding out it didn't work. Indeed, you can't do that
> on XFS - mkfs will refuse to run on a block device it can't access the
> last block on, and the kernel has the same "can I access the last
> block of the filesystem" sanity checks that are run at mount and
> growfs time.

Nope. What I reported on the XFS list, I had used mkfs.xfs while running 32bit 
kernel on a 20TB virtual disk. It did not fail to make the file system, it 
failed only to mount it. It was the same booted virtual machine, I created the 
file system and immediately mounted it. If you want the specifics, I'll post on 
the XFS list with versions and reproduce steps.


> 
> IOWs, XFS has *never* allowed >16TB on 32 bit systems on Linux.

OK that's fine, I've only reported what other people said they experienced, and 
it comes as no surprise they might have been confused. Although not knowing the 
size of one's file system would seem to be rare.


Chris Murphy

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Help with space

2014-02-27 Thread Dave Chinner
On Thu, Feb 27, 2014 at 02:11:19PM -0700, Chris Murphy wrote:
> 
> On Feb 27, 2014, at 1:49 PM, otakujunct...@gmail.com wrote:
> 
> > Yes it's an ancient 32 bit machine.  There must be a complex bug
> > involved as the system, when originally mounted, claimed the
> > correct free space and only as used over time did the
> > discrepancy between used and free grow.  I'm afraid I chose
> > btrfs because it appeared capable of breaking the 16 tera limit
> > on a 32 bit system.  If this isn't the case then it's incredible
> > that I've been using this file system for about a year without
> > difficulty until now.
> 
> Yep, it's not a good bug. This happened some years ago on XFS too,
> where people would use the file system for a long time and then at
> 16TB+1byte written to the volume, kablewy! And then it wasn't
> usable at all, until put on a 64-bit kernel.
> 
> http://oss.sgi.com/pipermail/xfs/2014-February/034588.html

Well, no, that's not what I said. I said that it was limited on XFS,
not that the limit was a result of a user making a filesystem too
large and then finding out it didn't work. Indeed, you can't do that
on XFS - mkfs will refuse to run on a block device it can't access the
last block on, and the kernel has the same "can I access the last
block of the filesystem" sanity checks that are run at mount and
growfs time.

IOWs, XFS has *never* allowed >16TB on 32 bit systems on Linux. And,
historically speaking, it didn't even allow it on Irix. Irix on 32
bit systems was limited to 1TB (2^31 sectors of 2^9 bytes = 1TB),
and only as Linux gained sufficient capability on 32 bit systems
(e.g.  CONFIG_LBD) was the limit increased. The limit we are now at
is the address space index being 32 bits, so the size is limited by
2^32 * PAGE_SIZE = 2^44 = 16TB

i.e Back when XFS was still being ported to Linux from Irix in 2000:

203 #if !XFS_BIG_FILESYSTEMS
204 if (sbp->sb_dblocks > INT_MAX || sbp->sb_rblocks > INT_MAX)  {
205 cmn_err(CE_WARN,
206 "XFS:  File systems greater than 1TB not supported on this system.\n");
207 return XFS_ERROR(E2BIG);
208 }
209 #endif

(http://oss.sgi.com/cgi-bin/gitweb.cgi?p=archive/xfs-import.git;a=blob;f=fs/xfs/xfs_mount.c;hb=60a4726a60437654e2af369ccc8458376e1657b9)

So, good story, but is not true.

Cheers,

Dave.

-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 3.14.0-rc3 btrfs scrub is preventing my laptop from going to sleep

2014-02-27 Thread Marc MERLIN
On Thu, Feb 27, 2014 at 11:06:56AM -0800, Marc MERLIN wrote:
> This does not happen consistently, but sometimes:
> 
> PM: Preparing system for mem sleep
> Freezing user space processes ... 
> (...)
>  Freezing of tasks failed after 20.002 seconds (1 tasks refusing to freeze, 
> wq_busy=0):
>  btrfs   D 88017639c800 0 12239  12224 0x0084
>   880165ec1960 0086 880165ec1fd8 88017639c2d0
>   000141c0 88017639c2d0 88007b874000 8804062fa480
>    880175837ec0 88007b874220 880165ec1970
>  Call Trace:
>   [] schedule+0x73/0x75
>   [] scrub_pages+0x27e/0x426
>   [] ? finish_wait+0x65/0x65
>   [] scrub_stripe+0xada/0xc9e
>   [] scrub_chunk.isra.9+0xd6/0x10d
>   [] scrub_enumerate_chunks+0x274/0x418
>   [] ? finish_wait+0x3/0x65
>   [] btrfs_scrub_dev+0x254/0x3cb
>   [] ? __mnt_want_write+0x62/0x78
>   [] btrfs_ioctl+0x1114/0x24b1
>   [] ? cache_alloc+0x1c/0x29b
>   [] ? kmem_cache_alloc_node+0xef/0x179
>   [] ? _raw_spin_unlock+0x17/0x2a
>   [] do_vfs_ioctl+0x3d2/0x41d
>   [] ? __fget+0x6f/0x79
>   [] SyS_ioctl+0x57/0x82
>   [] system_call_fastpath+0x1a/0x1f
 
Some time later, I go this one, not sure if it's btrfs' fault or not:
usb 1-11: new full-speed USB device number 7 using xhci_hcd

Freezing of tasks failed after 20.006 seconds (1 tasks refusing to freeze, 
wq_busy=0):
laptop_mode D 8800048c4a80 0  6657  1 0x0084
 880037f2bde0 0086 880037f2bfd8 8800048c4550
 000141c0 8800048c4550 8804072280e8 8804072280ec
 8800048c4550 8804072280f0  880037f2bdf0
Call Trace:
 [] schedule+0x73/0x75
 [] schedule_preempt_disabled+0x18/0x24
 [] __mutex_lock_slowpath+0x158/0x1cf
 [] mutex_lock+0x17/0x27
 [] control_store+0x44/0xb1
 [] dev_attr_store+0x18/0x24
 [] sysfs_kf_write+0x3e/0x40
 [] kernfs_fop_write+0xc2/0xff
 [] vfs_write+0xab/0x107
 [] SyS_write+0x46/0x79
 [] system_call_fastpath+0x1a/0x1f

Restarting tasks ... done.


-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | PGP 1024R/763BE901
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG: >16TB Btrfs volumes are mountable on 32 bit kernels

2014-02-27 Thread Chris Murphy

On Feb 27, 2014, at 2:07 PM, Josef Bacik  wrote:

> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
> 
> On 02/27/2014 04:05 PM, Chris Murphy wrote:
>> User reports successfully formatting and using an ~18TB Btrfs
>> volume on hardware raid5 using i686 kernel for over a year, and
>> then suddenly the file system starts behaving weirdly:
>> 
>> https://urldefense.proofpoint.com/v1/url?u=http://www.mail-archive.com/linux-btrfs%40vger.kernel.org/msg31856.html&k=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0A&r=cKCbChRKsMpTX8ybrSkonQ%3D%3D%0A&m=6eUt5RgBggFh930oFrH19iR4z%2BFVzT%2F0%2F4dYPt3g48U%3D%0A&s=5ac126734d7fa1d3238ab09a2ddc021a8dcc8fff7b022560a4d068be2de37c00
>> 
>> 
>> 
>> I think this is due to the kernel page cache address space being
>> 16TB limited on 32-bit kernels, as mentioned by Dave Chinner in
>> this thread:
>> 
>> https://urldefense.proofpoint.com/v1/url?u=http://oss.sgi.com/pipermail/xfs/2014-February/034588.html&k=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0A&r=cKCbChRKsMpTX8ybrSkonQ%3D%3D%0A&m=6eUt5RgBggFh930oFrH19iR4z%2BFVzT%2F0%2F4dYPt3g48U%3D%0A&s=3e45f9288e6a77bc1a24dded368802c2ab46b812bf59953f74d4ee1d4141f7d2
>> 
>> So it sounds like it shouldn't be possible to mount a Btrfs volume
>> larger than 16TB on 32-bit kernels. This is consistent with ext4
>> and XFS which refuse to mount large file systems.
>> 
>> 
> 
> Well that's not good, I'll fix this up.  Thanks,

Is it a valid or goofy work around to partition this 21TB volume into two equal 
portions, and then:

mkfs.btrfs -d single -m raid1 /dev/sdb[12]

Maybe it's too much of an edge case to permit it even if it worked?


Chris Murphy--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Help with space

2014-02-27 Thread Chris Murphy

On Feb 27, 2014, at 1:49 PM, otakujunct...@gmail.com wrote:

> Yes it's an ancient 32 bit machine.  There must be a complex bug involved as 
> the system, when originally mounted, claimed the correct free space and only 
> as used over time did the discrepancy between used and free grow.  I'm afraid 
> I chose btrfs because it appeared capable of breaking the 16 tera limit on a 
> 32 bit system.  If this isn't the case then it's incredible that I've been 
> using this file system for about a year without difficulty until now.

Yep, it's not a good bug. This happened some years ago on XFS too, where people 
would use the file system for a long time and then at 16TB+1byte written to the 
volume, kablewy! And then it wasn't usable at all, until put on a 64-bit kernel.

http://oss.sgi.com/pipermail/xfs/2014-February/034588.html

I can't tell you if there's a work around for this other than to go to a 64bit 
kernel. Maybe you could partition the raid5 into two 9TB block devices, and 
then format the two partitions with -d single -m raid1. That way it behaves as 
one volume, and alternates 1GB chunks to the two partitions. This should be 
decent performing for large files, but otherwise it's possible that you will 
sometimes have the allocator writing to two data chunks on what it thinks are 
two drives, atthe same time, but it's actually writing to the physical device 
(array) at the same time. Hardware raid should optimize some of this, but I 
don't know what the penalty will be, if it'll work for your use case.

And I definitely don't know if the kernel page cache limit applies to the block 
device (partition) or if it applies to the file system. It sounds like it 
applies to the block device, so this might be a way around this if you had to 
stick to a 32bit system.


Chris Murphy--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG: >16TB Btrfs volumes are mountable on 32 bit kernels

2014-02-27 Thread Josef Bacik
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 02/27/2014 04:05 PM, Chris Murphy wrote:
> User reports successfully formatting and using an ~18TB Btrfs
> volume on hardware raid5 using i686 kernel for over a year, and
> then suddenly the file system starts behaving weirdly:
> 
> https://urldefense.proofpoint.com/v1/url?u=http://www.mail-archive.com/linux-btrfs%40vger.kernel.org/msg31856.html&k=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0A&r=cKCbChRKsMpTX8ybrSkonQ%3D%3D%0A&m=6eUt5RgBggFh930oFrH19iR4z%2BFVzT%2F0%2F4dYPt3g48U%3D%0A&s=5ac126734d7fa1d3238ab09a2ddc021a8dcc8fff7b022560a4d068be2de37c00
>
> 
> 
> I think this is due to the kernel page cache address space being
> 16TB limited on 32-bit kernels, as mentioned by Dave Chinner in
> this thread:
> 
> https://urldefense.proofpoint.com/v1/url?u=http://oss.sgi.com/pipermail/xfs/2014-February/034588.html&k=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0A&r=cKCbChRKsMpTX8ybrSkonQ%3D%3D%0A&m=6eUt5RgBggFh930oFrH19iR4z%2BFVzT%2F0%2F4dYPt3g48U%3D%0A&s=3e45f9288e6a77bc1a24dded368802c2ab46b812bf59953f74d4ee1d4141f7d2
>
>  So it sounds like it shouldn't be possible to mount a Btrfs volume
> larger than 16TB on 32-bit kernels. This is consistent with ext4
> and XFS which refuse to mount large file systems.
> 
> 

Well that's not good, I'll fix this up.  Thanks,

Josef

-BEGIN PGP SIGNATURE-
Version: GnuPG v1
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBAgAGBQJTD6j6AAoJEANb+wAKly3BhIEQAJheOf/NMEurHSlxnLWYuRog
thRJMk+je1Ae9Sz93B5/0OztrmzYhK+OoQhWuF79OxVPoMeZ2Ta5qqmeNw3U+dRn
T44SjlYRnerq0ksVt9xR9j2zMXWatgO5+20doZpeESco/IRWYkakQTyrWj9WUATN
7YQsxZB57nijrOvig0GPmMtH9PriscsPQhMVDuTDIHkvfWgk0M2oqu/0TZl9f5xA
Es1uK0rv6KsExVQix+4GjWc/RBpl2QzxGEq/Ct+vcL+HaiKIERXuEw5liP9dSamj
Wqbkkli+FDftBx/GXGTA38VYSxLExrlF891R4fOXWUcqDvlLwhdBpZExeYCV9MUz
lEtaZaKUa3eeRBwzuxeLT8mEvY3BqvePQg8Io7auuIHG4fuRlOWWRiDG7bpTTPlD
NFZACEDlGGdXNli7TqQ82La9kxFDvXCISnfxNbbu2vlXqL/HqQom1HiPwgMNIDQ7
0UIOLW5X+gg++kH7ArhOv19B7FR2i50wxuJSwj2/XSLELAPFAd9/BMI+3DXWfkE4
qZwnHEt8bVKR/yJ+srnRC2mZP41eHWHA6c9IXEGU/STy2uOdnwnXoS+KAdNWEt1d
QRlr79S8Mhf7U8Acx/LhgwkbB1npmm0xssZmK2WycSyU7A66rdk0Cc+gfVyZOW5C
k68LvCitzpU1W7MSMmPt
=9S2b
-END PGP SIGNATURE-
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


BUG: >16TB Btrfs volumes are mountable on 32 bit kernels

2014-02-27 Thread Chris Murphy
User reports successfully formatting and using an ~18TB Btrfs volume on 
hardware raid5 using i686 kernel for over a year, and then suddenly the file 
system starts behaving weirdly:

http://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg31856.html


I think this is due to the kernel page cache address space being 16TB limited 
on 32-bit kernels, as mentioned by Dave Chinner in this thread:

http://oss.sgi.com/pipermail/xfs/2014-February/034588.html

So it sounds like it shouldn't be possible to mount a Btrfs volume larger than 
16TB on 32-bit kernels. This is consistent with ext4 and XFS which refuse to 
mount large file systems.



Chris Murphy--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Help with space

2014-02-27 Thread otakujunction
Yes it's an ancient 32 bit machine.  There must be a complex bug involved as 
the system, when originally mounted, claimed the correct free space and only as 
used over time did the discrepancy between used and free grow.  I'm afraid I 
chose btrfs because it appeared capable of breaking the 16 tera limit on a 32 
bit system.  If this isn't the case then it's incredible that I've been using 
this file system for about a year without difficulty until now.

-Justin

Sent from my iPad

> On Feb 27, 2014, at 1:51 PM, Chris Murphy  wrote:
> 
> 
>> On Feb 27, 2014, at 12:27 PM, Chris Murphy  wrote:
>> This is on i686?
>> 
>> The kernel page cache is limited to 16TB on i686, so effectively your block 
>> device is limited to 16TB. While the file system successfully creates, I 
>> think it's a bug that the mount -t btrfs command is probably a btrfs bug.
> 
> Yes Chris, circular logic day. It's probably a btrfs bug that the mount 
> command succeeds.
> 
> So let us know if this is i686 or x86_64, because if it's the former it's a 
> bug that should get fixed.
> 
> 
> Chris Murphy
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Help with space

2014-02-27 Thread Chris Murphy

On Feb 27, 2014, at 12:27 PM, Chris Murphy  wrote:
> This is on i686?
> 
> The kernel page cache is limited to 16TB on i686, so effectively your block 
> device is limited to 16TB. While the file system successfully creates, I 
> think it's a bug that the mount -t btrfs command is probably a btrfs bug.

Yes Chris, circular logic day. It's probably a btrfs bug that the mount command 
succeeds.

So let us know if this is i686 or x86_64, because if it's the former it's a bug 
that should get fixed.


Chris Murphy

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Help with space

2014-02-27 Thread Chris Murphy

On Feb 27, 2014, at 11:19 AM, Justin Brown  wrote:

> I've a 18 tera hardware raid 5 (areca ARC-1170 w/ 8 3 gig drives) in
> need of help.  Disk usage (du) shows 13 tera allocated yet strangely
> enough df shows approx. 780 gigs are free.  It seems, somehow, btrfs
> has eaten roughly 4 tera internally.  I've run a scrub and a balance
> usage=5 with no success, in fact I lost about 20 gigs after the
> balance attempt.  Some numbers:
> 
> terra:/var/lib/nobody/fs/ubfterra # uname -a
> Linux terra 3.12.4-2.44-desktop #1 SMP PREEMPT Mon Dec 9 03:14:51 CST
> 2013 i686 i686 i386 GNU/Linux

This is on i686?

The kernel page cache is limited to 16TB on i686, so effectively your block 
device is limited to 16TB. While the file system successfully creates, I think 
it's a bug that the mount -t btrfs command is probably a btrfs bug.

The way this works for XFS and ext4 is mount fails.

EXT4-fs (sdc): filesystem too large to mount safely on this system
XFS (sdc): file system too large to be mounted on this system.

If you're on a 32-bit OS, the file system might be toast, I'm not really sure. 
But I'd immediately stop using it and only use 64-bit OS for file systems of 
this size.



Chris Murphy

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


3.14.0-rc3 btrfs scrub is preventing my laptop from going to sleep

2014-02-27 Thread Marc MERLIN
This does not happen consistently, but sometimes:

PM: Preparing system for mem sleep
Freezing user space processes ... 
(...)
 Freezing of tasks failed after 20.002 seconds (1 tasks refusing to freeze, 
wq_busy=0):
 btrfs   D 88017639c800 0 12239  12224 0x0084
  880165ec1960 0086 880165ec1fd8 88017639c2d0
  000141c0 88017639c2d0 88007b874000 8804062fa480
   880175837ec0 88007b874220 880165ec1970
 Call Trace:
  [] schedule+0x73/0x75
  [] scrub_pages+0x27e/0x426
  [] ? finish_wait+0x65/0x65
  [] scrub_stripe+0xada/0xc9e
  [] scrub_chunk.isra.9+0xd6/0x10d
  [] scrub_enumerate_chunks+0x274/0x418
  [] ? finish_wait+0x3/0x65
  [] btrfs_scrub_dev+0x254/0x3cb
  [] ? __mnt_want_write+0x62/0x78
  [] btrfs_ioctl+0x1114/0x24b1
  [] ? cache_alloc+0x1c/0x29b
  [] ? kmem_cache_alloc_node+0xef/0x179
  [] ? _raw_spin_unlock+0x17/0x2a
  [] do_vfs_ioctl+0x3d2/0x41d
  [] ? __fget+0x6f/0x79
  [] SyS_ioctl+0x57/0x82
  [] system_call_fastpath+0x1a/0x1f


And then I end up with a hot laptop and a mostly dead battery in my backpack.

As far as I know, this was not happening with 3.13, unless I'm doing
something differently without knowing.

My laptop went to sleep just fine while I was typing this Email, so I'm guessing
it's only btrfs scrub that causes the problem with sleep.

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | PGP 1024R/763BE901
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Help with space

2014-02-27 Thread Justin Brown
I've a 18 tera hardware raid 5 (areca ARC-1170 w/ 8 3 gig drives) in
need of help.  Disk usage (du) shows 13 tera allocated yet strangely
enough df shows approx. 780 gigs are free.  It seems, somehow, btrfs
has eaten roughly 4 tera internally.  I've run a scrub and a balance
usage=5 with no success, in fact I lost about 20 gigs after the
balance attempt.  Some numbers:

terra:/var/lib/nobody/fs/ubfterra # uname -a
Linux terra 3.12.4-2.44-desktop #1 SMP PREEMPT Mon Dec 9 03:14:51 CST
2013 i686 i686 i386 GNU/Linux

terra:/var/lib/nobody/fs/ubfterra # parted -l
Model: Areca ARC-1170-VOL#00 (scsi)
Disk /dev/sdb: 21.0TB
Sector size (logical/physical): 4096B/4096B
Partition Table: gpt

Number  Start   End SizeFile system  Name  Flags
 1  1049kB  21.0TB  21.0TB   Linux filesystem

terra:/var/lib/nobody/fs/ubfterra # du -shc *
1.7M40588-4-1376856876.jpg
2.7M40588-4-1376856876b.jpg
1008G   Anime
180GDoctor Who (classic)
5.5TDownloads
28G Flash Rescue
1.9TJus
3.6TTornado
4.0Kdirsanime
4.0Kfilesanime
55G home videos
0   testsub
4.0Kunsharedanime
13T total

terra:/var/lib/nobody/fs/ubfterra # btrfs fi show /dev/sdb1
Label: ubfterra  uuid: 40f0f692-c68c-4af7-ade2-c15a127ceab5
Total devices 1 FS bytes used 17.61TiB
devid1 size 19.10TiB used 18.34TiB path /dev/sdb1

Btrfs v3.12

terra:/var/lib/nobody/fs/ubfterra # btrfs fi df .
Data, single: total=17.58TiB, used=17.57TiB
System, DUP: total=8.00MiB, used=1.93MiB
System, single: total=4.00MiB, used=0.00
Metadata, DUP: total=392.00GiB, used=33.50GiB
Metadata, single: total=8.00MiB, used=0.00


I use no subvolumes nor are there any snapshots, at least as near as I
can tell.  Any suggestions as to how to recover the missing space
assuming it's possible?  Any help is most appreciated.

-Justin
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: What are the linux kernel versions incompatibilities with btrfs?

2014-02-27 Thread Felix Blanke
Hi,

I can't give you a specific answer to your question. But because btrfs
is still under heavy development you shouldn't use it with those old
kernels at all in my oppinion. You should never be more than one
version away from the current stable kernel.

Regards,
Felix

On Thu, Feb 27, 2014 at 5:31 PM, Brent Millare  wrote:
> I read that usage of a btrfs volume with a newer kernel can render it
> unreadable when that same volume is used with an older kernel. I have
> a mobile storage device that will be used by different linux
> distributions and kernels. What are the kernel version
> incompatibilities I might have to worry about? The machines I will use
> have kernel versions 3.2.0, 3.5.0, and higher.
>
> -Brent
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/9] Btrfs: use bitfield instead of integer data type for the some variants in btrfs_root

2014-02-27 Thread David Sterba
On Wed, Feb 26, 2014 at 05:10:05PM +0800, Miao Xie wrote:
> On Sat, 22 Feb 2014 01:23:37 +0100, David Sterba wrote:
> > On Thu, Feb 20, 2014 at 06:08:54PM +0800, Miao Xie wrote:
> >> @@ -1352,13 +1347,15 @@ static struct btrfs_root *alloc_log_tree(struct 
> >> btrfs_trans_handle *trans,
> >>root->root_key.objectid = BTRFS_TREE_LOG_OBJECTID;
> >>root->root_key.type = BTRFS_ROOT_ITEM_KEY;
> >>root->root_key.offset = BTRFS_TREE_LOG_OBJECTID;
> >> +
> >>/*
> >> +   * DON'T set REF_COWS for log trees
> >> +   *
> >> * log trees do not get reference counted because they go away
> >> * before a real commit is actually done.  They do store pointers
> >> * to file data extents, and those reference counts still get
> >> * updated (along with back refs to the log tree).
> >> */
> >> -  root->ref_cows = 0;
> > 
> > This looks like a bugfix hidden in a cleanup patch. If it is standalone
> > and not related to changes in this patchset, it makes sense to send it
> > separately (and possibly CC stable).
> 
> It is a cleanup because we have set it to 0 before.
> 
> I add this comment just to remind the other developer that don't set this 
> flag.
> (The old one is not so striking, I think.)

Ox, thanks for explanation.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


What are the linux kernel versions incompatibilities with btrfs?

2014-02-27 Thread Brent Millare
I read that usage of a btrfs volume with a newer kernel can render it
unreadable when that same volume is used with an older kernel. I have
a mobile storage device that will be used by different linux
distributions and kernels. What are the kernel version
incompatibilities I might have to worry about? The machines I will use
have kernel versions 3.2.0, 3.5.0, and higher.

-Brent
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: throttle delayed refs better

2014-02-27 Thread Josef Bacik
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 02/27/2014 10:38 AM, 钱凯 wrote:
> I'm a little confused of what "avg_delayed_ref_runtime" means.
> 
> In __btrfs_run_delayed_refs(), "avg_delayed_ref_runtime" is set to
> the runtime of all delayed refs processed in current transaction
> commit. However, in btrfs_should_throttle_delayed_refs(), we based
> on the following condition to decide whether throttle refs or not: 
> * avg_runtime =
> fs_info->avg_delayed_ref_runtime; if (num_entries * avg_runtime >=
> NSEC_PER_SEC) return 1; * 
> It looks like "avg_delayed_ref_runtime" is used as runtime of each 
> delayed ref processed in average here. So what does it really
> means?
> 

Yeah I screwed this up, I should have been dividing the total time by
the number of delayed refs I ran.  I have a patch locally to fix it
and I'll send it out after I finish my qgroup work.  Thanks,

Josef

-BEGIN PGP SIGNATURE-
Version: GnuPG v1
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBAgAGBQJTD2AlAAoJEANb+wAKly3BQkEP/0F/LGGDsO+x63SAFh/apRZo
ZVmzi1yJGiArFImFs8IwZHKgr/HpP9yYYFqyDCTSYrErI32bjpPbSDKlFDiIKYBq
6mTptPlC6AJQcMJf3oV2SqUoQxI6Ea+04QaTtZwE5pDaTZsjD47QYfSyw/i+YwOr
Ds11ayDeU3FSj8JVYDKFg5ZBifv/mIHbh1fb8xc4R5XCWsbRzIL9LiQa9c56EEOq
vzXp57TIetbJdliK0cYQtPkA7R40us8TqVBH5MfcZPgITyBun3e0zrGxWmW6caTs
viejEbqDhyHLHCing+mMI6GX7w16duq5oG+w4nnjjyuMzWAyNN2pxloqQsWwOyv8
7+33JZCtVG/txRMIXkvc3bqzetrUyPAruo+M3pstN7B2dph6TDV0QJSFnxee6mKf
4/zseNOJtQqjHe5QJNcVJtkDaxgGBkSONHLm5Gz8rFU3XKcNZQcocV+0EtIjE7Zs
D5oDYCAyrxG1VKoFWhdaS883PDokRr75jcnFui4GhhFr5OAOdS3OOTLKVizWUag1
O11d9XsjnzLWiVTsZH+f4K0ONQcUwJFV0zADgYsXtU2LDHHNIPZX9+qSAa+L66hT
Ki6hocoZ4cXyGWcTZPtlGHxAmV2kEh8/Tr1ePfwy7FzTrg9hWUGLXY0DliQDPmIB
w3TdOa+Ghjl8dcaGc2rX
=kSsY
-END PGP SIGNATURE-
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: throttle delayed refs better

2014-02-27 Thread 钱凯
I'm a little confused of what "avg_delayed_ref_runtime" means.

In __btrfs_run_delayed_refs(), "avg_delayed_ref_runtime" is set to the
runtime of all delayed refs processed in current transaction commit.
However, in btrfs_should_throttle_delayed_refs(), we based on the
following condition to decide whether throttle refs or not:
*
   avg_runtime = fs_info->avg_delayed_ref_runtime;
   if (num_entries * avg_runtime >= NSEC_PER_SEC)
   return 1;
*
It looks like "avg_delayed_ref_runtime" is used as runtime of each
delayed ref processed in average here. So what does it really means?

Thanks,
Kai

2014-01-24 2:07 GMT+08:00 Josef Bacik :
> On one of our gluster clusters we noticed some pretty big lag spikes.  This
> turned out to be because our transaction commit was taking like 3 minutes to
> complete.  This is because we have like 30 gigs of metadata, so our global
> reserve would end up being the max which is like 512 mb.  So our throttling 
> code
> would allow a ridiculous amount of delayed refs to build up and then they'd 
> all
> get run at transaction commit time, and for a cold mounted file system that
> could take up to 3 minutes to run.  So fix the throttling to be based on both
> the size of the global reserve and how long it takes us to run delayed refs.
> This patch tracks the time it takes to run delayed refs and then only allows 1
> seconds worth of outstanding delayed refs at a time.  This way it will 
> auto-tune
> itself from cold cache up to when everything is in memory and it no longer has
> to go to disk.  This makes our transaction commits take much less time to run.
> Thanks,
>
> Signed-off-by: Josef Bacik 
> ---
>  fs/btrfs/ctree.h   |  3 +++
>  fs/btrfs/disk-io.c |  2 +-
>  fs/btrfs/extent-tree.c | 41 -
>  fs/btrfs/transaction.c |  4 ++--
>  4 files changed, 46 insertions(+), 4 deletions(-)
>
> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> index 3cebb4a..ca6bcc3 100644
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -1360,6 +1360,7 @@ struct btrfs_fs_info {
>
> u64 generation;
> u64 last_trans_committed;
> +   u64 avg_delayed_ref_runtime;
>
> /*
>  * this is updated to the current trans every time a full commit
> @@ -3172,6 +3173,8 @@ static inline u64 btrfs_calc_trunc_metadata_size(struct 
> btrfs_root *root,
>
>  int btrfs_should_throttle_delayed_refs(struct btrfs_trans_handle *trans,
>struct btrfs_root *root);
> +int btrfs_check_space_for_delayed_refs(struct btrfs_trans_handle *trans,
> +  struct btrfs_root *root);
>  void btrfs_put_block_group(struct btrfs_block_group_cache *cache);
>  int btrfs_run_delayed_refs(struct btrfs_trans_handle *trans,
>struct btrfs_root *root, unsigned long count);
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index ed23127..f0e7bbe 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -2185,7 +2185,7 @@ int open_ctree(struct super_block *sb,
> fs_info->free_chunk_space = 0;
> fs_info->tree_mod_log = RB_ROOT;
> fs_info->commit_interval = BTRFS_DEFAULT_COMMIT_INTERVAL;
> -
> +   fs_info->avg_delayed_ref_runtime = div64_u64(NSEC_PER_SEC, 64);
> /* readahead state */
> INIT_RADIX_TREE(&fs_info->reada_tree, GFP_NOFS & ~__GFP_WAIT);
> spin_lock_init(&fs_info->reada_lock);
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index c77156c..b532259 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -2322,8 +2322,10 @@ static noinline int __btrfs_run_delayed_refs(struct 
> btrfs_trans_handle *trans,
> struct btrfs_delayed_ref_head *locked_ref = NULL;
> struct btrfs_delayed_extent_op *extent_op;
> struct btrfs_fs_info *fs_info = root->fs_info;
> +   ktime_t start = ktime_get();
> int ret;
> unsigned long count = 0;
> +   unsigned long actual_count = 0;
> int must_insert_reserved = 0;
>
> delayed_refs = &trans->transaction->delayed_refs;
> @@ -2452,6 +2454,7 @@ static noinline int __btrfs_run_delayed_refs(struct 
> btrfs_trans_handle *trans,
>  &delayed_refs->href_root);
> spin_unlock(&delayed_refs->lock);
> } else {
> +   actual_count++;
> ref->in_tree = 0;
> rb_erase(&ref->rb_node, &locked_ref->ref_root);
> }
> @@ -2502,6 +2505,26 @@ static noinline int __btrfs_run_delayed_refs(struct 
> btrfs_trans_handle *trans,
> count++;
> cond_resched();
> }
> +
> +   /*
> +* We don't want to include ref heads since we can have empty ref 
> heads
> +* and those will drastically skew our runtime down 

[PATCH] Btrfs-progs: make sure to save mirror_num only if it is set

2014-02-27 Thread Josef Bacik
If we are cycling through all of the mirrors trying to find the best one we need
to make sure we set best_mirror to an actual mirror number and not 0.  Otherwise
we could end up reading a mirror that wasn't the best and make everybody sad.
Thanks,

Signed-off-by: Josef Bacik 
---
 disk-io.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/disk-io.c b/disk-io.c
index e840177..0bd1bb0 100644
--- a/disk-io.c
+++ b/disk-io.c
@@ -297,7 +297,7 @@ struct extent_buffer *read_tree_block(struct btrfs_root 
*root, u64 bytenr,
ignore = 1;
continue;
}
-   if (btrfs_header_generation(eb) > best_transid) {
+   if (btrfs_header_generation(eb) > best_transid && mirror_num) {
best_transid = btrfs_header_generation(eb);
good_mirror = mirror_num;
}
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs-progs: record generation for tree blocks in fsck

2014-02-27 Thread Josef Bacik
When working with a user who had a broken file system I noticed that we were
reading a bad copy of a block when the other copy was perfectly fine.  This is
because we don't keep track of the parent generation for tree blocks, so we just
read whichever copy we damned well please with no regards for which is best.
This fixes this problem by recording the parent generation of the tree block so
we can be sure to read the most correct copy before we check it, which will give
us a better chance of fixing really broken filesystems.  Thanks,

Signed-off-by: Josef Bacik 
---
 cmds-check.c | 32 +---
 1 file changed, 25 insertions(+), 7 deletions(-)

diff --git a/cmds-check.c b/cmds-check.c
index 2911af0..2fc5253 100644
--- a/cmds-check.c
+++ b/cmds-check.c
@@ -98,6 +98,7 @@ struct extent_record {
u64 refs;
u64 extent_item_refs;
u64 generation;
+   u64 parent_generation;
u64 info_objectid;
u64 num_duplicates;
u8 info_level;
@@ -2643,7 +2644,7 @@ static struct data_backref *alloc_data_backref(struct 
extent_record *rec,
 }
 
 static int add_extent_rec(struct cache_tree *extent_cache,
- struct btrfs_key *parent_key,
+ struct btrfs_key *parent_key, u64 parent_gen,
  u64 start, u64 nr, u64 extent_item_refs,
  int is_root, int inc_ref, int set_checked,
  int metadata, int extent_rec, u64 max_size)
@@ -2719,6 +2720,8 @@ static int add_extent_rec(struct cache_tree *extent_cache,
 
if (parent_key)
btrfs_cpu_key_to_disk(&rec->parent_key, parent_key);
+   if (parent_gen)
+   rec->parent_generation = parent_gen;
 
if (rec->max_size < max_size)
rec->max_size = max_size;
@@ -2759,6 +2762,11 @@ static int add_extent_rec(struct cache_tree 
*extent_cache,
else
memset(&rec->parent_key, 0, sizeof(*parent_key));
 
+   if (parent_gen)
+   rec->parent_generation = parent_gen;
+   else
+   rec->parent_generation = 0;
+
rec->cache.start = start;
rec->cache.size = nr;
ret = insert_cache_extent(extent_cache, &rec->cache);
@@ -2780,7 +2788,7 @@ static int add_tree_backref(struct cache_tree 
*extent_cache, u64 bytenr,
 
cache = lookup_cache_extent(extent_cache, bytenr, 1);
if (!cache) {
-   add_extent_rec(extent_cache, NULL, bytenr,
+   add_extent_rec(extent_cache, NULL, 0, bytenr,
   1, 0, 0, 0, 0, 1, 0, 0);
cache = lookup_cache_extent(extent_cache, bytenr, 1);
if (!cache)
@@ -2828,7 +2836,7 @@ static int add_data_backref(struct cache_tree 
*extent_cache, u64 bytenr,
 
cache = lookup_cache_extent(extent_cache, bytenr, 1);
if (!cache) {
-   add_extent_rec(extent_cache, NULL, bytenr, 1, 0, 0, 0, 0,
+   add_extent_rec(extent_cache, NULL, 0, bytenr, 1, 0, 0, 0, 0,
   0, 0, max_size);
cache = lookup_cache_extent(extent_cache, bytenr, 1);
if (!cache)
@@ -3315,7 +3323,7 @@ static int process_extent_item(struct btrfs_root *root,
 #else
BUG();
 #endif
-   return add_extent_rec(extent_cache, NULL, key.objectid,
+   return add_extent_rec(extent_cache, NULL, 0, key.objectid,
  num_bytes, refs, 0, 0, 0, metadata, 1,
  num_bytes);
}
@@ -3323,7 +3331,7 @@ static int process_extent_item(struct btrfs_root *root,
ei = btrfs_item_ptr(eb, slot, struct btrfs_extent_item);
refs = btrfs_extent_refs(eb, ei);
 
-   add_extent_rec(extent_cache, NULL, key.objectid, num_bytes,
+   add_extent_rec(extent_cache, NULL, 0, key.objectid, num_bytes,
   refs, 0, 0, 0, metadata, 1, num_bytes);
 
ptr = (unsigned long)(ei + 1);
@@ -3836,6 +3844,7 @@ static int run_next_block(struct btrfs_trans_handle 
*trans,
u64 owner;
u64 flags;
u64 ptr;
+   u64 gen = 0;
int ret = 0;
int i;
int nritems;
@@ -3885,8 +3894,16 @@ static int run_next_block(struct btrfs_trans_handle 
*trans,
free(cache);
}
 
+   cache = lookup_cache_extent(extent_cache, bytenr, size);
+   if (cache) {
+   struct extent_record *rec;
+
+   rec = container_of(cache, struct extent_record, cache);
+   gen = rec->parent_generation;
+   }
+
/* fixme, get the real parent transid */
-   buf = read_tree_block(root, bytenr, size, 0);
+   buf = read_tree_block(root, bytenr, size, gen);
if (!extent_buffer_uptodate(buf)) {
record_bad_block_io(root->fs_info,
extent_cache, bytenr, size);

Re: Incremental backup over writable snapshot

2014-02-27 Thread GEO
@Kai, Thank you very much for your reply. Sorry, I just saw it now. 
I will take care of the mailing issue now, so that it does not happen again in 
the future.

Sorry for the inconveniences!
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Incremental backup over writable snapshot

2014-02-27 Thread GEO
Does anyone have a technical info regarding the reliability of the incremental 
backup process using the said method?
(Apart from all the recommendations not to do it that way)
So the question I am interested in: Should it work or not?
I did some testing myself and it seemed to work, however I cannot find out if 
it backs up unnecessary blocks and thus making the incremental step space 
inefficient.
That information would help me very much!
Thank you very much!

On Wednesday 19 February 2014 14:45:57 GEO wrote:
> Hi,
> 
> As suggested in another thread, I would like to know the reliability of the
> following backup scheme:
> 
> Suppose I have a subvolume of my homedirectory  called @home.
> 
> Now I am interested in making incremental backups of data in home I am
> interested in, but not everything, so I create a normal snapshot of @home
> called @home-w and delete the files/folders I am not interested in backing
> up. After that I create a readonly snapshot of @home-w called @home-r, that
> I sent to my target volume with btrfs send.
> 
> After that is done, I do regular backups, by always going over the writeable
> snapshot where I remove always the same directories I am not interested and
> send the difference to the target volume with  btrfs send -p @home-r
> @home-r-1| btrfs receive /path/of/target/volume.
> 
> I do not like the idea of making subvolumes of all directories I am not
> interested in backing up.
> 
> So what I would like to know now is the following: Could there be drawbacks
> of doing this resp. could I further optimize my backup strategy, as I
> experienced it takes a while for deleting large files in the writeable
> snapshot (What does it write there?)
> 
> Could my method somehow lead to inefficiency in terms of the disk space used
> at the target volume (I mean, could the deleting cause a change, so that
> more is actually transferred as change, than in reality is?)?
> 
> One last question would be: Is there a quick way I could verify the local
> read only snapshot used last time is the same as the one synced to the
> target volume last time?
> 
> 
> Thank you for your support and the great work!


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: use btrfs_crc32c everywhere instead of libcrc32c

2014-02-27 Thread Philipp Klein
Hi,

I am the Arch user who initially reported this problem to the AUR (
https://aur.archlinux.org/packages/linux-mainline/).


2014-02-27 13:43 GMT+01:00 Filipe David Manana :

> On Wed, Feb 26, 2014 at 11:26 PM, WorMzy Tykashi
>  wrote:
> > On 29 January 2014 21:06, Filipe David Borba Manana 
> wrote:
> >> After the commit titled "Btrfs: fix btrfs boot when compiled as
> built-in",
> >> LIBCRC32C requirement was removed from btrfs' Kconfig. This made it not
> >> possible to build a kernel with btrfs enabled (either as module or
> built-in)
> >> if libcrc32c is not enabled as well. So just replace all uses of
> libcrc32c
> >> with the equivalent function in btrfs hash.h - btrfs_crc32c.
> >>
> >> Signed-off-by: Filipe David Borba Manana 
> >> ---
> >>  fs/btrfs/check-integrity.c |4 ++--
> >>  fs/btrfs/disk-io.c |4 ++--
> >>  fs/btrfs/send.c|4 ++--
> >>  3 files changed, 6 insertions(+), 6 deletions(-)
> >>
> >> diff --git a/fs/btrfs/check-integrity.c b/fs/btrfs/check-integrity.c
> >> index 160fb50..39bfd56 100644
> >> --- a/fs/btrfs/check-integrity.c
> >> +++ b/fs/btrfs/check-integrity.c
> >> @@ -92,11 +92,11 @@
> >>  #include 
> >>  #include 
> >>  #include 
> >> -#include 
> >>  #include 
> >>  #include 
> >>  #include "ctree.h"
> >>  #include "disk-io.h"
> >> +#include "hash.h"
> >>  #include "transaction.h"
> >>  #include "extent_io.h"
> >>  #include "volumes.h"
> >> @@ -1823,7 +1823,7 @@ static int btrfsic_test_for_metadata(struct
> btrfsic_state *state,
> >> size_t sublen = i ? PAGE_CACHE_SIZE :
> >> (PAGE_CACHE_SIZE - BTRFS_CSUM_SIZE);
> >>
> >> -   crc = crc32c(crc, data, sublen);
> >> +   crc = btrfs_crc32c(crc, data, sublen);
> >> }
> >> btrfs_csum_final(crc, csum);
> >> if (memcmp(csum, h->csum, state->csum_size))
> >> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> >> index 7619147..3903bd3 100644
> >> --- a/fs/btrfs/disk-io.c
> >> +++ b/fs/btrfs/disk-io.c
> >> @@ -26,7 +26,6 @@
> >>  #include 
> >>  #include 
> >>  #include 
> >> -#include 
> >>  #include 
> >>  #include 
> >>  #include 
> >> @@ -35,6 +34,7 @@
> >>  #include 
> >>  #include "ctree.h"
> >>  #include "disk-io.h"
> >> +#include "hash.h"
> >>  #include "transaction.h"
> >>  #include "btrfs_inode.h"
> >>  #include "volumes.h"
> >> @@ -244,7 +244,7 @@ out:
> >>
> >>  u32 btrfs_csum_data(char *data, u32 seed, size_t len)
> >>  {
> >> -   return crc32c(seed, data, len);
> >> +   return btrfs_crc32c(seed, data, len);
> >>  }
> >>
> >>  void btrfs_csum_final(u32 crc, char *result)
> >> diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
> >> index 04c07ed..31b76d0 100644
> >> --- a/fs/btrfs/send.c
> >> +++ b/fs/btrfs/send.c
> >> @@ -24,12 +24,12 @@
> >>  #include 
> >>  #include 
> >>  #include 
> >> -#include 
> >>  #include 
> >>  #include 
> >>
> >>  #include "send.h"
> >>  #include "backref.h"
> >> +#include "hash.h"
> >>  #include "locking.h"
> >>  #include "disk-io.h"
> >>  #include "btrfs_inode.h"
> >> @@ -620,7 +620,7 @@ static int send_cmd(struct send_ctx *sctx)
> >> hdr->len = cpu_to_le32(sctx->send_size - sizeof(*hdr));
> >> hdr->crc = 0;
> >>
> >> -   crc = crc32c(0, (unsigned char *)sctx->send_buf,
> sctx->send_size);
> >> +   crc = btrfs_crc32c(0, (unsigned char *)sctx->send_buf,
> sctx->send_size);
> >> hdr->crc = cpu_to_le32(crc);
> >>
> >> ret = write_buf(sctx->send_filp, sctx->send_buf,
> sctx->send_size,
> >> --
> >> 1.7.9.5
> >>
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
> in
> >> the body of a message to majord...@vger.kernel.org
> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >
> > Hi,
>
> Hi
>
> >
> > Ever since this patch was committed (git ref
> > 0b947aff1599afbbd2ec07ada87b05af0f94cf10), the btrfs module
> > (presumably intentionally) no longer depends on the crc32c module.
>
> To be more clear, it no longer depends on LIBCRC32C (which is just a
> convenience library to access crypto's crc32c).
> It still depends on CRYPTO and CRYPTO_CRC32C (which is what LIBCRC32C
> uses).
>
> > However, this means that this module is not pulled in during initrd
> > creation (at least using mkinitcpio on Arch Linux), and as a result,
> > the btrfs module cannot be loaded. Instead modprobe complains with:
> > "Unknown symbol in module, or unknown parameter (see dmesg)".
>
> That is weird. On debian creating the initrd via kernel's makefile
> (make modules_install && make install) works for me (don't know if it
> uses mkinitcpio or something else).
>
> >
> > Unfortunately there is no accompanying message in dmesg, so I can't
> > provide much more information. However, I have bisected the commit to
> > confirm that this problem was introduced by this patch. The following
> > is a grep of btrfs module's dependencies before and after this was
> > committed:
> >
> > $ grep btrfs pkg/lib/

Re: [PATCH] Btrfs: use btrfs_crc32c everywhere instead of libcrc32c

2014-02-27 Thread Filipe David Manana
On Wed, Feb 26, 2014 at 11:26 PM, WorMzy Tykashi
 wrote:
> On 29 January 2014 21:06, Filipe David Borba Manana  
> wrote:
>> After the commit titled "Btrfs: fix btrfs boot when compiled as built-in",
>> LIBCRC32C requirement was removed from btrfs' Kconfig. This made it not
>> possible to build a kernel with btrfs enabled (either as module or built-in)
>> if libcrc32c is not enabled as well. So just replace all uses of libcrc32c
>> with the equivalent function in btrfs hash.h - btrfs_crc32c.
>>
>> Signed-off-by: Filipe David Borba Manana 
>> ---
>>  fs/btrfs/check-integrity.c |4 ++--
>>  fs/btrfs/disk-io.c |4 ++--
>>  fs/btrfs/send.c|4 ++--
>>  3 files changed, 6 insertions(+), 6 deletions(-)
>>
>> diff --git a/fs/btrfs/check-integrity.c b/fs/btrfs/check-integrity.c
>> index 160fb50..39bfd56 100644
>> --- a/fs/btrfs/check-integrity.c
>> +++ b/fs/btrfs/check-integrity.c
>> @@ -92,11 +92,11 @@
>>  #include 
>>  #include 
>>  #include 
>> -#include 
>>  #include 
>>  #include 
>>  #include "ctree.h"
>>  #include "disk-io.h"
>> +#include "hash.h"
>>  #include "transaction.h"
>>  #include "extent_io.h"
>>  #include "volumes.h"
>> @@ -1823,7 +1823,7 @@ static int btrfsic_test_for_metadata(struct 
>> btrfsic_state *state,
>> size_t sublen = i ? PAGE_CACHE_SIZE :
>> (PAGE_CACHE_SIZE - BTRFS_CSUM_SIZE);
>>
>> -   crc = crc32c(crc, data, sublen);
>> +   crc = btrfs_crc32c(crc, data, sublen);
>> }
>> btrfs_csum_final(crc, csum);
>> if (memcmp(csum, h->csum, state->csum_size))
>> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
>> index 7619147..3903bd3 100644
>> --- a/fs/btrfs/disk-io.c
>> +++ b/fs/btrfs/disk-io.c
>> @@ -26,7 +26,6 @@
>>  #include 
>>  #include 
>>  #include 
>> -#include 
>>  #include 
>>  #include 
>>  #include 
>> @@ -35,6 +34,7 @@
>>  #include 
>>  #include "ctree.h"
>>  #include "disk-io.h"
>> +#include "hash.h"
>>  #include "transaction.h"
>>  #include "btrfs_inode.h"
>>  #include "volumes.h"
>> @@ -244,7 +244,7 @@ out:
>>
>>  u32 btrfs_csum_data(char *data, u32 seed, size_t len)
>>  {
>> -   return crc32c(seed, data, len);
>> +   return btrfs_crc32c(seed, data, len);
>>  }
>>
>>  void btrfs_csum_final(u32 crc, char *result)
>> diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
>> index 04c07ed..31b76d0 100644
>> --- a/fs/btrfs/send.c
>> +++ b/fs/btrfs/send.c
>> @@ -24,12 +24,12 @@
>>  #include 
>>  #include 
>>  #include 
>> -#include 
>>  #include 
>>  #include 
>>
>>  #include "send.h"
>>  #include "backref.h"
>> +#include "hash.h"
>>  #include "locking.h"
>>  #include "disk-io.h"
>>  #include "btrfs_inode.h"
>> @@ -620,7 +620,7 @@ static int send_cmd(struct send_ctx *sctx)
>> hdr->len = cpu_to_le32(sctx->send_size - sizeof(*hdr));
>> hdr->crc = 0;
>>
>> -   crc = crc32c(0, (unsigned char *)sctx->send_buf, sctx->send_size);
>> +   crc = btrfs_crc32c(0, (unsigned char *)sctx->send_buf, 
>> sctx->send_size);
>> hdr->crc = cpu_to_le32(crc);
>>
>> ret = write_buf(sctx->send_filp, sctx->send_buf, sctx->send_size,
>> --
>> 1.7.9.5
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
> Hi,

Hi

>
> Ever since this patch was committed (git ref
> 0b947aff1599afbbd2ec07ada87b05af0f94cf10), the btrfs module
> (presumably intentionally) no longer depends on the crc32c module.

To be more clear, it no longer depends on LIBCRC32C (which is just a
convenience library to access crypto's crc32c).
It still depends on CRYPTO and CRYPTO_CRC32C (which is what LIBCRC32C uses).

> However, this means that this module is not pulled in during initrd
> creation (at least using mkinitcpio on Arch Linux), and as a result,
> the btrfs module cannot be loaded. Instead modprobe complains with:
> "Unknown symbol in module, or unknown parameter (see dmesg)".

That is weird. On debian creating the initrd via kernel's makefile
(make modules_install && make install) works for me (don't know if it
uses mkinitcpio or something else).

>
> Unfortunately there is no accompanying message in dmesg, so I can't
> provide much more information. However, I have bisected the commit to
> confirm that this problem was introduced by this patch. The following
> is a grep of btrfs module's dependencies before and after this was
> committed:
>
> $ grep btrfs pkg/lib/modules/3.13.0-ARCH-00150-g8101c8d/modules.dep
> kernel/fs/btrfs/btrfs.ko: kernel/lib/raid6/raid6_pq.ko
> kernel/lib/libcrc32c.ko kernel/crypto/xor.ko
>
> $ grep btrfs pkg/lib/modules/3.13.0-ARCH-00151-g0b947af/modules.dep
> kernel/fs/btrfs/btrfs.ko: kernel/lib/raid6/raid6_pq.ko kernel/crypto/xor.ko
>
> As you can see, the dependency on kernel/lib/libcrc32c.ko was removed.

Yep, it is intentional.

>
> However, if crc32c.ko is manually added to the 

[PATCH 2/2 v3] Btrfs: check if directory has already been created smarter

2014-02-27 Thread Liu Bo
Currently to check whether a directory has been created, we search
DIR_INDEX items one by one to check if children has been processed.

Try to picture such a scenario:
   .
   |-- dir(ino X)
 |-- foo_1(ino X+1)
 |-- ...
 |-- foo_k(ino X+k)

With the current way, we have to check all the k DIR_INDEX items
to find it is a fresh new one.

So this introduced a rbtree to store those directories which are
created out of order, and in the above case, we just need an O(log n)
search instead of O(n) search.

Signed-off-by: Liu Bo 
---
v3: fix typo, s/O(1)/O(n)/g, thanks Wang Shilong.
v2: fix wrong patch name.

 fs/btrfs/send.c | 87 -
 1 file changed, 43 insertions(+), 44 deletions(-)

diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
index 33063d1..fcad93c 100644
--- a/fs/btrfs/send.c
+++ b/fs/btrfs/send.c
@@ -175,6 +175,9 @@ struct send_ctx {
 * own move/rename can be performed.
 */
struct rb_root waiting_dir_moves;
+
+   /* directories which are created out of order, check did_create_dir() */
+   struct rb_root out_of_order;
 };
 
 struct pending_dir_move {
@@ -2494,56 +2497,40 @@ out:
  */
 static int did_create_dir(struct send_ctx *sctx, u64 dir)
 {
-   int ret = 0;
-   struct btrfs_path *path = NULL;
-   struct btrfs_key key;
-   struct btrfs_key found_key;
-   struct btrfs_key di_key;
-   struct extent_buffer *eb;
-   struct btrfs_dir_item *di;
-   int slot;
+   struct rb_node **p = &sctx->out_of_order.rb_node;
+   struct rb_node *parent = NULL;
+   struct send_dir_node *entry = NULL;
+   int cur_is_dir = !!(dir == sctx->cur_ino);
 
-   path = alloc_path_for_send();
-   if (!path) {
-   ret = -ENOMEM;
-   goto out;
-   }
+   verbose_printk("dir=%llu cur_ino=%llu send_progress=%llu\n",
+dir, sctx->cur_ino, sctx->send_progress);
 
-   key.objectid = dir;
-   key.type = BTRFS_DIR_INDEX_KEY;
-   key.offset = 0;
-   while (1) {
-   ret = btrfs_search_slot_for_read(sctx->send_root, &key, path,
-   1, 0);
-   if (ret < 0)
-   goto out;
-   if (!ret) {
-   eb = path->nodes[0];
-   slot = path->slots[0];
-   btrfs_item_key_to_cpu(eb, &found_key, slot);
-   }
-   if (ret || found_key.objectid != key.objectid ||
-   found_key.type != key.type) {
-   ret = 0;
-   goto out;
+   while (*p) {
+   parent = *p;
+   entry = rb_entry(parent, struct send_dir_node, node);
+   if (dir < entry->ino) {
+   p = &(*p)->rb_left;
+   } else if (dir > entry->ino) {
+   p = &(*p)->rb_right;
+   } else {
+   if (cur_is_dir) {
+   rb_erase(&entry->node, &sctx->out_of_order);
+   kfree(entry);
+   }
+   return 1;
}
+   }
 
-   di = btrfs_item_ptr(eb, slot, struct btrfs_dir_item);
-   btrfs_dir_item_key_to_cpu(eb, di, &di_key);
-
-   if (di_key.type != BTRFS_ROOT_ITEM_KEY &&
-   di_key.objectid < sctx->send_progress) {
-   ret = 1;
-   goto out;
-   }
+   if (!cur_is_dir) {
+   entry = kmalloc(sizeof(*entry), GFP_NOFS);
+   if (!entry)
+   return -ENOMEM;
+   entry->ino = dir;
 
-   key.offset = found_key.offset + 1;
-   btrfs_release_path(path);
+   rb_link_node(&entry->node, parent, p);
+   rb_insert_color(&entry->node, &sctx->out_of_order);
}
-
-out:
-   btrfs_free_path(path);
-   return ret;
+   return 0;
 }
 
 /*
@@ -5340,6 +5327,7 @@ long btrfs_ioctl_send(struct file *mnt_file, void __user 
*arg_)
 
sctx->pending_dir_moves = RB_ROOT;
sctx->waiting_dir_moves = RB_ROOT;
+   sctx->out_of_order = RB_ROOT;
 
sctx->clone_roots = vzalloc(sizeof(struct clone_root) *
(arg->clone_sources_count + 1));
@@ -5477,6 +5465,17 @@ out:
kfree(dm);
}
 
+   WARN_ON(sctx && !ret && !RB_EMPTY_ROOT(&sctx->out_of_order));
+   while (sctx && !RB_EMPTY_ROOT(&sctx->out_of_order)) {
+   struct rb_node *n;
+   struct send_dir_node *entry;
+
+   n = rb_first(&sctx->out_of_order);
+   entry = rb_entry(n, struct send_dir_node, node);
+   rb_erase(&entry->node, &sctx->out_of_order);
+   kfree(entry);
+   }
+
if (sort_clone_roots) {
for (i = 0; i < sctx->clone_roo

[PATCH] Btrfs: skip search tree for REG files

2014-02-27 Thread Liu Bo
It is really unnecessary to search tree again for @gen, @mode and @rdev
in the case of REG inodes' creation, as we've got btrfs_inode_item in sctx,
and @gen, @mode and @rdev can easily be fetched.

Signed-off-by: Liu Bo 
---
 fs/btrfs/send.c | 19 +++
 1 file changed, 15 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
index 3609685..5b493e8 100644
--- a/fs/btrfs/send.c
+++ b/fs/btrfs/send.c
@@ -109,6 +109,7 @@ struct send_ctx {
int cur_inode_deleted;
u64 cur_inode_size;
u64 cur_inode_mode;
+   u64 cur_inode_rdev;
u64 cur_inode_last_extent;
 
u64 send_progress;
@@ -2432,10 +2433,16 @@ verbose_printk("btrfs: send_create_inode %llu\n", ino);
if (!p)
return -ENOMEM;
 
-   ret = get_inode_info(sctx->send_root, ino, NULL, &gen, &mode, NULL,
-   NULL, &rdev);
-   if (ret < 0)
-   goto out;
+   if (ino != sctx->cur_ino) {
+   ret = get_inode_info(sctx->send_root, ino, NULL, &gen, &mode,
+NULL, NULL, &rdev);
+   if (ret < 0)
+   goto out;
+   } else {
+   gen = sctx->cur_inode_gen;
+   mode = sctx->cur_inode_mode;
+   rdev = sctx->cur_inode_rdev;
+   }
 
if (S_ISREG(mode)) {
cmd = BTRFS_SEND_C_MKFILE;
@@ -4827,6 +4834,8 @@ static int changed_inode(struct send_ctx *sctx,
sctx->left_path->nodes[0], left_ii);
sctx->cur_inode_mode = btrfs_inode_mode(
sctx->left_path->nodes[0], left_ii);
+   sctx->cur_inode_rdev = btrfs_inode_rdev(
+   sctx->left_path->nodes[0], left_ii);
if (sctx->cur_ino != BTRFS_FIRST_FREE_OBJECTID)
ret = send_create_inode_if_needed(sctx);
} else if (result == BTRFS_COMPARE_TREE_DELETED) {
@@ -4871,6 +4880,8 @@ static int changed_inode(struct send_ctx *sctx,
sctx->left_path->nodes[0], left_ii);
sctx->cur_inode_mode = btrfs_inode_mode(
sctx->left_path->nodes[0], left_ii);
+   sctx->cur_inode_rdev = btrfs_inode_rdev(
+   sctx->left_path->nodes[0], left_ii);
ret = send_create_inode_if_needed(sctx);
if (ret < 0)
goto out;
-- 
1.8.2.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2 v2] Btrfs: check if directory has already been created smarter

2014-02-27 Thread Liu Bo
On Thu, Feb 27, 2014 at 04:01:23PM +0800, Wang Shilong wrote:
> On 02/27/2014 03:47 PM, Liu Bo wrote:
> >Currently to check whether a directory has been created, we search
> >DIR_INDEX items one by one to check if children has been processed.
> >
> >Try to picture such a scenario:
> >.
> >|-- dir(ino X)
> >  |-- foo_1(ino X+1)
> >  |-- ...
> >  |-- foo_k(ino X+k)
> >
> >With the current way, we have to check all the k DIR_INDEX items
> >to find it is a fresh new one.
> >
> >So this introduced a rbtree to store those directories which are
> >created out of order, and in the above case, we just need an O(logn)
> >search instead of O(1) search.
> Just a reminder, we ususally call O(n) rather O(1) here.
> If we falls O(1) to O(logn)..things are becoming worse~~

Good catch, my bad.

thanks,
-liubo
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2 v2] Btrfs: check if directory has already been created smarter

2014-02-27 Thread Wang Shilong

On 02/27/2014 03:47 PM, Liu Bo wrote:

Currently to check whether a directory has been created, we search
DIR_INDEX items one by one to check if children has been processed.

Try to picture such a scenario:
.
|-- dir(ino X)
  |-- foo_1(ino X+1)
  |-- ...
  |-- foo_k(ino X+k)

With the current way, we have to check all the k DIR_INDEX items
to find it is a fresh new one.

So this introduced a rbtree to store those directories which are
created out of order, and in the above case, we just need an O(logn)
search instead of O(1) search.

Just a reminder, we ususally call O(n) rather O(1) here.
If we falls O(1) to O(logn)..things are becoming worse~~


Thanks,
Wang


Signed-off-by: Liu Bo 
---
v2: fix wrong patch name.

  fs/btrfs/send.c | 87 -
  1 file changed, 43 insertions(+), 44 deletions(-)

diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
index 33063d1..fcad93c 100644
--- a/fs/btrfs/send.c
+++ b/fs/btrfs/send.c
@@ -175,6 +175,9 @@ struct send_ctx {
 * own move/rename can be performed.
 */
struct rb_root waiting_dir_moves;
+
+   /* directories which are created out of order, check did_create_dir() */
+   struct rb_root out_of_order;
  };
  
  struct pending_dir_move {

@@ -2494,56 +2497,40 @@ out:
   */
  static int did_create_dir(struct send_ctx *sctx, u64 dir)
  {
-   int ret = 0;
-   struct btrfs_path *path = NULL;
-   struct btrfs_key key;
-   struct btrfs_key found_key;
-   struct btrfs_key di_key;
-   struct extent_buffer *eb;
-   struct btrfs_dir_item *di;
-   int slot;
+   struct rb_node **p = &sctx->out_of_order.rb_node;
+   struct rb_node *parent = NULL;
+   struct send_dir_node *entry = NULL;
+   int cur_is_dir = !!(dir == sctx->cur_ino);
  
-	path = alloc_path_for_send();

-   if (!path) {
-   ret = -ENOMEM;
-   goto out;
-   }
+   verbose_printk("dir=%llu cur_ino=%llu send_progress=%llu\n",
+dir, sctx->cur_ino, sctx->send_progress);
  
-	key.objectid = dir;

-   key.type = BTRFS_DIR_INDEX_KEY;
-   key.offset = 0;
-   while (1) {
-   ret = btrfs_search_slot_for_read(sctx->send_root, &key, path,
-   1, 0);
-   if (ret < 0)
-   goto out;
-   if (!ret) {
-   eb = path->nodes[0];
-   slot = path->slots[0];
-   btrfs_item_key_to_cpu(eb, &found_key, slot);
-   }
-   if (ret || found_key.objectid != key.objectid ||
-   found_key.type != key.type) {
-   ret = 0;
-   goto out;
+   while (*p) {
+   parent = *p;
+   entry = rb_entry(parent, struct send_dir_node, node);
+   if (dir < entry->ino) {
+   p = &(*p)->rb_left;
+   } else if (dir > entry->ino) {
+   p = &(*p)->rb_right;
+   } else {
+   if (cur_is_dir) {
+   rb_erase(&entry->node, &sctx->out_of_order);
+   kfree(entry);
+   }
+   return 1;
}
+   }
  
-		di = btrfs_item_ptr(eb, slot, struct btrfs_dir_item);

-   btrfs_dir_item_key_to_cpu(eb, di, &di_key);
-
-   if (di_key.type != BTRFS_ROOT_ITEM_KEY &&
-   di_key.objectid < sctx->send_progress) {
-   ret = 1;
-   goto out;
-   }
+   if (!cur_is_dir) {
+   entry = kmalloc(sizeof(*entry), GFP_NOFS);
+   if (!entry)
+   return -ENOMEM;
+   entry->ino = dir;
  
-		key.offset = found_key.offset + 1;

-   btrfs_release_path(path);
+   rb_link_node(&entry->node, parent, p);
+   rb_insert_color(&entry->node, &sctx->out_of_order);
}
-
-out:
-   btrfs_free_path(path);
-   return ret;
+   return 0;
  }
  
  /*

@@ -5340,6 +5327,7 @@ long btrfs_ioctl_send(struct file *mnt_file, void __user 
*arg_)
  
  	sctx->pending_dir_moves = RB_ROOT;

sctx->waiting_dir_moves = RB_ROOT;
+   sctx->out_of_order = RB_ROOT;
  
  	sctx->clone_roots = vzalloc(sizeof(struct clone_root) *

(arg->clone_sources_count + 1));
@@ -5477,6 +5465,17 @@ out:
kfree(dm);
}
  
+	WARN_ON(sctx && !ret && !RB_EMPTY_ROOT(&sctx->out_of_order));

+   while (sctx && !RB_EMPTY_ROOT(&sctx->out_of_order)) {
+   struct rb_node *n;
+   struct send_dir_node *entry;
+
+   n = rb_first(&sctx->out_of_order);
+   entry = rb_entry(n, struct send_dir_node, node);
+   rb_erase(&entry->node, &sctx->out_of_order);
+   kfree(e