Re: [DOC] BTRFS Volume operations, Device Lists and Locks all in one page

2018-07-12 Thread Qu Wenruo


On 2018年07月13日 13:32, Anand Jain wrote:
> 
> 
> 
> 
> 
>>> But if you are planning to
>>>    record and start at transaction [14] then its an overkill because
>>>    transaction [19 and [20] are already in the disk.
>>
>> Yes, I'm doing it overkilled.
> 
>  Ah. Ok.
> 
>> But it's already much better than scrub all block groups (my original
>> plan).
> 
>  That's true. Which can be optimized later, but how? and scrub can't
>  fix RAID1.

How could scrub not fix RAID1?

For metadata or data with csum, just goes normal scrub.
For data without csum, we know which device is resilvering, just use the
other copy.

Thanks,
Qu

> 
> Thanks, Anand
> 
> 
>> Thanks,
>> Qu
>>
>>>
>>> Thanks, Anand
>>>
>>>
 Thanks,
 Qu

>  [3] https://patchwork.kernel.org/patch/10403311/
>
>    Further, as we do a self adapting chunk allocation in RAID1, it
> needs
>    balance-convert to fix. IMO at some point we have to provide
> degraded
>    raid1 chunk allocation and also modify the scrub to be chunk
> granular.
>
> Thanks, Anand
>
>> Any idea on this?
>>
>> Thanks,
>> Qu
>>
>>> Unlock: btrfs_fs_info::chunk_mutex
>>> Unlock: btrfs_fs_devices::device_list_mutex
>>>
>>> ---
>>>
>>>
>>>
>>> Thanks, Anand
>>> -- 
>>> To unsubscribe from this list: send the line "unsubscribe
>>> linux-btrfs" in
>>> the body of a message to majord...@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
> -- 
> To unsubscribe from this list: send the line "unsubscribe
> linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

>>> -- 
>>> To unsubscribe from this list: send the line "unsubscribe
>>> linux-btrfs" in
>>> the body of a message to majord...@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>



signature.asc
Description: OpenPGP digital signature


Re: [DOC] BTRFS Volume operations, Device Lists and Locks all in one page

2018-07-12 Thread Anand Jain








But if you are planning to
   record and start at transaction [14] then its an overkill because
   transaction [19 and [20] are already in the disk.


Yes, I'm doing it overkilled.


 Ah. Ok.


But it's already much better than scrub all block groups (my original plan).


 That's true. Which can be optimized later, but how? and scrub can't
 fix RAID1.

Thanks, Anand



Thanks,
Qu



Thanks, Anand



Thanks,
Qu


     [3] https://patchwork.kernel.org/patch/10403311/

   Further, as we do a self adapting chunk allocation in RAID1, it needs
   balance-convert to fix. IMO at some point we have to provide degraded
   raid1 chunk allocation and also modify the scrub to be chunk granular.

Thanks, Anand


Any idea on this?

Thanks,
Qu


Unlock: btrfs_fs_info::chunk_mutex
Unlock: btrfs_fs_devices::device_list_mutex

---


Thanks, Anand
--
To unsubscribe from this list: send the line "unsubscribe
linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html



--
To unsubscribe from this list: send the line "unsubscribe
linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] btrfs: use customized batch size for total_bytes_pinned

2018-07-12 Thread ethanlien

Omar Sandoval 於 2018-07-13 06:19 寫到:

On Wed, Jul 11, 2018 at 11:59:36PM +0800, Ethan Lien wrote:
In commit b150a4f10d878 ("Btrfs: use a percpu to keep track of 
possibly
pinned bytes") we use total_bytes_pinned to track how many bytes we 
are
going to free in this transaction. When we are close to ENOSPC, we 
check it
and know if we can make the allocation by commit the current 
transaction.

For every data/metadata extent we are going to free, we add
total_bytes_pinned in btrfs_free_extent() and btrfs_free_tree_block(), 
and
release it in unpin_extent_range() when we finish the transaction. So 
this

is a variable we frequently update but rarely read - just the suitable
use of percpu_counter. But in previous commit we update 
total_bytes_pinned

by default 32 batch size, making every update essentially a spin lock
protected update. Since every spin lock/unlock operation involves 
syncing
a globally used variable and some kind of barrier in a SMP system, 
this is
more expensive than using total_bytes_pinned as a simple atomic64_t. 
So

fix this by using a customized batch size. Since we only read
total_bytes_pinned when we are close to ENOSPC and fail to alloc new 
chunk,
we can use a really large batch size and have nearly no penalty in 
most

cases.


[Test]
We test the patch on a 4-cores x86 machine:
1. falloate a 16GiB size test file.
2. take snapshot (so all following writes will be cow write).
3. run a 180 sec, 4 jobs, 4K random write fio on test file.

We also add a temporary lockdep class on percpu_counter's spin lock 
used

by total_bytes_pinned to track lock_stat.


[Results]
unpatched:
lock_stat version 0.4
---
  class namecon-bouncescontentions
waittime-min   waittime-max waittime-total   waittime-avg
acq-bounces
acquisitions   holdtime-min   holdtime-max holdtime-total   
holdtime-avg


   total_bytes_pinned_percpu:82 82
0.21   0.61  29.46   0.36 
298340
  635973   0.09  11.01  173476.25   
0.27



patched:
lock_stat version 0.4
---
  class namecon-bouncescontentions
waittime-min   waittime-max waittime-total   waittime-avg
acq-bounces
acquisitions   holdtime-min   holdtime-max holdtime-total   
holdtime-avg


   total_bytes_pinned_percpu: 1  1
0.62   0.62   0.62   0.62  
13601
   31542   0.14   9.61   11016.90   
0.35



[Analysis]
Since the spin lock only protect a single in-memory variable, the
contentions (number of lock acquisitions that had to wait) in both
unpatched and patched version are low. But when we see acquisitions 
and
acq-bounces, we get much lower counts in patched version. Here the 
most

important metric is acq-bounces. It means how many times the lock get
transferred between different cpus, so the patch can really recude
cacheline bouncing of spin lock (also the global counter of 
percpu_counter)

in a SMP system.

Fixes: b150a4f10d878 ("Btrfs: use a percpu to keep track of possibly
pinned bytes")

Signed-off-by: Ethan Lien 
---

V2:
Rewrite commit comments.
Add lock_stat test.
Pull dirty_metadata_bytes out to a separate patch.

 fs/btrfs/ctree.h   |  1 +
 fs/btrfs/extent-tree.c | 46 
--

 2 files changed, 32 insertions(+), 15 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 118346aceea9..df682a521635 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -422,6 +422,7 @@ struct btrfs_space_info {
 * time the transaction commits.
 */
struct percpu_counter total_bytes_pinned;
+   s32 total_bytes_pinned_batch;


Can this just be a constant instead of adding it to space_info?


Yes constant is better here, I'll resend it, thanks.


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" 
in

the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [DOC] BTRFS Volume operations, Device Lists and Locks all in one page

2018-07-12 Thread Qu Wenruo


On 2018年07月13日 08:20, Qu Wenruo wrote:
> 
> 
> [snip]
>>> In this case, it depends on when and how we mark the device resilvering.
>>> If we record the generation of write error happens, then just initial a
>>> scrub for generation greater than that generation.
>>
>>  If we record all the degraded transactions then yes. Not just the last
>>  failed transaction.
> 
> The last successful generation won't be upgraded until the scrub success.
> 
>>
>>> In the list, some guys mentioned that for LVM/mdraid they will record
>>> the generation when some device(s) get write error or missing, and do
>>> self cure.
>>>

   I have been scratching on fix for this [3] for some time now. Thanks
   for the participation. In my understanding we are missing across-tree
   parent transid verification at the lowest possible granular OR
>>>
>>> Maybe the newly added first_key and level check could help detect such
>>> mismatch?
>>>
   other approach is to modify Liubo approach to provide a list of
   degraded chunks but without a journal disk.
>>>
>>> Currently, DEV_ITEM::generation is seldom used. (only for seed sprout
>>> case)
>>> Maybe we could reuse that member to record the last successful written
>>> transaction to that device and do above purposed LVM/mdraid style self
>>> cure?
>>
>>  Record of just the last successful transaction won't help. OR its an
>>  overkill to fix a write hole.
>>
>>  Transactions: 10 11 [12] [13] [14] < write hole > [19] [20]
>>  In the above example
>>   disk disappeared at transaction 11 and when it reappeared at
>>   the transaction 19, there were new writes as well as the resilver
>>   writes,
> 
> Then the last good generation will be 11 and we will commit current
> transaction as soon as we find a device disappear, and won't upgrade the
> last good generation until the scrub finishes.
> 
>> so we were able to write 12 13 14 and 19 20 and then
>>   the disk disappears again leaving a write hole.
> 
> Only if in above transactions, the auto scrub finishes, the device will
> has generation updated, or it will stay generation 11.
> 
>> Now next time when
>>   disk reappears the last transaction indicates 20 on both-disks
>>   but leaving a write hole in one of disk.
> 
> That will only happens if auto-scrub finishes in transaction 20, or its
> last successful generation will stay 11.
> 
>> But if you are planning to
>>   record and start at transaction [14] then its an overkill because
>>   transaction [19 and [20] are already in the disk.
> 
> Yes, I'm doing it overkilled.
> But it's already much better than scrub all block groups (my original plan).

Well, my idea has a major problem, that's we don't have generation for
block group item, that's to say either we use free space cache
generation or add new BLOCK_GROUP_ITEM member for generation detection.

Thanks,
Qu

> 
> Thanks,
> Qu
> 
>>
>> Thanks, Anand
>>
>>
>>> Thanks,
>>> Qu
>>>
     [3] https://patchwork.kernel.org/patch/10403311/

   Further, as we do a self adapting chunk allocation in RAID1, it needs
   balance-convert to fix. IMO at some point we have to provide degraded
   raid1 chunk allocation and also modify the scrub to be chunk granular.

 Thanks, Anand

> Any idea on this?
>
> Thanks,
> Qu
>
>> Unlock: btrfs_fs_info::chunk_mutex
>> Unlock: btrfs_fs_devices::device_list_mutex
>>
>> ---
>>
>>
>> Thanks, Anand
>> -- 
>> To unsubscribe from this list: send the line "unsubscribe
>> linux-btrfs" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
 -- 
 To unsubscribe from this list: send the line "unsubscribe
 linux-btrfs" in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>> -- 
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 



signature.asc
Description: OpenPGP digital signature


Re: [LKP] [lkp-robot] [mm] 9092c71bb7: blogbench.write_score -12.3% regression

2018-07-12 Thread Huang, Ying
Hi, Chris,

Chris Mason  writes:

> On 19 Jun 2018, at 23:51, Huang, Ying wrote:
 "Huang, Ying"  writes:

> Hi, Josef,
>
> Do you have time to take a look at the regression?
>
> kernel test robot  writes:
>
>> Greeting,
>>
>> FYI, we noticed a -12.3% regression of blogbench.write_score and
>> a +9.6% improvement
>> of blogbench.read_score due to commit:
>>
>>
>> commit: 9092c71bb724dba2ecba849eae69e5c9d39bd3d2 ("mm: use
>> sc->priority for slab shrink targets")
>> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git
>> master
>>
>> in testcase: blogbench
>> on test machine: 16 threads Intel(R) Xeon(R) CPU D-1541 @
>> 2.10GHz with 8G memory
>> with following parameters:
>>
>>  disk: 1SSD
>>  fs: btrfs
>>  cpufreq_governor: performance
>>
>> test-description: Blogbench is a portable filesystem benchmark
>> that tries to reproduce the load of a real-world busy file
>> server.
>> test-url:
>
> I'm surprised, this patch is a big win in production here at FB.  I'll
> have to reproduce these results to better understand what is going on.
> My first guess is that since we have fewer inodes in slab, we're
> reading more inodes from disk in order to do the writes.
>
> But that should also make our read scores lower.

Any update on this?

Best Regards,
Huang, Ying
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Why original mode doesn't use swap? (Original: Re: btrfs check lowmem, take 2)

2018-07-12 Thread Qu Wenruo



On 2018年07月13日 07:14, Marc MERLIN wrote:
> On Thu, Jul 12, 2018 at 01:26:41PM +0800, Qu Wenruo wrote:
>>
>>
>> On 2018年07月12日 01:09, Chris Murphy wrote:
>>> On Tue, Jul 10, 2018 at 12:09 PM, Marc MERLIN  wrote:
 Thanks to Su and Qu, I was able to get my filesystem to a point that
 it's mountable.
 I then deleted loads of snapshots and I'm down to 26.

 IT now looks like this:
 gargamel:~# btrfs fi show /mnt/mnt
 Label: 'dshelf2'  uuid: 0f1a0c9f-4e54-4fa7-8736-fd50818ff73d
 Total devices 1 FS bytes used 12.30TiB
 devid1 size 14.55TiB used 13.81TiB path /dev/mapper/dshelf2

 gargamel:~# btrfs fi df /mnt/mnt
 Data, single: total=13.57TiB, used=12.19TiB
 System, DUP: total=32.00MiB, used=1.55MiB
 Metadata, DUP: total=124.50GiB, used=115.62GiB
 Metadata, single: total=216.00MiB, used=0.00B
 GlobalReserve, single: total=512.00MiB, used=0.00B


 Problems
 1) btrfs check --repair _still_ takes all 32GB of RAM and crashes the
 server, despite my deleting lots of snapshots.
 Is it because I have too many files then?
>>>
>>> I think originally needs most of metdata in memory.
>>>
>>> I'm not understanding why btrfs check won't use swap like at least
>>> xfs_repair and pretty sure e2fsck will as well.
>>
>> I don't understand either.
>>
>> Isn't memory from malloc() swappable?
> 
> I never looked at the code and why/how it crashes, but my guess was
> that it somehow causes the kernel to grab a lot of memory in the btrfs
> driver and that is what is what is crashing the system.

Btrfs check is done completely at user space, so it should not be
related to kernel btrfs module.

> If it were just malloc() the btrfs user space tool, it should be both
> swappable like you said, and should also get OOM'ed.

That's the case, but then why xfs/ext check tool could take up tons of
swap without get killed by OOM?

> 
> I suppose I can still be completely wrong, but I can't find another
> logical explanation.
> 
> I just tried running it again to trigger the problem, but because I
> freed a lot of snapshots, btrfs check --repair goes back to only using
> 10GB instead of 32GB, so I wasn't able to replicate OOM for you.

At least it's a good news for you.

> 
> Incidently, it died with:
> gargamel:~# btrfs check --repair /dev/mapper/dshelf2
> enabling repair mode
> Checking filesystem on /dev/mapper/dshelf2
> UUID: 0f1a0c9f-4e54-4fa7-8736-fd50818ff73d
> root 18446744073709551607 has a root item with a more recent gen (143376) 
> compared to the found
>  root node (139061)
> ERROR: failed to repair root items: Invalid argument
> 
> That said, when it was using a fair amount of RAM, I captured this:
> USER   PID %CPU %MEMVSZ   RSS TTY  STAT START   TIME COMMAND
> root  1376  1.4 25.2 8256368 8240392 pts/18 R+  14:52   1:07 btrfs check 
> --repair /dev/mapper/dshelf2
> 
> I don't know how to read /proc/meminfo, but that's what it said:
> MemTotal:   32643792 kB
> MemFree: 1367516 kB
> MemAvailable:   15554836 kB
> Buffers: 3491672 kB
> Cached: 15900320 kB
> SwapCached: 2092 kB
> Active: 14577228 kB
> Inactive:   15028608 kB
> Active(anon):   12122180 kB
> Inactive(anon):  2643176 kB
> Active(file):2455048 kB
> Inactive(file): 12385432 kB
> Unevictable:8068 kB
> Mlocked:8068 kB
> SwapTotal:  15616764 kB   < swap was totally unused and stays unused when 
> I get the system to crash 
> SwapFree:   15578020 kB
> Dirty: 71956 kB
> Writeback:64 kB
> AnonPages:  10219976 kB
> Mapped:  4033568 kB
> Shmem:   4545552 kB
> Slab: 713300 kB
> SReclaimable: 395508 kB
> SUnreclaim:   317792 kB
> KernelStack:   11788 kB
> PageTables:52592 kB
> NFS_Unstable:  0 kB
> Bounce:0 kB
> WritebackTmp:  0 kB
> CommitLimit:31938660 kB
> Committed_AS:   20070736 kB
> VmallocTotal:   34359738367 kB
> VmallocUsed:   0 kB
> VmallocChunk:  0 kB
> HardwareCorrupted: 0 kB
> AnonHugePages: 0 kB
> ShmemHugePages:0 kB
> ShmemPmdMapped:0 kB
> CmaTotal:  16384 kB
> CmaFree:   0 kB
> HugePages_Total:   0
> HugePages_Free:0
> HugePages_Rsvd:0
> HugePages_Surp:0
> Hugepagesize:   2048 kB
> Hugetlb:   0 kB
> DirectMap4k: 1207572 kB
> DirectMap2M:32045056 kB
> 
> Does it help figure out where the memory was going and wehther kernel
> memory was being used?

Not really, much similar to what I observed.

I also tried to over-commit my memory usage on my system, however it
just freeze for several seconds and then get killed by OOM, failed to
capture any useful info during that freeze.

Thanks,
Qu

> 
> Marc
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo 

Re: [DOC] BTRFS Volume operations, Device Lists and Locks all in one page

2018-07-12 Thread Qu Wenruo


[snip]
>> In this case, it depends on when and how we mark the device resilvering.
>> If we record the generation of write error happens, then just initial a
>> scrub for generation greater than that generation.
> 
>  If we record all the degraded transactions then yes. Not just the last
>  failed transaction.

The last successful generation won't be upgraded until the scrub success.

> 
>> In the list, some guys mentioned that for LVM/mdraid they will record
>> the generation when some device(s) get write error or missing, and do
>> self cure.
>>
>>>
>>>   I have been scratching on fix for this [3] for some time now. Thanks
>>>   for the participation. In my understanding we are missing across-tree
>>>   parent transid verification at the lowest possible granular OR
>>
>> Maybe the newly added first_key and level check could help detect such
>> mismatch?
>>
>>>   other approach is to modify Liubo approach to provide a list of
>>>   degraded chunks but without a journal disk.
>>
>> Currently, DEV_ITEM::generation is seldom used. (only for seed sprout
>> case)
>> Maybe we could reuse that member to record the last successful written
>> transaction to that device and do above purposed LVM/mdraid style self
>> cure?
> 
>  Record of just the last successful transaction won't help. OR its an
>  overkill to fix a write hole.
> 
>  Transactions: 10 11 [12] [13] [14] < write hole > [19] [20]
>  In the above example
>   disk disappeared at transaction 11 and when it reappeared at
>   the transaction 19, there were new writes as well as the resilver
>   writes,

Then the last good generation will be 11 and we will commit current
transaction as soon as we find a device disappear, and won't upgrade the
last good generation until the scrub finishes.

> so we were able to write 12 13 14 and 19 20 and then
>   the disk disappears again leaving a write hole.

Only if in above transactions, the auto scrub finishes, the device will
has generation updated, or it will stay generation 11.

> Now next time when
>   disk reappears the last transaction indicates 20 on both-disks
>   but leaving a write hole in one of disk.

That will only happens if auto-scrub finishes in transaction 20, or its
last successful generation will stay 11.

> But if you are planning to
>   record and start at transaction [14] then its an overkill because
>   transaction [19 and [20] are already in the disk.

Yes, I'm doing it overkilled.
But it's already much better than scrub all block groups (my original plan).

Thanks,
Qu

> 
> Thanks, Anand
> 
> 
>> Thanks,
>> Qu
>>
>>>     [3] https://patchwork.kernel.org/patch/10403311/
>>>
>>>   Further, as we do a self adapting chunk allocation in RAID1, it needs
>>>   balance-convert to fix. IMO at some point we have to provide degraded
>>>   raid1 chunk allocation and also modify the scrub to be chunk granular.
>>>
>>> Thanks, Anand
>>>
 Any idea on this?

 Thanks,
 Qu

> Unlock: btrfs_fs_info::chunk_mutex
> Unlock: btrfs_fs_devices::device_list_mutex
>
> ---
>
>
> Thanks, Anand
> -- 
> To unsubscribe from this list: send the line "unsubscribe
> linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

>>> -- 
>>> To unsubscribe from this list: send the line "unsubscribe
>>> linux-btrfs" in
>>> the body of a message to majord...@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
> -- 
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



signature.asc
Description: OpenPGP digital signature


Re: [PATCH v2 1/2] btrfs-progs: Rename OPEN_CTREE_FS_PARTIAL to OPEN_CTREE_TEMPORARY_SUPER

2018-07-12 Thread Christoph Anton Mitterer
Hey.

Better late than never ;-)

Just to confirm: At least since 4.16.1, I could btrfs-restore from the
broken fs image again (that I've described in "spurious full btrfs
corruption" from around mid March).

So the regression in btrfsprogs has in fact been fixed by these
patches, it seems.


Thanks,
Chris.

On Wed, 2018-04-11 at 15:29 +0800, Qu Wenruo wrote:
> The old flag OPEN_CTREE_FS_PARTIAL is in fact quite easy to be
> confused
> with OPEN_CTREE_PARTIAL, which allow btrfs-progs to open damaged
> filesystem (like corrupted extent/csum tree).
> 
> However OPEN_CTREE_FS_PARTIAL, unlike its name, is just allowing
> btrfs-progs to open fs with temporary superblocks (which only has 6
> basic trees on SINGLE meta/sys chunks).
> 
> The usage of FS_PARTIAL is really confusing here.
> 
> So rename OPEN_CTREE_FS_PARTIAL to OPEN_CTREE_TEMPORARY_SUPER, and
> add
> extra comment for its behavior.
> Also rename BTRFS_MAGIC_PARTIAL to BTRFS_MAGIC_TEMPORARY to keep the
> naming consistent.
> 
> And with above comment, the usage of FS_PARTIAL in dump-tree is
> obviously incorrect, fix it.
> 
> Fixes: 8698a2b9ba89 ("btrfs-progs: Allow inspect dump-tree to show
> specified tree block even some tree roots are corrupted")
> Signed-off-by: Qu Wenruo 
> ---
> changelog:
> v2:
>   New patch
> ---
>  cmds-inspect-dump-tree.c |  2 +-
>  convert/main.c   |  4 ++--
>  ctree.h  |  8 +---
>  disk-io.c| 12 ++--
>  disk-io.h| 10 +++---
>  mkfs/main.c  |  2 +-
>  6 files changed, 22 insertions(+), 16 deletions(-)
> 
> diff --git a/cmds-inspect-dump-tree.c b/cmds-inspect-dump-tree.c
> index 0802b31e9596..e6510851e8f4 100644
> --- a/cmds-inspect-dump-tree.c
> +++ b/cmds-inspect-dump-tree.c
> @@ -220,7 +220,7 @@ int cmd_inspect_dump_tree(int argc, char **argv)
>   int uuid_tree_only = 0;
>   int roots_only = 0;
>   int root_backups = 0;
> - unsigned open_ctree_flags = OPEN_CTREE_FS_PARTIAL;
> + unsigned open_ctree_flags = OPEN_CTREE_PARTIAL;
>   u64 block_only = 0;
>   struct btrfs_root *tree_root_scan;
>   u64 tree_id = 0;
> diff --git a/convert/main.c b/convert/main.c
> index 6bdfab40d0b0..80f3bed84c84 100644
> --- a/convert/main.c
> +++ b/convert/main.c
> @@ -1140,7 +1140,7 @@ static int do_convert(const char *devname, u32
> convert_flags, u32 nodesize,
>   }
>  
>   root = open_ctree_fd(fd, devname, mkfs_cfg.super_bytenr,
> -  OPEN_CTREE_WRITES |
> OPEN_CTREE_FS_PARTIAL);
> +  OPEN_CTREE_WRITES |
> OPEN_CTREE_TEMPORARY_SUPER);
>   if (!root) {
>   error("unable to open ctree");
>   goto fail;
> @@ -1230,7 +1230,7 @@ static int do_convert(const char *devname, u32
> convert_flags, u32 nodesize,
>   }
>  
>   root = open_ctree_fd(fd, devname, 0,
> - OPEN_CTREE_WRITES | OPEN_CTREE_FS_PARTIAL);
> +  OPEN_CTREE_WRITES |
> OPEN_CTREE_TEMPORARY_SUPER);
>   if (!root) {
>   error("unable to open ctree for finalization");
>   goto fail;
> diff --git a/ctree.h b/ctree.h
> index fa861ba0b4c3..80d4e59a66ce 100644
> --- a/ctree.h
> +++ b/ctree.h
> @@ -45,10 +45,12 @@ struct btrfs_free_space_ctl;
>  #define BTRFS_MAGIC 0x4D5F53665248425FULL /* ascii _BHRfS_M, no null
> */
>  
>  /*
> - * Fake signature for an unfinalized filesystem, structures might be
> partially
> - * created or missing.
> + * Fake signature for an unfinalized filesystem, which only has
> barebone tree
> + * structures (normally 6 near empty trees, on SINGLE meta/sys
> temporary chunks)
> + *
> + * ascii !BHRfS_M, no null
>   */
> -#define BTRFS_MAGIC_PARTIAL 0x4D5F536652484221ULL /* ascii !BHRfS_M,
> no null */
> +#define BTRFS_MAGIC_TEMPORARY 0x4D5F536652484221ULL
>  
>  #define BTRFS_MAX_MIRRORS 3
>  
> diff --git a/disk-io.c b/disk-io.c
> index 58eae709e0e8..9e8b1e9d295c 100644
> --- a/disk-io.c
> +++ b/disk-io.c
> @@ -1117,14 +1117,14 @@ static struct btrfs_fs_info
> *__open_ctree_fd(int fp, const char *path,
>   fs_info->ignore_chunk_tree_error = 1;
>  
>   if ((flags & OPEN_CTREE_RECOVER_SUPER)
> -  && (flags & OPEN_CTREE_FS_PARTIAL)) {
> +  && (flags & OPEN_CTREE_TEMPORARY_SUPER)) {
>   fprintf(stderr,
> - "cannot open a partially created filesystem for
> recovery");
> + "cannot open a filesystem with temporary super block for
> recovery");
>   goto out;
>   }
>  
> - if (flags & OPEN_CTREE_FS_PARTIAL)
> - sbflags = SBREAD_PARTIAL;
> + if (flags & OPEN_CTREE_TEMPORARY_SUPER)
> + sbflags = SBREAD_TEMPORARY;
>  
>   ret = btrfs_scan_fs_devices(fp, path, _devices,
> sb_bytenr, sbflags,
>   (flags & OPEN_CTREE_NO_DEVICES));
> @@ -1285,8 +1285,8 @@ static int check_super(struct btrfs_super_block
> *sb, unsigned sbflags)
>   int csum_size;
>  
>   if 

Re: Why original mode doesn't use swap? (Original: Re: btrfs check lowmem, take 2)

2018-07-12 Thread Marc MERLIN
On Thu, Jul 12, 2018 at 01:26:41PM +0800, Qu Wenruo wrote:
> 
> 
> On 2018年07月12日 01:09, Chris Murphy wrote:
> > On Tue, Jul 10, 2018 at 12:09 PM, Marc MERLIN  wrote:
> >> Thanks to Su and Qu, I was able to get my filesystem to a point that
> >> it's mountable.
> >> I then deleted loads of snapshots and I'm down to 26.
> >>
> >> IT now looks like this:
> >> gargamel:~# btrfs fi show /mnt/mnt
> >> Label: 'dshelf2'  uuid: 0f1a0c9f-4e54-4fa7-8736-fd50818ff73d
> >> Total devices 1 FS bytes used 12.30TiB
> >> devid1 size 14.55TiB used 13.81TiB path /dev/mapper/dshelf2
> >>
> >> gargamel:~# btrfs fi df /mnt/mnt
> >> Data, single: total=13.57TiB, used=12.19TiB
> >> System, DUP: total=32.00MiB, used=1.55MiB
> >> Metadata, DUP: total=124.50GiB, used=115.62GiB
> >> Metadata, single: total=216.00MiB, used=0.00B
> >> GlobalReserve, single: total=512.00MiB, used=0.00B
> >>
> >>
> >> Problems
> >> 1) btrfs check --repair _still_ takes all 32GB of RAM and crashes the
> >> server, despite my deleting lots of snapshots.
> >> Is it because I have too many files then?
> > 
> > I think originally needs most of metdata in memory.
> > 
> > I'm not understanding why btrfs check won't use swap like at least
> > xfs_repair and pretty sure e2fsck will as well.
> 
> I don't understand either.
> 
> Isn't memory from malloc() swappable?

I never looked at the code and why/how it crashes, but my guess was
that it somehow causes the kernel to grab a lot of memory in the btrfs
driver and that is what is what is crashing the system.
If it were just malloc() the btrfs user space tool, it should be both
swappable like you said, and should also get OOM'ed.

I suppose I can still be completely wrong, but I can't find another
logical explanation.

I just tried running it again to trigger the problem, but because I
freed a lot of snapshots, btrfs check --repair goes back to only using
10GB instead of 32GB, so I wasn't able to replicate OOM for you.

Incidently, it died with:
gargamel:~# btrfs check --repair /dev/mapper/dshelf2
enabling repair mode
Checking filesystem on /dev/mapper/dshelf2
UUID: 0f1a0c9f-4e54-4fa7-8736-fd50818ff73d
root 18446744073709551607 has a root item with a more recent gen (143376) 
compared to the found
 root node (139061)
ERROR: failed to repair root items: Invalid argument

That said, when it was using a fair amount of RAM, I captured this:
USER   PID %CPU %MEMVSZ   RSS TTY  STAT START   TIME COMMAND
root  1376  1.4 25.2 8256368 8240392 pts/18 R+  14:52   1:07 btrfs check 
--repair /dev/mapper/dshelf2

I don't know how to read /proc/meminfo, but that's what it said:
MemTotal:   32643792 kB
MemFree: 1367516 kB
MemAvailable:   15554836 kB
Buffers: 3491672 kB
Cached: 15900320 kB
SwapCached: 2092 kB
Active: 14577228 kB
Inactive:   15028608 kB
Active(anon):   12122180 kB
Inactive(anon):  2643176 kB
Active(file):2455048 kB
Inactive(file): 12385432 kB
Unevictable:8068 kB
Mlocked:8068 kB
SwapTotal:  15616764 kB   < swap was totally unused and stays unused when I 
get the system to crash 
SwapFree:   15578020 kB
Dirty: 71956 kB
Writeback:64 kB
AnonPages:  10219976 kB
Mapped:  4033568 kB
Shmem:   4545552 kB
Slab: 713300 kB
SReclaimable: 395508 kB
SUnreclaim:   317792 kB
KernelStack:   11788 kB
PageTables:52592 kB
NFS_Unstable:  0 kB
Bounce:0 kB
WritebackTmp:  0 kB
CommitLimit:31938660 kB
Committed_AS:   20070736 kB
VmallocTotal:   34359738367 kB
VmallocUsed:   0 kB
VmallocChunk:  0 kB
HardwareCorrupted: 0 kB
AnonHugePages: 0 kB
ShmemHugePages:0 kB
ShmemPmdMapped:0 kB
CmaTotal:  16384 kB
CmaFree:   0 kB
HugePages_Total:   0
HugePages_Free:0
HugePages_Rsvd:0
HugePages_Surp:0
Hugepagesize:   2048 kB
Hugetlb:   0 kB
DirectMap4k: 1207572 kB
DirectMap2M:32045056 kB

Does it help figure out where the memory was going and wehther kernel
memory was being used?

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/   | PGP 7F55D5F27AAF9D08
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH RESEND 2/2] btrfs-progs: make all programs and libraries optional

2018-07-12 Thread Omar Sandoval
From: Omar Sandoval 

We have a build system internally which only needs to build the
libraries out of a repository, not any binaries. I looked at how this
works with other projects, and the best example was util-linux, which
makes it possible to enable or disable everything individually. This is
nice and really flexible, so let's do the same. This way, if you only
want to build and install libbtrfsutil, you can simply do

  ./configure --disable-documentation --disable-all-programs 
--enable-libbtrfsutil
  make
  make install

Signed-off-by: Omar Sandoval 
---
 Makefile| 129 +++-
 Makefile.inc.in |  16 +-
 configure.ac| 138 +---
 3 files changed, 226 insertions(+), 57 deletions(-)

diff --git a/Makefile b/Makefile
index 62102baf..fe71b694 100644
--- a/Makefile
+++ b/Makefile
@@ -206,22 +206,40 @@ endif
 
 MAKEOPTS = --no-print-directory Q=$(Q)
 
-# build all by default
-progs = $(progs_install) btrfsck btrfs-corrupt-block
-
-# install only selected
-progs_install = btrfs mkfs.btrfs btrfs-map-logical btrfs-image \
-   btrfs-find-root btrfstune \
-   btrfs-select-super
-
-# other tools, not built by default
-progs_extra = btrfs-fragments
-
-progs_static = $(foreach p,$(progs),$(p).static)
-
-ifneq ($(DISABLE_BTRFSCONVERT),1)
+ifeq ($(BUILD_BTRFS),1)
+progs_install += btrfs
+progs += btrfsck
+endif
+ifeq ($(BUILD_CONVERT),1)
 progs_install += btrfs-convert
 endif
+ifeq ($(BUILD_CORRUPT_BLOCK),1)
+progs += btrfs-corrupt-block
+endif
+ifeq ($(BUILD_FIND_ROOT),1)
+progs_install += btrfs-find-root
+endif
+ifeq ($(BUILD_FRAGMENTS),1)
+progs += btrfs-fragments
+endif
+ifeq ($(BUILD_IMAGE),1)
+progs_install += btrfs-image
+endif
+ifeq ($(BUILD_MAP_LOGICAL),1)
+progs_install += btrfs-map-logical
+endif
+ifeq ($(BUILD_MKFS),1)
+progs_install += mkfs.btrfs
+endif
+ifeq ($(BUILD_SELECT_SUPER),1)
+progs_install += btrfs-select-super
+endif
+ifeq ($(BUILD_TUNE),1)
+progs_install += btrfstune
+endif
+
+progs += $(progs_install)
+progs_static = $(foreach p,$(progs),$(p).static)
 
 # external libs required by various binaries; for btrfs-foo,
 # specify btrfs_foo_libs = ; see $($(subst...)) rules below
@@ -233,7 +251,7 @@ cmds_restore_cflags = 
-DBTRFSRESTORE_ZSTD=$(BTRFSRESTORE_ZSTD)
 CHECKER_FLAGS += $(btrfs_convert_cflags)
 
 # collect values of the variables above
-standalone_deps = $(foreach dep,$(patsubst %,%_objects,$(subst -,_,$(filter 
btrfs-%, $(progs) $(progs_extra,$($(dep)))
+standalone_deps = $(foreach dep,$(patsubst %,%_objects,$(subst -,_,$(filter 
btrfs-%, $(progs,$($(dep)))
 
 SUBDIRS =
 BUILDDIRS = $(patsubst %,build-%,$(SUBDIRS))
@@ -262,10 +280,21 @@ static_convert_objects = $(patsubst %.o, %.static.o, 
$(convert_objects))
 static_mkfs_objects = $(patsubst %.o, %.static.o, $(mkfs_objects))
 static_image_objects = $(patsubst %.o, %.static.o, $(image_objects))
 
-libs_shared = libbtrfs.so.0.1 libbtrfsutil.so.$(libbtrfsutil_version)
-libs_static = libbtrfs.a libbtrfsutil.a
+ifeq ($(BUILD_LIBBTRFS),1)
+ifeq ($(BUILD_SHARED),1)
+libs_shared += libbtrfs.so.0.1
+lib_links += libbtrfs.so.0 libbtrfs.so
+endif
+libs_static += libbtrfs.a
+endif
+ifeq ($(BUILD_LIBBTRFSUTIL),1)
+ifeq ($(BUILD_SHARED),1)
+libs_shared += libbtrfsutil.so.$(libbtrfsutil_version)
+lib_links += libbtrfsutil.so.$(libbtrfsutil_major) libbtrfsutil.so
+endif
+libs_static += libbtrfsutil.a
+endif
 libs = $(libs_shared) $(libs_static)
-lib_links = libbtrfs.so.0 libbtrfs.so libbtrfsutil.so.$(libbtrfsutil_major) 
libbtrfsutil.so
 
 # make C=1 to enable sparse
 ifdef C
@@ -303,7 +332,7 @@ endif
$($(subst -,_,btrfs-$(@:%/$(notdir $@)=%)-cflags))
 
 all: $(progs) $(libs) $(lib_links) $(BUILDDIRS)
-ifeq ($(PYTHON_BINDINGS),1)
+ifeq ($(BUILD_PYTHON),1)
 all: libbtrfsutil_python
 endif
 $(SUBDIRS): $(BUILDDIRS)
@@ -353,7 +382,7 @@ testsuite: btrfs-corrupt-block fssum
@echo "Export tests as a package"
$(Q)cd tests && ./export-testsuite.sh
 
-ifeq ($(PYTHON_BINDINGS),1)
+ifeq ($(BUILD_PYTHON),1)
 test-libbtrfsutil: libbtrfsutil_python mkfs.btrfs
$(Q)cd libbtrfsutil/python; \
LD_LIBRARY_PATH=../.. $(PYTHON) -m unittest discover -v tests
@@ -413,7 +442,7 @@ libbtrfsutil.so.$(libbtrfsutil_major) libbtrfsutil.so: 
libbtrfsutil.so.$(libbtrf
@echo "[LN] $@"
$(Q)$(LN_S) -f $< $@
 
-ifeq ($(PYTHON_BINDINGS),1)
+ifeq ($(BUILD_PYTHON),1)
 libbtrfsutil_python: libbtrfsutil.so.$(libbtrfsutil_major) libbtrfsutil.so 
libbtrfsutil/btrfsutil.h
@echo "[PY] libbtrfsutil"
$(Q)cd libbtrfsutil/python; \
@@ -439,14 +468,14 @@ btrfs-%.static: btrfs-%.static.o $(static_objects) 
$(patsubst %.o,%.static.o,$(s
$(static_libbtrfs_objects) $(STATIC_LDFLAGS) \
$($(subst -,_,$(subst .static,,$@)-libs)) $(STATIC_LIBS)
 
-btrfs-%: btrfs-%.o $(objects) $(standalone_deps) $(libs_static)
+btrfs-%: btrfs-%.o $(objects) $(standalone_deps) 

[PATCH RESEND 0/2] btrfs-progs: build improvements

2018-07-12 Thread Omar Sandoval
From: Omar Sandoval 

Hi, Dave,

This is a resend of a couple of patches I sent back in April, rebased on
the current devel branch. Patch 1 cleans up some stale build targets,
and patch 2 makes the btrfs-progs build more flexible, so that it's
possible to pick and choose what gets built. Please consider these for
the next progs release.

Thanks!

Omar Sandoval (2):
  btrfs-progs: remove stale dir-test and quick-test
  btrfs-progs: make all programs and libraries optional

 Makefile| 139 -
 Makefile.inc.in |  16 +-
 configure.ac| 138 +++--
 dir-test.c  | 518 
 quick-test.c| 226 -
 5 files changed, 227 insertions(+), 810 deletions(-)
 delete mode 100644 dir-test.c
 delete mode 100644 quick-test.c

-- 
2.18.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH RESEND 1/2] btrfs-progs: remove stale dir-test and quick-test

2018-07-12 Thread Omar Sandoval
From: Omar Sandoval 

These don't build anymore and don't appear to be used for anything.

Signed-off-by: Omar Sandoval 
---
 Makefile |  10 +-
 dir-test.c   | 518 ---
 quick-test.c | 226 --
 3 files changed, 1 insertion(+), 753 deletions(-)
 delete mode 100644 dir-test.c
 delete mode 100644 quick-test.c

diff --git a/Makefile b/Makefile
index 544410e6..62102baf 100644
--- a/Makefile
+++ b/Makefile
@@ -495,14 +495,6 @@ btrfs-convert.static: $(static_convert_objects) 
$(static_objects) $(static_libbt
@echo "[LD] $@"
$(Q)$(CC) -o $@ $^ $(STATIC_LDFLAGS) $(btrfs_convert_libs) 
$(STATIC_LIBS)
 
-dir-test: dir-test.o $(objects) $(libs)
-   @echo "[LD] $@"
-   $(Q)$(CC) -o $@ $^ $(LDFLAGS) $(LIBS)
-
-quick-test: quick-test.o $(objects) $(libs)
-   @echo "[LD] $@"
-   $(Q)$(CC) -o $@ $^ $(LDFLAGS) $(LIBS)
-
 ioctl-test.o: ioctl-test.c ioctl.h kerncompat.h ctree.h
@echo "[CC]   $@"
$(Q)$(CC) $(CFLAGS) -c $< -o $@
@@ -603,7 +595,7 @@ clean: $(CLEANDIRS)
image/*.o image/*.o.d \
convert/*.o convert/*.o.d \
mkfs/*.o mkfs/*.o.d check/*.o check/*.o.d \
- dir-test ioctl-test quick-test library-test library-test-static \
+ ioctl-test library-test library-test-static \
   mktables btrfs.static mkfs.btrfs.static fssum \
  $(check_defs) \
  $(libs) $(lib_links) \
diff --git a/dir-test.c b/dir-test.c
deleted file mode 100644
index cfb77f2a..
--- a/dir-test.c
+++ /dev/null
@@ -1,518 +0,0 @@
-/*
- * Copyright (C) 2007 Oracle.  All rights reserved.
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of the GNU General Public
- * License v2 as published by the Free Software Foundation.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
- * General Public License for more details.
- *
- * You should have received a copy of the GNU General Public
- * License along with this program; if not, write to the
- * Free Software Foundation, Inc., 59 Temple Place - Suite 330,
- * Boston, MA 021110-1307, USA.
- */
-
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include "kerncompat.h"
-#include "radix-tree.h"
-#include "ctree.h"
-#include "disk-io.h"
-#include "print-tree.h"
-#include "hash.h"
-#include "transaction.h"
-
-int keep_running = 1;
-struct btrfs_super_block super;
-static u64 dir_oid = 0;
-static u64 file_oid = 33778;
-
-static int find_num(struct radix_tree_root *root, unsigned long *num_ret,
-int exists)
-{
-   unsigned long num = rand();
-   unsigned long res[2];
-   int ret;
-
-again:
-   ret = radix_tree_gang_lookup(root, (void **)res, num, 2);
-   if (exists) {
-   if (ret == 0)
-   return -1;
-   num = res[0];
-   } else if (ret != 0 && num == res[0]) {
-   num++;
-   if (ret > 1 && num == res[1]) {
-   num++;
-   goto again;
-   }
-   }
-   *num_ret = num;
-   return 0;
-}
-
-static void initial_inode_init(struct btrfs_root *root,
-  struct btrfs_inode_item *inode_item)
-{
-   memset(inode_item, 0, sizeof(*inode_item));
-   btrfs_set_inode_generation(inode_item, root->fs_info->generation);
-   btrfs_set_inode_mode(inode_item, S_IFREG | 0700);
-}
-
-static int ins_one(struct btrfs_trans_handle *trans, struct btrfs_root *root,
-  struct radix_tree_root *radix)
-{
-   int ret;
-   char buf[128];
-   unsigned long oid;
-   u64 objectid;
-   struct btrfs_path path;
-   struct btrfs_key inode_map;
-   struct btrfs_inode_item inode_item;
-
-   find_num(radix, , 0);
-   sprintf(buf, "str-%lu", oid);
-
-   ret = btrfs_find_free_objectid(trans, root, dir_oid + 1, );
-   if (ret)
-   goto error;
-
-   inode_map.objectid = objectid;
-   inode_map.flags = 0;
-   inode_map.type = BTRFS_INODE_ITEM_KEY;
-   inode_map.offset = 0;
-
-   initial_inode_init(root, _item);
-   ret = btrfs_insert_inode(trans, root, objectid, _item);
-   if (ret)
-   goto error;
-   ret = btrfs_insert_dir_item(trans, root, buf, strlen(buf), dir_oid,
-   _map, BTRFS_FT_UNKNOWN);
-   if (ret)
-   goto error;
-
-   radix_tree_preload(GFP_KERNEL);
-   ret = radix_tree_insert(radix, oid, (void *)oid);
-   radix_tree_preload_end();
-   if (ret)
-   goto error;
-   return ret;
-error:
-   if (ret != -EEXIST)
-   goto fatal;
-
-   /*
-* if we got an 

Re: [PATCH v2] btrfs: use customized batch size for total_bytes_pinned

2018-07-12 Thread Omar Sandoval
On Wed, Jul 11, 2018 at 11:59:36PM +0800, Ethan Lien wrote:
> In commit b150a4f10d878 ("Btrfs: use a percpu to keep track of possibly
> pinned bytes") we use total_bytes_pinned to track how many bytes we are
> going to free in this transaction. When we are close to ENOSPC, we check it
> and know if we can make the allocation by commit the current transaction.
> For every data/metadata extent we are going to free, we add
> total_bytes_pinned in btrfs_free_extent() and btrfs_free_tree_block(), and
> release it in unpin_extent_range() when we finish the transaction. So this
> is a variable we frequently update but rarely read - just the suitable
> use of percpu_counter. But in previous commit we update total_bytes_pinned
> by default 32 batch size, making every update essentially a spin lock
> protected update. Since every spin lock/unlock operation involves syncing
> a globally used variable and some kind of barrier in a SMP system, this is
> more expensive than using total_bytes_pinned as a simple atomic64_t. So
> fix this by using a customized batch size. Since we only read
> total_bytes_pinned when we are close to ENOSPC and fail to alloc new chunk,
> we can use a really large batch size and have nearly no penalty in most
> cases.
> 
> 
> [Test]
> We test the patch on a 4-cores x86 machine:
> 1. falloate a 16GiB size test file.
> 2. take snapshot (so all following writes will be cow write).
> 3. run a 180 sec, 4 jobs, 4K random write fio on test file.
> 
> We also add a temporary lockdep class on percpu_counter's spin lock used
> by total_bytes_pinned to track lock_stat.
> 
> 
> [Results]
> unpatched:
> lock_stat version 0.4
> ---
>   class namecon-bouncescontentions
> waittime-min   waittime-max waittime-total   waittime-avgacq-bounces
> acquisitions   holdtime-min   holdtime-max holdtime-total   holdtime-avg
> 
>total_bytes_pinned_percpu:82 82
> 0.21   0.61  29.46   0.36 298340
>   635973   0.09  11.01  173476.25   0.27
> 
> 
> patched:
> lock_stat version 0.4
> ---
>   class namecon-bouncescontentions
> waittime-min   waittime-max waittime-total   waittime-avgacq-bounces
> acquisitions   holdtime-min   holdtime-max holdtime-total   holdtime-avg
> 
>total_bytes_pinned_percpu: 1  1
> 0.62   0.62   0.62   0.62  13601
>31542   0.14   9.61   11016.90   0.35
> 
> 
> [Analysis]
> Since the spin lock only protect a single in-memory variable, the
> contentions (number of lock acquisitions that had to wait) in both
> unpatched and patched version are low. But when we see acquisitions and
> acq-bounces, we get much lower counts in patched version. Here the most
> important metric is acq-bounces. It means how many times the lock get
> transferred between different cpus, so the patch can really recude
> cacheline bouncing of spin lock (also the global counter of percpu_counter)
> in a SMP system.
> 
> Fixes: b150a4f10d878 ("Btrfs: use a percpu to keep track of possibly
> pinned bytes")
> 
> Signed-off-by: Ethan Lien 
> ---
> 
> V2:
>   Rewrite commit comments.
>   Add lock_stat test.
>   Pull dirty_metadata_bytes out to a separate patch.
> 
>  fs/btrfs/ctree.h   |  1 +
>  fs/btrfs/extent-tree.c | 46 --
>  2 files changed, 32 insertions(+), 15 deletions(-)
> 
> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> index 118346aceea9..df682a521635 100644
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -422,6 +422,7 @@ struct btrfs_space_info {
>* time the transaction commits.
>*/
>   struct percpu_counter total_bytes_pinned;
> + s32 total_bytes_pinned_batch;

Can this just be a constant instead of adding it to space_info?
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Transaction aborted (error -28) btrfs_run_delayed_refs*0x163/0x190

2018-07-12 Thread Pete
On 07/12/2018 07:07 PM, Pete wrote:
> On 07/12/2018 08:11 AM, Nikolay Borisov wrote:
>>
>>
>> This one shouldn't have gone RO since it has plenty of unallocated and
>> free space. What was the workload at the time it went RO? Hard to say,
>> it's best if you can provide output with the debug patch applied when
>> this issue re-appears.
>>
> 
> I'll respond in a separate email to list, as the above is the more
> important issue and I need to trawl the logs.

Nothing seen, though I recently had the disks go read-only.  I'll wait
and see what happens.

Pete
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Transaction aborted (error -28) btrfs_run_delayed_refs*0x163/0x190

2018-07-12 Thread Pete
On 07/12/2018 08:11 AM, Nikolay Borisov wrote:
> 
> 
> This one shouldn't have gone RO since it has plenty of unallocated and
> free space. What was the workload at the time it went RO? Hard to say,
> it's best if you can provide output with the debug patch applied when
> this issue re-appears.
> 

I'll respond in a separate email to list, as the above is the more
important issue and I need to trawl the logs.

>>
>>
>> Incidental I've been running out of space on my ssd which contains / and
>> /home - which I am sorting.
>>
>> root@phoenix:~# btrfs fi usage /
>>
>> Overall:
>>
>> Device size: 350.00GiB
>>
>> Device allocated:350.00GiB
>>
>> Device unallocated:1.00MiB
>>
>> Device missing:  0.00B
>>
>> Used:324.74GiB
>>
>> Free (estimated): 23.28GiB  (min: 23.28GiB)
>>
>> Data ratio:   1.00
>>
>> Metadata ratio:   1.00
>>
>> Global reserve:  512.00MiB  (used: 0.00B)
>>
> SO this doesn't look healthy, essentially you don't have any unallocated
> space on your device. I will suggest you run balance to try and compact
> the space in your data groups and hopefully free up some space. As a
> first step you can try and run :
> 
> btrfs balance start -dusage=0 -musage=0 /
> 
> This will try and reclaim any non-used block groups i.e it could bring
> some unallocated space back. Then you can run 'btrfs fi us / ' to see if
> this is the case. Then I'd suggest you run something like:
> 
> 'btrfs balance start -dusage=60 -musage=60 /' this will try and compact
> all data/metadata chunks which are less than 60% full.

Thank you, appreciated.  I have run the above and listed the usage.

root@phoenix:~# btrfs balance start -dusage=0 -musage=0 /
Done, had to relocate 0 out of 351 chunks
root@phoenix:~# btrfs fi usage /
Overall:

Device size: 350.00GiB

Device allocated:349.03GiB

Device unallocated:  992.00MiB

Device missing:  0.00B

Used:324.77GiB
Free (estimated): 24.22GiB  (min: 24.22GiB)
Data ratio:   1.00
Metadata ratio:   1.00
Global reserve:  512.00MiB  (used: 0.00B)

Data,single: Size:343.00GiB, Used:319.75GiB
   /dev/disk/by-label/desk-system343.00GiB

Metadata,single: Size:6.00GiB, Used:5.02GiB
   /dev/disk/by-label/desk-system  6.00GiB

System,single: Size:32.00MiB, Used:64.00KiB
   /dev/disk/by-label/desk-system 32.00MiB

Unallocated:
   /dev/disk/by-label/desk-system992.00MiB
root@phoenix:~#


root@phoenix:~# btrfs balance start -dusage=60 -musage=60 /
Done, had to relocate 15 out of 351 chunks
root@phoenix:~# btrfs fi us /
Overall:
Device size: 350.00GiB
Device allocated:335.06GiB
Device unallocated:   14.94GiB
Device missing:  0.00B
Used:324.77GiB
Free (estimated): 24.22GiB  (min: 24.22GiB)
Data ratio:   1.00
Metadata ratio:   1.00
Global reserve:  512.00MiB  (used: 0.00B)

Data,single: Size:329.03GiB, Used:319.75GiB
   /dev/disk/by-label/desk-system329.03GiB

Metadata,single: Size:6.00GiB, Used:5.02GiB
   /dev/disk/by-label/desk-system  6.00GiB

System,single: Size:32.00MiB, Used:64.00KiB
   /dev/disk/by-label/desk-system 32.00MiB

Unallocated:
   /dev/disk/by-label/desk-system 14.94GiB
root@phoenix:~#

I believe this is simply coincidental to the error I was getting.  I've
got to do some housekeeping.  I think running out of disk space is a
combination of having root on the disk, installing another distro in
another subvolume, having losts of old kernels and modules around and
snapshotting directores that are likely churning files (.mozilla?).




--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] btrfs: use customized batch size for total_bytes_pinned

2018-07-12 Thread ethanlien

Nikolay Borisov 於 2018-07-12 15:07 寫到:

On 11.07.2018 18:59, Ethan Lien wrote:
In commit b150a4f10d878 ("Btrfs: use a percpu to keep track of 
possibly
pinned bytes") we use total_bytes_pinned to track how many bytes we 
are
going to free in this transaction. When we are close to ENOSPC, we 
check it
and know if we can make the allocation by commit the current 
transaction.

For every data/metadata extent we are going to free, we add
total_bytes_pinned in btrfs_free_extent() and btrfs_free_tree_block(), 
and
release it in unpin_extent_range() when we finish the transaction. So 
this

is a variable we frequently update but rarely read - just the suitable
use of percpu_counter. But in previous commit we update 
total_bytes_pinned

by default 32 batch size, making every update essentially a spin lock
protected update. Since every spin lock/unlock operation involves 
syncing
a globally used variable and some kind of barrier in a SMP system, 
this is
more expensive than using total_bytes_pinned as a simple atomic64_t. 
So

fix this by using a customized batch size. Since we only read
total_bytes_pinned when we are close to ENOSPC and fail to alloc new 
chunk,
we can use a really large batch size and have nearly no penalty in 
most

cases.


[Test]
We test the patch on a 4-cores x86 machine:
1. falloate a 16GiB size test file.
2. take snapshot (so all following writes will be cow write).
3. run a 180 sec, 4 jobs, 4K random write fio on test file.

We also add a temporary lockdep class on percpu_counter's spin lock 
used

by total_bytes_pinned to track lock_stat.


[Results]
unpatched:
lock_stat version 0.4
---
  class namecon-bouncescontentions
waittime-min   waittime-max waittime-total   waittime-avg
acq-bounces
acquisitions   holdtime-min   holdtime-max holdtime-total   
holdtime-avg


   total_bytes_pinned_percpu:82 82
0.21   0.61  29.46   0.36 
298340
  635973   0.09  11.01  173476.25   
0.27



patched:
lock_stat version 0.4
---
  class namecon-bouncescontentions
waittime-min   waittime-max waittime-total   waittime-avg
acq-bounces
acquisitions   holdtime-min   holdtime-max holdtime-total   
holdtime-avg


   total_bytes_pinned_percpu: 1  1
0.62   0.62   0.62   0.62  
13601
   31542   0.14   9.61   11016.90   
0.35




According to these numbers though, the best-case (waittime-min and the
average waittime-avg) have actually regressed. So what really saves you


In patched version we only got one contentions count so the min/max/avg
waittime all come from that count. I think the min/max/avg waittime here
are inconclusive since we have only one sample count so it could be 
biased.




is the fact that the number of time we had to go to the best/average
case is reduced, due to the larger batch. I guess in that regard you 
are

in the clear. Another pertinent question is did you observe any
significant impact on run times of actual workloads or, say, 
transaction

commit times or anything like that? I think this case will only be hit
when the filesystem is struggling to satisfy a metadata allocation  has
to cycle through all stages .


The frequently update of total_bytes_pinned can happen in normal cow 
write path.
In our test, any re-write to a existing 4K data block will trigger cow 
and we release
old data block by btrfs_free_extent() and in that time we add 
total_bytes_pinned.
As the test file is written out, eventually almost every 4K write will 
trigger a +4K and a
-4K update to total_bytes_pinned. But so far we haven't seen significant 
performance

improvement from the test.





[Analysis]
Since the spin lock only protect a single in-memory variable, the
contentions (number of lock acquisitions that had to wait) in both
unpatched and patched version are low. But when we see acquisitions 
and
acq-bounces, we get much lower counts in patched version. Here the 
most

important metric is acq-bounces. It means how many times the lock get
transferred between different cpus, so the patch can really recude


nit: s/recude/reduce - no need to resend i guess david will fix it on
the way into the tree

cacheline bouncing of spin lock (also the global counter of 
percpu_counter)

in a SMP system.

Fixes: b150a4f10d878 ("Btrfs: use a percpu to keep track of possibly
pinned bytes")

Signed-off-by: Ethan Lien 


 Reviewed-by: Nikolay Borisov 


---

V2:
Rewrite commit comments.
Add lock_stat test.
Pull dirty_metadata_bytes out to a separate patch.

 fs/btrfs/ctree.h   |  1 +
 fs/btrfs/extent-tree.c | 46 
--

 2 files 

Re: [DOC] BTRFS Volume operations, Device Lists and Locks all in one page

2018-07-12 Thread Anand Jain




On 07/12/2018 08:59 PM, Qu Wenruo wrote:



On 2018年07月12日 20:33, Anand Jain wrote:



On 07/12/2018 01:43 PM, Qu Wenruo wrote:



On 2018年07月11日 15:50, Anand Jain wrote:



BTRFS Volume operations, Device Lists and Locks all in one page:

Devices are managed in two contexts, the scan context and the mounted
context. In scan context the threads originate from the btrfs_control
ioctl and in the mounted context the threads originates from the mount
point ioctl.
Apart from these two context, there also can be two transient state
where device state are transitioning from the scan to the mount context
or from the mount to the scan context.

Device List and Locks:-

   Count: btrfs_fs_devices::num_devices
   List : btrfs_fs_devices::devices -> btrfs_devices::dev_list
   Lock : btrfs_fs_devices::device_list_mutex

   Count: btrfs_fs_devices::rw_devices


So btrfs_fs_devices::num_devices = btrfs_fs_devices::rw_devices + RO
devices.
How seed and ro devices are different in this case?


  Given:
  btrfs_fs_devices::total_devices = btrfs_super_num_devices(disk_super);

  Consider no missing devices, no replace target, no seeding. Then,
    btrfs_fs_devices::total_devices == btrfs_fs_devices::num_devices

  And in case of seeding.
    btrfs_fs_devices::total_devices  == (btrfs_fs_devices::num_devices +
     btrfs_fs_devices::seed::total_devices

    All devices in the list [1] are RW/Sprout
  [1] fs_info::btrfs_fs_devices::devices
    All devices in the list [2] are RO/Seed


 to avoid confusion I shall remove RO here
   All devices in the list [2] are Seed


  [2] fs_info::btrfs_fs_devices::seed::devices


  Thanks for asking will add this part to the doc.


Another question is, what if a device is RO but not seed?

E.g. loopback device set to RO.
IMHO it won't be mounted RW for single device case, but not sure for
multi device case.


 RO devices are different from the seed devices. If any one device is
 RO then FS is mounted in RO.

 And the btrfs_fs_devices::seed will still be NULL.






   List : btrfs_fs_devices::alloc_list -> btrfs_devices::dev_alloc_list
   Lock : btrfs_fs_info::chunk_mutex


At least the chunk_mutex is also shared with chunk allocator,


  Right.


or we
should have some mutex in btrfs_fs_devices other than fs_info.
Right?


  More locks? no. But some of the locks-and-flags are wrongly
  belong to fs_info instead it should have been in fs_devices.
  When the dust settles planning to propose to migrate them
  to fs_devices.


OK, migrating to fs_devices looks good to me then.




   Lock: set_bit btrfs_fs_info::flags::BTRFS_FS_EXCL_OP

FSID List and Lock:-

   Count : None
   HEAD  : Global::fs_uuids -> btrfs_fs_devices::fs_list
   Lock  : Global::uuid_mutex


After the fs_devices is mounted, the btrfs_fs_devices::opened > 0.


fs_devices::opended should be btrfs_fs_devices::num_devices if no device
is missing and -1 or -2 for degraded case, right?


  No. I think you are getting confused with
     btrfs_fs_devices::open_devices

  btrfs_fs_devices::opened
   indicate how many times the volume is opened. And in reality it would
  stay at 1 always. (except for a short duration of time during
  subsequent subvol mount).


Thanks, this makes sense.





In the scan context we have the following device operations..

Device SCAN:-  which creates the btrfs_fs_devices and its corresponding
btrfs_device entries, also checks and frees the duplicate device
entries.
Lock: uuid_mutex
    SCAN
    if (found_duplicate && btrfs_fs_devices::opened == 0)
   Free_duplicate
Unlock: uuid_mutex

Device READY:- check if the volume is ready. Also does an implicit scan
and duplicate device free as in Device SCAN.
Lock: uuid_mutex
    SCAN
    if (found_duplicate && btrfs_fs_devices::opened == 0)
   Free_duplicate
    Check READY
Unlock: uuid_mutex

Device FORGET:- (planned) free a given or all unmounted devices and
empty fs_devices if any.
Lock: uuid_mutex
    if (found_duplicate && btrfs_fs_devices::opened == 0)
  Free duplicate
Unlock: uuid_mutex

Device mount operation -> A Transient state leading to the mounted
context
Lock: uuid_mutex
   Find, SCAN, btrfs_fs_devices::opened++
Unlock: uuid_mutex

Device umount operation -> A transient state leading to the unmounted
context or scan context
Lock: uuid_mutex
    btrfs_fs_devices::opened--
Unlock: uuid_mutex


In the mounted context we have the following device operations..

Device Rename through SCAN:- This is a special case where the device
path gets renamed after its been mounted. (Ubuntu changes the boot path
during boot up so we need this feature). Currently, this is part of
Device SCAN as above. And we need the locks as below, because the
dynamic disappearing device might cleanup the btrfs_device::name
Lock: btrfs_fs_devices::device_list_mutex
     Rename
Unlock: btrfs_fs_devices::device_list_mutex

Commit Transaction:- Write All supers.
Lock: btrfs_fs_devices::device_list_mutex
    Write all super of 

Re: [DOC] BTRFS Volume operations, Device Lists and Locks all in one page

2018-07-12 Thread Qu Wenruo


On 2018年07月12日 20:33, Anand Jain wrote:
> 
> 
> On 07/12/2018 01:43 PM, Qu Wenruo wrote:
>>
>>
>> On 2018年07月11日 15:50, Anand Jain wrote:
>>>
>>>
>>> BTRFS Volume operations, Device Lists and Locks all in one page:
>>>
>>> Devices are managed in two contexts, the scan context and the mounted
>>> context. In scan context the threads originate from the btrfs_control
>>> ioctl and in the mounted context the threads originates from the mount
>>> point ioctl.
>>> Apart from these two context, there also can be two transient state
>>> where device state are transitioning from the scan to the mount context
>>> or from the mount to the scan context.
>>>
>>> Device List and Locks:-
>>>
>>>   Count: btrfs_fs_devices::num_devices
>>>   List : btrfs_fs_devices::devices -> btrfs_devices::dev_list
>>>   Lock : btrfs_fs_devices::device_list_mutex
>>>
>>>   Count: btrfs_fs_devices::rw_devices
>>
>> So btrfs_fs_devices::num_devices = btrfs_fs_devices::rw_devices + RO
>> devices.
>> How seed and ro devices are different in this case?
> 
>  Given:
>  btrfs_fs_devices::total_devices = btrfs_super_num_devices(disk_super);
> 
>  Consider no missing devices, no replace target, no seeding. Then,
>    btrfs_fs_devices::total_devices == btrfs_fs_devices::num_devices
> 
>  And in case of seeding.
>    btrfs_fs_devices::total_devices  == (btrfs_fs_devices::num_devices +
>     btrfs_fs_devices::seed::total_devices
> 
>    All devices in the list [1] are RW/Sprout
>  [1] fs_info::btrfs_fs_devices::devices
>    All devices in the list [2] are RO/Seed
>  [2] fs_info::btrfs_fs_devices::seed::devices
> 
> 
>  Thanks for asking will add this part to the doc.

Another question is, what if a device is RO but not seed?

E.g. loopback device set to RO.
IMHO it won't be mounted RW for single device case, but not sure for
multi device case.

> 
> 
>>
>>>   List : btrfs_fs_devices::alloc_list -> btrfs_devices::dev_alloc_list
>>>   Lock : btrfs_fs_info::chunk_mutex
>>
>> At least the chunk_mutex is also shared with chunk allocator,
> 
>  Right.
> 
>> or we
>> should have some mutex in btrfs_fs_devices other than fs_info.
>> Right?
> 
>  More locks? no. But some of the locks-and-flags are wrongly
>  belong to fs_info instead it should have been in fs_devices.
>  When the dust settles planning to propose to migrate them
>  to fs_devices.

OK, migrating to fs_devices looks good to me then.

> 
>>>   Lock: set_bit btrfs_fs_info::flags::BTRFS_FS_EXCL_OP
>>>
>>> FSID List and Lock:-
>>>
>>>   Count : None
>>>   HEAD  : Global::fs_uuids -> btrfs_fs_devices::fs_list
>>>   Lock  : Global::uuid_mutex
>>>
>>>
>>> After the fs_devices is mounted, the btrfs_fs_devices::opened > 0.
>>
>> fs_devices::opended should be btrfs_fs_devices::num_devices if no device
>> is missing and -1 or -2 for degraded case, right?
> 
>  No. I think you are getting confused with
>     btrfs_fs_devices::open_devices
> 
>  btrfs_fs_devices::opened
>   indicate how many times the volume is opened. And in reality it would
>  stay at 1 always. (except for a short duration of time during
>  subsequent subvol mount).

Thanks, this makes sense.

> 
> 
>>> In the scan context we have the following device operations..
>>>
>>> Device SCAN:-  which creates the btrfs_fs_devices and its corresponding
>>> btrfs_device entries, also checks and frees the duplicate device
>>> entries.
>>> Lock: uuid_mutex
>>>    SCAN
>>>    if (found_duplicate && btrfs_fs_devices::opened == 0)
>>>   Free_duplicate
>>> Unlock: uuid_mutex
>>>
>>> Device READY:- check if the volume is ready. Also does an implicit scan
>>> and duplicate device free as in Device SCAN.
>>> Lock: uuid_mutex
>>>    SCAN
>>>    if (found_duplicate && btrfs_fs_devices::opened == 0)
>>>   Free_duplicate
>>>    Check READY
>>> Unlock: uuid_mutex
>>>
>>> Device FORGET:- (planned) free a given or all unmounted devices and
>>> empty fs_devices if any.
>>> Lock: uuid_mutex
>>>    if (found_duplicate && btrfs_fs_devices::opened == 0)
>>>  Free duplicate
>>> Unlock: uuid_mutex
>>>
>>> Device mount operation -> A Transient state leading to the mounted
>>> context
>>> Lock: uuid_mutex
>>>   Find, SCAN, btrfs_fs_devices::opened++
>>> Unlock: uuid_mutex
>>>
>>> Device umount operation -> A transient state leading to the unmounted
>>> context or scan context
>>> Lock: uuid_mutex
>>>    btrfs_fs_devices::opened--
>>> Unlock: uuid_mutex
>>>
>>>
>>> In the mounted context we have the following device operations..
>>>
>>> Device Rename through SCAN:- This is a special case where the device
>>> path gets renamed after its been mounted. (Ubuntu changes the boot path
>>> during boot up so we need this feature). Currently, this is part of
>>> Device SCAN as above. And we need the locks as below, because the
>>> dynamic disappearing device might cleanup the btrfs_device::name
>>> Lock: btrfs_fs_devices::device_list_mutex
>>>     Rename
>>> Unlock: btrfs_fs_devices::device_list_mutex
>>>
>>> Commit Transaction:- 

Re: About hung task on generic/041

2018-07-12 Thread Lu Fengqi
On Thu, Jul 12, 2018 at 11:40:54AM +0100, Filipe Manana wrote:
>On Wed, Jul 11, 2018 at 10:02 AM, Lu Fengqi  wrote:
>> Hi,
>>
>> When I run generic/041 with v4.18-rc3 (turn on kasan and hung task
>> detection), btrfs-transaction kthread will trigger the hung task timeout
>> (stall at wait_event in btrfs_commit_transaction). At the same time, you
>> can see that xfs_io -c fsync will occupy 100% of the CPU. I am not sure
>> whether this is a problem. Any suggestion?
>
>Well, something at 100% cpu and that seems hang forever is definitely
>a problem, specially a workload as simple as the one in generic/041

To clarify, the hung task will end within 500s. Without KASAN, it will
end within 80s, so it won't trigger hung task timeout 120s. I'm not sure
if this is just slow, or have some problem?

>(never happened to me, even on vanilla 4.18-rc4).
>Do you have the stack trace for the fsync task? What you pasted below

I will send the stack trace tomorrow.

-- 
Thanks,
Lu

>is only for the transaction kthread and that alone doesn't help.
>
>>
>> [Wed Jul 11 15:50:08 2018] INFO: task btrfs-transacti:1053 blocked for more 
>> than 120 seconds.
>> [Wed Jul 11 15:50:08 2018]   Not tainted 4.18.0-rc3-custom #14
>> [Wed Jul 11 15:50:08 2018] "echo 0 > 
>> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> [Wed Jul 11 15:50:08 2018] btrfs-transacti D0  1053  2 0x8000
>> [Wed Jul 11 15:50:08 2018] Call Trace:
>> [Wed Jul 11 15:50:08 2018]  ? __schedule+0x5b2/0x1380
>> [Wed Jul 11 15:50:08 2018]  ? check_flags.part.23+0x240/0x240
>> [Wed Jul 11 15:50:08 2018]  ? firmware_map_remove+0x187/0x187
>> [Wed Jul 11 15:50:08 2018]  ? ___preempt_schedule+0x16/0x18
>> [Wed Jul 11 15:50:08 2018]  ? mark_held_locks+0x6e/0x90
>> [Wed Jul 11 15:50:08 2018]  ? _raw_spin_unlock_irqrestore+0x59/0x70
>> [Wed Jul 11 15:50:08 2018]  ? preempt_count_sub+0x14/0xc0
>> [Wed Jul 11 15:50:08 2018]  ? _raw_spin_unlock_irqrestore+0x46/0x70
>> [Wed Jul 11 15:50:08 2018]  ? prepare_to_wait_event+0x191/0x410
>> [Wed Jul 11 15:50:08 2018]  ? prepare_to_wait_exclusive+0x210/0x210
>> [Wed Jul 11 15:50:08 2018]  ? print_usage_bug+0x3a0/0x3a0
>> [Wed Jul 11 15:50:08 2018]  ? do_raw_spin_unlock+0x10f/0x1e0
>> [Wed Jul 11 15:50:08 2018]  ? do_raw_spin_trylock+0x120/0x120
>> [Wed Jul 11 15:50:08 2018]  schedule+0xca/0x260
>> [Wed Jul 11 15:50:08 2018]  ? rcu_lockdep_current_cpu_online+0x12b/0x160
>> [Wed Jul 11 15:50:08 2018]  ? __schedule+0x1380/0x1380
>> [Wed Jul 11 15:50:08 2018]  ? ___might_sleep+0x126/0x370
>> [Wed Jul 11 15:50:08 2018]  ? init_wait_entry+0xc7/0x100
>> [Wed Jul 11 15:50:08 2018]  ? __wake_up_locked_key_bookmark+0x20/0x20
>> [Wed Jul 11 15:50:08 2018]  ? __btrfs_run_delayed_items+0x1e5/0x280 [btrfs]
>> [Wed Jul 11 15:50:08 2018]  ? __might_sleep+0x31/0xd0
>> [Wed Jul 11 15:50:08 2018]  btrfs_commit_transaction+0x122a/0x1640 [btrfs]
>> [Wed Jul 11 15:50:08 2018]  ? btrfs_apply_pending_changes+0x90/0x90 [btrfs]
>> [Wed Jul 11 15:50:08 2018]  ? wait_woken+0x150/0x150
>> [Wed Jul 11 15:50:08 2018]  ? ret_from_fork+0x27/0x50
>> [Wed Jul 11 15:50:08 2018]  ? ret_from_fork+0x27/0x50
>> [Wed Jul 11 15:50:08 2018]  ? deref_stack_reg+0xe0/0xe0
>> [Wed Jul 11 15:50:08 2018]  ? __module_text_address+0x63/0xa0
>> [Wed Jul 11 15:50:08 2018]  ? preempt_count_sub+0x14/0xc0
>> [Wed Jul 11 15:50:08 2018]  ? transaction_kthread+0x161/0x240 [btrfs]
>> [Wed Jul 11 15:50:08 2018]  ? is_module_text_address+0x2b/0x50
>> [Wed Jul 11 15:50:08 2018]  ? transaction_kthread+0x1d9/0x240 [btrfs]
>> [Wed Jul 11 15:50:08 2018]  ? kernel_text_address+0x5a/0x100
>> [Wed Jul 11 15:50:08 2018]  ? deactivate_slab.isra.27+0x64f/0x7a0
>> [Wed Jul 11 15:50:08 2018]  ? __save_stack_trace+0x82/0x100
>> [Wed Jul 11 15:50:08 2018]  ? kasan_kmalloc+0x142/0x170
>> [Wed Jul 11 15:50:08 2018]  ? kmem_cache_alloc+0xfc/0x2e0
>> [Wed Jul 11 15:50:08 2018]  ? start_transaction+0x596/0x930 [btrfs]
>> [Wed Jul 11 15:50:08 2018]  ? transaction_kthread+0x1d9/0x240 [btrfs]
>> [Wed Jul 11 15:50:08 2018]  ? kthread+0x1b9/0x1e0
>> [Wed Jul 11 15:50:08 2018]  ? ret_from_fork+0x27/0x50
>> [Wed Jul 11 15:50:08 2018]  ? deactivate_slab.isra.27+0x64f/0x7a0
>> [Wed Jul 11 15:50:08 2018]  ? mark_lock+0x149/0xa80
>> [Wed Jul 11 15:50:08 2018]  ? init_object+0x6b/0x80
>> [Wed Jul 11 15:50:08 2018]  ? print_usage_bug+0x3a0/0x3a0
>> [Wed Jul 11 15:50:08 2018]  ? ___slab_alloc+0x62a/0x690
>> [Wed Jul 11 15:50:08 2018]  ? ___slab_alloc+0x62a/0x690
>> [Wed Jul 11 15:50:08 2018]  ? __lock_is_held+0x8c/0xe0
>> [Wed Jul 11 15:50:08 2018]  ? start_transaction+0x596/0x930 [btrfs]
>> [Wed Jul 11 15:50:08 2018]  ? preempt_count_sub+0x14/0xc0
>> [Wed Jul 11 15:50:08 2018]  ? rcu_lockdep_current_cpu_online+0x12b/0x160
>> [Wed Jul 11 15:50:08 2018]  ? rcu_oom_callback+0x40/0x40
>> [Wed Jul 11 15:50:08 2018]  ? __lock_is_held+0x8c/0xe0
>> [Wed Jul 11 15:50:08 2018]  ? start_transaction+0x596/0x930 [btrfs]
>> [Wed Jul 11 15:50:08 2018]  ? rcu_read_lock_sched_held+0x8f/0xa0
>> [Wed Jul 11 

Re: [DOC] BTRFS Volume operations, Device Lists and Locks all in one page

2018-07-12 Thread Anand Jain




On 07/12/2018 01:43 PM, Qu Wenruo wrote:



On 2018年07月11日 15:50, Anand Jain wrote:



BTRFS Volume operations, Device Lists and Locks all in one page:

Devices are managed in two contexts, the scan context and the mounted
context. In scan context the threads originate from the btrfs_control
ioctl and in the mounted context the threads originates from the mount
point ioctl.
Apart from these two context, there also can be two transient state
where device state are transitioning from the scan to the mount context
or from the mount to the scan context.

Device List and Locks:-

  Count: btrfs_fs_devices::num_devices
  List : btrfs_fs_devices::devices -> btrfs_devices::dev_list
  Lock : btrfs_fs_devices::device_list_mutex

  Count: btrfs_fs_devices::rw_devices


So btrfs_fs_devices::num_devices = btrfs_fs_devices::rw_devices + RO
devices.
How seed and ro devices are different in this case?


 Given:
 btrfs_fs_devices::total_devices = btrfs_super_num_devices(disk_super);

 Consider no missing devices, no replace target, no seeding. Then,
   btrfs_fs_devices::total_devices == btrfs_fs_devices::num_devices

 And in case of seeding.
   btrfs_fs_devices::total_devices  == (btrfs_fs_devices::num_devices +
btrfs_fs_devices::seed::total_devices

   All devices in the list [1] are RW/Sprout
 [1] fs_info::btrfs_fs_devices::devices
   All devices in the list [2] are RO/Seed
 [2] fs_info::btrfs_fs_devices::seed::devices


 Thanks for asking will add this part to the doc.





  List : btrfs_fs_devices::alloc_list -> btrfs_devices::dev_alloc_list
  Lock : btrfs_fs_info::chunk_mutex


At least the chunk_mutex is also shared with chunk allocator,


 Right.


or we
should have some mutex in btrfs_fs_devices other than fs_info.
Right?


 More locks? no. But some of the locks-and-flags are wrongly
 belong to fs_info instead it should have been in fs_devices.
 When the dust settles planning to propose to migrate them
 to fs_devices.


  Lock: set_bit btrfs_fs_info::flags::BTRFS_FS_EXCL_OP

FSID List and Lock:-

  Count : None
  HEAD  : Global::fs_uuids -> btrfs_fs_devices::fs_list
  Lock  : Global::uuid_mutex


After the fs_devices is mounted, the btrfs_fs_devices::opened > 0.


fs_devices::opended should be btrfs_fs_devices::num_devices if no device
is missing and -1 or -2 for degraded case, right?


 No. I think you are getting confused with
btrfs_fs_devices::open_devices

 btrfs_fs_devices::opened
  indicate how many times the volume is opened. And in reality it would
 stay at 1 always. (except for a short duration of time during
 subsequent subvol mount).



In the scan context we have the following device operations..

Device SCAN:-  which creates the btrfs_fs_devices and its corresponding
btrfs_device entries, also checks and frees the duplicate device entries.
Lock: uuid_mutex
   SCAN
   if (found_duplicate && btrfs_fs_devices::opened == 0)
  Free_duplicate
Unlock: uuid_mutex

Device READY:- check if the volume is ready. Also does an implicit scan
and duplicate device free as in Device SCAN.
Lock: uuid_mutex
   SCAN
   if (found_duplicate && btrfs_fs_devices::opened == 0)
  Free_duplicate
   Check READY
Unlock: uuid_mutex

Device FORGET:- (planned) free a given or all unmounted devices and
empty fs_devices if any.
Lock: uuid_mutex
   if (found_duplicate && btrfs_fs_devices::opened == 0)
     Free duplicate
Unlock: uuid_mutex

Device mount operation -> A Transient state leading to the mounted context
Lock: uuid_mutex
  Find, SCAN, btrfs_fs_devices::opened++
Unlock: uuid_mutex

Device umount operation -> A transient state leading to the unmounted
context or scan context
Lock: uuid_mutex
   btrfs_fs_devices::opened--
Unlock: uuid_mutex


In the mounted context we have the following device operations..

Device Rename through SCAN:- This is a special case where the device
path gets renamed after its been mounted. (Ubuntu changes the boot path
during boot up so we need this feature). Currently, this is part of
Device SCAN as above. And we need the locks as below, because the
dynamic disappearing device might cleanup the btrfs_device::name
Lock: btrfs_fs_devices::device_list_mutex
    Rename
Unlock: btrfs_fs_devices::device_list_mutex

Commit Transaction:- Write All supers.
Lock: btrfs_fs_devices::device_list_mutex
   Write all super of btrfs_devices::dev_list
Unlock: btrfs_fs_devices::device_list_mutex

Device add:- Add a new device to the existing mounted volume.
set_bit: btrfs_fs_info::flags::BTRFS_FS_EXCL_OP
Lock: btrfs_fs_devices::device_list_mutex
Lock: btrfs_fs_info::chunk_mutex
    List_add btrfs_devices::dev_list
    List_add btrfs_devices::dev_alloc_list
Unlock: btrfs_fs_info::chunk_mutex
Unlock: btrfs_fs_devices::device_list_mutex

Device remove:- Remove a device from the mounted volume.
set_bit: btrfs_fs_info::flags::BTRFS_FS_EXCL_OP
Lock: btrfs_fs_devices::device_list_mutex
Lock: btrfs_fs_info::chunk_mutex
    List_del btrfs_devices::dev_list
    List_del 

[v10.4 2/5] btrfs-progs: dedupe: Add enable command for dedupe command group

2018-07-12 Thread Lu Fengqi
From: Qu Wenruo 

Add enable subcommand for dedupe commmand group.

Signed-off-by: Qu Wenruo 
Signed-off-by: Lu Fengqi 
---
v10.4:
1. s/btrfs-dedupe/btrfs-dedupe-inband
2. replace strerror(errno) with %m

 Documentation/btrfs-dedupe-inband.asciidoc | 114 +-
 btrfs-completion   |   6 +-
 cmds-dedupe-ib.c   | 238 +
 ioctl.h|   2 +
 4 files changed, 358 insertions(+), 2 deletions(-)

diff --git a/Documentation/btrfs-dedupe-inband.asciidoc 
b/Documentation/btrfs-dedupe-inband.asciidoc
index 83113f5487e2..d895aafbcf45 100644
--- a/Documentation/btrfs-dedupe-inband.asciidoc
+++ b/Documentation/btrfs-dedupe-inband.asciidoc
@@ -22,7 +22,119 @@ use with caution.
 
 SUBCOMMAND
 --
-Nothing yet
+*enable* [options] ::
+Enable in-band de-duplication for a filesystem.
++
+`Options`
++
+-f|--force
+Force 'enable' command to be exected.
+Will skip memory limit check and allow 'enable' to be executed even in-band
+de-duplication is already enabled.
++
+NOTE: If re-enable dedupe with '-f' option, any unspecified parameter will be
+reset to its default value.
+
+-s|--storage-backend 
+Specify de-duplication hash storage backend.
+Only 'inmemory' backend is supported yet.
+If not specified, default value is 'inmemory'.
++
+Refer to *BACKENDS* sector for more information.
+
+-b|--blocksize 
+Specify dedupe block size.
+Supported values are power of 2 from '16K' to '8M'.
+Default value is '128K'.
++
+Refer to *BLOCKSIZE* sector for more information.
+
+-a|--hash-algorithm 
+Specify hash algorithm.
+Only 'sha256' is supported yet.
+
+-l|--limit-hash 
+Specify maximum number of hashes stored in memory.
+Only works for 'inmemory' backend.
+Conflicts with '-m' option.
++
+Only positive values are valid.
+Default value is '32K'.
+
+-m|--limit-memory 
+Specify maximum memory used for hashes.
+Only works for 'inmemory' backend.
+Conflicts with '-l' option.
++
+Only value larger than or equal to '1024' is valid.
+No default value.
++
+NOTE: Memory limit will be rounded down to kernel internal hash size,
+so the memory limit shown in 'btrfs dedupe-inband status' may be different
+from the .
+
+WARNING: Too large value for '-l' or '-m' will easily trigger OOM.
+Please use with caution according to system memory.
+
+NOTE: In-band de-duplication is not compactible with compression yet.
+And compression has higher priority than in-band de-duplication, means if
+compression and de-duplication is enabled at the same time, only compression
+will work.
+
+BACKENDS
+
+Btrfs in-band de-duplication will support different storage backends, with
+different use case and features.
+
+In-memory backend::
+This backend provides backward-compatibility, and more fine-tuning options.
+But hash pool is non-persistent and may exhaust kernel memory if not setup
+properly.
++
+This backend can be used on old btrfs(without '-O dedupe' mkfs option).
+When used on old btrfs, this backend needs to be enabled manually after mount.
++
+Designed for fast hash search speed, in-memory backend will keep all dedupe
+hashes in memory. (Although overall performance is still much the same with
+'ondisk' backend if all 'ondisk' hash can be cached in memory)
++
+And only keeps limited number of hash in memory to avoid exhausting memory.
+Hashes over the limit will be dropped following Last-Recent-Use behavior.
+So this backend has a consistent overhead for given limit but can\'t ensure
+all duplicated blocks will be de-duplicated.
++
+After umount and mount, in-memory backend need to refill its hash pool.
+
+On-disk backend::
+This backend provides persistent hash pool, with more smart memory management
+for hash pool.
+But it\'s not backward-compatible, meaning it must be used with '-O dedupe' 
mkfs
+option and older kernel can\'t mount it read-write.
++
+Designed for de-duplication rate, hash pool is stored as btrfs B+ tree on disk.
+This behavior may cause extra disk IO for hash search under high memory
+pressure.
++
+After umount and mount, on-disk backend still has its hash on disk, no need to
+refill its dedupe hash pool.
+
+Currently, only 'inmemory' backend is supported in btrfs-progs.
+
+DEDUPE BLOCK SIZE
+
+In-band de-duplication is done at dedupe block size.
+Any data smaller than dedupe block size won\'t go through in-band
+de-duplication.
+
+And dedupe block size affects dedupe rate and fragmentation heavily.
+
+Smaller block size will cause more fragments, but higher dedupe rate.
+
+Larger block size will cause less fragments, but lower dedupe rate.
+
+In-band de-duplication rate is highly related to the workload pattern.
+So it\'s highly recommended to align dedupe block size to the workload
+block size to make full use of de-duplication.
 
 EXIT STATUS
 ---
diff --git a/btrfs-completion b/btrfs-completion
index ae683f4ecf61..cfdf70966e47 100644
--- a/btrfs-completion
+++ 

[v10.4 1/5] btrfs-progs: Basic framework for dedupe-inband command group

2018-07-12 Thread Lu Fengqi
From: Qu Wenruo 

Add basic ioctl header and command group framework for later use.
Alone with basic man page doc.

Signed-off-by: Qu Wenruo 
Signed-off-by: Lu Fengqi 
---
v10.4:
1. s/btrfs-dedupe/btrfs-dedupe-inband in man page
2. use SZ_* instead of the intermediate number

 Documentation/Makefile.in  |  1 +
 Documentation/btrfs-dedupe-inband.asciidoc | 40 ++
 Documentation/btrfs.asciidoc   |  4 +++
 Makefile   |  3 +-
 btrfs.c|  2 ++
 cmds-dedupe-ib.c   | 35 +++
 commands.h |  2 ++
 dedupe-ib.h| 28 +++
 ioctl.h| 36 +++
 9 files changed, 150 insertions(+), 1 deletion(-)
 create mode 100644 Documentation/btrfs-dedupe-inband.asciidoc
 create mode 100644 cmds-dedupe-ib.c
 create mode 100644 dedupe-ib.h

diff --git a/Documentation/Makefile.in b/Documentation/Makefile.in
index 184647c41940..402155fae001 100644
--- a/Documentation/Makefile.in
+++ b/Documentation/Makefile.in
@@ -28,6 +28,7 @@ MAN8_TXT += btrfs-qgroup.asciidoc
 MAN8_TXT += btrfs-replace.asciidoc
 MAN8_TXT += btrfs-restore.asciidoc
 MAN8_TXT += btrfs-property.asciidoc
+MAN8_TXT += btrfs-dedupe-inband.asciidoc
 
 # Category 5 manual page
 MAN5_TXT += btrfs-man5.asciidoc
diff --git a/Documentation/btrfs-dedupe-inband.asciidoc 
b/Documentation/btrfs-dedupe-inband.asciidoc
new file mode 100644
index ..83113f5487e2
--- /dev/null
+++ b/Documentation/btrfs-dedupe-inband.asciidoc
@@ -0,0 +1,40 @@
+btrfs-dedupe-inband(8)
+==
+
+NAME
+
+btrfs-dedupe-inband - manage in-band (write time) de-duplication of a btrfs
+filesystem
+
+SYNOPSIS
+
+*btrfs dedupe-inband*  
+
+DESCRIPTION
+---
+*btrfs dedupe-inband* is used to enable/disable or show current in-band 
de-duplication
+status of a btrfs filesystem.
+
+Kernel support for in-band de-duplication starts from 4.19.
+
+WARNING: In-band de-duplication is still an experimental feautre of btrfs,
+use with caution.
+
+SUBCOMMAND
+--
+Nothing yet
+
+EXIT STATUS
+---
+*btrfs dedupe-inband* returns a zero exit status if it succeeds. Non zero is
+returned in case of failure.
+
+AVAILABILITY
+
+*btrfs* is part of btrfs-progs.
+Please refer to the btrfs wiki http://btrfs.wiki.kernel.org for
+further details.
+
+SEE ALSO
+
+`mkfs.btrfs`(8),
diff --git a/Documentation/btrfs.asciidoc b/Documentation/btrfs.asciidoc
index 7316ac094413..1cf5bddec335 100644
--- a/Documentation/btrfs.asciidoc
+++ b/Documentation/btrfs.asciidoc
@@ -50,6 +50,10 @@ COMMANDS
Do off-line check on a btrfs filesystem. +
See `btrfs-check`(8) for details.
 
+*dedupe-inband*::
+   Control btrfs in-band(write time) de-duplication. +
+   See `btrfs-dedupe-inband`(8) for details.
+
 *device*::
Manage devices managed by btrfs, including add/delete/scan and so
on. +
diff --git a/Makefile b/Makefile
index 544410e6440c..1ebed7135714 100644
--- a/Makefile
+++ b/Makefile
@@ -123,7 +123,8 @@ cmds_objects = cmds-subvolume.o cmds-filesystem.o 
cmds-device.o cmds-scrub.o \
   cmds-restore.o cmds-rescue.o chunk-recover.o super-recover.o \
   cmds-property.o cmds-fi-usage.o cmds-inspect-dump-tree.o \
   cmds-inspect-dump-super.o cmds-inspect-tree-stats.o cmds-fi-du.o 
\
-  mkfs/common.o check/mode-common.o check/mode-lowmem.o
+  mkfs/common.o check/mode-common.o check/mode-lowmem.o \
+  cmds-dedupe-ib.o
 libbtrfs_objects = send-stream.o send-utils.o kernel-lib/rbtree.o btrfs-list.o 
\
   kernel-lib/crc32c.o messages.o \
   uuid-tree.o utils-lib.o rbtree-utils.o
diff --git a/btrfs.c b/btrfs.c
index 2d39f2ced3e8..2168f5a8bc7f 100644
--- a/btrfs.c
+++ b/btrfs.c
@@ -255,6 +255,8 @@ static const struct cmd_group btrfs_cmd_group = {
{ "quota", cmd_quota, NULL, _cmd_group, 0 },
{ "qgroup", cmd_qgroup, NULL, _cmd_group, 0 },
{ "replace", cmd_replace, NULL, _cmd_group, 0 },
+   { "dedupe-inband", cmd_dedupe_ib, NULL, _ib_cmd_group,
+   0 },
{ "help", cmd_help, cmd_help_usage, NULL, 0 },
{ "version", cmd_version, cmd_version_usage, NULL, 0 },
NULL_CMD_STRUCT
diff --git a/cmds-dedupe-ib.c b/cmds-dedupe-ib.c
new file mode 100644
index ..73c923a797da
--- /dev/null
+++ b/cmds-dedupe-ib.c
@@ -0,0 +1,35 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2017 Fujitsu.  All rights reserved.
+ */
+
+#include 
+#include 
+#include 
+
+#include "ctree.h"
+#include "ioctl.h"
+
+#include "commands.h"
+#include "utils.h"
+#include "kerncompat.h"
+#include "dedupe-ib.h"
+
+static const char * const dedupe_ib_cmd_group_usage[] = {
+   

[v10.4 0/5] In-band de-duplication for btrfs-progs

2018-07-12 Thread Lu Fengqi
Patchset can be fetched from github:
https://github.com/littleroad/btrfs-progs.git dedupe_latest

Inband dedupe(in-memory backend only) ioctl support for btrfs-progs.

v7 changes:
   Update ctree.h to follow kernel structure change
   Update print-tree to follow kernel structure change
V8 changes:
   Move dedup props and on-disk backend support out of the patchset
   Change command group name to "dedupe-inband", to avoid confusion with
   possible out-of-band dedupe. Suggested by Mark.
   Rebase to latest devel branch.
V9 changes:
   Follow kernels ioctl change to support FORCE flag, new reconf ioctl,
   and more precious error reporting.
v10 changes:
   Rebase to v4.10.
   Add BUILD_ASSERT for btrfs_ioctl_dedupe_args
v10.1 changes:
   Rebase to v4.14.
v10.2 changes:
   Rebase to v4.16.1.
v10.3 changes:
   Rebase to v4.17.
v10.4 changes:
   Deal with offline reviews from Misono Tomohiro.
   1. s/btrfs-dedupe/btrfs-dedupe-inband
   2. Replace strerror(errno) with %m
   3. Use SZ_* instead of intermedia number
   4. update btrfs-completion for reconfigure subcommand

Qu Wenruo (5):
  btrfs-progs: Basic framework for dedupe-inband command group
  btrfs-progs: dedupe: Add enable command for dedupe command group
  btrfs-progs: dedupe: Add disable support for inband dedupelication
  btrfs-progs: dedupe: Add status subcommand
  btrfs-progs: dedupe: introduce reconfigure subcommand

 Documentation/Makefile.in  |   1 +
 Documentation/btrfs-dedupe-inband.asciidoc | 167 
 Documentation/btrfs.asciidoc   |   4 +
 Makefile   |   3 +-
 btrfs-completion   |   6 +-
 btrfs.c|   2 +
 cmds-dedupe-ib.c   | 437 +
 commands.h |   2 +
 dedupe-ib.h|  28 ++
 ioctl.h|  38 ++
 10 files changed, 686 insertions(+), 2 deletions(-)
 create mode 100644 Documentation/btrfs-dedupe-inband.asciidoc
 create mode 100644 cmds-dedupe-ib.c
 create mode 100644 dedupe-ib.h

-- 
2.18.0



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[v10.4 4/5] btrfs-progs: dedupe: Add status subcommand

2018-07-12 Thread Lu Fengqi
From: Qu Wenruo 

Add status subcommand for dedupe command group.

Signed-off-by: Qu Wenruo 
Signed-off-by: Lu Fengqi 
---
v10.4:
1. s/btrfs-dedupe/btrfs-dedupe-inband
2. replace strerror(errno) with %m

 Documentation/btrfs-dedupe-inband.asciidoc |  3 +
 btrfs-completion   |  2 +-
 cmds-dedupe-ib.c   | 80 ++
 3 files changed, 84 insertions(+), 1 deletion(-)

diff --git a/Documentation/btrfs-dedupe-inband.asciidoc 
b/Documentation/btrfs-dedupe-inband.asciidoc
index 3452f690e3e5..6096389cb0b4 100644
--- a/Documentation/btrfs-dedupe-inband.asciidoc
+++ b/Documentation/btrfs-dedupe-inband.asciidoc
@@ -86,6 +86,9 @@ And compression has higher priority than in-band 
de-duplication, means if
 compression and de-duplication is enabled at the same time, only compression
 will work.
 
+*status* ::
+Show current in-band de-duplication status of a filesystem.
+
 BACKENDS
 
 Btrfs in-band de-duplication will support different storage backends, with
diff --git a/btrfs-completion b/btrfs-completion
index a74a23f42022..62a7bdd4d0d5 100644
--- a/btrfs-completion
+++ b/btrfs-completion
@@ -41,7 +41,7 @@ _btrfs()
commands_quota='enable disable rescan'
commands_qgroup='assign remove create destroy show limit'
commands_replace='start status cancel'
-   commands_dedupe_inband='enable disable'
+   commands_dedupe_inband='enable disable status'
 
if [[ "$cur" == -* && $cword -le 3 && "$cmd" != "help" ]]; then
COMPREPLY=( $( compgen -W '--help' -- "$cur" ) )
diff --git a/cmds-dedupe-ib.c b/cmds-dedupe-ib.c
index 91b6fe234043..e778457e25a8 100644
--- a/cmds-dedupe-ib.c
+++ b/cmds-dedupe-ib.c
@@ -298,12 +298,92 @@ out:
return 0;
 }
 
+static const char * const cmd_dedupe_ib_status_usage[] = {
+   "btrfs dedupe-inband status ",
+   "Show current in-band(write time) de-duplication status of a btrfs.",
+   NULL
+};
+
+static int cmd_dedupe_ib_status(int argc, char **argv)
+{
+   struct btrfs_ioctl_dedupe_args dargs;
+   DIR *dirstream;
+   char *path;
+   int fd;
+   int ret;
+   int print_limit = 1;
+
+   if (check_argc_exact(argc, 2))
+   usage(cmd_dedupe_ib_status_usage);
+
+   path = argv[1];
+   fd = open_file_or_dir(path, );
+   if (fd < 0) {
+   error("failed to open file or directory: %s", path);
+   ret = 1;
+   goto out;
+   }
+   memset(, 0, sizeof(dargs));
+   dargs.cmd = BTRFS_DEDUPE_CTL_STATUS;
+
+   ret = ioctl(fd, BTRFS_IOC_DEDUPE_CTL, );
+   if (ret < 0) {
+   error("failed to get inband deduplication status: %m");
+   ret = 1;
+   goto out;
+   }
+   ret = 0;
+   if (dargs.status == 0) {
+   printf("Status: \t\t\tDisabled\n");
+   goto out;
+   }
+   printf("Status:\t\t\tEnabled\n");
+
+   if (dargs.hash_algo == BTRFS_DEDUPE_HASH_SHA256)
+   printf("Hash algorithm:\t\tSHA-256\n");
+   else
+   printf("Hash algorithm:\t\tUnrecognized(%x)\n",
+   dargs.hash_algo);
+
+   if (dargs.backend == BTRFS_DEDUPE_BACKEND_INMEMORY) {
+   printf("Backend:\t\tIn-memory\n");
+   print_limit = 1;
+   } else  {
+   printf("Backend:\t\tUnrecognized(%x)\n",
+   dargs.backend);
+   }
+
+   printf("Dedup Blocksize:\t%llu\n", dargs.blocksize);
+
+   if (print_limit) {
+   u64 cur_mem;
+
+   /* Limit nr may be 0 */
+   if (dargs.limit_nr)
+   cur_mem = dargs.current_nr * (dargs.limit_mem /
+   dargs.limit_nr);
+   else
+   cur_mem = 0;
+
+   printf("Number of hash: \t[%llu/%llu]\n", dargs.current_nr,
+   dargs.limit_nr);
+   printf("Memory usage: \t\t[%s/%s]\n",
+   pretty_size(cur_mem),
+   pretty_size(dargs.limit_mem));
+   }
+out:
+   close_file_or_dir(fd, dirstream);
+   return ret;
+}
+
 const struct cmd_group dedupe_ib_cmd_group = {
dedupe_ib_cmd_group_usage, dedupe_ib_cmd_group_info, {
{ "enable", cmd_dedupe_ib_enable, cmd_dedupe_ib_enable_usage,
  NULL, 0},
{ "disable", cmd_dedupe_ib_disable, cmd_dedupe_ib_disable_usage,
  NULL, 0},
+   { "status", cmd_dedupe_ib_status, cmd_dedupe_ib_status_usage,
+ NULL, 0},
NULL_CMD_STRUCT
}
 };
-- 
2.18.0



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[v10.4 3/5] btrfs-progs: dedupe: Add disable support for inband dedupelication

2018-07-12 Thread Lu Fengqi
From: Qu Wenruo 

Add disable subcommand for dedupe command group.

Signed-off-by: Qu Wenruo 
Signed-off-by: Lu Fengqi 
---
v10.4:
1. s/btrfs-dedupe/btrfs-dedupe-inband
2. replace strerror(errno) with %m

 Documentation/btrfs-dedupe-inband.asciidoc |  5 +++
 btrfs-completion   |  2 +-
 cmds-dedupe-ib.c   | 41 ++
 3 files changed, 47 insertions(+), 1 deletion(-)

diff --git a/Documentation/btrfs-dedupe-inband.asciidoc 
b/Documentation/btrfs-dedupe-inband.asciidoc
index d895aafbcf45..3452f690e3e5 100644
--- a/Documentation/btrfs-dedupe-inband.asciidoc
+++ b/Documentation/btrfs-dedupe-inband.asciidoc
@@ -22,6 +22,11 @@ use with caution.
 
 SUBCOMMAND
 --
+*disable* ::
+Disable in-band de-duplication for a filesystem.
++
+This will trash all stored dedupe hash.
++
 *enable* [options] ::
 Enable in-band de-duplication for a filesystem.
 +
diff --git a/btrfs-completion b/btrfs-completion
index cfdf70966e47..a74a23f42022 100644
--- a/btrfs-completion
+++ b/btrfs-completion
@@ -41,7 +41,7 @@ _btrfs()
commands_quota='enable disable rescan'
commands_qgroup='assign remove create destroy show limit'
commands_replace='start status cancel'
-   commands_dedupe_inband='enable'
+   commands_dedupe_inband='enable disable'
 
if [[ "$cur" == -* && $cword -le 3 && "$cmd" != "help" ]]; then
COMPREPLY=( $( compgen -W '--help' -- "$cur" ) )
diff --git a/cmds-dedupe-ib.c b/cmds-dedupe-ib.c
index 4d499677d9ae..91b6fe234043 100644
--- a/cmds-dedupe-ib.c
+++ b/cmds-dedupe-ib.c
@@ -259,10 +259,51 @@ out:
return ret;
 }
 
+static const char * const cmd_dedupe_ib_disable_usage[] = {
+   "btrfs dedupe-inband disable ",
+   "Disable in-band(write time) de-duplication of a btrfs.",
+   NULL
+};
+
+static int cmd_dedupe_ib_disable(int argc, char **argv)
+{
+   struct btrfs_ioctl_dedupe_args dargs;
+   DIR *dirstream;
+   char *path;
+   int fd;
+   int ret;
+
+   if (check_argc_exact(argc, 2))
+   usage(cmd_dedupe_ib_disable_usage);
+
+   path = argv[1];
+   fd = open_file_or_dir(path, );
+   if (fd < 0) {
+   error("failed to open file or directory: %s", path);
+   return 1;
+   }
+   memset(, 0, sizeof(dargs));
+   dargs.cmd = BTRFS_DEDUPE_CTL_DISABLE;
+
+   ret = ioctl(fd, BTRFS_IOC_DEDUPE_CTL, );
+   if (ret < 0) {
+   error("failed to disable inband deduplication: %m");
+   ret = 1;
+   goto out;
+   }
+   ret = 0;
+
+out:
+   close_file_or_dir(fd, dirstream);
+   return 0;
+}
+
 const struct cmd_group dedupe_ib_cmd_group = {
dedupe_ib_cmd_group_usage, dedupe_ib_cmd_group_info, {
{ "enable", cmd_dedupe_ib_enable, cmd_dedupe_ib_enable_usage,
  NULL, 0},
+   { "disable", cmd_dedupe_ib_disable, cmd_dedupe_ib_disable_usage,
+ NULL, 0},
NULL_CMD_STRUCT
}
 };
-- 
2.18.0



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[v10.4 5/5] btrfs-progs: dedupe: introduce reconfigure subcommand

2018-07-12 Thread Lu Fengqi
From: Qu Wenruo 

Introduce reconfigure subcommand to co-operate with new kernel ioctl
modification.

Signed-off-by: Qu Wenruo 
Signed-off-by: Lu Fengqi 
---
v10.4:
1. update btrfs-completion for reconfigure
2. replace strerror(errno) with %m

 Documentation/btrfs-dedupe-inband.asciidoc |  7 +++
 btrfs-completion   |  2 +-
 cmds-dedupe-ib.c   | 73 +-
 3 files changed, 66 insertions(+), 16 deletions(-)

diff --git a/Documentation/btrfs-dedupe-inband.asciidoc 
b/Documentation/btrfs-dedupe-inband.asciidoc
index 6096389cb0b4..78c806f772d6 100644
--- a/Documentation/btrfs-dedupe-inband.asciidoc
+++ b/Documentation/btrfs-dedupe-inband.asciidoc
@@ -86,6 +86,13 @@ And compression has higher priority than in-band 
de-duplication, means if
 compression and de-duplication is enabled at the same time, only compression
 will work.
 
+*reconfigure* [options] ::
+Re-configure in-band de-duplication parameters of a filesystem.
++
+In-band de-duplication must be enbaled first before re-configuration.
++
+[Options] are the same with 'btrfs dedupe-inband enable'.
+
 *status* ::
 Show current in-band de-duplication status of a filesystem.
 
diff --git a/btrfs-completion b/btrfs-completion
index 62a7bdd4d0d5..6ff48e4c2f6a 100644
--- a/btrfs-completion
+++ b/btrfs-completion
@@ -41,7 +41,7 @@ _btrfs()
commands_quota='enable disable rescan'
commands_qgroup='assign remove create destroy show limit'
commands_replace='start status cancel'
-   commands_dedupe_inband='enable disable status'
+   commands_dedupe_inband='enable disable status reconfigure'
 
if [[ "$cur" == -* && $cword -le 3 && "$cmd" != "help" ]]; then
COMPREPLY=( $( compgen -W '--help' -- "$cur" ) )
diff --git a/cmds-dedupe-ib.c b/cmds-dedupe-ib.c
index e778457e25a8..e52f939c9ced 100644
--- a/cmds-dedupe-ib.c
+++ b/cmds-dedupe-ib.c
@@ -56,7 +56,6 @@ static const char * const cmd_dedupe_ib_enable_usage[] = {
NULL
 };
 
-
 #define report_fatal_parameter(dargs, old, member, type, err_val, fmt) \
 ({ \
if (dargs->member != old->member && \
@@ -88,6 +87,12 @@ static void report_parameter_error(struct 
btrfs_ioctl_dedupe_args *dargs,
}
report_option_parameter(dargs, old, flags, u8, -1, x);
}
+
+   if (dargs->status == 0 && old->cmd == BTRFS_DEDUPE_CTL_RECONF) {
+   error("must enable dedupe before reconfiguration");
+   return;
+   }
+
if (report_fatal_parameter(dargs, old, cmd, u16, -1, u) ||
report_fatal_parameter(dargs, old, blocksize, u64, -1, llu) ||
report_fatal_parameter(dargs, old, backend, u16, -1, u) ||
@@ -100,14 +105,17 @@ static void report_parameter_error(struct 
btrfs_ioctl_dedupe_args *dargs,
old->limit_nr, old->limit_mem);
 }
 
-static int cmd_dedupe_ib_enable(int argc, char **argv)
+static int enable_reconfig_dedupe(int argc, char **argv, int reconf)
 {
int ret;
int fd = -1;
char *path;
u64 blocksize = BTRFS_DEDUPE_BLOCKSIZE_DEFAULT;
+   int blocksize_set = 0;
u16 hash_algo = BTRFS_DEDUPE_HASH_SHA256;
+   int hash_algo_set = 0;
u16 backend = BTRFS_DEDUPE_BACKEND_INMEMORY;
+   int backend_set = 0;
u64 limit_nr = 0;
u64 limit_mem = 0;
u64 sys_mem = 0;
@@ -134,15 +142,17 @@ static int cmd_dedupe_ib_enable(int argc, char **argv)
break;
switch (c) {
case 's':
-   if (!strcasecmp("inmemory", optarg))
+   if (!strcasecmp("inmemory", optarg)) {
backend = BTRFS_DEDUPE_BACKEND_INMEMORY;
-   else {
+   backend_set = 1;
+   } else {
error("unsupported dedupe backend: %s", optarg);
exit(1);
}
break;
case 'b':
blocksize = parse_size(optarg);
+   blocksize_set = 1;
break;
case 'a':
if (strcmp("sha256", optarg)) {
@@ -224,26 +234,40 @@ static int cmd_dedupe_ib_enable(int argc, char **argv)
return 1;
}
memset(, -1, sizeof(dargs));
-   dargs.cmd = BTRFS_DEDUPE_CTL_ENABLE;
-   dargs.blocksize = blocksize;
-   dargs.hash_algo = hash_algo;
-   dargs.limit_nr = limit_nr;
-   dargs.limit_mem = limit_mem;
-   dargs.backend = backend;
-   if (force)
-   dargs.flags |= BTRFS_DEDUPE_FLAG_FORCE;
-   else
-   dargs.flags = 0;
+   if (reconf) {
+   dargs.cmd = BTRFS_DEDUPE_CTL_RECONF;
+   if (blocksize_set)
+

Re: About hung task on generic/041

2018-07-12 Thread Filipe Manana
On Wed, Jul 11, 2018 at 10:02 AM, Lu Fengqi  wrote:
> Hi,
>
> When I run generic/041 with v4.18-rc3 (turn on kasan and hung task
> detection), btrfs-transaction kthread will trigger the hung task timeout
> (stall at wait_event in btrfs_commit_transaction). At the same time, you
> can see that xfs_io -c fsync will occupy 100% of the CPU. I am not sure
> whether this is a problem. Any suggestion?

Well, something at 100% cpu and that seems hang forever is definitely
a problem, specially a workload as simple as the one in generic/041
(never happened to me, even on vanilla 4.18-rc4).
Do you have the stack trace for the fsync task? What you pasted below
is only for the transaction kthread and that alone doesn't help.

>
> [Wed Jul 11 15:50:08 2018] INFO: task btrfs-transacti:1053 blocked for more 
> than 120 seconds.
> [Wed Jul 11 15:50:08 2018]   Not tainted 4.18.0-rc3-custom #14
> [Wed Jul 11 15:50:08 2018] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
> disables this message.
> [Wed Jul 11 15:50:08 2018] btrfs-transacti D0  1053  2 0x8000
> [Wed Jul 11 15:50:08 2018] Call Trace:
> [Wed Jul 11 15:50:08 2018]  ? __schedule+0x5b2/0x1380
> [Wed Jul 11 15:50:08 2018]  ? check_flags.part.23+0x240/0x240
> [Wed Jul 11 15:50:08 2018]  ? firmware_map_remove+0x187/0x187
> [Wed Jul 11 15:50:08 2018]  ? ___preempt_schedule+0x16/0x18
> [Wed Jul 11 15:50:08 2018]  ? mark_held_locks+0x6e/0x90
> [Wed Jul 11 15:50:08 2018]  ? _raw_spin_unlock_irqrestore+0x59/0x70
> [Wed Jul 11 15:50:08 2018]  ? preempt_count_sub+0x14/0xc0
> [Wed Jul 11 15:50:08 2018]  ? _raw_spin_unlock_irqrestore+0x46/0x70
> [Wed Jul 11 15:50:08 2018]  ? prepare_to_wait_event+0x191/0x410
> [Wed Jul 11 15:50:08 2018]  ? prepare_to_wait_exclusive+0x210/0x210
> [Wed Jul 11 15:50:08 2018]  ? print_usage_bug+0x3a0/0x3a0
> [Wed Jul 11 15:50:08 2018]  ? do_raw_spin_unlock+0x10f/0x1e0
> [Wed Jul 11 15:50:08 2018]  ? do_raw_spin_trylock+0x120/0x120
> [Wed Jul 11 15:50:08 2018]  schedule+0xca/0x260
> [Wed Jul 11 15:50:08 2018]  ? rcu_lockdep_current_cpu_online+0x12b/0x160
> [Wed Jul 11 15:50:08 2018]  ? __schedule+0x1380/0x1380
> [Wed Jul 11 15:50:08 2018]  ? ___might_sleep+0x126/0x370
> [Wed Jul 11 15:50:08 2018]  ? init_wait_entry+0xc7/0x100
> [Wed Jul 11 15:50:08 2018]  ? __wake_up_locked_key_bookmark+0x20/0x20
> [Wed Jul 11 15:50:08 2018]  ? __btrfs_run_delayed_items+0x1e5/0x280 [btrfs]
> [Wed Jul 11 15:50:08 2018]  ? __might_sleep+0x31/0xd0
> [Wed Jul 11 15:50:08 2018]  btrfs_commit_transaction+0x122a/0x1640 [btrfs]
> [Wed Jul 11 15:50:08 2018]  ? btrfs_apply_pending_changes+0x90/0x90 [btrfs]
> [Wed Jul 11 15:50:08 2018]  ? wait_woken+0x150/0x150
> [Wed Jul 11 15:50:08 2018]  ? ret_from_fork+0x27/0x50
> [Wed Jul 11 15:50:08 2018]  ? ret_from_fork+0x27/0x50
> [Wed Jul 11 15:50:08 2018]  ? deref_stack_reg+0xe0/0xe0
> [Wed Jul 11 15:50:08 2018]  ? __module_text_address+0x63/0xa0
> [Wed Jul 11 15:50:08 2018]  ? preempt_count_sub+0x14/0xc0
> [Wed Jul 11 15:50:08 2018]  ? transaction_kthread+0x161/0x240 [btrfs]
> [Wed Jul 11 15:50:08 2018]  ? is_module_text_address+0x2b/0x50
> [Wed Jul 11 15:50:08 2018]  ? transaction_kthread+0x1d9/0x240 [btrfs]
> [Wed Jul 11 15:50:08 2018]  ? kernel_text_address+0x5a/0x100
> [Wed Jul 11 15:50:08 2018]  ? deactivate_slab.isra.27+0x64f/0x7a0
> [Wed Jul 11 15:50:08 2018]  ? __save_stack_trace+0x82/0x100
> [Wed Jul 11 15:50:08 2018]  ? kasan_kmalloc+0x142/0x170
> [Wed Jul 11 15:50:08 2018]  ? kmem_cache_alloc+0xfc/0x2e0
> [Wed Jul 11 15:50:08 2018]  ? start_transaction+0x596/0x930 [btrfs]
> [Wed Jul 11 15:50:08 2018]  ? transaction_kthread+0x1d9/0x240 [btrfs]
> [Wed Jul 11 15:50:08 2018]  ? kthread+0x1b9/0x1e0
> [Wed Jul 11 15:50:08 2018]  ? ret_from_fork+0x27/0x50
> [Wed Jul 11 15:50:08 2018]  ? deactivate_slab.isra.27+0x64f/0x7a0
> [Wed Jul 11 15:50:08 2018]  ? mark_lock+0x149/0xa80
> [Wed Jul 11 15:50:08 2018]  ? init_object+0x6b/0x80
> [Wed Jul 11 15:50:08 2018]  ? print_usage_bug+0x3a0/0x3a0
> [Wed Jul 11 15:50:08 2018]  ? ___slab_alloc+0x62a/0x690
> [Wed Jul 11 15:50:08 2018]  ? ___slab_alloc+0x62a/0x690
> [Wed Jul 11 15:50:08 2018]  ? __lock_is_held+0x8c/0xe0
> [Wed Jul 11 15:50:08 2018]  ? start_transaction+0x596/0x930 [btrfs]
> [Wed Jul 11 15:50:08 2018]  ? preempt_count_sub+0x14/0xc0
> [Wed Jul 11 15:50:08 2018]  ? rcu_lockdep_current_cpu_online+0x12b/0x160
> [Wed Jul 11 15:50:08 2018]  ? rcu_oom_callback+0x40/0x40
> [Wed Jul 11 15:50:08 2018]  ? __lock_is_held+0x8c/0xe0
> [Wed Jul 11 15:50:08 2018]  ? start_transaction+0x596/0x930 [btrfs]
> [Wed Jul 11 15:50:08 2018]  ? rcu_read_lock_sched_held+0x8f/0xa0
> [Wed Jul 11 15:50:08 2018]  ? btrfs_record_root_in_trans+0x1f/0xa0 [btrfs]
> [Wed Jul 11 15:50:08 2018]  ? start_transaction+0x26b/0x930 [btrfs]
> [Wed Jul 11 15:50:08 2018]  ? btrfs_commit_transaction+0x1640/0x1640 [btrfs]
> [Wed Jul 11 15:50:08 2018]  ? check_flags.part.23+0x240/0x240
> [Wed Jul 11 15:50:08 2018]  ? lock_downgrade+0x380/0x380
> [Wed Jul 11 15:50:08 2018]  ? 

[PATCH v2] generic: add test for fsync after cloning file range

2018-07-12 Thread fdmanana
From: Filipe Manana 

Test that if we do a buffered write to a file, fsync it, clone a range
from another file into our file that overlaps the previously written
range, fsync the file again and then power fail, after we mount again the
filesystem, no file data was lost or corrupted.

This test is motivated by a bug found in btrfs, which is fixed by a patch
for the linux kernel titled:

  "Btrfs: fix file data corruption after cloning a range and fsync"

Signed-off-by: Filipe Manana 
---

V2: Fixed title of referenced btrfs patch.

 tests/generic/500 | 70 +++
 tests/generic/500.out |  5 
 tests/generic/group   |  1 +
 3 files changed, 76 insertions(+)
 create mode 100755 tests/generic/500
 create mode 100644 tests/generic/500.out

diff --git a/tests/generic/500 b/tests/generic/500
new file mode 100755
index ..b7baca34
--- /dev/null
+++ b/tests/generic/500
@@ -0,0 +1,70 @@
+#! /bin/bash
+# SPDX-License-Identifier: GPL-2.0
+# Copyright (C) 2018 SUSE Linux Products GmbH. All Rights Reserved.
+#
+# FS QA Test No. 500
+#
+# Test that if we do a buffered write to a file, fsync it, clone a range from
+# another file into our file that overlaps the previously written range, fsync
+# the file again and then power fail, after we mount again the filesystem, no
+# file data was lost or corrupted.
+#
+seq=`basename $0`
+seqres=$RESULT_DIR/$seq
+echo "QA output created by $seq"
+tmp=/tmp/$$
+status=1   # failure is the default!
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+_cleanup()
+{
+   _cleanup_flakey
+   cd /
+   rm -f $tmp.*
+}
+
+# get standard environment, filters and checks
+. ./common/rc
+. ./common/filter
+. ./common/reflink
+. ./common/dmflakey
+
+# real QA test starts here
+_supported_fs generic
+_supported_os Linux
+_require_scratch_reflink
+_require_dm_target flakey
+
+rm -f $seqres.full
+
+_scratch_mkfs >>$seqres.full 2>&1
+_require_metadata_journaling $SCRATCH_DEV
+_init_flakey
+_mount_flakey
+
+$XFS_IO_PROG -f -c "pwrite -S 0x18 9000K 6908K" $SCRATCH_MNT/foo >>$seqres.full
+$XFS_IO_PROG -f -c "pwrite -S 0x20 2572K 156K" $SCRATCH_MNT/bar >>$seqres.full
+
+# We clone from file foo into a range of file bar that overlaps the existing
+# extent at file bar. The destination offset of the reflink operation matches
+# the eof position of file bar minus 4Kb.
+$XFS_IO_PROG -c "fsync" \
+-c "reflink ${SCRATCH_MNT}/foo 0 2724K 15908K" \
+-c "fsync" \
+$SCRATCH_MNT/bar >>$seqres.full
+
+echo "File bar digest before power failure:"
+md5sum $SCRATCH_MNT/bar | _filter_scratch
+
+# Simulate a power failure and mount the filesystem to check that no file data
+# was lost or corrupted.
+_flakey_drop_and_remount
+
+echo "File bar digest after power failure:"
+md5sum $SCRATCH_MNT/bar | _filter_scratch
+
+_unmount_flakey
+_cleanup_flakey
+
+status=0
+exit
diff --git a/tests/generic/500.out b/tests/generic/500.out
new file mode 100644
index ..f590154e
--- /dev/null
+++ b/tests/generic/500.out
@@ -0,0 +1,5 @@
+QA output created by 500
+File bar digest before power failure:
+95a95813a8c2abc9aa75a6c2914a077e  SCRATCH_MNT/bar
+File bar digest after power failure:
+95a95813a8c2abc9aa75a6c2914a077e  SCRATCH_MNT/bar
diff --git a/tests/generic/group b/tests/generic/group
index b2a093f4..a84321dd 100644
--- a/tests/generic/group
+++ b/tests/generic/group
@@ -502,3 +502,4 @@
 497 auto quick swap collapse
 498 auto quick log
 499 auto quick rw collapse zero
+500 auto quick clone log
-- 
2.11.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] Btrfs: fix file data corruption after cloning a range and fsync

2018-07-12 Thread fdmanana
From: Filipe Manana 

When we clone a range into a file we can end up dropping existing
extent maps (or trimming them) and replacing them with new ones if the
range to be cloned overlaps with a range in the destination inode.
When that happens we add the new extent maps to the  list of modified
extents in the inode's extent map tree, so that a "fast" fsync (the flag
BTRFS_INODE_NEEDS_FULL_SYNC not set in the inode) will see the extent maps
and log corresponding extent items. However, at the end of range cloning
operation we do truncate all the pages in the affected range (in order to
ensure future reads will not get stale data). Sometimes this truncation
will release the corresponding extent maps besides the pages from the page
cache. If this happens, then a "fast" fsync operation will miss logging
some extent items, because it relies exclusively on the extent maps being
present in the inode's extent tree, leading to data loss/corruption if
the fsync ends up using the same transaction used by the clone operation
(that transaction was not committed in the meanwhile). An extent map is
released through the callback btrfs_invalidatepage(), which gets called by
truncate_inode_pages_range(), and it calls __btrfs_releasepage(). The
later ends up calling try_release_extent_mapping() which will release the
extent map if some conditions are met, like the file size being greater
than 16Mb, gfp flags allow blocking and the range not being locked (which
is the case during the clone operation) nor being the extent map flagged
as pinned (also the case for cloning).

The following example, turned into a test for fstests, reproduces the
issue:

  $ mkfs.btrfs -f /dev/sdb
  $ mount /dev/sdb /mnt

  $ xfs_io -f -c "pwrite -S 0x18 9000K 6908K" /mnt/foo
  $ xfs_io -f -c "pwrite -S 0x20 2572K 156K" /mnt/bar

  $ xfs_io -c "fsync" /mnt/bar
  # reflink destination offset corresponds to the size of file bar,
  # 2728Kb minus 4Kb.
  $ xfs_io -c ""reflink ${SCRATCH_MNT}/foo 0 2724K 15908K" /mnt/bar
  $ xfs_io -c "fsync" /mnt/bar

  $ md5sum /mnt/bar
  95a95813a8c2abc9aa75a6c2914a077e  /mnt/bar

  

  $ mount /dev/sdb /mnt
  $ md5sum /mnt/bar
  207fd8d0b161be8a84b945f0df8d5f8d  /mnt/bar
  # digest should be 95a95813a8c2abc9aa75a6c2914a077e like before the
  # power failure

In the above example, the destination offset of the clone operation
corresponds to the size of the "bar" file minus 4Kb. So during the clone
operation, the extent map covering the range from 2572Kb to 2728Kb gets
trimmed so that it ends at offset 2724Kb, and a new extent map covering
the range from 2724Kb to 11724Kb is created. So at the end of the clone
operation when we ask to truncate the pages in the range from 2724Kb to
2724Kb + 15908Kb, the page invalidation callback ends up removing the new
extent map (through try_release_extent_mapping()) when the page at offset
2724Kb is passed to that callback.

Fix this by setting the bit BTRFS_INODE_NEEDS_FULL_SYNC whenever an extent
map is removed at try_release_extent_mapping(), forcing the next fsync to
search for modified extents in the fs/subvolume tree instead of relying on
the presence of extent maps in memory. This way we can continue doing a
"fast" fsync if the destination range of a clone operation does not
overlap with an existing range or if any of the criteria necessary to
remove an extent map at try_release_extent_mapping() is not met (file
size not bigger then 16Mb or gfp flags do not allow blocking).

CC: sta...@vger.kernel.org # 3.16+
Signed-off-by: Filipe Manana 
---

V2: Added missing "fix" word to subject only.

 fs/btrfs/extent_io.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index e55843f536bc..b3e45714d28f 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -4238,8 +4238,9 @@ int try_release_extent_mapping(struct page *page, gfp_t 
mask)
struct extent_map *em;
u64 start = page_offset(page);
u64 end = start + PAGE_SIZE - 1;
-   struct extent_io_tree *tree = _I(page->mapping->host)->io_tree;
-   struct extent_map_tree *map = 
_I(page->mapping->host)->extent_tree;
+   struct btrfs_inode *btrfs_inode = BTRFS_I(page->mapping->host);
+   struct extent_io_tree *tree = _inode->io_tree;
+   struct extent_map_tree *map = _inode->extent_tree;
 
if (gfpflags_allow_blocking(mask) &&
page->mapping->host->i_size > SZ_16M) {
@@ -4262,6 +4263,8 @@ int try_release_extent_mapping(struct page *page, gfp_t 
mask)
extent_map_end(em) - 1,
EXTENT_LOCKED | EXTENT_WRITEBACK,
0, NULL)) {
+   set_bit(BTRFS_INODE_NEEDS_FULL_SYNC,
+   _inode->runtime_flags);
remove_extent_mapping(map, em);
/* once for the rb tree */
  

[PATCH] generic: add test for fsync after cloning file range

2018-07-12 Thread fdmanana
From: Filipe Manana 

Test that if we do a buffered write to a file, fsync it, clone a range
from another file into our file that overlaps the previously written
range, fsync the file again and then power fail, after we mount again the
filesystem, no file data was lost or corrupted.

This test is motivated by a bug found in btrfs, which is fixed by a patch
for the linux kernel titled:

  "Btrfs: file data corruption after cloning a range and fsync"

Signed-off-by: Filipe Manana 
---
 tests/generic/500 | 70 +++
 tests/generic/500.out |  5 
 tests/generic/group   |  1 +
 3 files changed, 76 insertions(+)
 create mode 100755 tests/generic/500
 create mode 100644 tests/generic/500.out

diff --git a/tests/generic/500 b/tests/generic/500
new file mode 100755
index ..b7baca34
--- /dev/null
+++ b/tests/generic/500
@@ -0,0 +1,70 @@
+#! /bin/bash
+# SPDX-License-Identifier: GPL-2.0
+# Copyright (C) 2018 SUSE Linux Products GmbH. All Rights Reserved.
+#
+# FS QA Test No. 500
+#
+# Test that if we do a buffered write to a file, fsync it, clone a range from
+# another file into our file that overlaps the previously written range, fsync
+# the file again and then power fail, after we mount again the filesystem, no
+# file data was lost or corrupted.
+#
+seq=`basename $0`
+seqres=$RESULT_DIR/$seq
+echo "QA output created by $seq"
+tmp=/tmp/$$
+status=1   # failure is the default!
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+_cleanup()
+{
+   _cleanup_flakey
+   cd /
+   rm -f $tmp.*
+}
+
+# get standard environment, filters and checks
+. ./common/rc
+. ./common/filter
+. ./common/reflink
+. ./common/dmflakey
+
+# real QA test starts here
+_supported_fs generic
+_supported_os Linux
+_require_scratch_reflink
+_require_dm_target flakey
+
+rm -f $seqres.full
+
+_scratch_mkfs >>$seqres.full 2>&1
+_require_metadata_journaling $SCRATCH_DEV
+_init_flakey
+_mount_flakey
+
+$XFS_IO_PROG -f -c "pwrite -S 0x18 9000K 6908K" $SCRATCH_MNT/foo >>$seqres.full
+$XFS_IO_PROG -f -c "pwrite -S 0x20 2572K 156K" $SCRATCH_MNT/bar >>$seqres.full
+
+# We clone from file foo into a range of file bar that overlaps the existing
+# extent at file bar. The destination offset of the reflink operation matches
+# the eof position of file bar minus 4Kb.
+$XFS_IO_PROG -c "fsync" \
+-c "reflink ${SCRATCH_MNT}/foo 0 2724K 15908K" \
+-c "fsync" \
+$SCRATCH_MNT/bar >>$seqres.full
+
+echo "File bar digest before power failure:"
+md5sum $SCRATCH_MNT/bar | _filter_scratch
+
+# Simulate a power failure and mount the filesystem to check that no file data
+# was lost or corrupted.
+_flakey_drop_and_remount
+
+echo "File bar digest after power failure:"
+md5sum $SCRATCH_MNT/bar | _filter_scratch
+
+_unmount_flakey
+_cleanup_flakey
+
+status=0
+exit
diff --git a/tests/generic/500.out b/tests/generic/500.out
new file mode 100644
index ..f590154e
--- /dev/null
+++ b/tests/generic/500.out
@@ -0,0 +1,5 @@
+QA output created by 500
+File bar digest before power failure:
+95a95813a8c2abc9aa75a6c2914a077e  SCRATCH_MNT/bar
+File bar digest after power failure:
+95a95813a8c2abc9aa75a6c2914a077e  SCRATCH_MNT/bar
diff --git a/tests/generic/group b/tests/generic/group
index b2a093f4..a84321dd 100644
--- a/tests/generic/group
+++ b/tests/generic/group
@@ -502,3 +502,4 @@
 497 auto quick swap collapse
 498 auto quick log
 499 auto quick rw collapse zero
+500 auto quick clone log
-- 
2.11.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: file data corruption after cloning a range and fsync

2018-07-12 Thread fdmanana
From: Filipe Manana 

When we clone a range into a file we can end up dropping existing
extent maps (or trimming them) and replacing them with new ones if the
range to be cloned overlaps with a range in the destination inode.
When that happens we add the new extent maps to the  list of modified
extents in the inode's extent map tree, so that a "fast" fsync (the flag
BTRFS_INODE_NEEDS_FULL_SYNC not set in the inode) will see the extent maps
and log corresponding extent items. However, at the end of range cloning
operation we do truncate all the pages in the affected range (in order to
ensure future reads will not get stale data). Sometimes this truncation
will release the corresponding extent maps besides the pages from the page
cache. If this happens, then a "fast" fsync operation will miss logging
some extent items, because it relies exclusively on the extent maps being
present in the inode's extent tree, leading to data loss/corruption if
the fsync ends up using the same transaction used by the clone operation
(that transaction was not committed in the meanwhile). An extent map is
released through the callback btrfs_invalidatepage(), which gets called by
truncate_inode_pages_range(), and it calls __btrfs_releasepage(). The
later ends up calling try_release_extent_mapping() which will release the
extent map if some conditions are met, like the file size being greater
than 16Mb, gfp flags allow blocking and the range not being locked (which
is the case during the clone operation) nor being the extent map flagged
as pinned (also the case for cloning).

The following example, turned into a test for fstests, reproduces the
issue:

  $ mkfs.btrfs -f /dev/sdb
  $ mount /dev/sdb /mnt

  $ xfs_io -f -c "pwrite -S 0x18 9000K 6908K" /mnt/foo
  $ xfs_io -f -c "pwrite -S 0x20 2572K 156K" /mnt/bar

  $ xfs_io -c "fsync" /mnt/bar
  # reflink destination offset corresponds to the size of file bar,
  # 2728Kb minus 4Kb.
  $ xfs_io -c ""reflink ${SCRATCH_MNT}/foo 0 2724K 15908K" /mnt/bar
  $ xfs_io -c "fsync" /mnt/bar

  $ md5sum /mnt/bar
  95a95813a8c2abc9aa75a6c2914a077e  /mnt/bar

  

  $ mount /dev/sdb /mnt
  $ md5sum /mnt/bar
  207fd8d0b161be8a84b945f0df8d5f8d  /mnt/bar
  # digest should be 95a95813a8c2abc9aa75a6c2914a077e like before the
  # power failure

In the above example, the destination offset of the clone operation
corresponds to the size of the "bar" file minus 4Kb. So during the clone
operation, the extent map covering the range from 2572Kb to 2728Kb gets
trimmed so that it ends at offset 2724Kb, and a new extent map covering
the range from 2724Kb to 11724Kb is created. So at the end of the clone
operation when we ask to truncate the pages in the range from 2724Kb to
2724Kb + 15908Kb, the page invalidation callback ends up removing the new
extent map (through try_release_extent_mapping()) when the page at offset
2724Kb is passed to that callback.

Fix this by setting the bit BTRFS_INODE_NEEDS_FULL_SYNC whenever an extent
map is removed at try_release_extent_mapping(), forcing the next fsync to
search for modified extents in the fs/subvolume tree instead of relying on
the presence of extent maps in memory. This way we can continue doing a
"fast" fsync if the destination range of a clone operation does not
overlap with an existing range or if any of the criteria necessary to
remove an extent map at try_release_extent_mapping() is not met (file
size not bigger then 16Mb or gfp flags do not allow blocking).

CC: sta...@vger.kernel.org # 3.16+
Signed-off-by: Filipe Manana 
---
 fs/btrfs/extent_io.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index e55843f536bc..b3e45714d28f 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -4238,8 +4238,9 @@ int try_release_extent_mapping(struct page *page, gfp_t 
mask)
struct extent_map *em;
u64 start = page_offset(page);
u64 end = start + PAGE_SIZE - 1;
-   struct extent_io_tree *tree = _I(page->mapping->host)->io_tree;
-   struct extent_map_tree *map = 
_I(page->mapping->host)->extent_tree;
+   struct btrfs_inode *btrfs_inode = BTRFS_I(page->mapping->host);
+   struct extent_io_tree *tree = _inode->io_tree;
+   struct extent_map_tree *map = _inode->extent_tree;
 
if (gfpflags_allow_blocking(mask) &&
page->mapping->host->i_size > SZ_16M) {
@@ -4262,6 +4263,8 @@ int try_release_extent_mapping(struct page *page, gfp_t 
mask)
extent_map_end(em) - 1,
EXTENT_LOCKED | EXTENT_WRITEBACK,
0, NULL)) {
+   set_bit(BTRFS_INODE_NEEDS_FULL_SYNC,
+   _inode->runtime_flags);
remove_extent_mapping(map, em);
/* once for the rb tree */
free_extent_map(em);

Re: [PATCH 3/3] btrfs: add helper function check device delete able

2018-07-12 Thread Nikolay Borisov



On 10.07.2018 21:22, Anand Jain wrote:
> Move the section of the code which performs the check if the device is
> indelible, move that into a helper function.
> 
> Signed-off-by: Anand Jain 
> ---
>  fs/btrfs/volumes.c | 49 ++---
>  1 file changed, 30 insertions(+), 19 deletions(-)
> 
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 59a6d8f42c98..feb29c5b44f6 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -1945,6 +1945,33 @@ static inline u64 btrfs_num_devices(struct 
> btrfs_fs_info *fs_info)
>   return num_devices;
>  }
>  
> +static struct btrfs_device *btrfs_device_delete_able(

Ugliest name ever! So this function is not really a predicate, rather
it's used to fetch the struct btrfs_device * to delete. So a more
becoming name would be:

btrfs_get_device_for_delete - though this a bit verbose.

I guess btrfs_can_delete_device is more suitable if you want to follow
this predicate style. At the very least, though, the correct form of the
adjective is deletable so it should be btrfs_device_deletable. But as I
said this function is not really used as a predicate.

> + struct btrfs_fs_info *fs_info,
> + const char *device_path, u64 devid)
> +{
> + int ret;
> + struct btrfs_device *device;
> +
> + ret = btrfs_check_raid_min_devices(fs_info,
> +btrfs_num_devices(fs_info) - 1);
> + if (ret)
> + return ERR_PTR(ret);
> +
> + ret = btrfs_find_device_by_devspec(fs_info, devid, device_path,
> +);

Not really related to this patchset, but I think the whole
btrfs_find_device_by_devspec -> btrfs_find_device_missing_or_by_path
could be simplified by making those functions return a pointer to
btrfs_device rather than an int error value. That way you eliminate the
ugly "argument as return value" convention.

> + if (ret)
> + return ERR_PTR(ret);
> +
> + if (test_bit(BTRFS_DEV_STATE_REPLACE_TGT, >dev_state))
> + return ERR_PTR(BTRFS_ERROR_DEV_TGT_REPLACE);
> +
> + if (test_bit(BTRFS_DEV_STATE_WRITEABLE, >dev_state) &&
> + fs_info->fs_devices->rw_devices == 1)
> + return ERR_PTR(BTRFS_ERROR_DEV_ONLY_WRITABLE);
> +
> + return device;
> +}
> +
>  int btrfs_rm_device(struct btrfs_fs_info *fs_info, const char *device_path,
>   u64 devid)
>  {
> @@ -1958,25 +1985,9 @@ int btrfs_rm_device(struct btrfs_fs_info *fs_info, 
> const char *device_path,
>  
>   mutex_lock(_mutex);
>  
> - num_devices = btrfs_num_devices(fs_info);
> -
> - ret = btrfs_check_raid_min_devices(fs_info, num_devices - 1);
> - if (ret)
> - goto out;
> -
> - ret = btrfs_find_device_by_devspec(fs_info, devid, device_path,
> -);
> - if (ret)
> - goto out;
> -
> - if (test_bit(BTRFS_DEV_STATE_REPLACE_TGT, >dev_state)) {
> - ret = BTRFS_ERROR_DEV_TGT_REPLACE;
> - goto out;
> - }
> -
> - if (test_bit(BTRFS_DEV_STATE_WRITEABLE, >dev_state) &&
> - fs_info->fs_devices->rw_devices == 1) {
> - ret = BTRFS_ERROR_DEV_ONLY_WRITABLE;
> + device = btrfs_device_delete_able(fs_info, device_path, devid);
> + if (IS_ERR(device)) {
> + ret = PTR_ERR(device);
>   goto out;
>   }
>  
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] btrfs: add helper btrfs_num_devices() to deduce num_devices

2018-07-12 Thread Nikolay Borisov



On 10.07.2018 21:22, Anand Jain wrote:
> When the replace is running the fs_devices::num_devices also includes
> the replace device, however in some operations like device delete and
> balance it needs the actual num_devices without the repalce devices, so
> now the function btrfs_num_devices() just provides that.
> 
> Signed-off-by: Anand Jain 
> ---
>  fs/btrfs/volumes.c | 31 +--
>  1 file changed, 17 insertions(+), 14 deletions(-)
> 
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index ce6faeb8bcf8..59a6d8f42c98 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -1931,6 +1931,20 @@ void btrfs_assign_next_active_device(struct 
> btrfs_fs_info *fs_info,
>   fs_info->fs_devices->latest_bdev = next_device->bdev;
>  }
>  

Put a comment above the function saying it returns the number of devices
minus the replace device (in case replace is working) otherwise one have
to go and 'git blame' it

> +static inline u64 btrfs_num_devices(struct btrfs_fs_info *fs_info)
> +{
> + u64 num_devices = fs_info->fs_devices->num_devices;
> +
> + btrfs_dev_replace_read_lock(_info->dev_replace);
> + if (btrfs_dev_replace_is_ongoing(_info->dev_replace)) {
> + WARN_ON(num_devices < 1);
> + num_devices--;
> + }
> + btrfs_dev_replace_read_unlock(_info->dev_replace);
> +
> + return num_devices;
> +}
> +
>  int btrfs_rm_device(struct btrfs_fs_info *fs_info, const char *device_path,
>   u64 devid)
>  {
> @@ -1944,13 +1958,7 @@ int btrfs_rm_device(struct btrfs_fs_info *fs_info, 
> const char *device_path,
>  
>   mutex_lock(_mutex);
>  
> - num_devices = fs_devices->num_devices;
> - btrfs_dev_replace_read_lock(_info->dev_replace);
> - if (btrfs_dev_replace_is_ongoing(_info->dev_replace)) {
> - WARN_ON(num_devices < 1);
> - num_devices--;
> - }
> - btrfs_dev_replace_read_unlock(_info->dev_replace);
> + num_devices = btrfs_num_devices(fs_info);
>  
>   ret = btrfs_check_raid_min_devices(fs_info, num_devices - 1);
>   if (ret)
> @@ -3810,13 +3818,8 @@ int btrfs_balance(struct btrfs_fs_info *fs_info,
>   }
>   }
>  
> - num_devices = fs_info->fs_devices->num_devices;
> - btrfs_dev_replace_read_lock(_info->dev_replace);
> - if (btrfs_dev_replace_is_ongoing(_info->dev_replace)) {
> - WARN_ON(num_devices < 1);
> - num_devices--;
> - }
> - btrfs_dev_replace_read_unlock(_info->dev_replace);
> + num_devices = btrfs_num_devices(fs_info);
> +
>   allowed = BTRFS_AVAIL_ALLOC_BIT_SINGLE | BTRFS_BLOCK_GROUP_DUP;
>   if (num_devices > 1)
>   allowed |= (BTRFS_BLOCK_GROUP_RAID0 | BTRFS_BLOCK_GROUP_RAID1);
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] btrfs: warn for num_devices below 0

2018-07-12 Thread Nikolay Borisov



On 10.07.2018 21:22, Anand Jain wrote:
> In preparation to de-duplicate a section of code where we deduce the
> num_devices, use warn instead of bug.
> 
> Signed-off-by: Anand Jain 
> ---
>  fs/btrfs/volumes.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index eb78bb8d1108..ce6faeb8bcf8 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -3813,7 +3813,7 @@ int btrfs_balance(struct btrfs_fs_info *fs_info,
>   num_devices = fs_info->fs_devices->num_devices;
>   btrfs_dev_replace_read_lock(_info->dev_replace);
>   if (btrfs_dev_replace_is_ongoing(_info->dev_replace)) {
> - BUG_ON(num_devices < 1);
> + WARN_ON(num_devices < 1);

Isn't dev_replace_is_ongoing && num_devices < 1 indeed a logical bug
situation? Under what condition can it happen that you deem "non
critical" ?

>   num_devices--;
>   }
>   btrfs_dev_replace_read_unlock(_info->dev_replace);
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Transaction aborted (error -28) btrfs_run_delayed_refs*0x163/0x190

2018-07-12 Thread Nikolay Borisov



On 12.07.2018 10:04, Peter Chant wrote:
> On 07/12/2018 07:10 AM, Nikolay Borisov wrote:
>>
>>
>> On 10.07.2018 10:04, Pete wrote:
>>> I've just had the error in the subject which caused the file system to
>>> go read-only.
>>>
>>> Further part of error message:
>>> WARNING: CPU: 14 PID: 1351 at fs/btrfs/extent-tree.c:3076
>>> btrfs_run_delayed_refs*0x163/0x190
>>>
>>> 'Screenshot' here:
>>> https://drive.google.com/file/d/1qw7TE1bec8BKcmffrOmg2LS15IOq8Jwc/view?usp=sharing
>>>
>>> The kernel is 4.17.4.  There are three hard drives in the file system.
>>> dmcrypt (luks) is used between btrfs and the disks.
>>>
>>> I'm about to run a scrub.  On reboot the disks mounted fine.
>>
>> Show the output of :
>>
>> btrfs fi usage /path/to/fs
>>
> 
> On the hdd system, which recently went ro again:
> root@phoenix:~# btrfs fi usage /home_data/
> Overall:
> Device size:   9.10TiB
> Device allocated:  6.38TiB
> Device unallocated:2.71TiB
> Device missing:  0.00B
> Used:  5.50TiB
> Free (estimated):  1.80TiB  (min: 1.80TiB)
> Data ratio:   2.00
> Metadata ratio:   2.00
> Global reserve:  512.00MiB  (used: 0.00B)>
> Data,RAID1: Size:3.15TiB, Used:2.71TiB
>/dev/mapper/data_disk_1 1.80TiB
>/dev/mapper/data_disk_2 1.80TiB
>/dev/mapper/data_disk_3 2.70TiB
> 
> Metadata,RAID1: Size:42.00GiB, Used:40.56GiB
>/dev/mapper/data_disk_125.00GiB
>/dev/mapper/data_disk_226.00GiB
>/dev/mapper/data_disk_333.00GiB
> 
> System,RAID1: Size:64.00MiB, Used:480.00KiB
>/dev/mapper/data_disk_164.00MiB
>/dev/mapper/data_disk_232.00MiB
>/dev/mapper/data_disk_332.00MiB
> 
> Unallocated:
>/dev/mapper/data_disk_1   926.46GiB
>/dev/mapper/data_disk_2   925.49GiB
>/dev/mapper/data_disk_3   924.99GiB
> 
> root@phoenix:~#


This one shouldn't have gone RO since it has plenty of unallocated and
free space. What was the workload at the time it went RO? Hard to say,
it's best if you can provide output with the debug patch applied when
this issue re-appears.

> 
> 
> Incidental I've been running out of space on my ssd which contains / and
> /home - which I am sorting.
> 
> root@phoenix:~# btrfs fi usage /
> 
> Overall:
> 
> Device size: 350.00GiB
> 
> Device allocated:350.00GiB
> 
> Device unallocated:1.00MiB
> 
> Device missing:  0.00B
> 
> Used:324.74GiB
> 
> Free (estimated): 23.28GiB  (min: 23.28GiB)
> 
> Data ratio:   1.00
> 
> Metadata ratio:   1.00
> 
> Global reserve:  512.00MiB  (used: 0.00B)
> 
SO this doesn't look healthy, essentially you don't have any unallocated
space on your device. I will suggest you run balance to try and compact
the space in your data groups and hopefully free up some space. As a
first step you can try and run :

btrfs balance start -dusage=0 -musage=0 /

This will try and reclaim any non-used block groups i.e it could bring
some unallocated space back. Then you can run 'btrfs fi us / ' to see if
this is the case. Then I'd suggest you run something like:

'btrfs balance start -dusage=60 -musage=60 /' this will try and compact
all data/metadata chunks which are less than 60% full.

> 
> 
> Data,single: Size:343.00GiB, Used:319.72GiB
> 
>/dev/disk/by-label/desk-system343.00GiB
> 
> Metadata,single: Size:6.97GiB, Used:5.02GiB
>/dev/disk/by-label/desk-system  6.97GiB
> 
> System,single: Size:32.00MiB, Used:64.00KiB
>/dev/disk/by-label/desk-system 32.00MiB
> 
> Unallocated:
>/dev/disk/by-label/desk-system  1.00MiB
> root@phoenix:~#
> 
> Pete
> 
> 
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] btrfs: use customized batch size for total_bytes_pinned

2018-07-12 Thread Nikolay Borisov



On 11.07.2018 18:59, Ethan Lien wrote:
> In commit b150a4f10d878 ("Btrfs: use a percpu to keep track of possibly
> pinned bytes") we use total_bytes_pinned to track how many bytes we are
> going to free in this transaction. When we are close to ENOSPC, we check it
> and know if we can make the allocation by commit the current transaction.
> For every data/metadata extent we are going to free, we add
> total_bytes_pinned in btrfs_free_extent() and btrfs_free_tree_block(), and
> release it in unpin_extent_range() when we finish the transaction. So this
> is a variable we frequently update but rarely read - just the suitable
> use of percpu_counter. But in previous commit we update total_bytes_pinned
> by default 32 batch size, making every update essentially a spin lock
> protected update. Since every spin lock/unlock operation involves syncing
> a globally used variable and some kind of barrier in a SMP system, this is
> more expensive than using total_bytes_pinned as a simple atomic64_t. So
> fix this by using a customized batch size. Since we only read
> total_bytes_pinned when we are close to ENOSPC and fail to alloc new chunk,
> we can use a really large batch size and have nearly no penalty in most
> cases.
> 
> 
> [Test]
> We test the patch on a 4-cores x86 machine:
> 1. falloate a 16GiB size test file.
> 2. take snapshot (so all following writes will be cow write).
> 3. run a 180 sec, 4 jobs, 4K random write fio on test file.
> 
> We also add a temporary lockdep class on percpu_counter's spin lock used
> by total_bytes_pinned to track lock_stat.
> 
> 
> [Results]
> unpatched:
> lock_stat version 0.4
> ---
>   class namecon-bouncescontentions
> waittime-min   waittime-max waittime-total   waittime-avgacq-bounces
> acquisitions   holdtime-min   holdtime-max holdtime-total   holdtime-avg
> 
>total_bytes_pinned_percpu:82 82
> 0.21   0.61  29.46   0.36 298340
>   635973   0.09  11.01  173476.25   0.27
> 
> 
> patched:
> lock_stat version 0.4
> ---
>   class namecon-bouncescontentions
> waittime-min   waittime-max waittime-total   waittime-avgacq-bounces
> acquisitions   holdtime-min   holdtime-max holdtime-total   holdtime-avg
> 
>total_bytes_pinned_percpu: 1  1
> 0.62   0.62   0.62   0.62  13601
>31542   0.14   9.61   11016.90   0.35
> 

According to these numbers though, the best-case (waittime-min and the
average waittime-avg) have actually regressed. So what really saves you
is the fact that the number of time we had to go to the best/average
case is reduced, due to the larger batch. I guess in that regard you are
in the clear. Another pertinent question is did you observe any
significant impact on run times of actual workloads or, say, transaction
commit times or anything like that? I think this case will only be hit
when the filesystem is struggling to satisfy a metadata allocation  has
to cycle through all stages .

> 
> [Analysis]
> Since the spin lock only protect a single in-memory variable, the
> contentions (number of lock acquisitions that had to wait) in both
> unpatched and patched version are low. But when we see acquisitions and
> acq-bounces, we get much lower counts in patched version. Here the most
> important metric is acq-bounces. It means how many times the lock get
> transferred between different cpus, so the patch can really recude

nit: s/recude/reduce - no need to resend i guess david will fix it on
the way into the tree

> cacheline bouncing of spin lock (also the global counter of percpu_counter)
> in a SMP system.
> 
> Fixes: b150a4f10d878 ("Btrfs: use a percpu to keep track of possibly
> pinned bytes")
> 
> Signed-off-by: Ethan Lien 

 Reviewed-by: Nikolay Borisov 

> ---
> 
> V2:
>   Rewrite commit comments.
>   Add lock_stat test.
>   Pull dirty_metadata_bytes out to a separate patch.
> 
>  fs/btrfs/ctree.h   |  1 +
>  fs/btrfs/extent-tree.c | 46 --
>  2 files changed, 32 insertions(+), 15 deletions(-)
> 
> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> index 118346aceea9..df682a521635 100644
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -422,6 +422,7 @@ struct btrfs_space_info {
>* time the transaction commits.
>*/
>   struct percpu_counter total_bytes_pinned;
> + s32 total_bytes_pinned_batch;
>  
>   struct list_head list;
>   /* Protected by the spinlock 'lock'. */
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index 3d9fe58c0080..937113534ef4 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> 

Re: Transaction aborted (error -28) btrfs_run_delayed_refs*0x163/0x190

2018-07-12 Thread Peter Chant
On 07/12/2018 07:10 AM, Nikolay Borisov wrote:
> 
> 
> On 10.07.2018 10:04, Pete wrote:
>> I've just had the error in the subject which caused the file system to
>> go read-only.
>>
>> Further part of error message:
>> WARNING: CPU: 14 PID: 1351 at fs/btrfs/extent-tree.c:3076
>> btrfs_run_delayed_refs*0x163/0x190
>>
>> 'Screenshot' here:
>> https://drive.google.com/file/d/1qw7TE1bec8BKcmffrOmg2LS15IOq8Jwc/view?usp=sharing
>>
>> The kernel is 4.17.4.  There are three hard drives in the file system.
>> dmcrypt (luks) is used between btrfs and the disks.
>>
>> I'm about to run a scrub.  On reboot the disks mounted fine.
> 
> Show the output of :
> 
> btrfs fi usage /path/to/fs
> 

On the hdd system, which recently went ro again:
root@phoenix:~# btrfs fi usage /home_data/
Overall:
Device size:   9.10TiB
Device allocated:  6.38TiB
Device unallocated:2.71TiB
Device missing:  0.00B
Used:  5.50TiB
Free (estimated):  1.80TiB  (min: 1.80TiB)
Data ratio:   2.00
Metadata ratio:   2.00
Global reserve:  512.00MiB  (used: 0.00B)

Data,RAID1: Size:3.15TiB, Used:2.71TiB
   /dev/mapper/data_disk_1 1.80TiB
   /dev/mapper/data_disk_2 1.80TiB
   /dev/mapper/data_disk_3 2.70TiB

Metadata,RAID1: Size:42.00GiB, Used:40.56GiB
   /dev/mapper/data_disk_125.00GiB
   /dev/mapper/data_disk_226.00GiB
   /dev/mapper/data_disk_333.00GiB

System,RAID1: Size:64.00MiB, Used:480.00KiB
   /dev/mapper/data_disk_164.00MiB
   /dev/mapper/data_disk_232.00MiB
   /dev/mapper/data_disk_332.00MiB

Unallocated:
   /dev/mapper/data_disk_1   926.46GiB
   /dev/mapper/data_disk_2   925.49GiB
   /dev/mapper/data_disk_3   924.99GiB

root@phoenix:~#


Incidental I've been running out of space on my ssd which contains / and
/home - which I am sorting.

root@phoenix:~# btrfs fi usage /

Overall:

Device size: 350.00GiB

Device allocated:350.00GiB

Device unallocated:1.00MiB

Device missing:  0.00B

Used:324.74GiB

Free (estimated): 23.28GiB  (min: 23.28GiB)

Data ratio:   1.00

Metadata ratio:   1.00

Global reserve:  512.00MiB  (used: 0.00B)



Data,single: Size:343.00GiB, Used:319.72GiB

   /dev/disk/by-label/desk-system343.00GiB

Metadata,single: Size:6.97GiB, Used:5.02GiB
   /dev/disk/by-label/desk-system  6.97GiB

System,single: Size:32.00MiB, Used:64.00KiB
   /dev/disk/by-label/desk-system 32.00MiB

Unallocated:
   /dev/disk/by-label/desk-system  1.00MiB
root@phoenix:~#

Pete



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: Introduce compile time structure size check

2018-07-12 Thread Qu Wenruo



On 2018年07月12日 14:27, Nikolay Borisov wrote:
> 
> 
> On 12.07.2018 09:19, Qu Wenruo wrote:
>> Introduce a new macro based compile time check for ioctl structures.
>>
>> The new macro is BTRFS_ASSERT_SIZE(), which is mostly copied from
>> VMMDEV_ASSERT_SIZE().
>>
>> Such check is only added to structure pended to power of 2.
>> And exposed one structure, btrfs_ioctl_get_dev_stats() is not aligned
>> well.
>> The misalign is introduced by commit b27f7c0c150f ("btrfs: join DEV_STATS
>> ioctls to one").
> 
> So what's the negative effect if a structure is not aligned to a power of 2?

No real negative effect, until one day one may find the comment is not
correct.

Thanks,
Qu

> 
>>
>> Signed-off-by: Qu Wenruo 
>> ---
>>  include/uapi/linux/btrfs.h | 18 +-
>>  1 file changed, 17 insertions(+), 1 deletion(-)
>>
>> diff --git a/include/uapi/linux/btrfs.h b/include/uapi/linux/btrfs.h
>> index 5ca1d21fc4a7..be1213c10080 100644
>> --- a/include/uapi/linux/btrfs.h
>> +++ b/include/uapi/linux/btrfs.h
>> @@ -22,6 +22,14 @@
>>  #include 
>>  #include 
>>  
>> +/*
>> + * Compile time check on structure size, same method as
>> + * VMMDEV_ASSERT_SIZE() from linux/vbox_vmmdev_types.h, to trigger a 
>> negative
>> + * array size at compile time if size doesn't match.
>> + */
>> +#define BTRFS_ASSERT_SIZE(type, size) \
>> +typedef char type ## _asrt_size[1 - 2 * !!(sizeof(struct type) != 
>> (size))]
>> +
>>  #define BTRFS_IOCTL_MAGIC 0x94
>>  #define BTRFS_VOL_NAME_MAX 255
>>  #define BTRFS_LABEL_SIZE 256
>> @@ -32,6 +40,7 @@ struct btrfs_ioctl_vol_args {
>>  __s64 fd;
>>  char name[BTRFS_PATH_NAME_MAX + 1];
>>  };
>> +BTRFS_ASSERT_SIZE(btrfs_ioctl_vol_args, 4096);
>>  
>>  #define BTRFS_DEVICE_PATH_NAME_MAX  1024
>>  #define BTRFS_SUBVOL_NAME_MAX   4039
>> @@ -171,6 +180,7 @@ struct btrfs_ioctl_scrub_args {
>>  /* pad to 1k */
>>  __u64 unused[(1024-32-sizeof(struct btrfs_scrub_progress))/8];
>>  };
>> +BTRFS_ASSERT_SIZE(btrfs_ioctl_scrub_args, 1024);
>>  
>>  #define BTRFS_IOCTL_DEV_REPLACE_CONT_READING_FROM_SRCDEV_MODE_ALWAYS>> 0
>>  #define BTRFS_IOCTL_DEV_REPLACE_CONT_READING_FROM_SRCDEV_MODE_AVOID 1
>> @@ -223,6 +233,7 @@ struct btrfs_ioctl_dev_info_args {
>>  __u64 unused[379];  /* pad to 4k */
>>  __u8 path[BTRFS_DEVICE_PATH_NAME_MAX];  /* out */
>>  };
>> +BTRFS_ASSERT_SIZE(btrfs_ioctl_dev_info_args, 4096);
>>  
>>  struct btrfs_ioctl_fs_info_args {
>>  __u64 max_id;   /* out */
>> @@ -234,6 +245,7 @@ struct btrfs_ioctl_fs_info_args {
>>  __u32 reserved32;
>>  __u64 reserved[122];/* pad to 1k */
>>  };
>> +BTRFS_ASSERT_SIZE(btrfs_ioctl_fs_info_args, 1024);
>>  
>>  /*
>>   * feature flags
>> @@ -414,6 +426,7 @@ struct btrfs_ioctl_balance_args {
>>  
>>  __u64 unused[72];   /* pad to 1k */
>>  };
>> +BTRFS_ASSERT_SIZE(btrfs_ioctl_balance_args, 1024);
>>  
>>  #define BTRFS_INO_LOOKUP_PATH_MAX 4080
>>  struct btrfs_ioctl_ino_lookup_args {
>> @@ -421,6 +434,7 @@ struct btrfs_ioctl_ino_lookup_args {
>>  __u64 objectid;
>>  char name[BTRFS_INO_LOOKUP_PATH_MAX];
>>  };
>> +BTRFS_ASSERT_SIZE(btrfs_ioctl_ino_lookup_args, 4096);
>>  
>>  #define BTRFS_INO_LOOKUP_USER_PATH_MAX (4080 - BTRFS_VOL_NAME_MAX - 1)
>>  struct btrfs_ioctl_ino_lookup_user_args {
>> @@ -436,6 +450,7 @@ struct btrfs_ioctl_ino_lookup_user_args {
>>   */
>>  char path[BTRFS_INO_LOOKUP_USER_PATH_MAX];
>>  };
>> +BTRFS_ASSERT_SIZE(btrfs_ioctl_ino_lookup_args, 4096);
>>  
>>  /* Search criteria for the btrfs SEARCH ioctl family. */
>>  struct btrfs_ioctl_search_key {
>> @@ -664,8 +679,9 @@ struct btrfs_ioctl_get_dev_stats {
>>  /* out values: */
>>  __u64 values[BTRFS_DEV_STAT_VALUES_MAX];
>>  
>> -__u64 unused[128 - 2 - BTRFS_DEV_STAT_VALUES_MAX]; /* pad to 1k */
>> +__u64 unused[128 - 3 - BTRFS_DEV_STAT_VALUES_MAX]; /* pad to 1k */
>>  };
>> +BTRFS_ASSERT_SIZE(btrfs_ioctl_get_dev_stats, 1024);
>>  
>>  #define BTRFS_QUOTA_CTL_ENABLE  1
>>  #define BTRFS_QUOTA_CTL_DISABLE 2
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Transaction aborted (error -28) btrfs_run_delayed_refs*0x163/0x190

2018-07-12 Thread Nikolay Borisov



On 11.07.2018 23:59, Pete wrote:
> On 07/10/2018 09:38 AM, Martin Raiber wrote:
> 
>> This is probably a known issue. See
>> https://www.spinics.net/lists/linux-btrfs/msg75647.html
>> You could apply the patch in this thread and mount with enospc_debug to
>> confirm it is the same issue.
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
> 
> OK, I've applied the patch, by hand, and hopefully put it in the right
> place.  Need to learn to patch better.
> 
> Booted the (rebuilt with make ; make modules_install syslinux etc)
> kernel with the option enospc_debug for the two btrfs file systems (1st
> entry for each in fstb.  I was not expecting to get the issue to appear
> quickly as it took several days to hit previously.  However, on checking
> I see another error, not sure if it is related, still is in extent-tree.c.
> 
> https://drive.google.com/file/d/1K12MfpWFB1aHSXBga1Rym5terbmHeDfI/view?usp=sharing

This is just a warn_on saying that btrfs didn't manage to reserve space
for a CoW operation on the checksum tree ( so metadata). But did your
machine crash at that time, i.e why don't you paste the text but post
images?

> 
> 
> Pete
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 1/2] btrfs: make fs_devices to be a local variable

2018-07-12 Thread Nikolay Borisov



On 12.07.2018 09:23, Gu Jinxiang wrote:
> fs_devices is always passed to btrfs_scan_one_device which
> overrides it. And in the call stack below fs_devices is passed to
> btrfs_scan_one_device from btrfs_mount_root.
> And in btrfs_mount_root the output fs_devices of this call stack
> is not used.
> btrfs_mount_root
>   -> btrfs_parse_early_options
>   ->btrfs_scan_one_device
> So, there is no necessary to pass fs_devices from btrfs_mount_root,
> use a local variable in btrfs_parse_early_options is enough.
> 
> Signed-off-by: Gu Jinxiang 
> Reviewed-by: Anand Jain 

Reviewed-by: Nikolay Borisov 

> ---
> 
> Changelog:
> v4: changed a line warp, and adjusted the order of two rows.
> v3: rebase to misc-next.
> v2: deal with Nikolay's comment, make changelog more clair.
> 
>  fs/btrfs/super.c | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
> index 39d8e39b2fe1..44f58bdb5fa6 100644
> --- a/fs/btrfs/super.c
> +++ b/fs/btrfs/super.c
> @@ -884,10 +884,11 @@ int btrfs_parse_options(struct btrfs_fs_info *info, 
> char *options,
>   * only when we need to allocate a new super block.
>   */
>  static int btrfs_parse_early_options(const char *options, fmode_t flags,
> - void *holder, struct btrfs_fs_devices **fs_devices)
> +   void *holder)
>  {
>   substring_t args[MAX_OPT_ARGS];
>   char *device_name, *opts, *orig, *p;
> + struct btrfs_fs_devices *fs_devices = NULL;
>   int error = 0;
>  
>   if (!options)
> @@ -916,7 +917,7 @@ static int btrfs_parse_early_options(const char *options, 
> fmode_t flags,
>   goto out;
>   }
>   error = btrfs_scan_one_device(device_name,
> - flags, holder, fs_devices);
> + flags, holder, _devices);
>   kfree(device_name);
>   if (error)
>   goto out;
> @@ -1524,8 +1525,7 @@ static struct dentry *btrfs_mount_root(struct 
> file_system_type *fs_type,
>   if (!(flags & SB_RDONLY))
>   mode |= FMODE_WRITE;
>  
> - error = btrfs_parse_early_options(data, mode, fs_type,
> -   _devices);
> + error = btrfs_parse_early_options(data, mode, fs_type);
>   if (error) {
>   return ERR_PTR(error);
>   }
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 2/2] btrfs: get device pointer from btrfs_scan_one_device

2018-07-12 Thread Nikolay Borisov



On 12.07.2018 09:23, Gu Jinxiang wrote:
> Instead of pointer to btrfs_fs_devices as an arg in
> btrfs_scan_one_device, better to make it as a return value.
> 
> And since btrfs_fs_devices can be get by btrfs_device,
> better to return btrfs_device than fs_btrfs_devices.
> 
> Signed-off-by: Gu Jinxiang 

Reviewed-by: Nikolay Borisov 

> ---
> 
> Changelog:
> v4: as suggested by Anand, change return value of
> btrfs_scan_one_device from btrfs_fs_devices to btrfs_device.
> v3: as comment by robot, use PTR_ERR_OR_ZERO, and rebase to misc-next.
> v2: as comment by Nikolay, use ERR_CAST instead of cast type manually.
> 
>  fs/btrfs/super.c   | 37 -
>  fs/btrfs/volumes.c | 17 ++---
>  fs/btrfs/volumes.h |  4 ++--
>  3 files changed, 32 insertions(+), 26 deletions(-)
> 
> diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
> index 44f58bdb5fa6..e73d547eacff 100644
> --- a/fs/btrfs/super.c
> +++ b/fs/btrfs/super.c
> @@ -888,7 +888,7 @@ static int btrfs_parse_early_options(const char *options, 
> fmode_t flags,
>  {
>   substring_t args[MAX_OPT_ARGS];
>   char *device_name, *opts, *orig, *p;
> - struct btrfs_fs_devices *fs_devices = NULL;
> + struct btrfs_device *device = NULL;
>   int error = 0;
>  
>   if (!options)
> @@ -916,11 +916,13 @@ static int btrfs_parse_early_options(const char 
> *options, fmode_t flags,
>   error = -ENOMEM;
>   goto out;
>   }
> - error = btrfs_scan_one_device(device_name,
> - flags, holder, _devices);
> + device = btrfs_scan_one_device(device_name,
> + flags, holder);
>   kfree(device_name);
> - if (error)
> + if (IS_ERR(device)) {
> + error = PTR_ERR(device);
>   goto out;
> + }
>   }
>   }
>  
> @@ -1516,6 +1518,7 @@ static struct dentry *btrfs_mount_root(struct 
> file_system_type *fs_type,
>  {
>   struct block_device *bdev = NULL;
>   struct super_block *s;
> + struct btrfs_device *device = NULL;
>   struct btrfs_fs_devices *fs_devices = NULL;
>   struct btrfs_fs_info *fs_info = NULL;
>   struct security_mnt_opts new_sec_opts;
> @@ -1537,9 +1540,13 @@ static struct dentry *btrfs_mount_root(struct 
> file_system_type *fs_type,
>   return ERR_PTR(error);
>   }
>  
> - error = btrfs_scan_one_device(device_name, mode, fs_type, _devices);
> - if (error)
> + device = btrfs_scan_one_device(device_name, mode, fs_type);
> + if (IS_ERR(device)) {
> + error = PTR_ERR(device);
>   goto error_sec_opts;
> + }
> +
> + fs_devices = device->fs_devices;
>  
>   /*
>* Setup a dummy root and fs_info for test/set super.  This is because
> @@ -2220,7 +2227,7 @@ static long btrfs_control_ioctl(struct file *file, 
> unsigned int cmd,
>   unsigned long arg)
>  {
>   struct btrfs_ioctl_vol_args *vol;
> - struct btrfs_fs_devices *fs_devices;
> + struct btrfs_device *device = NULL;
>   int ret = -ENOTTY;
>  
>   if (!capable(CAP_SYS_ADMIN))
> @@ -2232,15 +2239,19 @@ static long btrfs_control_ioctl(struct file *file, 
> unsigned int cmd,
>  
>   switch (cmd) {
>   case BTRFS_IOC_SCAN_DEV:
> - ret = btrfs_scan_one_device(vol->name, FMODE_READ,
> - _root_fs_type, _devices);
> + device = btrfs_scan_one_device(vol->name, FMODE_READ,
> + _root_fs_type);
> + ret = PTR_ERR_OR_ZERO(device);
>   break;
>   case BTRFS_IOC_DEVICES_READY:
> - ret = btrfs_scan_one_device(vol->name, FMODE_READ,
> - _root_fs_type, _devices);
> - if (ret)
> + device = btrfs_scan_one_device(vol->name, FMODE_READ,
> + _root_fs_type);
> + if (IS_ERR(device)) {
> + ret = PTR_ERR(device);
>   break;
> - ret = !(fs_devices->num_devices == fs_devices->total_devices);
> + }
> + ret = !(device->fs_devices->num_devices ==
> + device->fs_devices->total_devices);
>   break;
>   case BTRFS_IOC_GET_SUPPORTED_FEATURES:
>   ret = btrfs_ioctl_get_supported_features((void __user*)arg);
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 52abada5c789..3bc479e8cc22 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -1212,14 +1212,13 @@ static int btrfs_read_disk_super(struct block_device 
> *bdev, u64 bytenr,
>   * and we are not allowed to call set_blocksize during the scan. The 

[PATCH v4 2/2] btrfs: get device pointer from btrfs_scan_one_device

2018-07-12 Thread Gu Jinxiang
Instead of pointer to btrfs_fs_devices as an arg in
btrfs_scan_one_device, better to make it as a return value.

And since btrfs_fs_devices can be get by btrfs_device,
better to return btrfs_device than fs_btrfs_devices.

Signed-off-by: Gu Jinxiang 
---

Changelog:
v4: as suggested by Anand, change return value of
btrfs_scan_one_device from btrfs_fs_devices to btrfs_device.
v3: as comment by robot, use PTR_ERR_OR_ZERO, and rebase to misc-next.
v2: as comment by Nikolay, use ERR_CAST instead of cast type manually.

 fs/btrfs/super.c   | 37 -
 fs/btrfs/volumes.c | 17 ++---
 fs/btrfs/volumes.h |  4 ++--
 3 files changed, 32 insertions(+), 26 deletions(-)

diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 44f58bdb5fa6..e73d547eacff 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -888,7 +888,7 @@ static int btrfs_parse_early_options(const char *options, 
fmode_t flags,
 {
substring_t args[MAX_OPT_ARGS];
char *device_name, *opts, *orig, *p;
-   struct btrfs_fs_devices *fs_devices = NULL;
+   struct btrfs_device *device = NULL;
int error = 0;
 
if (!options)
@@ -916,11 +916,13 @@ static int btrfs_parse_early_options(const char *options, 
fmode_t flags,
error = -ENOMEM;
goto out;
}
-   error = btrfs_scan_one_device(device_name,
-   flags, holder, _devices);
+   device = btrfs_scan_one_device(device_name,
+   flags, holder);
kfree(device_name);
-   if (error)
+   if (IS_ERR(device)) {
+   error = PTR_ERR(device);
goto out;
+   }
}
}
 
@@ -1516,6 +1518,7 @@ static struct dentry *btrfs_mount_root(struct 
file_system_type *fs_type,
 {
struct block_device *bdev = NULL;
struct super_block *s;
+   struct btrfs_device *device = NULL;
struct btrfs_fs_devices *fs_devices = NULL;
struct btrfs_fs_info *fs_info = NULL;
struct security_mnt_opts new_sec_opts;
@@ -1537,9 +1540,13 @@ static struct dentry *btrfs_mount_root(struct 
file_system_type *fs_type,
return ERR_PTR(error);
}
 
-   error = btrfs_scan_one_device(device_name, mode, fs_type, _devices);
-   if (error)
+   device = btrfs_scan_one_device(device_name, mode, fs_type);
+   if (IS_ERR(device)) {
+   error = PTR_ERR(device);
goto error_sec_opts;
+   }
+
+   fs_devices = device->fs_devices;
 
/*
 * Setup a dummy root and fs_info for test/set super.  This is because
@@ -2220,7 +2227,7 @@ static long btrfs_control_ioctl(struct file *file, 
unsigned int cmd,
unsigned long arg)
 {
struct btrfs_ioctl_vol_args *vol;
-   struct btrfs_fs_devices *fs_devices;
+   struct btrfs_device *device = NULL;
int ret = -ENOTTY;
 
if (!capable(CAP_SYS_ADMIN))
@@ -2232,15 +2239,19 @@ static long btrfs_control_ioctl(struct file *file, 
unsigned int cmd,
 
switch (cmd) {
case BTRFS_IOC_SCAN_DEV:
-   ret = btrfs_scan_one_device(vol->name, FMODE_READ,
-   _root_fs_type, _devices);
+   device = btrfs_scan_one_device(vol->name, FMODE_READ,
+   _root_fs_type);
+   ret = PTR_ERR_OR_ZERO(device);
break;
case BTRFS_IOC_DEVICES_READY:
-   ret = btrfs_scan_one_device(vol->name, FMODE_READ,
-   _root_fs_type, _devices);
-   if (ret)
+   device = btrfs_scan_one_device(vol->name, FMODE_READ,
+   _root_fs_type);
+   if (IS_ERR(device)) {
+   ret = PTR_ERR(device);
break;
-   ret = !(fs_devices->num_devices == fs_devices->total_devices);
+   }
+   ret = !(device->fs_devices->num_devices ==
+   device->fs_devices->total_devices);
break;
case BTRFS_IOC_GET_SUPPORTED_FEATURES:
ret = btrfs_ioctl_get_supported_features((void __user*)arg);
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 52abada5c789..3bc479e8cc22 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1212,14 +1212,13 @@ static int btrfs_read_disk_super(struct block_device 
*bdev, u64 bytenr,
  * and we are not allowed to call set_blocksize during the scan. The superblock
  * is read via pagecache
  */
-int btrfs_scan_one_device(const char *path, fmode_t flags, void *holder,
- struct btrfs_fs_devices 

[PATCH v4 1/2] btrfs: make fs_devices to be a local variable

2018-07-12 Thread Gu Jinxiang
fs_devices is always passed to btrfs_scan_one_device which
overrides it. And in the call stack below fs_devices is passed to
btrfs_scan_one_device from btrfs_mount_root.
And in btrfs_mount_root the output fs_devices of this call stack
is not used.
btrfs_mount_root
-> btrfs_parse_early_options
->btrfs_scan_one_device
So, there is no necessary to pass fs_devices from btrfs_mount_root,
use a local variable in btrfs_parse_early_options is enough.

Signed-off-by: Gu Jinxiang 
Reviewed-by: Anand Jain 
---

Changelog:
v4: changed a line warp, and adjusted the order of two rows.
v3: rebase to misc-next.
v2: deal with Nikolay's comment, make changelog more clair.

 fs/btrfs/super.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 39d8e39b2fe1..44f58bdb5fa6 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -884,10 +884,11 @@ int btrfs_parse_options(struct btrfs_fs_info *info, char 
*options,
  * only when we need to allocate a new super block.
  */
 static int btrfs_parse_early_options(const char *options, fmode_t flags,
-   void *holder, struct btrfs_fs_devices **fs_devices)
+ void *holder)
 {
substring_t args[MAX_OPT_ARGS];
char *device_name, *opts, *orig, *p;
+   struct btrfs_fs_devices *fs_devices = NULL;
int error = 0;
 
if (!options)
@@ -916,7 +917,7 @@ static int btrfs_parse_early_options(const char *options, 
fmode_t flags,
goto out;
}
error = btrfs_scan_one_device(device_name,
-   flags, holder, fs_devices);
+   flags, holder, _devices);
kfree(device_name);
if (error)
goto out;
@@ -1524,8 +1525,7 @@ static struct dentry *btrfs_mount_root(struct 
file_system_type *fs_type,
if (!(flags & SB_RDONLY))
mode |= FMODE_WRITE;
 
-   error = btrfs_parse_early_options(data, mode, fs_type,
- _devices);
+   error = btrfs_parse_early_options(data, mode, fs_type);
if (error) {
return ERR_PTR(error);
}
-- 
2.17.1



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: Introduce compile time structure size check

2018-07-12 Thread Nikolay Borisov



On 12.07.2018 09:19, Qu Wenruo wrote:
> Introduce a new macro based compile time check for ioctl structures.
> 
> The new macro is BTRFS_ASSERT_SIZE(), which is mostly copied from
> VMMDEV_ASSERT_SIZE().
> 
> Such check is only added to structure pended to power of 2.
> And exposed one structure, btrfs_ioctl_get_dev_stats() is not aligned
> well.
> The misalign is introduced by commit b27f7c0c150f ("btrfs: join DEV_STATS
> ioctls to one").

So what's the negative effect if a structure is not aligned to a power of 2?

> 
> Signed-off-by: Qu Wenruo 
> ---
>  include/uapi/linux/btrfs.h | 18 +-
>  1 file changed, 17 insertions(+), 1 deletion(-)
> 
> diff --git a/include/uapi/linux/btrfs.h b/include/uapi/linux/btrfs.h
> index 5ca1d21fc4a7..be1213c10080 100644
> --- a/include/uapi/linux/btrfs.h
> +++ b/include/uapi/linux/btrfs.h
> @@ -22,6 +22,14 @@
>  #include 
>  #include 
>  
> +/*
> + * Compile time check on structure size, same method as
> + * VMMDEV_ASSERT_SIZE() from linux/vbox_vmmdev_types.h, to trigger a negative
> + * array size at compile time if size doesn't match.
> + */
> +#define BTRFS_ASSERT_SIZE(type, size) \
> + typedef char type ## _asrt_size[1 - 2 * !!(sizeof(struct type) != 
> (size))]
> +
>  #define BTRFS_IOCTL_MAGIC 0x94
>  #define BTRFS_VOL_NAME_MAX 255
>  #define BTRFS_LABEL_SIZE 256
> @@ -32,6 +40,7 @@ struct btrfs_ioctl_vol_args {
>   __s64 fd;
>   char name[BTRFS_PATH_NAME_MAX + 1];
>  };
> +BTRFS_ASSERT_SIZE(btrfs_ioctl_vol_args, 4096);
>  
>  #define BTRFS_DEVICE_PATH_NAME_MAX   1024
>  #define BTRFS_SUBVOL_NAME_MAX4039
> @@ -171,6 +180,7 @@ struct btrfs_ioctl_scrub_args {
>   /* pad to 1k */
>   __u64 unused[(1024-32-sizeof(struct btrfs_scrub_progress))/8];
>  };
> +BTRFS_ASSERT_SIZE(btrfs_ioctl_scrub_args, 1024);
>  
>  #define BTRFS_IOCTL_DEV_REPLACE_CONT_READING_FROM_SRCDEV_MODE_ALWAYS 0
>  #define BTRFS_IOCTL_DEV_REPLACE_CONT_READING_FROM_SRCDEV_MODE_AVOID  1
> @@ -223,6 +233,7 @@ struct btrfs_ioctl_dev_info_args {
>   __u64 unused[379];  /* pad to 4k */
>   __u8 path[BTRFS_DEVICE_PATH_NAME_MAX];  /* out */
>  };
> +BTRFS_ASSERT_SIZE(btrfs_ioctl_dev_info_args, 4096);
>  
>  struct btrfs_ioctl_fs_info_args {
>   __u64 max_id;   /* out */
> @@ -234,6 +245,7 @@ struct btrfs_ioctl_fs_info_args {
>   __u32 reserved32;
>   __u64 reserved[122];/* pad to 1k */
>  };
> +BTRFS_ASSERT_SIZE(btrfs_ioctl_fs_info_args, 1024);
>  
>  /*
>   * feature flags
> @@ -414,6 +426,7 @@ struct btrfs_ioctl_balance_args {
>  
>   __u64 unused[72];   /* pad to 1k */
>  };
> +BTRFS_ASSERT_SIZE(btrfs_ioctl_balance_args, 1024);
>  
>  #define BTRFS_INO_LOOKUP_PATH_MAX 4080
>  struct btrfs_ioctl_ino_lookup_args {
> @@ -421,6 +434,7 @@ struct btrfs_ioctl_ino_lookup_args {
>   __u64 objectid;
>   char name[BTRFS_INO_LOOKUP_PATH_MAX];
>  };
> +BTRFS_ASSERT_SIZE(btrfs_ioctl_ino_lookup_args, 4096);
>  
>  #define BTRFS_INO_LOOKUP_USER_PATH_MAX (4080 - BTRFS_VOL_NAME_MAX - 1)
>  struct btrfs_ioctl_ino_lookup_user_args {
> @@ -436,6 +450,7 @@ struct btrfs_ioctl_ino_lookup_user_args {
>*/
>   char path[BTRFS_INO_LOOKUP_USER_PATH_MAX];
>  };
> +BTRFS_ASSERT_SIZE(btrfs_ioctl_ino_lookup_args, 4096);
>  
>  /* Search criteria for the btrfs SEARCH ioctl family. */
>  struct btrfs_ioctl_search_key {
> @@ -664,8 +679,9 @@ struct btrfs_ioctl_get_dev_stats {
>   /* out values: */
>   __u64 values[BTRFS_DEV_STAT_VALUES_MAX];
>  
> - __u64 unused[128 - 2 - BTRFS_DEV_STAT_VALUES_MAX]; /* pad to 1k */
> + __u64 unused[128 - 3 - BTRFS_DEV_STAT_VALUES_MAX]; /* pad to 1k */
>  };
> +BTRFS_ASSERT_SIZE(btrfs_ioctl_get_dev_stats, 1024);
>  
>  #define BTRFS_QUOTA_CTL_ENABLE   1
>  #define BTRFS_QUOTA_CTL_DISABLE  2
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs: Introduce compile time structure size check

2018-07-12 Thread Qu Wenruo
Introduce a new macro based compile time check for ioctl structures.

The new macro is BTRFS_ASSERT_SIZE(), which is mostly copied from
VMMDEV_ASSERT_SIZE().

Such check is only added to structure pended to power of 2.
And exposed one structure, btrfs_ioctl_get_dev_stats() is not aligned
well.
The misalign is introduced by commit b27f7c0c150f ("btrfs: join DEV_STATS
ioctls to one").

Signed-off-by: Qu Wenruo 
---
 include/uapi/linux/btrfs.h | 18 +-
 1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/btrfs.h b/include/uapi/linux/btrfs.h
index 5ca1d21fc4a7..be1213c10080 100644
--- a/include/uapi/linux/btrfs.h
+++ b/include/uapi/linux/btrfs.h
@@ -22,6 +22,14 @@
 #include 
 #include 
 
+/*
+ * Compile time check on structure size, same method as
+ * VMMDEV_ASSERT_SIZE() from linux/vbox_vmmdev_types.h, to trigger a negative
+ * array size at compile time if size doesn't match.
+ */
+#define BTRFS_ASSERT_SIZE(type, size) \
+   typedef char type ## _asrt_size[1 - 2 * !!(sizeof(struct type) != 
(size))]
+
 #define BTRFS_IOCTL_MAGIC 0x94
 #define BTRFS_VOL_NAME_MAX 255
 #define BTRFS_LABEL_SIZE 256
@@ -32,6 +40,7 @@ struct btrfs_ioctl_vol_args {
__s64 fd;
char name[BTRFS_PATH_NAME_MAX + 1];
 };
+BTRFS_ASSERT_SIZE(btrfs_ioctl_vol_args, 4096);
 
 #define BTRFS_DEVICE_PATH_NAME_MAX 1024
 #define BTRFS_SUBVOL_NAME_MAX  4039
@@ -171,6 +180,7 @@ struct btrfs_ioctl_scrub_args {
/* pad to 1k */
__u64 unused[(1024-32-sizeof(struct btrfs_scrub_progress))/8];
 };
+BTRFS_ASSERT_SIZE(btrfs_ioctl_scrub_args, 1024);
 
 #define BTRFS_IOCTL_DEV_REPLACE_CONT_READING_FROM_SRCDEV_MODE_ALWAYS   0
 #define BTRFS_IOCTL_DEV_REPLACE_CONT_READING_FROM_SRCDEV_MODE_AVOID1
@@ -223,6 +233,7 @@ struct btrfs_ioctl_dev_info_args {
__u64 unused[379];  /* pad to 4k */
__u8 path[BTRFS_DEVICE_PATH_NAME_MAX];  /* out */
 };
+BTRFS_ASSERT_SIZE(btrfs_ioctl_dev_info_args, 4096);
 
 struct btrfs_ioctl_fs_info_args {
__u64 max_id;   /* out */
@@ -234,6 +245,7 @@ struct btrfs_ioctl_fs_info_args {
__u32 reserved32;
__u64 reserved[122];/* pad to 1k */
 };
+BTRFS_ASSERT_SIZE(btrfs_ioctl_fs_info_args, 1024);
 
 /*
  * feature flags
@@ -414,6 +426,7 @@ struct btrfs_ioctl_balance_args {
 
__u64 unused[72];   /* pad to 1k */
 };
+BTRFS_ASSERT_SIZE(btrfs_ioctl_balance_args, 1024);
 
 #define BTRFS_INO_LOOKUP_PATH_MAX 4080
 struct btrfs_ioctl_ino_lookup_args {
@@ -421,6 +434,7 @@ struct btrfs_ioctl_ino_lookup_args {
__u64 objectid;
char name[BTRFS_INO_LOOKUP_PATH_MAX];
 };
+BTRFS_ASSERT_SIZE(btrfs_ioctl_ino_lookup_args, 4096);
 
 #define BTRFS_INO_LOOKUP_USER_PATH_MAX (4080 - BTRFS_VOL_NAME_MAX - 1)
 struct btrfs_ioctl_ino_lookup_user_args {
@@ -436,6 +450,7 @@ struct btrfs_ioctl_ino_lookup_user_args {
 */
char path[BTRFS_INO_LOOKUP_USER_PATH_MAX];
 };
+BTRFS_ASSERT_SIZE(btrfs_ioctl_ino_lookup_args, 4096);
 
 /* Search criteria for the btrfs SEARCH ioctl family. */
 struct btrfs_ioctl_search_key {
@@ -664,8 +679,9 @@ struct btrfs_ioctl_get_dev_stats {
/* out values: */
__u64 values[BTRFS_DEV_STAT_VALUES_MAX];
 
-   __u64 unused[128 - 2 - BTRFS_DEV_STAT_VALUES_MAX]; /* pad to 1k */
+   __u64 unused[128 - 3 - BTRFS_DEV_STAT_VALUES_MAX]; /* pad to 1k */
 };
+BTRFS_ASSERT_SIZE(btrfs_ioctl_get_dev_stats, 1024);
 
 #define BTRFS_QUOTA_CTL_ENABLE 1
 #define BTRFS_QUOTA_CTL_DISABLE2
-- 
2.18.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Transaction aborted (error -28) btrfs_run_delayed_refs*0x163/0x190

2018-07-12 Thread Nikolay Borisov



On 10.07.2018 10:04, Pete wrote:
> I've just had the error in the subject which caused the file system to
> go read-only.
> 
> Further part of error message:
> WARNING: CPU: 14 PID: 1351 at fs/btrfs/extent-tree.c:3076
> btrfs_run_delayed_refs*0x163/0x190
> 
> 'Screenshot' here:
> https://drive.google.com/file/d/1qw7TE1bec8BKcmffrOmg2LS15IOq8Jwc/view?usp=sharing
> 
> The kernel is 4.17.4.  There are three hard drives in the file system.
> dmcrypt (luks) is used between btrfs and the disks.
> 
> I'm about to run a scrub.  On reboot the disks mounted fine.

Show the output of :

btrfs fi usage /path/to/fs


> 
> Pete
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html