linux 4.1 - memory leak (possibly dedup related)

2015-07-02 Thread Marcel Ritter
Hi,

I've been running some btrfs tests (mainly duperemove related) with
linux kernel 4.1 for the last few days.

Now I noticed by accident (dying processes), that all my memory (128
GB!) is gone.
"Gone" meaning, there's no user space process allocating this memory.

Digging deeper I found the missing memory using slabtop (see output of
/proc/slabinfo is attached): Looks like I got a lot of kernel memory
allocated by kmalloc-1024 (memory leak?).
Given the fact that the test machine does little more than btrfs
testing I think this may be btrfs related.

I was running duperemove on a 1.5 TB volume around the time the first
"Out of memory" error were logged, so maybe the memory leak can be
found somewhere in this code path.

I'm still waiting for a scrub run to finish, after that I'll reboot
the machine and try to reproduce this behaviour with a fresh btrfs
filesystem.

Have there been any fixes concerning memory leaks since 4.1 release I could try?
Any other ideas how to track down this potential memory leak?

Bye,
   Marcel
slabinfo - version: 2.1
# name : tunables: slabdata   
btrfs_delayed_data_ref   2394   2982 96   421 : tunables000 : slabdata 71 71  0
btrfs_delayed_tree_ref   3726   4600 88   461 : tunables000 : slabdata100100  0
btrfs_delayed_ref_head   1900   2100160   251 : tunables000 : slabdata 84 84  0
btrfs_delayed_node   1561   1794304   262 : tunables000 : slabdata 69 69  0
btrfs_ordered_extent   1140   1558424   384 : tunables000 : slabdata 41 41  0
bio-2   1600   1625320   252 : tunables000 : slabdata 65 65  0
btrfs_extent_buffer688   1276280   292 : tunables000 : slabdata 44 44  0
btrfs_extent_state   4539   4590 80   511 : tunables000 : slabdata 90 90  0
btrfs_delalloc_work  0  0152   261 : tunables000 : slabdata  0  0  0
btrfs_transaction176176360   222 : tunables000 : slabdata  8  8  0
btrfs_trans_handle184184176   231 : tunables000 : slabdata  8  8  0
btrfs_inode 1636   6388984   338 : tunables000 : slabdata205205  0
nfs4_layout_stateid  0  0240   342 : tunables000 : slabdata  0  0  0
nfsd4_delegations  0  0224   362 : tunables000 : slabdata  0  0  0
nfsd4_files0  0288   282 : tunables000 : slabdata  0  0  0
nfsd4_openowners   0  0440   374 : tunables000 : slabdata  0  0  0
nfs_direct_cache   0  0352   232 : tunables000 : slabdata  0  0  0
nfs_commit_data   23 23704   234 : tunables000 : slabdata  1  1  0
nfs_inode_cache0  0   1000   328 : tunables000 : slabdata  0  0  0
rpc_inode_cache   50 50640   254 : tunables000 : slabdata  2  2  0
fscache_cookie_jar 46 46 88   461 : tunables000 : slabdata  1  1  0
ext3_inode_cache 160160808   204 : tunables000 : slabdata  8  8  0
journal_handle  1360   1360 24  1701 : tunables000 : slabdata  8  8  0
ext4_groupinfo_4k   3887   6636144   281 : tunables000 : slabdata237237  0
ip6-frags  0  0216   372 : tunables000 : slabdata  0  0  0
UDPLITEv6  0  0   1088   308 : tunables000 : slabdata  0  0  0
UDPv6240240   1088   308 : tunables000 : slabdata  8  8  0
tw_sock_TCPv6 58 58280   292 : tunables000 : slabdata  2  2  0
TCPv6112112   2240   148 : tunables000 : slabdata  8  8  0
kcopyd_job 0  0   331298 : tunables000 : slabdata  0  0  0
dm_uevent  0  0   2632   128 : tunables000 : slabdata  0  0  0
cfq_queue  0  0232   352 : tunables000 : slabdata  0  0  0
bsg_cmd0  0312   262 : tunables000 : slabdata  0  0  0
mqueue_inode_cache 36 36896   368 : tunables000 : slabdata  1  1  0
fuse_request   0  0400   202 : tunables000 : slabdata  0  0  0
fuse_inode 0  0704   234 : tunables000 : slabdata  0  0  0
ecry

Re: btrfs full, but not full, can't rebalance

2015-07-02 Thread Donald Pearson
what does the fi df , or btrfs fi usage show now

On Fri, Jul 3, 2015 at 1:03 AM, Rich Rauenzahn  wrote:
> Yes -- I just figured that out as well!
>
> Now why did it suddenly fill up?   (I still get the failure rebalancing ...)
>
> # btrfs fi show /
> Label: 'centos7'  uuid: 35f0ce3f-0902-47a3-8ad8-86179d1f3e3a
> Total devices 4 FS bytes used 17.12GiB
> devid1 size 111.11GiB used 111.05GiB path /dev/sdf3
> devid2 size 111.11GiB used 111.05GiB path /dev/sdg3
> devid3 size 5.00GiB used 5.00GiB path /dev/loop1
> devid4 size 5.00GiB used 5.00GiB path /dev/loop2
>
> Btrfs v3.16.2
>
>
> On Thu, Jul 2, 2015 at 11:01 PM, Donald Pearson
>  wrote:
>> Because this is raid1 I believe you need another for that to work.
>>
>> On Fri, Jul 3, 2015 at 12:57 AM, Rich Rauenzahn  wrote:
>>> Yes, I tried that -- and adding the loopback device.
>>>
>>> # btrfs device add /dev/loop1 /
>>> Performing full device TRIM (5.00GiB) ...
>>>
>>> # btrfs fi show /
>>> Label: 'centos7'  uuid: 35f0ce3f-0902-47a3-8ad8-86179d1f3e3a
>>> Total devices 3 FS bytes used 17.13GiB
>>> devid1 size 111.11GiB used 111.10GiB path /dev/sdf3
>>> devid2 size 111.11GiB used 111.10GiB path /dev/sdg3
>>> devid3 size 5.00GiB used 0.00 path /dev/loop1
>>>
>>> Btrfs v3.16.2
>>>
>>> # btrfs balance start -m /
>>> ERROR: error during balancing '/' - No space left on device
>>> There may be more info in syslog - try dmesg | tail
>>>
>>> # btrfs balance start -v -dusage=1 /
>>> Dumping filters: flags 0x1, state 0x0, force is off
>>>   DATA (flags 0x2): balancing, usage=1
>>> ERROR: error during balancing '/' - No space left on device
>>> There may be more info in syslog - try dmesg | tail
>>>
>>>
>>>
>>> On Thu, Jul 2, 2015 at 10:45 PM, Donald Pearson
>>>  wrote:
 Have you seen this article?

 I think the interesting part for you is the "balance cannot run
 because the filesystem is full" heading.

 http://marc.merlins.org/perso/btrfs/post_2014-05-04_Fixing-Btrfs-Filesystem-Full-Problems.html

 On Fri, Jul 3, 2015 at 12:32 AM, Rich Rauenzahn  wrote:
> Running on CentOS7 ... / got full, I removed the files, but it still
> thinks it is full.  I've tried following the FAQ, even adding a
> loopback device during the rebalance.
>
> # btrfs fi show /
> Label: 'centos7'  uuid: 35f0ce3f-0902-47a3-8ad8-86179d1f3e3a
> Total devices 2 FS bytes used 24.27GiB
> devid1 size 111.11GiB used 111.05GiB path /dev/sdf3
> devid2 size 111.11GiB used 111.05GiB path /dev/sdg3
>
> # btrfs fi df /
> Data, RAID1: total=107.02GiB, used=22.12GiB
> System, RAID1: total=32.00MiB, used=16.00KiB
> Metadata, RAID1: total=4.05GiB, used=2.15GiB
> GlobalReserve, single: total=512.00MiB, used=0.00
>
> # btrfs balance start -v -dusage=1 /
> Dumping filters: flags 0x1, state 0x0, force is off
>   DATA (flags 0x2): balancing, usage=1
> ERROR: error during balancing '/' - No space left on device
> There may be more info in syslog - try dmesg | tail
>
> # btrfs balance start -m /
> ERROR: error during balancing '/' - No space left on device
> There may be more info in syslog - try dmesg | tail
>
> What can I do?
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs full, but not full, can't rebalance

2015-07-02 Thread Rich Rauenzahn
Yes -- I just figured that out as well!

Now why did it suddenly fill up?   (I still get the failure rebalancing ...)

# btrfs fi show /
Label: 'centos7'  uuid: 35f0ce3f-0902-47a3-8ad8-86179d1f3e3a
Total devices 4 FS bytes used 17.12GiB
devid1 size 111.11GiB used 111.05GiB path /dev/sdf3
devid2 size 111.11GiB used 111.05GiB path /dev/sdg3
devid3 size 5.00GiB used 5.00GiB path /dev/loop1
devid4 size 5.00GiB used 5.00GiB path /dev/loop2

Btrfs v3.16.2


On Thu, Jul 2, 2015 at 11:01 PM, Donald Pearson
 wrote:
> Because this is raid1 I believe you need another for that to work.
>
> On Fri, Jul 3, 2015 at 12:57 AM, Rich Rauenzahn  wrote:
>> Yes, I tried that -- and adding the loopback device.
>>
>> # btrfs device add /dev/loop1 /
>> Performing full device TRIM (5.00GiB) ...
>>
>> # btrfs fi show /
>> Label: 'centos7'  uuid: 35f0ce3f-0902-47a3-8ad8-86179d1f3e3a
>> Total devices 3 FS bytes used 17.13GiB
>> devid1 size 111.11GiB used 111.10GiB path /dev/sdf3
>> devid2 size 111.11GiB used 111.10GiB path /dev/sdg3
>> devid3 size 5.00GiB used 0.00 path /dev/loop1
>>
>> Btrfs v3.16.2
>>
>> # btrfs balance start -m /
>> ERROR: error during balancing '/' - No space left on device
>> There may be more info in syslog - try dmesg | tail
>>
>> # btrfs balance start -v -dusage=1 /
>> Dumping filters: flags 0x1, state 0x0, force is off
>>   DATA (flags 0x2): balancing, usage=1
>> ERROR: error during balancing '/' - No space left on device
>> There may be more info in syslog - try dmesg | tail
>>
>>
>>
>> On Thu, Jul 2, 2015 at 10:45 PM, Donald Pearson
>>  wrote:
>>> Have you seen this article?
>>>
>>> I think the interesting part for you is the "balance cannot run
>>> because the filesystem is full" heading.
>>>
>>> http://marc.merlins.org/perso/btrfs/post_2014-05-04_Fixing-Btrfs-Filesystem-Full-Problems.html
>>>
>>> On Fri, Jul 3, 2015 at 12:32 AM, Rich Rauenzahn  wrote:
 Running on CentOS7 ... / got full, I removed the files, but it still
 thinks it is full.  I've tried following the FAQ, even adding a
 loopback device during the rebalance.

 # btrfs fi show /
 Label: 'centos7'  uuid: 35f0ce3f-0902-47a3-8ad8-86179d1f3e3a
 Total devices 2 FS bytes used 24.27GiB
 devid1 size 111.11GiB used 111.05GiB path /dev/sdf3
 devid2 size 111.11GiB used 111.05GiB path /dev/sdg3

 # btrfs fi df /
 Data, RAID1: total=107.02GiB, used=22.12GiB
 System, RAID1: total=32.00MiB, used=16.00KiB
 Metadata, RAID1: total=4.05GiB, used=2.15GiB
 GlobalReserve, single: total=512.00MiB, used=0.00

 # btrfs balance start -v -dusage=1 /
 Dumping filters: flags 0x1, state 0x0, force is off
   DATA (flags 0x2): balancing, usage=1
 ERROR: error during balancing '/' - No space left on device
 There may be more info in syslog - try dmesg | tail

 # btrfs balance start -m /
 ERROR: error during balancing '/' - No space left on device
 There may be more info in syslog - try dmesg | tail

 What can I do?
 --
 To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs full, but not full, can't rebalance

2015-07-02 Thread Donald Pearson
Because this is raid1 I believe you need another for that to work.

On Fri, Jul 3, 2015 at 12:57 AM, Rich Rauenzahn  wrote:
> Yes, I tried that -- and adding the loopback device.
>
> # btrfs device add /dev/loop1 /
> Performing full device TRIM (5.00GiB) ...
>
> # btrfs fi show /
> Label: 'centos7'  uuid: 35f0ce3f-0902-47a3-8ad8-86179d1f3e3a
> Total devices 3 FS bytes used 17.13GiB
> devid1 size 111.11GiB used 111.10GiB path /dev/sdf3
> devid2 size 111.11GiB used 111.10GiB path /dev/sdg3
> devid3 size 5.00GiB used 0.00 path /dev/loop1
>
> Btrfs v3.16.2
>
> # btrfs balance start -m /
> ERROR: error during balancing '/' - No space left on device
> There may be more info in syslog - try dmesg | tail
>
> # btrfs balance start -v -dusage=1 /
> Dumping filters: flags 0x1, state 0x0, force is off
>   DATA (flags 0x2): balancing, usage=1
> ERROR: error during balancing '/' - No space left on device
> There may be more info in syslog - try dmesg | tail
>
>
>
> On Thu, Jul 2, 2015 at 10:45 PM, Donald Pearson
>  wrote:
>> Have you seen this article?
>>
>> I think the interesting part for you is the "balance cannot run
>> because the filesystem is full" heading.
>>
>> http://marc.merlins.org/perso/btrfs/post_2014-05-04_Fixing-Btrfs-Filesystem-Full-Problems.html
>>
>> On Fri, Jul 3, 2015 at 12:32 AM, Rich Rauenzahn  wrote:
>>> Running on CentOS7 ... / got full, I removed the files, but it still
>>> thinks it is full.  I've tried following the FAQ, even adding a
>>> loopback device during the rebalance.
>>>
>>> # btrfs fi show /
>>> Label: 'centos7'  uuid: 35f0ce3f-0902-47a3-8ad8-86179d1f3e3a
>>> Total devices 2 FS bytes used 24.27GiB
>>> devid1 size 111.11GiB used 111.05GiB path /dev/sdf3
>>> devid2 size 111.11GiB used 111.05GiB path /dev/sdg3
>>>
>>> # btrfs fi df /
>>> Data, RAID1: total=107.02GiB, used=22.12GiB
>>> System, RAID1: total=32.00MiB, used=16.00KiB
>>> Metadata, RAID1: total=4.05GiB, used=2.15GiB
>>> GlobalReserve, single: total=512.00MiB, used=0.00
>>>
>>> # btrfs balance start -v -dusage=1 /
>>> Dumping filters: flags 0x1, state 0x0, force is off
>>>   DATA (flags 0x2): balancing, usage=1
>>> ERROR: error during balancing '/' - No space left on device
>>> There may be more info in syslog - try dmesg | tail
>>>
>>> # btrfs balance start -m /
>>> ERROR: error during balancing '/' - No space left on device
>>> There may be more info in syslog - try dmesg | tail
>>>
>>> What can I do?
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>>> the body of a message to majord...@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs full, but not full, can't rebalance

2015-07-02 Thread Rich Rauenzahn
Yes, I tried that -- and adding the loopback device.

# btrfs device add /dev/loop1 /
Performing full device TRIM (5.00GiB) ...

# btrfs fi show /
Label: 'centos7'  uuid: 35f0ce3f-0902-47a3-8ad8-86179d1f3e3a
Total devices 3 FS bytes used 17.13GiB
devid1 size 111.11GiB used 111.10GiB path /dev/sdf3
devid2 size 111.11GiB used 111.10GiB path /dev/sdg3
devid3 size 5.00GiB used 0.00 path /dev/loop1

Btrfs v3.16.2

# btrfs balance start -m /
ERROR: error during balancing '/' - No space left on device
There may be more info in syslog - try dmesg | tail

# btrfs balance start -v -dusage=1 /
Dumping filters: flags 0x1, state 0x0, force is off
  DATA (flags 0x2): balancing, usage=1
ERROR: error during balancing '/' - No space left on device
There may be more info in syslog - try dmesg | tail



On Thu, Jul 2, 2015 at 10:45 PM, Donald Pearson
 wrote:
> Have you seen this article?
>
> I think the interesting part for you is the "balance cannot run
> because the filesystem is full" heading.
>
> http://marc.merlins.org/perso/btrfs/post_2014-05-04_Fixing-Btrfs-Filesystem-Full-Problems.html
>
> On Fri, Jul 3, 2015 at 12:32 AM, Rich Rauenzahn  wrote:
>> Running on CentOS7 ... / got full, I removed the files, but it still
>> thinks it is full.  I've tried following the FAQ, even adding a
>> loopback device during the rebalance.
>>
>> # btrfs fi show /
>> Label: 'centos7'  uuid: 35f0ce3f-0902-47a3-8ad8-86179d1f3e3a
>> Total devices 2 FS bytes used 24.27GiB
>> devid1 size 111.11GiB used 111.05GiB path /dev/sdf3
>> devid2 size 111.11GiB used 111.05GiB path /dev/sdg3
>>
>> # btrfs fi df /
>> Data, RAID1: total=107.02GiB, used=22.12GiB
>> System, RAID1: total=32.00MiB, used=16.00KiB
>> Metadata, RAID1: total=4.05GiB, used=2.15GiB
>> GlobalReserve, single: total=512.00MiB, used=0.00
>>
>> # btrfs balance start -v -dusage=1 /
>> Dumping filters: flags 0x1, state 0x0, force is off
>>   DATA (flags 0x2): balancing, usage=1
>> ERROR: error during balancing '/' - No space left on device
>> There may be more info in syslog - try dmesg | tail
>>
>> # btrfs balance start -m /
>> ERROR: error during balancing '/' - No space left on device
>> There may be more info in syslog - try dmesg | tail
>>
>> What can I do?
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs full, but not full, can't rebalance

2015-07-02 Thread Donald Pearson
Have you seen this article?

I think the interesting part for you is the "balance cannot run
because the filesystem is full" heading.

http://marc.merlins.org/perso/btrfs/post_2014-05-04_Fixing-Btrfs-Filesystem-Full-Problems.html

On Fri, Jul 3, 2015 at 12:32 AM, Rich Rauenzahn  wrote:
> Running on CentOS7 ... / got full, I removed the files, but it still
> thinks it is full.  I've tried following the FAQ, even adding a
> loopback device during the rebalance.
>
> # btrfs fi show /
> Label: 'centos7'  uuid: 35f0ce3f-0902-47a3-8ad8-86179d1f3e3a
> Total devices 2 FS bytes used 24.27GiB
> devid1 size 111.11GiB used 111.05GiB path /dev/sdf3
> devid2 size 111.11GiB used 111.05GiB path /dev/sdg3
>
> # btrfs fi df /
> Data, RAID1: total=107.02GiB, used=22.12GiB
> System, RAID1: total=32.00MiB, used=16.00KiB
> Metadata, RAID1: total=4.05GiB, used=2.15GiB
> GlobalReserve, single: total=512.00MiB, used=0.00
>
> # btrfs balance start -v -dusage=1 /
> Dumping filters: flags 0x1, state 0x0, force is off
>   DATA (flags 0x2): balancing, usage=1
> ERROR: error during balancing '/' - No space left on device
> There may be more info in syslog - try dmesg | tail
>
> # btrfs balance start -m /
> ERROR: error during balancing '/' - No space left on device
> There may be more info in syslog - try dmesg | tail
>
> What can I do?
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


btrfs full, but not full, can't rebalance

2015-07-02 Thread Rich Rauenzahn
Running on CentOS7 ... / got full, I removed the files, but it still
thinks it is full.  I've tried following the FAQ, even adding a
loopback device during the rebalance.

# btrfs fi show /
Label: 'centos7'  uuid: 35f0ce3f-0902-47a3-8ad8-86179d1f3e3a
Total devices 2 FS bytes used 24.27GiB
devid1 size 111.11GiB used 111.05GiB path /dev/sdf3
devid2 size 111.11GiB used 111.05GiB path /dev/sdg3

# btrfs fi df /
Data, RAID1: total=107.02GiB, used=22.12GiB
System, RAID1: total=32.00MiB, used=16.00KiB
Metadata, RAID1: total=4.05GiB, used=2.15GiB
GlobalReserve, single: total=512.00MiB, used=0.00

# btrfs balance start -v -dusage=1 /
Dumping filters: flags 0x1, state 0x0, force is off
  DATA (flags 0x2): balancing, usage=1
ERROR: error during balancing '/' - No space left on device
There may be more info in syslog - try dmesg | tail

# btrfs balance start -m /
ERROR: error during balancing '/' - No space left on device
There may be more info in syslog - try dmesg | tail

What can I do?
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs: remove empty header file extent-tree.h

2015-07-02 Thread Qu Wenruo
The empty file is introduced as an careless 'git add', remove it.

Reported-by: David Sterba 
Signed-off-by: Qu Wenruo 
---
 fs/btrfs/extent-tree.h | 0
 1 file changed, 0 insertions(+), 0 deletions(-)
 delete mode 100644 fs/btrfs/extent-tree.h

diff --git a/fs/btrfs/extent-tree.h b/fs/btrfs/extent-tree.h
deleted file mode 100644
index e69de29..000
-- 
2.4.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 11/18] btrfs: qgroup: Add new qgroup calculation function btrfs_qgroup_account_extents().

2015-07-02 Thread Qu Wenruo



David Sterba wrote on 2015/07/02 16:43 +0200:

On Wed, Apr 29, 2015 at 10:29:04AM +0800, Qu Wenruo wrote:

The new btrfs_qgroup_account_extents() function should be called in
btrfs_commit_transaction() and it will update all the qgroup according
to delayed_ref_root->dirty_extent_root.

The new function can handle both normal operation during
commit_transaction() or in rescan in a unified method with clearer
logic.

Signed-off-by: Qu Wenruo 
---
v2:
   Also free ulist even quota is disabled to avoid possible memory leak.
---
  fs/btrfs/extent-tree.h |   0


Surprise, this adds an empty file. What was the intention?

My fault, it seems I opened the non-existing file and saved it...
Do I need to submit a patch to remove it?

Thanks,
Qu

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4] btrfs: Fix data checksum error cause by replace with io-load.

2015-07-02 Thread Qu Wenruo



Chris Mason wrote on 2015/07/02 08:42 -0400:

On Tue, Jun 30, 2015 at 10:26:18AM +0800, Qu Wenruo wrote:

To Chris:

Would you consider merging these patchset for late 4.2 merge window?
If it's OK to merge it into 4.2 late rc, we'll start our test and send pull
request after our test, eta this Friday or next Monday.

I know normally we should submit it early especially when such fix is not
small.
But the bug is long-standing and is quite annoying (possibility involved),
also Zhao Lei has quite a good idea to cleanup the scrub codes based on the
patchset.

So it would be quite nice if we have any chance to merge it into 4.2

Would it be OK for you?



I can still take these patches in a later RC, but with this set applied,
I'm getting this during xfstests (btrfs/073 and btrfs/066):

We also found that problem.
In our investigation, this seems to be a mistake when rebasing the patchset.

We will resend the patchset for review when we fix them all.
And only after that, we will send a pull request.

Thanks,
Qu


[11185.853152] [ cut here ]
[11185.862659] WARNING: CPU: 7 PID: 580363 at fs/btrfs/extent-tree.c:9460 
btrfs_create_pending_block_groups+0x161/0x1f0 [btrfs]()
[11185.885804] Modules linked in: dm_flakey btrfs raid6_pq zlib_deflate 
lzo_compress xor xfs exportfs libcrc32c tcp_diag inet_diag nfsv4 fuse loop 
k10temp coretemp hwmon ip6table_filter ip6_tables xt_NFLOG nfnetlink_log 
nfnetlink xt_comment xt_statistic iptable_filter ip_tables x_tables mptctl 
nfsv3 nfs lockd grace netconsole autofs4 rpcsec_gss_krb5 auth_rpcgss 
oid_registry sunrpc ipv6 ext3 jbd iTCO_wdt iTCO_vendor_support pcspkr rtc_cmos 
ipmi_si ipmi_msghandler i2c_i801 i2c_core lpc_ich mfd_core shpchp ehci_pci 
ehci_hcd mlx4_en ptp pps_core mlx4_core sg ses enclosure button dm_mod 
megaraid_sas
[11185.994338] CPU: 7 PID: 580363 Comm: btrfs Tainted: GW   
4.1.0-rc6-mason+ #82
[11186.011074] Hardware name: ZTSYSTEMS Echo Ridge T4  /A9DRPF-10D, BIOS 1.07 
05/10/2012
[11186.027107]  24f4 880894b539c8 816c48c5 
a06dbe8d
[11186.042442]  880894b53a18 880894b53a08 8105ba75 
881053ad4000
[11186.057769]  88085417c9a8 88085417c800 8806335a8858 
ffe5
[11186.073128] Call Trace:
[11186.078233]  [] dump_stack+0x4f/0x6a
[11186.088713]  [] warn_slowpath_common+0x95/0xe0
[11186.100926]  [] warn_slowpath_fmt+0x46/0x70
[11186.112627]  [] 
btrfs_create_pending_block_groups+0x161/0x1f0 [btrfs]
[11186.129039]  [] __btrfs_end_transaction+0xac/0x400 [btrfs]
[11186.143325]  [] btrfs_end_transaction+0x10/0x20 [btrfs]
[11186.157092]  [] btrfs_inc_block_group_ro+0x116/0x1d0 
[btrfs]
[11186.171941]  [] scrub_enumerate_chunks+0x2f2/0x5e0 [btrfs]
[11186.186233]  [] ? ttwu_stat+0x4d/0x250
[11186.197045]  [] ? bit_waitqueue+0x80/0xa0
[11186.208392]  [] ? trace_hardirqs_on+0xd/0x10
[11186.220271]  [] btrfs_scrub_dev+0x1c6/0x5d0 [btrfs]
[11186.233350]  [] ? __mnt_want_write_file+0x29/0x30
[11186.246083]  [] btrfs_ioctl_scrub+0xb1/0x130 [btrfs]
[11186.259348]  [] btrfs_ioctl+0xa68/0x11d0 [btrfs]
[11186.271908]  [] do_vfs_ioctl+0x8f/0x580
[11186.282903]  [] ? __fget+0x110/0x200
[11186.293377]  [] ? get_close_on_exec+0x180/0x180
[11186.305764]  [] ? __fget_light+0x2a/0x90
[11186.316928]  [] SyS_ioctl+0xa1/0xb0
[11186.327233]  [] ? __audit_syscall_entry+0xac/0x110
[11186.340133]  [] system_call_fastpath+0x12/0x6f
[11186.352346] ---[ end trace 720cebad3201fcad ]---
[11186.361785] BTRFS: error (device sdi) in 
btrfs_create_pending_block_groups:9460: errno=-27 unknown^M

I've removed them for now.

-chris


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Any hope of pool recovery?

2015-07-02 Thread Donald Pearson
Yes it works with raid6 as well.

[root@san01 btrfs-progs]# ./btrfs fi show
Label: 'rockstor_rockstor'  uuid: 08d14b6f-18df-4b1b-a91e-4b33e7c90c29
Total devices 1 FS bytes used 19.25GiB
devid1 size 457.40GiB used 457.40GiB path /dev/sdt3

warning, device 4 is missing
warning, device 2 is missing
warning devid 2 not found already
warning devid 4 not found already
checksum verify failed on 21364736 found 925303CE wanted 09150E74
checksum verify failed on 21364736 found 925303CE wanted 09150E74
bytenr mismatch, want=21364736, have=1065943040
Couldn't read chunk tree
Label: 'backup'  uuid: 68be4632-93ba-4478-9098-2ecb23ee6c94
Total devices 5 FS bytes used 978.72MiB
devid1 size 2.73TiB used 1.62GiB path /dev/sdb
devid3 size 2.73TiB used 1.62GiB path /dev/sdi
devid4 size 2.73TiB used 1.62GiB path /dev/sdj
devid5 size 2.73TiB used 1.62GiB path /dev/sdn
devid6 size 2.73TiB used 1.62GiB path /dev/sdq

Label: 'tank'  uuid: 8a03f8e8-8b84-4d1b-b27d-e23ef8ebe21d
Total devices 10 FS bytes used 5.67TiB
devid1 size 1.36TiB used 792.67GiB path /dev/sdc
devid3 size 1.82TiB used 792.65GiB path /dev/sdf
devid5 size 1.36TiB used 792.65GiB path /dev/sdg
devid6 size 1.36TiB used 792.65GiB path /dev/sdh
devid7 size 1.82TiB used 792.65GiB path /dev/sdk
devid8 size 1.82TiB used 792.65GiB path /dev/sdl
devid9 size 1.82TiB used 792.65GiB path /dev/sdm
devid   10 size 1.82TiB used 792.65GiB path /dev/sdp
*** Some devices missing

btrfs-progs v4.1
[root@san01 btrfs-progs]# mount /dev/sdb /mnt2/backup
[root@san01 btrfs-progs]# ./btrfs fi df /mnt2/backup
Data, RAID6: total=3.00GiB, used=977.56MiB
System, RAID6: total=96.00MiB, used=16.00KiB
Metadata, RAID6: total=1.03GiB, used=1.14MiB
GlobalReserve, single: total=16.00MiB, used=0.00B
[root@san01 btrfs-progs]# ll /mnt2/backup
total 100
-rw-r--r-- 1 root root 102400 Jul  2 13:48 test_file_1gb
[root@san01 btrfs-progs]# umount /mnt2/backup
[root@san01 btrfs-progs]# ./btrfs restore -xmv /dev/sdb ~
Restoring /root/test_file_1gb
Done searching
[root@san01 btrfs-progs]# ll ~
total 104
-rw---. 1 root root   1101 Jun 20 23:18 anaconda-ks.cfg
drwxr-xr-x  1 root root 22 Jul  1 09:48 git
-rw-r--r--  1 root root 102400 Jul  2 13:48 test_file_1gb
[root@san01 btrfs-progs]#

On Thu, Jul 2, 2015 at 1:45 PM, Donald Pearson
 wrote:
> That is correct.  I'm going to rebalance my raid5 pool as raid6 and
> re-test just because.
>
> On Thu, Jul 2, 2015 at 1:37 PM, Chris Murphy  wrote:
>> On Thu, Jul 2, 2015 at 12:32 PM, Donald Pearson
>>  wrote:
>>> I think it is.  I have another raid5 pool that I've created to test
>>> the restore function on, and it worked.
>>
>> So you have all devices for this raid6 available, and yet when you use
>> restore, you get missing device message for all devices except the one
>> specified? But that doesn't happen with the raid5 volume?
>>
>>
>> --
>> Chris Murphy
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Any hope of pool recovery?

2015-07-02 Thread Donald Pearson
That is correct.  I'm going to rebalance my raid5 pool as raid6 and
re-test just because.

On Thu, Jul 2, 2015 at 1:37 PM, Chris Murphy  wrote:
> On Thu, Jul 2, 2015 at 12:32 PM, Donald Pearson
>  wrote:
>> I think it is.  I have another raid5 pool that I've created to test
>> the restore function on, and it worked.
>
> So you have all devices for this raid6 available, and yet when you use
> restore, you get missing device message for all devices except the one
> specified? But that doesn't happen with the raid5 volume?
>
>
> --
> Chris Murphy
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Any hope of pool recovery?

2015-07-02 Thread Chris Murphy
On Thu, Jul 2, 2015 at 12:32 PM, Donald Pearson
 wrote:
> I think it is.  I have another raid5 pool that I've created to test
> the restore function on, and it worked.

So you have all devices for this raid6 available, and yet when you use
restore, you get missing device message for all devices except the one
specified? But that doesn't happen with the raid5 volume?


-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Any hope of pool recovery?

2015-07-02 Thread Donald Pearson
I think it is.  I have another raid5 pool that I've created to test
the restore function on, and it worked.

On Thu, Jul 2, 2015 at 1:26 PM, Chris Murphy  wrote:
> On Thu, Jul 2, 2015 at 12:19 PM, Donald Pearson
>  wrote:
>> Unfortunately btrfs image fails with "couldn't read chunk tree".
>>
>> btrfs restore complains that every device is missing except the one
>> that you specify on executing the command.  Multiple devices as a
>> parameter isn't an option.  Specifcy /dev/disk/by-uuid/ claims
>> that all devices are missing.
>
> Sounds like restore isn't raid56 aware yet?
>
>
>
> --
> Chris Murphy
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Any hope of pool recovery?

2015-07-02 Thread Chris Murphy
On Thu, Jul 2, 2015 at 12:19 PM, Donald Pearson
 wrote:
> Unfortunately btrfs image fails with "couldn't read chunk tree".
>
> btrfs restore complains that every device is missing except the one
> that you specify on executing the command.  Multiple devices as a
> parameter isn't an option.  Specifcy /dev/disk/by-uuid/ claims
> that all devices are missing.

Sounds like restore isn't raid56 aware yet?



-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Any hope of pool recovery?

2015-07-02 Thread Donald Pearson
Unfortunately btrfs image fails with "couldn't read chunk tree".

btrfs restore complains that every device is missing except the one
that you specify on executing the command.  Multiple devices as a
parameter isn't an option.  Specifcy /dev/disk/by-uuid/ claims
that all devices are missing.

I went ahead and dropped the drive that dmesg is still complaining
about.  Mounting still fails, so I'm going to try to rescue chunk-tree
again (for science!).

If anybody has any other ideas to try or data to gather/methods to
gather them as a case study for any devs please let me know.  I'll
assemble all the data that I know how to and follow that link Chris
suggested for filing a bug.

On Thu, Jul 2, 2015 at 12:00 PM, Chris Murphy  wrote:
> On Thu, Jul 2, 2015 at 8:49 AM, Donald Pearson
>  wrote:
>
>> I do see plenty of complaints about the sdg drive (previously sde) in
>> /var/log/messages from the 28th which is when I started noticing
>> issues.  Nothing is jumping out at me claiming the btrfs is taking
>> action but I may not know what to look for.
>
> I'd include that entire log with the bug report. I'd like to skim it
> at least. Even logs from earlier might be useful.
>
>
> --
> Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


possible enhancement: failing device converted to a seed device

2015-07-02 Thread Kyle Gates
I'll preface this with the fact that I'm just a user and am only posing a 
question for a possible enhancement to btrfs.
 
I'm quite sure it isn't currently allowed but would it be possible to set a 
failing device as a seed instead of kicking it out of a multi-device 
filesystem? This would make the failing device RO, while keeping the filesystem 
as a whole RW thereby allowing the user additional protection when 
recovering/balancing. Is this a feasible/realistic request?
 
Thanks,
Kyle  --
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Any hope of pool recovery?

2015-07-02 Thread Chris Murphy
On Thu, Jul 2, 2015 at 8:49 AM, Donald Pearson
 wrote:

> I do see plenty of complaints about the sdg drive (previously sde) in
> /var/log/messages from the 28th which is when I started noticing
> issues.  Nothing is jumping out at me claiming the btrfs is taking
> action but I may not know what to look for.

I'd include that entire log with the bug report. I'd like to skim it
at least. Even logs from earlier might be useful.


-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Any hope of pool recovery?

2015-07-02 Thread Chris Murphy
On Thu, Jul 2, 2015 at 8:49 AM, Donald Pearson
 wrote:

> Which is curious because this is device id 2, where previously the
> complaint was about device id 1.  So can I believe dmesg about which
> drive is actually the issue or is the drive that's printed in dmesg
> just whichever drive happens to be the last in some loop of code?

devid is static/reliable
/dev/sdX is dynamic/unreliable and related to logic board's firmware

Some systems are more stable in this regard than others, I've worked
with systems that have different drive order every boot, even when
hardware configuration is unchanged. When the config changes, good bet
the drive letters will change.



> Theoretically I should be able to kick another drive out of the pool
> safely, but I'm not sure which one to actually kick out or if that is
> the appropriate next step.

My limited understanding at this point is that once you get "open with
broken chunk error
 Fail to recover the chunk tree." from chunk recover, you've reached
the limits of the current state of recovery tools.

But that it completed suggests it might be possible to get a complete
btrfs image, and get that to a developer who can then use it to
improve the recovery tools.

>
> I do see plenty of complaints about the sdg drive (previously sde) in
> /var/log/messages from the 28th which is when I started noticing
> issues.  Nothing is jumping out at me claiming the btrfs is taking
> action but I may not know what to look for.
>
> journalctl I'm not familiar with.  journalctl -bX returns with "failed
> to parse relative boot ID number 'X'" but perhaps you meant X to be a
> variable of some value?journalctl -b does run, but I'm not sure
> what to look for.

I don't have a raid56 example handy for what this looks like before
this message appears:

[48466.853589] BTRFS: fixed up error at logical 20971520 on dev /dev/sdb

But that's what I get for corrupt metadata where metadata profile is
DUP. The messages for missing metadata that needs reconstruction would
be different but I'd expect to still see the fixed up message. But I'd
also look at
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/fs/btrfs/raid56.c?id=refs/tags/v4.1
and read comments and possible raid56 related error messages.

It's similar for data.

[ 1540.865534] BTRFS: bdev /dev/sdb errs: wr 0, rd 0, flush 0, corrupt 1, gen 0
[ 1540.866944] BTRFS: unable to fixup (regular) error at logical
12845056 on dev /dev/sdb

Again this is a corruption example, not a read failure example. It
can't be fixed because the data profile is single in this case.



>
> So, what does the audience suggest?  Shall I compile a newer kernel,
> kick out another drive (which?), or take what's behind door #3 (which
> is...?)

If there's data on this volume you need, put all the drives back in
and look at btrfs-rescue to try and extract what you can. And then try
a btrfs-image again, maybe it'll work too if there aren't read errors.

Once you've gotten what you need out of it, you can decide if it's
worth continuing to try to fix it (seems doubtful to me but I am not a
developer). I'd probably just start over. The one change to make going
forward is more frequent scrubs to hopefully find and fixup any bad
sectors before it starts to cause this problem again.

Maybe someone with more knowledge will say if any of the btrfs kernel
debug features are worth enabling? I suspect those debug features are
only useful to gather more information as the file system is being
used and encounters the first problem, the URE, and any subsequent
events that caused confusion and then the self-corruption of the fs
beyond repair. If so, that implies a whole new fs, and then trying to
reproduce the conditions that caused the problem. Which brings me
to...

hdparm has a dangerous --make-bad-sector option for testing RAID. I
wonder if qemu has such an option? I'd rather test this in a VM than
use a "do not ever use" option in hdparm.


-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


strange corruptions found during btrfs check

2015-07-02 Thread Christoph Anton Mitterer
Hi.

This is on a btrfs created and used with a 4.0 kernel.

Not much was done on it, apart from send/receive snapshots from another
btrfs (with -p).
Some of the older snapshots (that were used as parents before) have
been removed in the meantime).

Now a btrfs check gives this:
# btrfs check /dev/mapper/image
Checking filesystem on /dev/mapper/data-b
UUID: 250ddae1-7b37-4b22-89e9-4dc5886c810f
checking extents
ref mismatch on [468697088 16384] extent item 0, found 1
Backref 468697088 parent 4159 root 4159 not found in extent tree
backpointer mismatch on [468697088 16384]
owner ref check failed [468697088 16384]
ref mismatch on [1002373120 16384] extent item 0, found 1
Backref 1002373120 parent 4159 root 4159 not found in extent tree
backpointer mismatch on [1002373120 16384]
ref mismatch on [1013940224 16384] extent item 0, found 1
Backref 1013940224 parent 4159 root 4159 not found in extent tree
backpointer mismatch on [1013940224 16384]
ref mismatch on [525281738752 16384] extent item 0, found 1
Backref 525281738752 parent 4159 root 4159 not found in extent tree
backpointer mismatch on [525281738752 16384]
owner ref check failed [525281738752 16384]
ref mismatch on [525317095424 16384] extent item 0, found 1
Backref 525317095424 parent 4159 root 4159 not found in extent tree
backpointer mismatch on [525317095424 16384]
owner ref check failed [525317095424 16384]
ref mismatch on [525404700672 16384] extent item 0, found 1
Backref 525404700672 parent 4159 root 4159 not found in extent tree
backpointer mismatch on [525404700672 16384]
owner ref check failed [525404700672 16384]
ref mismatch on [525438025728 16384] extent item 0, found 1
Backref 525438025728 parent 4159 root 4159 not found in extent tree
backpointer mismatch on [525438025728 16384]
owner ref check failed [525438025728 16384]
ref mismatch on [525554302976 16384] extent item 0, found 1
Backref 525554302976 parent 4159 root 4159 not found in extent tree
backpointer mismatch on [525554302976 16384]
owner ref check failed [525554302976 16384]
ref mismatch on [525585235968 16384] extent item 0, found 1
Backref 525585235968 parent 4159 root 4159 not found in extent tree
backpointer mismatch on [525585235968 16384]
owner ref check failed [525585235968 16384]
ref mismatch on [830810521600 16384] extent item 0, found 1
Backref 830810521600 parent 4159 root 4159 not found in extent tree
backpointer mismatch on [830810521600 16384]
owner ref check failed [830810521600 16384]
ref mismatch on [830895620096 16384] extent item 0, found 1
Backref 830895620096 parent 4159 root 4159 not found in extent tree
backpointer mismatch on [830895620096 16384]
owner ref check failed [830895620096 16384]
ref mismatch on [1038383448064 16384] extent item 0, found 1
Backref 1038383448064 parent 4159 root 4159 not found in extent tree
backpointer mismatch on [1038383448064 16384]
owner ref check failed [1038383448064 16384]
ref mismatch on [1391733161984 16384] extent item 0, found 1
Backref 1391733161984 parent 4159 root 4159 not found in extent tree
backpointer mismatch on [1391733161984 16384]
ref mismatch on [1392008445952 16384] extent item 0, found 1
Backref 1392008445952 parent 4159 root 4159 not found in extent tree
backpointer mismatch on [1392008445952 16384]
ref mismatch on [1392058843136 16384] extent item 0, found 1
Backref 1392058843136 parent 4159 root 4159 not found in extent tree
backpointer mismatch on [1392058843136 16384]
ref mismatch on [1392058925056 16384] extent item 0, found 1
Backref 1392058925056 parent 4159 root 4159 not found in extent tree
backpointer mismatch on [1392058925056 16384]
ref mismatch on [1466625753088 16384] extent item 0, found 1
Backref 1466625753088 parent 4159 root 4159 not found in extent tree
backpointer mismatch on [1466625753088 16384]
owner ref check failed [1466625753088 16384]
ref mismatch on [2857092792320 16384] extent item 0, found 1
Backref 2857092792320 parent 4159 root 4159 not found in extent tree
backpointer mismatch on [2857092792320 16384]
owner ref check failed [2857092792320 16384]
ref mismatch on [2857095610368 16384] extent item 0, found 1
Backref 2857095610368 parent 4159 root 4159 not found in extent tree
backpointer mismatch on [2857095610368 16384]
owner ref check failed [2857095610368 16384]
ref mismatch on [2857125183488 16384] extent item 0, found 1
Backref 2857125183488 parent 4159 root 4159 not found in extent tree
backpointer mismatch on [2857125183488 16384]
owner ref check failed [2857125183488 16384]
ref mismatch on [2857127591936 16384] extent item 0, found 1
Backref 2857127591936 parent 4159 root 4159 not found in extent tree
backpointer mismatch on [2857127591936 16384]
owner ref check failed [2857127591936 16384]
ref mismatch on [2857130393600 16384] extent item 0, found 1
Backref 2857130393600 parent 4159 root 4159 not found in extent tree
backpointer mismatch on [2857130393600 16384]
owner ref check failed [2857130393600 16384]
ref mismatch on [2857138421760 16384] extent item 0, found 

Re: Any hope of pool recovery?

2015-07-02 Thread Donald Pearson
Hello,

At the bottom of this email are the results of the latest
chunk-recover.  I only included one example of the output that was
printed prior to the summary information but it went up to the end of
my screen buffer and beyond.

So it looks like the command executed properly when none of the drives
give up on a read.  That said my issue with mounting still exists
unfortunately.  The errors in dmesg now complain about /dev/sdd.

[56496.014539] BTRFS (device sdd): bad tree block start 0 21364736

Which is curious because this is device id 2, where previously the
complaint was about device id 1.  So can I believe dmesg about which
drive is actually the issue or is the drive that's printed in dmesg
just whichever drive happens to be the last in some loop of code?
Theoretically I should be able to kick another drive out of the pool
safely, but I'm not sure which one to actually kick out or if that is
the appropriate next step.

I do see plenty of complaints about the sdg drive (previously sde) in
/var/log/messages from the 28th which is when I started noticing
issues.  Nothing is jumping out at me claiming the btrfs is taking
action but I may not know what to look for.

journalctl I'm not familiar with.  journalctl -bX returns with "failed
to parse relative boot ID number 'X'" but perhaps you meant X to be a
variable of some value?journalctl -b does run, but I'm not sure
what to look for.

So, what does the audience suggest?  Shall I compile a newer kernel,
kick out another drive (which?), or take what's behind door #3 (which
is...?)

Thanks again everybody,
Donald

  Chunk: start = 6643489177600, len = 1073741824, type = 104, num_stripes = 10
  Stripes list:
  [ 0] Stripe: devid = 8, offset = 817549672448
  [ 1] Stripe: devid = 7, offset = 817549672448
  [ 2] Stripe: devid = 10, offset = 817549672448
  [ 3] Stripe: devid = 9, offset = 817549672448
  [ 4] Stripe: devid = 3, offset = 817549672448
  [ 5] Stripe: devid = 0, offset = 0
  [ 6] Stripe: devid = 0, offset = 0
  [ 7] Stripe: devid = 0, offset = 0
  [ 8] Stripe: devid = 0, offset = 0
  [ 9] Stripe: devid = 0, offset = 0
  Block Group: start = 6643489177600, len = 1073741824, flag = 104
  Device extent list:
  [ 0]Device extent: devid = 3, start = 817549672448, len =
134217728, chunk offset = 6643489177600
  [ 1]Device extent: devid = 9, start = 817549672448, len =
134217728, chunk offset = 6643489177600
  [ 2]Device extent: devid = 10, start = 817549672448, len =
134217728, chunk offset = 6643489177600
  [ 3]Device extent: devid = 7, start = 817549672448, len =
134217728, chunk offset = 6643489177600
  [ 4]Device extent: devid = 8, start = 817549672448, len =
134217728, chunk offset = 6643489177600
  [ 5]Device extent: devid = 4, start = 817549672448, len =
134217728, chunk offset = 6643489177600
  [ 6]Device extent: devid = 2, start = 817549672448, len =
134217728, chunk offset = 6643489177600
  [ 7]Device extent: devid = 1, start = 817569595392, len =
134217728, chunk offset = 6643489177600
  [ 8]Device extent: devid = 6, start = 817549672448, len =
134217728, chunk offset = 6643489177600
  [ 9]Device extent: devid = 5, start = 817549672448, len =
134217728, chunk offset = 6643489177600
  Chunk: start = 6886154829824, len = 8589934592, type = 101, num_stripes = 0
  Stripes list:
  Block Group: start = 6886154829824, len = 8589934592, flag = 101
  No device extent.
  Chunk: start = 6894744764416, len = 8589934592, type = 101, num_stripes = 0
  Stripes list:
  Block Group: start = 6894744764416, len = 8589934592, flag = 101
  No device extent.
  Chunk: start = 6903334699008, len = 8589934592, type = 101, num_stripes = 0
  Stripes list:
  Block Group: start = 6903334699008, len = 8589934592, flag = 101
  No device extent.

Total Chunks:   805
  Recoverable:  567
  Unrecoverable:238

Orphan Block Groups:

Orphan Device Extents:
  Device extent: devid = 4, start = 819831373824, len = 1073741824,
chunk offset = 6661742788608
  Device extent: devid = 2, start = 819831373824, len = 1073741824,
chunk offset = 6661742788608
  Device extent: devid = 1, start = 819851296768, len = 1073741824,
chunk offset = 6661742788608
  Device extent: devid = 9, start = 819831373824, len = 1073741824,
chunk offset = 6661742788608
  Device extent: devid = 10, start = 819831373824, len = 1073741824,
chunk offset = 6661742788608
  Device extent: devid = 8, start = 819831373824, len = 1073741824,
chunk offset = 6661742788608
  Device extent: devid = 7, start = 819831373824, len = 1073741824,
chunk offset = 6661742788608
  Device extent: devid = 3, start = 819831373824, len = 1073741824,
chunk offset = 6661742788608
  Device extent: devid = 6, start = 819831373824, len = 1073741824,
chunk offset = 6661742788608
  Device extent: devid = 5, start = 819831373824, len = 1073741824,
chunk of

Re: [PATCH v2 11/18] btrfs: qgroup: Add new qgroup calculation function btrfs_qgroup_account_extents().

2015-07-02 Thread David Sterba
On Wed, Apr 29, 2015 at 10:29:04AM +0800, Qu Wenruo wrote:
> The new btrfs_qgroup_account_extents() function should be called in
> btrfs_commit_transaction() and it will update all the qgroup according
> to delayed_ref_root->dirty_extent_root.
> 
> The new function can handle both normal operation during
> commit_transaction() or in rescan in a unified method with clearer
> logic.
> 
> Signed-off-by: Qu Wenruo 
> ---
> v2:
>   Also free ulist even quota is disabled to avoid possible memory leak.
> ---
>  fs/btrfs/extent-tree.h |   0

Surprise, this adds an empty file. What was the intention?
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4] btrfs: Fix data checksum error cause by replace with io-load.

2015-07-02 Thread Chris Mason
On Tue, Jun 30, 2015 at 10:26:18AM +0800, Qu Wenruo wrote:
> To Chris:
> 
> Would you consider merging these patchset for late 4.2 merge window?
> If it's OK to merge it into 4.2 late rc, we'll start our test and send pull
> request after our test, eta this Friday or next Monday.
> 
> I know normally we should submit it early especially when such fix is not
> small.
> But the bug is long-standing and is quite annoying (possibility involved),
> also Zhao Lei has quite a good idea to cleanup the scrub codes based on the
> patchset.
> 
> So it would be quite nice if we have any chance to merge it into 4.2
> 
> Would it be OK for you?
> 

I can still take these patches in a later RC, but with this set applied,
I'm getting this during xfstests (btrfs/073 and btrfs/066):

[11185.853152] [ cut here ]   
[11185.862659] WARNING: CPU: 7 PID: 580363 at fs/btrfs/extent-tree.c:9460 
btrfs_create_pending_block_groups+0x161/0x1f0 [btrfs]()
[11185.885804] Modules linked in: dm_flakey btrfs raid6_pq zlib_deflate 
lzo_compress xor xfs exportfs libcrc32c tcp_diag inet_diag nfsv4 fuse loop 
k10temp coretemp hwmon ip6table_filter ip6_tables xt_NFLOG nfnetlink_log 
nfnetlink xt_comment xt_statistic iptable_filter ip_tables x_tables mptctl 
nfsv3 nfs lockd grace netconsole autofs4 rpcsec_gss_krb5 auth_rpcgss 
oid_registry sunrpc ipv6 ext3 jbd iTCO_wdt iTCO_vendor_support pcspkr rtc_cmos 
ipmi_si ipmi_msghandler i2c_i801 i2c_core lpc_ich mfd_core shpchp ehci_pci 
ehci_hcd mlx4_en ptp pps_core mlx4_core sg ses enclosure button dm_mod 
megaraid_sas
[11185.994338] CPU: 7 PID: 580363 Comm: btrfs Tainted: GW   
4.1.0-rc6-mason+ #82
[11186.011074] Hardware name: ZTSYSTEMS Echo Ridge T4  /A9DRPF-10D, BIOS 1.07 
05/10/2012
[11186.027107]  24f4 880894b539c8 816c48c5 
a06dbe8d
[11186.042442]  880894b53a18 880894b53a08 8105ba75 
881053ad4000
[11186.057769]  88085417c9a8 88085417c800 8806335a8858 
ffe5
[11186.073128] Call Trace:
[11186.078233]  [] dump_stack+0x4f/0x6a 
[11186.088713]  [] warn_slowpath_common+0x95/0xe0   
[11186.100926]  [] warn_slowpath_fmt+0x46/0x70  
[11186.112627]  [] 
btrfs_create_pending_block_groups+0x161/0x1f0 [btrfs]
[11186.129039]  [] __btrfs_end_transaction+0xac/0x400 [btrfs]
[11186.143325]  [] btrfs_end_transaction+0x10/0x20 [btrfs]  
[11186.157092]  [] btrfs_inc_block_group_ro+0x116/0x1d0 
[btrfs]
[11186.171941]  [] scrub_enumerate_chunks+0x2f2/0x5e0 [btrfs]
[11186.186233]  [] ? ttwu_stat+0x4d/0x250   
[11186.197045]  [] ? bit_waitqueue+0x80/0xa0
[11186.208392]  [] ? trace_hardirqs_on+0xd/0x10 
[11186.220271]  [] btrfs_scrub_dev+0x1c6/0x5d0 [btrfs]  
[11186.233350]  [] ? __mnt_want_write_file+0x29/0x30
[11186.246083]  [] btrfs_ioctl_scrub+0xb1/0x130 [btrfs] 
[11186.259348]  [] btrfs_ioctl+0xa68/0x11d0 [btrfs] 
[11186.271908]  [] do_vfs_ioctl+0x8f/0x580  
[11186.282903]  [] ? __fget+0x110/0x200 
[11186.293377]  [] ? get_close_on_exec+0x180/0x180  
[11186.305764]  [] ? __fget_light+0x2a/0x90 
[11186.316928]  [] SyS_ioctl+0xa1/0xb0  
[11186.327233]  [] ? __audit_syscall_entry+0xac/0x110   
[11186.340133]  [] system_call_fastpath+0x12/0x6f   
[11186.352346] ---[ end trace 720cebad3201fcad ]---
[11186.361785] BTRFS: error (device sdi) in 
btrfs_create_pending_block_groups:9460: errno=-27 unknown^M

I've removed them for now.

-chris

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html