BUG during send, cannot delete subvolume

2018-04-12 Thread Matt McKinnon

Hi All,

I had a ctree.c error during a send/receive backup:

kernel BUG at fs/btrfs/ctree.c:1862

Nothing seemed to go wrong otherwise on the file system.  After 
restarting the send, it completed, but I'm left with a subvolume I can't 
delete:


BTRFS warning (device sdb1): Attempt to delete subvolume 176188 during send

I don't see any zombie btrfs send processes lying around.  Is there 
anyway to delete this volume?  Do I just need a reboot?


-Matt
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs-transacti hammering the system

2017-12-01 Thread Matt McKinnon

Well, it's at zero now...

# btrfs fi df /export/
Data, single: total=30.45TiB, used=30.25TiB
System, DUP: total=32.00MiB, used=3.62MiB
Metadata, DUP: total=66.50GiB, used=65.16GiB
GlobalReserve, single: total=512.00MiB, used=0.00B


On 01/12/17 16:47, Duncan wrote:

Hans van Kranenburg posted on Fri, 01 Dec 2017 18:06:23 +0100 as
excerpted:


On 12/01/2017 05:31 PM, Matt McKinnon wrote:

Sorry, I missed your in-line reply:



2) How big is this filesystem? What does your `btrfs fi df
/mountpoint` say?



# btrfs fi df /export/
Data, single: total=30.45TiB, used=30.25TiB
System, DUP: total=32.00MiB, used=3.62MiB
Metadata, DUP: total=66.50GiB, used=65.08GiB
GlobalReserve, single: total=512.00MiB, used=53.69MiB


Multi-TiB filesystem, check. total/used ratio looks healthy.


Not so healthy, from here.  Data/metadata are healthy, yes,
but...

Any usage at all of global reserve is a red flag indicating that
something in the filesystem thinks, or thought when it resorted
to global reserve, that space is running out.

Global reserve usage doesn't really hint what the problem is,
but it's definitely a red flag that there /is/ a problem, and
it's easily overlooked, as it apparently was here.

It's likely indication of a bug, possibly one of the ones fixed
right around 4.12/4.13.  I'll let the devs and better experts take
it from there, but I'd certainly be worried until global reserve
drops to zero usage.


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs-transacti hammering the system

2017-12-01 Thread Matt McKinnon
Right.  The file system is 48T, with 17T available, so we're not quite 
pushing it yet.


So far so good on the space_cache=v2 mount.  I'm surprised this isn't on 
the gotcha page in the wiki; it may end up making a world of difference 
to the users here


Thanks again,
Matt

On 01/12/17 13:24, Hans van Kranenburg wrote:

On 12/01/2017 06:57 PM, Holger Hoffstätte wrote:

On 12/01/17 18:34, Matt McKinnon wrote:

Thanks, I'll give space_cache=v2 a shot.


Yes, very much recommended.


My mount options are: rw,relatime,space_cache,autodefrag,subvolid=5,subvol=/


Turn autodefrag off and use noatime instead of relatime.

Your filesystem also seems very full,


We don't know. btrfs fi df only displays allocated space. And that being
full is good, it means not too much free space fragments everywhere.


that's bad with every filesystem but
*especially* with btrfs because the allocator has to work really hard to find
free space for COWing. Really consider deleting stuff or adding more space.



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs-transacti hammering the system

2017-12-01 Thread Matt McKinnon

Thanks, I'll give space_cache=v2 a shot.

My mount options are: rw,relatime,space_cache,autodefrag,subvolid=5,subvol=/
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs-transacti hammering the system

2017-12-01 Thread Matt McKinnon

Sorry, I missed your in-line reply:



1) The one right above, btrfs_write_out_cache, is the write-out of the
free space cache v1. Do you see this for multiple seconds going on, and
does it match the time when it's writing X MB/s to disk?



It seems to only last until the next watch update.

[] io_schedule+0x16/0x40
[] get_request+0x23e/0x720
[] blk_queue_bio+0xc1/0x3a0
[] generic_make_request+0xf8/0x2a0
[] submit_bio+0x75/0x150
[] btrfs_map_bio+0xe5/0x2f0 [btrfs]
[] btree_submit_bio_hook+0x8c/0xe0 [btrfs]
[] submit_one_bio+0x63/0xa0 [btrfs]
[] flush_epd_write_bio+0x3b/0x50 [btrfs]
[] flush_write_bio+0xe/0x10 [btrfs]
[] btree_write_cache_pages+0x379/0x450 [btrfs]
[] btree_writepages+0x5d/0x70 [btrfs]
[] do_writepages+0x1c/0x70
[] __filemap_fdatawrite_range+0xaa/0xe0
[] filemap_fdatawrite_range+0x13/0x20
[] btrfs_write_marked_extents+0xe9/0x110 [btrfs]
[] btrfs_write_and_wait_transaction.isra.22+0x3d/0x80 
[btrfs]

[] btrfs_commit_transaction+0x665/0x900 [btrfs]
[] transaction_kthread+0x18a/0x1c0 [btrfs]
[] kthread+0x109/0x140
[] ret_from_fork+0x25/0x30

The last three lines will stick around for a while.  Is switching to 
space cache v2 something that everyone should be doing?  Something that 
would be a good test at least?




2) How big is this filesystem? What does your `btrfs fi df /mountpoint` say?



# btrfs fi df /export/
Data, single: total=30.45TiB, used=30.25TiB
System, DUP: total=32.00MiB, used=3.62MiB
Metadata, DUP: total=66.50GiB, used=65.08GiB
GlobalReserve, single: total=512.00MiB, used=53.69MiB



3) What kind of workload are you running? E.g. how can you describe it
within a range from "big files which just sit there" to "small writes
and deletes all over the place all the time"?



It's a pretty light workload most of the time.  It's a file system that 
exports two NFS shares to a small lab group.  I believe it is more small 
reads all over a large file (MRI imaging) rather than small writes.



4) What kernel version is this? `uname -a` output?



# uname -a
Linux machine_name 4.12.8-custom #1 SMP Tue Aug 22 10:15:01 EDT 2017 
x86_64 x86_64 x86_64 GNU/Linux


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs-transacti hammering the system

2017-12-01 Thread Matt McKinnon

These seem to come up most often:

[] transaction_kthread+0x133/0x1c0 [btrfs]
[] kthread+0x109/0x140
[] ret_from_fork+0x25/0x30


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs-transacti hammering the system

2017-12-01 Thread Matt McKinnon

Thanks for this.  Here's what I get:


[] transaction_kthread+0x133/0x1c0 [btrfs]
[] kthread+0x109/0x140
[] ret_from_fork+0x25/0x30

...

[] io_schedule+0x16/0x40
[] get_request+0x23e/0x720
[] blk_queue_bio+0xc1/0x3a0
[] generic_make_request+0xf8/0x2a0
[] submit_bio+0x75/0x150
[] btrfs_map_bio+0xe5/0x2f0 [btrfs]
[] btree_submit_bio_hook+0x8c/0xe0 [btrfs]
[] submit_one_bio+0x63/0xa0 [btrfs]
[] flush_epd_write_bio+0x3b/0x50 [btrfs]
[] flush_write_bio+0xe/0x10 [btrfs]
[] btree_write_cache_pages+0x379/0x450 [btrfs]
[] btree_writepages+0x5d/0x70 [btrfs]
[] do_writepages+0x1c/0x70
[] __filemap_fdatawrite_range+0xaa/0xe0
[] filemap_fdatawrite_range+0x13/0x20
[] btrfs_write_marked_extents+0xe9/0x110 [btrfs]
[] btrfs_write_and_wait_transaction.isra.22+0x3d/0x80 
[btrfs]

[] btrfs_commit_transaction+0x665/0x900 [btrfs]

...

[] io_schedule+0x16/0x40
[] wait_on_page_bit+0xe8/0x120
[] read_extent_buffer_pages+0x1cd/0x2e0 [btrfs]
[] btree_read_extent_buffer_pages+0x9f/0x100 [btrfs]
[] read_tree_block+0x32/0x50 [btrfs]
[] read_block_for_search.isra.32+0x120/0x2e0 [btrfs]
[] btrfs_next_old_leaf+0x215/0x400 [btrfs]
[] btrfs_next_leaf+0x10/0x20 [btrfs]
[] btrfs_lookup_csums_range+0x12e/0x410 [btrfs]
[] csum_exist_in_range.isra.49+0x2a/0x81 [btrfs]
[] run_delalloc_nocow+0x9b2/0xa10 [btrfs]
[] run_delalloc_range+0x68/0x340 [btrfs]
[] writepage_delalloc.isra.47+0xf0/0x140 [btrfs]
[] __extent_writepage+0xc7/0x290 [btrfs]
[] extent_write_cache_pages.constprop.53+0x2b5/0x450 
[btrfs]

[] extent_writepages+0x4d/0x70 [btrfs]
[] btrfs_writepages+0x28/0x30 [btrfs]
[] do_writepages+0x1c/0x70
[] __filemap_fdatawrite_range+0xaa/0xe0
[] filemap_fdatawrite_range+0x13/0x20
[] btrfs_fdatawrite_range+0x20/0x50 [btrfs]
[] __btrfs_write_out_cache+0x3d9/0x420 [btrfs]
[] btrfs_write_out_cache+0x86/0x100 [btrfs]
[] btrfs_write_dirty_block_groups+0x261/0x390 [btrfs]
[] commit_cowonly_roots+0x1fb/0x290 [btrfs]
[] btrfs_commit_transaction+0x434/0x900 [btrfs]

...

[] tree_search_offset.isra.23+0x37/0x1d0 [btrfs]

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


btrfs-transacti hammering the system

2017-12-01 Thread Matt McKinnon

Hi All,

Is there any way to figure out what exactly btrfs-transacti is chugging 
on?  I have a few file systems that seem to get wedged for days on end 
with this process pegged around 100%.  I've stopped all snapshots, made 
sure no quotas were enabled, turned on autodefrag in the mount options, 
tried manual defragging, kernel upgrades, yet still this brings my 
system to a crawl.


Network I/O to the system seems very tiny.  The only I/O I see to the 
disk is btrfs-transacti writing a couple M/s.


# time touch foo

real2m54.303s
user0m0.000s
sys 0m0.002s

# uname -r
4.12.8-custom

# btrfs --version
btrfs-progs v4.13.3

Yes, I know I'm a bit behind there...

-Matt



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


kernel BUG at fs/btrfs/ctree.c:3182

2017-10-16 Thread Matt McKinnon

Hi All,

Been having issues on one machine and I was wondering if I could get 
some help tracking the issue down.


# uname -a
Linux riperton 4.13.5-custom #1 SMP Sat Oct 7 18:28:16 EDT 2017 x86_64 
x86_64 x86_64 GNU/Linux


# btrfs --version
btrfs-progs v4.13.3

# btrfs fi show
Label: none  uuid: 8133a362-8e41-4da4-b607-a27832861157
Total devices 1 FS bytes used 41.64TiB
devid1 size 50.93TiB used 41.88TiB path /dev/sda1

# btrfs fi df /export/
Data, single: total=41.70TiB, used=41.57TiB
System, DUP: total=64.00MiB, used=4.56MiB
Metadata, DUP: total=90.00GiB, used=72.30GiB
Metadata, single: total=1.53GiB, used=0.00B
GlobalReserve, single: total=512.00MiB, used=0.00B


[617994.948036] [ cut here ]
[617994.948040] kernel BUG at fs/btrfs/ctree.c:3182!
[617994.952786] invalid opcode:  [#1] SMP
[617994.956896] Modules linked in: ipmi_devintf xt_tcpudp 
nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack 
iptable_filter ip_tables x_tables intel_ra
pl sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm 
irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel btrfs pcbc 
aesni_intel aes_
x86_64 crypto_simd glue_helper cryptd dm_multipath joydev lpc_ich mei_me 
mei nfsd ioatdma auth_rpcgss nfs_acl ipmi_si wmi nfs ipmi_msghandler 
lockd grace sunrp
c fscache shpchp mac_hid lp parport ses enclosure scsi_transport_sas 
raid10 raid456 async_raid6_recov hid_generic async_memcpy async_pq 
usbhid async_xor hid as
ync_tx xor igb raid6_pq libcrc32c i2c_algo_bit raid1 ahci dca raid0 
libahci ptp megaraid_sas multipath pps_core linear dm_mirror 
dm_region_hash dm_log
[617995.025316] CPU: 1 PID: 3191 Comm: nfsd Tainted: GW 
4.13.5-custom #1
[617995.032965] Hardware name: Supermicro 
X9DRH-7TF/7F/iTF/iF/X9DRH-7TF/7F/iTF/iF, BIOS 3.0b 04/28/2014

[617995.042092] task: 996bac7d5a00 task.stack: bb7984b74000
[617995.048134] RIP: 0010:btrfs_set_item_key_safe+0x14e/0x160 [btrfs]
[617995.054310] RSP: 0018:bb7984b77658 EFLAGS: 00010246
[617995.059622] RAX:  RBX: 0037 RCX: 
00018000
[617995.066834] RDX:  RSI: bb7984b7776e RDI: 
bb7984b77677
[617995.074051] RBP: bb7984b776b0 R08: bb7984b77677 R09: 

[617995.081263] R10:  R11: 0003 R12: 
bb7984b77666
[617995.088483] R13: 99679cc00460 R14: bb7984b7776e R15: 
9966184867a8
[617995.095705] FS:  () GS:9967afc8() 
knlGS:

[617995.103876] CS:  0010 DS:  ES:  CR0: 80050033
[617995.109707] CR2: 7fdbaad6 CR3: 00071fe09000 CR4: 
001406e0

[617995.116921] Call Trace:
[617995.119493]  __btrfs_drop_extents+0x50c/0xdd0 [btrfs]
[617995.124663]  ? btrfs_encode_fh+0xd0/0xd0 [btrfs]
[617995.129390]  btrfs_log_changed_extents+0x31b/0x640 [btrfs]
[617995.134990]  ? free_extent_buffer+0x4b/0x90 [btrfs]
[617995.139976]  btrfs_log_inode+0x8de/0xb90 [btrfs]
[617995.144686]  ? dput+0xf1/0x1d0
[617995.147847]  btrfs_log_inode_parent+0x21a/0x960 [btrfs]
[617995.153164]  ? kmem_cache_alloc+0x194/0x1a0
[617995.157459]  ? start_transaction+0x120/0x440 [btrfs]
[617995.162528]  btrfs_log_dentry_safe+0x69/0x90 [btrfs]
[617995.167599]  btrfs_sync_file+0x2ab/0x3e0 [btrfs]
[617995.172309]  vfs_fsync_range+0x3d/0xb0
[617995.176168]  btrfs_file_write_iter+0x45b/0x560 [btrfs]
[617995.181396]  do_iter_readv_writev+0xe2/0x130
[617995.185753]  do_iter_write+0x7f/0x190
[617995.189506]  vfs_iter_write+0x19/0x30
[617995.193271]  nfsd_vfs_write+0xb1/0x310 [nfsd]
[617995.197719]  nfsd_write+0x134/0x1e0 [nfsd]
[617995.201908]  nfsd3_proc_write+0x92/0x110 [nfsd]
[617995.206533]  nfsd_dispatch+0xb9/0x250 [nfsd]
[617995.210915]  svc_process_common+0x36e/0x6f0 [sunrpc]
[617995.215979]  svc_process+0xfc/0x1c0 [sunrpc]
[617995.220339]  nfsd+0xe9/0x160 [nfsd]
[617995.223918]  kthread+0x109/0x140
[617995.227238]  ? nfsd_destroy+0x60/0x60 [nfsd]
[617995.231591]  ? kthread_park+0x60/0x60
[617995.235348]  ret_from_fork+0x25/0x30
[617995.239010] Code: 48 8b 45 bf 48 8d 7d c7 4c 89 f6 48 89 45 d0 0f b6 
45 be 88 45 cf 48 8b 45 b6 48 89 45 c7 e8 aa f3 ff ff 85 c0 0f 8f 55 ff 
ff ff <0f> 0b

0f 0b 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44
[617995.257983] RIP: btrfs_set_item_key_safe+0x14e/0x160 [btrfs] RSP: 
bb7984b77658

[617995.265696] ---[ end trace 41d8bb716a419cdd ]---



And after a reboot we come up with this warning:



[  112.712899] [ cut here ]
[  112.712943] WARNING: CPU: 5 PID: 505 at fs/btrfs/file.c:547 
btrfs_drop_extent_cache+0x3c5/0x3d0 [btrfs]
[  112.712944] Modules linked in: intel_rapl sb_edac 
x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel xt_tcpudp kvm 
nf_conntrack_ipv4 nf_defrag_ipv4 irqbypass xt_conntrack crct10dif_pclmul 
nf_conntrack crc32_pclmul ghash_clmulni_intel pcbc iptable_filter 
ip_tables aesni_intel x_tables aes_x86_64 crypto_simd glue_helper cryptd 
dm_multipath 

Re: Struggling with file system slowness

2017-05-09 Thread Matt McKinnon
Those snapshots were created using Marc Merlin's script (thanks, Marc). 
They don't do anything except sit around on the file system for a week 
or so and then are removed.


I'm now doing quarter-hourly snaps instead of nightly since I have 
nightly backups of the filesytem going off-site.  So far the 
btrfs-transaction and memory spikes have not returned.


-Matt





On 05/09/2017 03:14 PM, Liu Bo wrote:

On Fri, May 05, 2017 at 09:24:32AM -0400, Matt McKinnon wrote:

Too little information. Is IO happening at the same time? Is
compression on? Deduplicated? Lots of subvolumes? SSD? What
kind of workload and file size/distribution profile?


Only write IO during the load spikes.  No compression, no deduplication.  12
volumes (including snapshots).  Spinning disks.  Medium workload; file sizes
are all over the map since this hold about 30 user home directories.

Interestingly enough, the problems which had persisted for many weeks went
away when all snapshots were removed.  btrfs-transaction spikes disappeared.
Memory usage went from 30G to under 2G.



Were those snapshots served as backup?

Could you please elaborate how you create snapshots?  We could
probably hammer out a testcase to improve the situation.

Thanks,

-liubo
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Struggling with file system slowness

2017-05-04 Thread Matt McKinnon

Hi All,

Trying to peg down why I have one server that has btrfs-transacti pegged 
at 100% CPU for most of the time.


I thought this might have to do with fragmentation as mentioned in the 
Gotchas page in the wiki (btrfs-endio-wri doesn't seem to be involved as 
mentioned in the wiki), but after running a full defrag of the file 
system, and also enabling the 'autodefrag' mount option, the problem 
still persists.


What's the best way to figure out what btrfs is chugging away at here?

Kernel: 4.10.13-custom
btrfs-progs: v4.10.2


-Matt
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Hard crash on 4.9.5, part 2

2017-01-30 Thread Matt McKinnon
I have an error on this file system I've had in the distant pass where 
the mount would fail with a "file exists" error.  Running a btrfs check 
gives the following over and over again:


Found file extent holes:
start: 0, len: 290816
root 257 inode 28472371 errors 1000, some csum missing
root 257 inode 28472416 errors 1000, some csum missing
root 257 inode 9182183 errors 1000, some csum missing
root 257 inode 9182186 errors 1000, some csum missing
root 257 inode 28419536 errors 1100, file extent discount, some csum missing
Found file extent holes:
start: 0, len: 290816
root 257 inode 28472371 errors 1000, some csum missing
root 257 inode 28472416 errors 1000, some csum missing
root 257 inode 9182183 errors 1000, some csum missing
root 257 inode 9182186 errors 1000, some csum missing
root 257 inode 28419536 errors 1100, file extent discount, some csum missing


Are these found per subvolume snapshot I have and will eventually end?

Here is the crash after the mount (with recovery/usebackuproot):

[  627.233213] BTRFS warning (device sda1): 'recovery' is deprecated, 
use 'usebackuproot' instead
[  627.233216] BTRFS info (device sda1): trying to use backup root at 
mount time

[  627.233218] BTRFS info (device sda1): disk space caching is enabled
[  627.233220] BTRFS info (device sda1): has skinny extents
[  709.234688] [ cut here ]
[  709.234734] WARNING: CPU: 5 PID: 3468 at fs/btrfs/file.c:546 
btrfs_drop_extent_cache+0x3e8/0x400 [btrfs]
[  709.234735] Modules linked in: ipmi_devintf nfsd auth_rpcgss nfs_acl 
nfs lockd grace sunrpc fscache lp parport intel_rapl sb_edac
 edac_core x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel 
xt_tcpudp kvm nf_conntrack_ipv4 nf_defrag_ipv4 irqbypass crct10d
if_pclmul crc32_pclmul ghash_clmulni_intel xt_conntrack aesni_intel 
btrfs nf_conntrack aes_x86_64 lrw gf128mul iptable_filter glue_h
elper ip_tables ablk_helper cryptd x_tables dm_multipath joydev mei_me 
ioatdma mei lpc_ich wmi ipmi_si ipmi_msghandler shpchp mac_hi
d ses enclosure scsi_transport_sas raid10 raid456 async_raid6_recov 
async_memcpy async_pq async_xor async_tx xor hid_generic megarai
d_sas raid6_pq ahci libcrc32c libahci igb usbhid raid1 hid i2c_algo_bit 
raid0 dca ptp multipath pps_core linear dm_mirror dm_region_

hash dm_log
[  709.234812] CPU: 5 PID: 3468 Comm: mount Not tainted 4.9.5-custom #1
[  709.234813] Hardware name: Supermicro 
X9DRH-7TF/7F/iTF/iF/X9DRH-7TF/7F/iTF/iF, BIOS 3.0b 04/28/2014
[  709.234816]  bd3784bb7568 8e3c8e7c  

[  709.234820]  bd3784bb75a8 8e07d3d1 02220070 
9e5f0ae4d150
[  709.234823]  0002d000 9e5f0bc91f78 9e5f0bc91da8 
0002c000

[  709.234827] Call Trace:
[  709.234837]  [] dump_stack+0x63/0x87
[  709.234846]  [] __warn+0xd1/0xf0
[  709.234850]  [] warn_slowpath_null+0x1d/0x20
[  709.234874]  [] btrfs_drop_extent_cache+0x3e8/0x400 
[btrfs]
[  709.234895]  [] __btrfs_drop_extents+0x5b2/0xd30 
[btrfs]
[  709.234914]  [] ? 
generic_bin_search.constprop.36+0x8b/0x1e0 [btrfs]
[  709.234931]  [] ? btrfs_set_path_blocking+0x36/0x70 
[btrfs]

[  709.234942]  [] ? kmem_cache_alloc+0x194/0x1a0
[  709.234958]  [] ? btrfs_alloc_path+0x1a/0x20 [btrfs]
[  709.234977]  [] btrfs_drop_extents+0x79/0xa0 [btrfs]
[  709.235002]  [] replay_one_extent+0x414/0x7b0 [btrfs]
[  709.235007]  [] ? autoremove_wake_function+0x40/0x40
[  709.235030]  [] replay_one_buffer+0x4cc/0x7c0 [btrfs]
[  709.235053]  [] ? 
mark_extent_buffer_accessed+0x4f/0x70 [btrfs]

[  709.235074]  [] walk_down_log_tree+0x1ba/0x3b0 [btrfs]
[  709.235094]  [] walk_log_tree+0xb4/0x1a0 [btrfs]
[  709.235114]  [] btrfs_recover_log_trees+0x20e/0x460 
[btrfs]

[  709.235133]  [] ? replay_one_extent+0x7b0/0x7b0 [btrfs]
[  709.235154]  [] open_ctree+0x2640/0x27f0 [btrfs]
[  709.235171]  [] btrfs_mount+0xca4/0xec0 [btrfs]
[  709.235176]  [] ? find_next_zero_bit+0x1e/0x20
[  709.235180]  [] ? pcpu_next_unpop+0x3e/0x50
[  709.235184]  [] ? find_next_bit+0x19/0x20
[  709.235190]  [] mount_fs+0x39/0x160
[  709.235193]  [] ? __alloc_percpu+0x15/0x20
[  709.235196]  [] vfs_kern_mount+0x67/0x110
[  709.235213]  [] btrfs_mount+0x18b/0xec0 [btrfs]
[  709.235216]  [] ? find_next_zero_bit+0x1e/0x20
[  709.235220]  [] mount_fs+0x39/0x160
[  709.235223]  [] ? __alloc_percpu+0x15/0x20
[  709.235225]  [] vfs_kern_mount+0x67/0x110
[  709.235228]  [] do_mount+0x1bb/0xc80
[  709.235232]  [] ? kmem_cache_alloc_trace+0x14b/0x1b0
[  709.235235]  [] SyS_mount+0x83/0xd0
[  709.235240]  [] entry_SYSCALL_64_fastpath+0x1e/0xad
[  709.235243] ---[ end trace d4e5dcddb432b7d3 ]---
[  709.354972] BTRFS: error (device sda1) in btrfs_replay_log:2506: 
errno=-17 Object already exists (Failed to recover log tree)
[  709.355570] BTRFS error (device sda1): cleaner transaction attach 
returned -30

[  709.548919] BTRFS error (device sda1): open_ctree failed


-Matt
--
To unsubscribe from this list: send the line "unsubscribe 

Re: Hard crash on 4.9.5

2017-01-28 Thread Matt McKinnon
This same file system (which crashed again with the same errors) is also 
giving this output during a metadata or data balance:


Jan 27 19:42:47 my_machine kernel: [  335.018123] BTRFS info (device 
sda1): no csum found for inode 28472371 start 2191360
Jan 27 19:42:47 my_machine kernel: [  335.018128] BTRFS info (device 
sda1): no csum found for inode 28472371 start 2195456
Jan 27 19:42:47 my_machine kernel: [  335.018491] BTRFS info (device 
sda1): no csum found for inode 28472371 start 4018176
Jan 27 19:42:47 my_machine kernel: [  335.018496] BTRFS info (device 
sda1): no csum found for inode 28472371 start 4022272
Jan 27 19:42:47 my_machine kernel: [  335.018499] BTRFS info (device 
sda1): no csum found for inode 28472371 start 4026368
Jan 27 19:42:47 my_machine kernel: [  335.018502] BTRFS info (device 
sda1): no csum found for inode 28472371 start 4030464
Jan 27 19:42:47 my_machine kernel: [  335.019443] BTRFS info (device 
sda1): no csum found for inode 28472371 start 6156288
Jan 27 19:42:47 my_machine kernel: [  335.019688] BTRFS info (device 
sda1): no csum found for inode 28472371 start 7933952
Jan 27 19:42:47 my_machine kernel: [  335.019693] BTRFS info (device 
sda1): no csum found for inode 28472371 start 7938048
Jan 27 19:42:47 my_machine kernel: [  335.019754] BTRFS info (device 
sda1): no csum found for inode 28472371 start 8077312
Jan 27 19:42:47 my_machine kernel: [  335.025485] BTRFS warning (device 
sda1): csum failed ino 28472371 off 2191360 csum 4031061501 expected csum 0
Jan 27 19:42:47 my_machine kernel: [  335.025490] BTRFS warning (device 
sda1): csum failed ino 28472371 off 2195456 csum 2371784003 expected csum 0
Jan 27 19:42:47 my_machine kernel: [  335.025526] BTRFS warning (device 
sda1): csum failed ino 28472371 off 4018176 csum 3812080098 expected csum 0
Jan 27 19:42:47 my_machine kernel: [  335.025531] BTRFS warning (device 
sda1): csum failed ino 28472371 off 4022272 csum 2776681411 expected csum 0
Jan 27 19:42:47 my_machine kernel: [  335.025534] BTRFS warning (device 
sda1): csum failed ino 28472371 off 4026368 csum 1179241675 expected csum 0
Jan 27 19:42:47 my_machine kernel: [  335.025540] BTRFS warning (device 
sda1): csum failed ino 28472371 off 4030464 csum 1256914217 expected csum 0
Jan 27 19:42:47 my_machine kernel: [  335.026142] BTRFS warning (device 
sda1): csum failed ino 28472371 off 7933952 csum 2695958066 expected csum 0
Jan 27 19:42:47 my_machine kernel: [  335.026147] BTRFS warning (device 
sda1): csum failed ino 28472371 off 7938048 csum 3260800596 expected csum 0
Jan 27 19:42:47 my_machine kernel: [  335.026934] BTRFS warning (device 
sda1): csum failed ino 28472371 off 6156288 csum 4293116449 expected csum 0
Jan 27 19:42:47 my_machine kernel: [  335.033249] BTRFS warning (device 
sda1): csum failed ino 28472371 off 8077312 csum 4031878292 expected csum 0


Can these be ignored?


On 01/25/2017 04:06 PM, Liu Bo wrote:

On Mon, Jan 23, 2017 at 03:03:55PM -0500, Matt McKinnon wrote:

Wondering what to do about this error which says 'reboot needed'.  Has
happened a three times in the past week:



Well, I don't think btrfs's logic here is wrong, the following stack
shows that a nfs client has sent a second unlink against the same inode
while somehow the inode was not fully deleted by the first unlink.

So it'd be good that you could add some debugging information to get us
further.

Thanks,

-liubo


Jan 23 14:16:17 my_machine kernel: [ 2568.595648] BTRFS error (device sda1):
err add delayed dir index item(index: 23810) into the deletion tree of the
delayed node(root id: 257, inode id: 2661433, errno: -17)
Jan 23 14:16:17 my_machine kernel: [ 2568.611010] [ cut here
]
Jan 23 14:16:17 my_machine kernel: [ 2568.615628] kernel BUG at
fs/btrfs/delayed-inode.c:1557!
Jan 23 14:16:17 my_machine kernel: [ 2568.620942] invalid opcode:  [#1]
SMP
Jan 23 14:16:17 my_machine kernel: [ 2568.624960] Modules linked in: ufs
qnx4 hfsplus hfs minix ntfs msdos jfs xfs ipt_REJECT nf_rej
ect_ipv4 xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack
nf_conntrack iptable_filter ip_tables x_tables ipmi_devintf nfsd au
th_rpcgss nfs_acl nfs lockd grace sunrpc fscache intel_rapl sb_edac
edac_core x86_pkg_temp_thermal intel_powerclamp coretemp kvm_int
el kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel
aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper crypt
d dm_multipath joydev mei_me mei lpc_ich ioatdma wmi ipmi_si ipmi_msghandler
btrfs shpchp mac_hid lp parport ses enclosure scsi_tran
sport_sas raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor
async_tx xor raid6_pq libcrc32c igb hid_generic i2c_algo_
bit raid1 dca usbhid ahci raid0 ptp megaraid_sas multipath
Jan 23 14:16:17 my_machine kernel: [ 2568.697150]  hid libahci pps_core
linear dm_mirror dm_region_hash dm_log
Jan 23 14:16:17 my_machine kernel: [ 2568.702689] CPU: 0 PID: 2440 Comm:
nfsd Tainted: GW   4.9.5-custom #1
Jan 23 14:16:17

Hard crash on 4.9.5

2017-01-23 Thread Matt McKinnon
Wondering what to do about this error which says 'reboot needed'.  Has 
happened a three times in the past week:


Jan 23 14:16:17 my_machine kernel: [ 2568.595648] BTRFS error (device 
sda1): err add delayed dir index item(index: 23810) into the deletion 
tree of the delayed node(root id: 257, inode id: 2661433, errno: -17)
Jan 23 14:16:17 my_machine kernel: [ 2568.611010] [ cut here 
]
Jan 23 14:16:17 my_machine kernel: [ 2568.615628] kernel BUG at 
fs/btrfs/delayed-inode.c:1557!
Jan 23 14:16:17 my_machine kernel: [ 2568.620942] invalid opcode:  
[#1] SMP
Jan 23 14:16:17 my_machine kernel: [ 2568.624960] Modules linked in: ufs 
qnx4 hfsplus hfs minix ntfs msdos jfs xfs ipt_REJECT nf_rej
ect_ipv4 xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack 
nf_conntrack iptable_filter ip_tables x_tables ipmi_devintf nfsd au
th_rpcgss nfs_acl nfs lockd grace sunrpc fscache intel_rapl sb_edac 
edac_core x86_pkg_temp_thermal intel_powerclamp coretemp kvm_int
el kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel 
aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper crypt
d dm_multipath joydev mei_me mei lpc_ich ioatdma wmi ipmi_si 
ipmi_msghandler btrfs shpchp mac_hid lp parport ses enclosure scsi_tran
sport_sas raid10 raid456 async_raid6_recov async_memcpy async_pq 
async_xor async_tx xor raid6_pq libcrc32c igb hid_generic i2c_algo_

bit raid1 dca usbhid ahci raid0 ptp megaraid_sas multipath
Jan 23 14:16:17 my_machine kernel: [ 2568.697150]  hid libahci pps_core 
linear dm_mirror dm_region_hash dm_log
Jan 23 14:16:17 my_machine kernel: [ 2568.702689] CPU: 0 PID: 2440 Comm: 
nfsd Tainted: GW   4.9.5-custom #1
Jan 23 14:16:17 my_machine kernel: [ 2568.710166] Hardware name: 
Supermicro X9DRH-7TF/7F/iTF/iF/X9DRH-7TF/7F/iTF/iF, BIOS 3.0b 04/28

/2014
Jan 23 14:16:17 my_machine kernel: [ 2568.719207] task: 95a42addab80 
task.stack: b9da8533
Jan 23 14:16:17 my_machine kernel: [ 2568.725124] RIP: 
0010:[]  [] 
btrfs_delete_delayed_dir_inde

x+0x286/0x290 [btrfs]
Jan 23 14:16:17 my_machine kernel: [ 2568.735604] RSP: 
0018:b9da85333be0  EFLAGS: 00010286
Jan 23 14:16:17 my_machine kernel: [ 2568.740917] RAX:  
RBX: 95a3b104b690 RCX: 
Jan 23 14:16:17 my_machine kernel: [ 2568.748048] RDX: 0001 
RSI: 95a42fc0dcc8 RDI: 95a42fc0dcc8
Jan 23 14:16:17 my_machine kernel: [ 2568.755171] RBP: b9da85333c48 
R08: 0491 R09: 
Jan 23 14:16:17 my_machine kernel: [ 2568.762297] R10: 0005 
R11: 0006 R12: 95a3b104b6d8
Jan 23 14:16:17 my_machine kernel: [ 2568.769429] R13: 5d02 
R14: 95a82953d800 R15: ffef
Jan 23 14:16:17 my_machine kernel: [ 2568.776555] FS: 
() GS:95a42fc0() knlGS:
Jan 23 14:16:17 my_machine kernel: [ 2568.784639] CS:  0010 DS:  ES: 
 CR0: 80050033
Jan 23 14:16:17 my_machine kernel: [ 2568.790377] CR2: 7f12ea376000 
CR3: 0003e1e07000 CR4: 001406f0

Jan 23 14:16:17 my_machine kernel: [ 2568.797503] Stack:
Jan 23 14:16:17 my_machine kernel: [ 2568.799524]  9b7fe5f2 
95a3b104b560 0004 95a3f96b3e80
Jan 23 14:16:17 my_machine kernel: [ 2568.806983]  95a3f96b3e80 
39ff95a814eeeb68 6000289c 5d02
Jan 23 14:16:17 my_machine kernel: [ 2568.814436]  95a3f7457c40 
95a3bcb74138 95a814eeeb68 00289c39

Jan 23 14:16:17 my_machine kernel: [ 2568.821891] Call Trace:
Jan 23 14:16:17 my_machine kernel: [ 2568.824343]  [] 
? mutex_lock+0x12/0x2f
Jan 23 14:16:17 my_machine kernel: [ 2568.829671]  [] 
__btrfs_unlink_inode+0x198/0x4c0 [btrfs]
Jan 23 14:16:17 my_machine kernel: [ 2568.836555]  [] 
btrfs_unlink_inode+0x1c/0x40 [btrfs]
Jan 23 14:16:17 my_machine kernel: [ 2568.843086]  [] 
btrfs_unlink+0x6b/0xb0 [btrfs]
Jan 23 14:16:17 my_machine kernel: [ 2568.849091]  [] 
vfs_unlink+0xda/0x190
Jan 23 14:16:17 my_machine kernel: [ 2568.854315]  [] 
? lookup_one_len+0xd3/0x130
Jan 23 14:16:17 my_machine kernel: [ 2568.860075]  [] 
nfsd_unlink+0x16e/0x210 [nfsd]
Jan 23 14:16:17 my_machine kernel: [ 2568.866084]  [] 
nfsd3_proc_remove+0x7c/0x110 [nfsd]
Jan 23 14:16:17 my_machine kernel: [ 2568.872529]  [] 
nfsd_dispatch+0xb8/0x1f0 [nfsd]
Jan 23 14:16:17 my_machine kernel: [ 2568.878641]  [] 
svc_process_common+0x43f/0x700 [sunrpc]
Jan 23 14:16:17 my_machine kernel: [ 2568.885432]  [] 
svc_process+0xfc/0x1c0 [sunrpc]
Jan 23 14:16:17 my_machine kernel: [ 2568.891528]  [] 
nfsd+0xf0/0x160 [nfsd]
Jan 23 14:16:17 my_machine kernel: [ 2568.896838]  [] 
? nfsd_destroy+0x60/0x60 [nfsd]
Jan 23 14:16:17 my_machine kernel: [ 2568.902931]  [] 
kthread+0xca/0xe0
Jan 23 14:16:17 my_machine kernel: [ 2568.907807]  [] 
? kthread_park+0x60/0x60
Jan 23 14:16:17 my_machine kernel: [ 2568.913296]  [] 
ret_from_fork+0x25/0x30
Jan 23 14:16:17 my_machine kernel: [ 2568.918693] Code: ff ff 48 8b 43 
10 49 8b 

kernel crash after upgrading to 4.9

2017-01-04 Thread Matt McKinnon

Hi All,

I seem to have a similar issue to a subject in December:

Subject: page allocation stall in kernel 4.9 when copying files from one 
btrfs hdd to another


In my case, this is caused when rsync'ing large amounts of data over NFS 
to the server with the BTRFS file system.  This was not apparent in the 
previous kernel (4.7).


The poster mentioned some suggestions from Ducan here:

https://mail-archive.com/linux-btrfs@vger.kernel.org/msg60083.html

But those are not visible in the thread.  What suggestions were given to 
help alleviate this pain?


-Matt
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS: error (device sda1) in btrfs_run_delayed_refs:2963: errno=-17 Object already exists

2016-08-10 Thread Matt McKinnon
t here ]
[   79.922000] WARNING: CPU: 6 PID: 2632 at fs/btrfs/file.c:546 
btrfs_drop_extent_cache+0x3e8/0x400 [btrfs]
[   79.922002] Modules linked in: ipt_REJECT nf_reject_ipv4 xt_tcpudp 
nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack 
iptable_filter ip_tables x_tables ipmi_devintf sb_edac edac_core 
x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass 
crct10dif_pclmul crc32_pclmul ghash_clmulni_intel btrfs aesni_intel 
aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd dm_multipath nfsd 
auth_rpcgss joydev nfs_acl mei_me nfs lpc_ich mei lockd wmi grace 
ipmi_si sunrpc ipmi_msghandler fscache shpchp ioatdma mac_hid lp parport 
ses enclosure scsi_transport_sas raid10 raid456 async_raid6_recov 
async_memcpy async_pq async_xor async_tx xor hid_generic igb raid6_pq 
i2c_algo_bit libcrc32c dca usbhid raid1 ahci raid0 ptp megaraid_sas 
multipath hid libahci pps_core linear dm_mirror dm_region_hash dm_log

[   79.922063] CPU: 6 PID: 2632 Comm: mount Not tainted 4.7.0-custom #1
[   79.922065] Hardware name: Supermicro 
X9DRH-7TF/7F/iTF/iF/X9DRH-7TF/7F/iTF/iF, BIOS 3.0b 04/28/2014
[   79.922067]   88046ca1f538 813b816c 

[   79.922071]   88046ca1f578 8107a321 
02226ca1f5e0
[   79.922074]  880841d19460 e000 880841e21290 
880841e210c0

[   79.922077] Call Trace:
[   79.922089]  [] dump_stack+0x63/0x87
[   79.922096]  [] __warn+0xd1/0xf0
[   79.922099]  [] warn_slowpath_null+0x1d/0x20
[   79.922117]  [] btrfs_drop_extent_cache+0x3e8/0x400 
[btrfs]
[   79.922133]  [] __btrfs_drop_extents+0x5b2/0xd30 
[btrfs]
[   79.922147]  [] ? 
generic_bin_search.constprop.36+0x85/0x190 [btrfs]
[   79.922160]  [] ? btrfs_set_path_blocking+0x36/0x70 
[btrfs]

[   79.922173]  [] ? btrfs_search_slot+0x438/0x970 [btrfs]
[   79.922178]  [] ? kmem_cache_alloc+0x1d6/0x1f0
[   79.922190]  [] ? btrfs_alloc_path+0x1a/0x20 [btrfs]
[   79.922205]  [] btrfs_drop_extents+0x79/0xa0 [btrfs]
[   79.94]  [] replay_one_extent+0x419/0x750 [btrfs]
[   79.922241]  [] replay_one_buffer+0x4db/0x7d0 [btrfs]
[   79.922258]  [] ? 
mark_extent_buffer_accessed+0x4f/0x70 [btrfs]

[   79.922274]  [] walk_down_log_tree+0x1cc/0x3d0 [btrfs]
[   79.922289]  [] walk_log_tree+0xba/0x1a0 [btrfs]
[   79.922304]  [] btrfs_recover_log_trees+0x213/0x470 
[btrfs]

[   79.922318]  [] ? replay_one_extent+0x750/0x750 [btrfs]
[   79.922335]  [] open_ctree+0x264d/0x2760 [btrfs]
[   79.922348]  [] btrfs_mount+0xc94/0xeb0 [btrfs]
[   79.922353]  [] ? find_next_zero_bit+0x1e/0x20
[   79.922358]  [] ? pcpu_next_unpop+0x3e/0x50
[   79.922362]  [] ? find_next_bit+0x19/0x20
[   79.922368]  [] mount_fs+0x39/0x160
[   79.922371]  [] ? __alloc_percpu+0x15/0x20
[   79.922375]  [] vfs_kern_mount+0x67/0x110
[   79.922387]  [] btrfs_mount+0x18b/0xeb0 [btrfs]
[   79.922390]  [] ? find_next_zero_bit+0x1e/0x20
[   79.922394]  [] mount_fs+0x39/0x160
[   79.922397]  [] ? __alloc_percpu+0x15/0x20
[   79.922399]  [] vfs_kern_mount+0x67/0x110
[   79.922402]  [] do_mount+0x22a/0xd90
[   79.922406]  [] ? __kmalloc_track_caller+0x1af/0x250
[   79.922408]  [] ? strndup_user+0x41/0x80
[   79.922411]  [] ? memdup_user+0x42/0x70
[   79.922413]  [] SyS_mount+0x83/0xd0
[   79.922418]  [] entry_SYSCALL_64_fastpath+0x1e/0xa8
[   79.922436] ---[ end trace 0db3466cdad31dcf ]---




On 08/09/2016 10:25 PM, Chris Murphy wrote:

On Tue, Aug 9, 2016 at 6:29 PM, Matt McKinnon <m...@techsquare.com> wrote:

Spoke too soon.  Do I need to continue to run with that mount option in
place?


It shouldn't be necessary. Something's still wrong for some reason,
even with DUP metadata being CoW'd so someone else is going to have to
speak up what the problem is. And that btrfs check not only doesn't
come up clean but crashes suggests some confluence of things in kernel
4.3 and your hardware conspired to make the file system inconsistent
in a way that isn't immediately recovering the usual way. That is,
usebackuproots working suggests that there's a bug elsewhere in the
storage stack because normally that shouldn't be necessary -
something's happened out of order.

1 size 50.93TiB used 22.67TiB path /dev/sda1

What is the exact nature of this block device?

If getting this back up and running is urgent I suggest inquiring on
IRC what the next steps are.

In the meantime I'd get a btrfs-image (which is probably going to be
quite large given metadata is 60GiB), if that pukes then see if 'btrfs
inspect-internal dump-tree /dev/sda1 > dumptree.log' which may also
fail but before it fails might contain something useful. Obviously
btrfs check shouldn't crash so that's a bug already. What do you get
for free -m? It's known that btrfs check needs a lot of memory and
pretty much all the metadata needs to be read in, so... if you have an
SSD available it might make sense to setup a huge pile of swap on that
SSD and rerun btrfs check.




--
To unsubscribe from this list: send the line "uns

Re: BTRFS: error (device sda1) in btrfs_run_delayed_refs:2963: errno=-17 Object already exists

2016-08-10 Thread Matt McKinnon

I performed a quick balance which gave me:

[39020.030638] BTRFS info (device sda1): relocating block group 
25428383236096 flags 1
[39020.206097] BTRFS warning (device sda1): block group 23113395863552 
has wrong amount of free space
[39020.206101] BTRFS warning (device sda1): failed to load free space 
cache for block group 23113395863552, rebuilding it now


then a crash dump.

Remounted with -o clear_cache,nospace_cache and the balance completed. 
Running a larger balance now.


Will umount, and remount with default options to see if that works.

-Matt

On 08/10/2016 03:09 AM, g6094...@freenet.de wrote:

Hi,

from what i see you have a non finished balance ongoing, since you have
system and metadata DUP and single information on disk.

so you should (re)run a balance for this data.


sash


Am 10.08.2016 um 02:17 schrieb Matt McKinnon:

-o usebackuproot worked well.

after the file system settled, performing a sync and a clean umount, a
normal mount works now as well.

Anything I should be doing going forward?

Thanks,
Matt

On 08/09/2016 08:01 PM, Chris Murphy wrote:

On Tue, Aug 9, 2016 at 5:15 PM, Matt McKinnon <m...@techsquare.com>
wrote:

Hello,

Our server recently crashed and was rebooted.  When it returned our
BTRFS
volume is mounting read-only:


What happens when you try mounting with -o usebackuproot ?

If that fails, what output do you get for 'btrfs check' (without
--repair)? If you only get some "errors 400, nbytes wrong" then
--repair should fix the problem.




--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html




--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS: error (device sda1) in btrfs_run_delayed_refs:2963: errno=-17 Object already exists

2016-08-09 Thread Matt McKinnon

# btrfs check /dev/sda1
Checking filesystem on /dev/sda1
UUID: 33f9089e-acc7-4a39-8b83-b18bb182faaf
checking extents
ref mismatch on [958277767168 5894144] extent item 0, found 1
Backref 958277767168 root 257 owner 15799573 offset 750342144 num_refs 0 
not found in extent tree
Incorrect local backref count on 958277767168 root 257 owner 15799573 
offset 750342144 found 1 wanted 0 back 0x15d380f90

backpointer mismatch on [958277767168 5894144]
ref mismatch on [958298935296 9666560] extent item 0, found 2
Backref 958298935296 root 257 owner 15799573 offset 559185920 num_refs 0 
not found in extent tree
Incorrect local backref count on 958298935296 root 257 owner 15799573 
offset 559185920 found 2 wanted 0 back 0x15d3809a0

backpointer mismatch on [958298935296 9666560]


about 859 of those ...

Then:

owner ref check failed [25737445867520 16384]
checking free space cache
There is no free space entry for 109105479680-109105496064
There is no free space entry for 109105479680-109551026176
cache appears valid but isn't 109014155264
There is no free space entry for 139709693952-139709710336
There is no free space entry for 139709693952-140152668160
cache appears valid but isn't 139615797248
Wanted offset 171291525120, found 171291426816
Wanted offset 171291525120, found 171291426816
cache appears valid but isn't 171291181056
Wanted offset 220146597888, found 220146532352
Wanted offset 220146597888, found 220146532352
cache appears valid but isn't 220146434048
btrfs: unable to add free space :-17
free-space-cache.c:824: btrfs_add_free_space: Assertion `ret == -EEXIST` 
failed.

btrfs[0x464af9]
btrfs(btrfs_add_free_space+0x154)[0x46531f]
btrfs(load_free_space_cache+0xab7)[0x465e36]
btrfs(cmd_check+0x22c7)[0x42db0e]
btrfs(main+0x155)[0x40a4fd]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7faad34cdf45]
btrfs[0x40a0f9]


and we crashed out of the check there.

-Matt

On 08/09/2016 08:06 PM, Chris Murphy wrote:

On Tue, Aug 9, 2016 at 6:01 PM, Chris Murphy <li...@colorremedies.com> wrote:

On Tue, Aug 9, 2016 at 5:15 PM, Matt McKinnon <m...@techsquare.com> wrote:

Hello,

Our server recently crashed and was rebooted.  When it returned our BTRFS
volume is mounting read-only:


What happens when you try mounting with -o usebackuproot ?

If that fails, what output do you get for 'btrfs check' (without
--repair)? If you only get some "errors 400, nbytes wrong" then
--repair should fix the problem.


This could also be a regression somewhere...
https://bugzilla.kernel.org/show_bug.cgi?id=60522



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS: error (device sda1) in btrfs_run_delayed_refs:2963: errno=-17 Object already exists

2016-08-09 Thread Matt McKinnon
Spoke too soon.  Do I need to continue to run with that mount option in 
place?



[   83.775984] BTRFS warning (device sda1): block group 25741009879040 
has wrong amount of free space
[   83.775989] BTRFS warning (device sda1): failed to load free space 
cache for block group 25741009879040, rebuilding it now
[   85.231748] BTRFS warning (device sda1): block group 25737721544704 
has wrong amount of free space
[   85.231752] BTRFS warning (device sda1): failed to load free space 
cache for block group 25737721544704, rebuilding it now

[   98.913796] BTRFS info (device sda1): disk space caching is enabled
[   98.913803] BTRFS info (device sda1): has skinny extents
[  179.564408] BTRFS warning (device sda1): block group 78412513280 has 
wrong amount of free space
[  179.564414] BTRFS warning (device sda1): failed to load free space 
cache for block group 78412513280, rebuilding it now

[  667.106718] [ cut here ]
[  667.106772] WARNING: CPU: 0 PID: 2726 at fs/btrfs/extent-tree.c:2963 
btrfs_run_delayed_refs+0x292/0x2d0 [btrfs]

[  667.106775] BTRFS: Transaction aborted (error -17)
[  667.106777] Modules linked in: ipt_REJECT nf_reject_ipv4 xt_tcpudp 
nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack 
iptable_filter ip_tables x_tables ipmi_devintf sb_edac edac_core 
x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel btrfs kvm 
irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel 
aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd dm_multipath 
joydev lpc_ich mei_me mei wmi ipmi_si ipmi_msghandler nfsd auth_rpcgss 
nfs_acl nfs lockd grace ioatdma sunrpc shpchp mac_hid fscache lp parport 
ses enclosure scsi_transport_sas raid10 raid456 async_raid6_recov 
async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c 
hid_generic igb raid1 usbhid i2c_algo_bit ahci raid0 dca multipath ptp 
hid megaraid_sas libahci linear pps_core dm_mirror dm_region_hash dm_log
[  667.106859] CPU: 0 PID: 2726 Comm: btrfs-transacti Not tainted 
4.7.0-custom #1
[  667.106861] Hardware name: Supermicro 
X9DRH-7TF/7F/iTF/iF/X9DRH-7TF/7F/iTF/iF, BIOS 3.0b 04/28/2014
[  667.106864]   880464e73c08 813b816c 
880464e73c58
[  667.106869]   880464e73c48 8107a321 
0b936c3cc170
[  667.106873]  880443191130 88046c3cc170 88046b43f000 


[  667.106878] Call Trace:
[  667.106889]  [] dump_stack+0x63/0x87
[  667.106896]  [] __warn+0xd1/0xf0
[  667.106901]  [] warn_slowpath_fmt+0x4f/0x60
[  667.106925]  [] btrfs_run_delayed_refs+0x292/0x2d0 
[btrfs]
[  667.106947]  [] 
btrfs_write_dirty_block_groups+0x178/0x3b0 [btrfs]
[  667.106974]  [] commit_cowonly_roots+0x23c/0x2e0 
[btrfs]
[  667.106999]  [] 
btrfs_commit_transaction+0x4fb/0xa80 [btrfs]

[  667.107021]  [] transaction_kthread+0x1d2/0x200 [btrfs]
[  667.107042]  [] ? 
btrfs_cleanup_transaction+0x580/0x580 [btrfs]

[  667.107047]  [] kthread+0xc9/0xe0
[  667.107053]  [] ret_from_fork+0x1f/0x40
[  667.107056]  [] ? kthread_park+0x60/0x60
[  667.107060] ---[ end trace 336c80ba4db66e78 ]---
[  667.107065] BTRFS: error (device sda1) in 
btrfs_run_delayed_refs:2963: errno=-17 Object already exists

[  667.116389] BTRFS info (device sda1): forced readonly
[  667.117081] BTRFS warning (device sda1): Skipping commit of aborted 
transaction.
[  667.117086] BTRFS: error (device sda1) in cleanup_transaction:1853: 
errno=-17 Object already exists



On 08/09/2016 08:06 PM, Chris Murphy wrote:

On Tue, Aug 9, 2016 at 6:01 PM, Chris Murphy <li...@colorremedies.com> wrote:

On Tue, Aug 9, 2016 at 5:15 PM, Matt McKinnon <m...@techsquare.com> wrote:

Hello,

Our server recently crashed and was rebooted.  When it returned our BTRFS
volume is mounting read-only:


What happens when you try mounting with -o usebackuproot ?

If that fails, what output do you get for 'btrfs check' (without
--repair)? If you only get some "errors 400, nbytes wrong" then
--repair should fix the problem.


This could also be a regression somewhere...
https://bugzilla.kernel.org/show_bug.cgi?id=60522



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS: error (device sda1) in btrfs_run_delayed_refs:2963: errno=-17 Object already exists

2016-08-09 Thread Matt McKinnon

-o usebackuproot worked well.

after the file system settled, performing a sync and a clean umount, a 
normal mount works now as well.


Anything I should be doing going forward?

Thanks,
Matt

On 08/09/2016 08:01 PM, Chris Murphy wrote:

On Tue, Aug 9, 2016 at 5:15 PM, Matt McKinnon <m...@techsquare.com> wrote:

Hello,

Our server recently crashed and was rebooted.  When it returned our BTRFS
volume is mounting read-only:


What happens when you try mounting with -o usebackuproot ?

If that fails, what output do you get for 'btrfs check' (without
--repair)? If you only get some "errors 400, nbytes wrong" then
--repair should fix the problem.




--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


BTRFS: error (device sda1) in btrfs_run_delayed_refs:2963: errno=-17 Object already exists

2016-08-09 Thread Matt McKinnon

Hello,

Our server recently crashed and was rebooted.  When it returned our 
BTRFS volume is mounting read-only:


[  142.395093] BTRFS: error (device sda1) in 
btrfs_run_delayed_refs:2963: errno=-17 Object already exists

[  142.404418] BTRFS info (device sda1): forced readonly

I tried upgrading the kernel from 4.3 to 4.7.  Upgraded btrfs-progs to 
v4.7 as well.


# uname -a
Linux hostname 4.7.0-custom #1 SMP Tue Aug 9 11:16:28 EDT 2016 x86_64 
x86_64 x86_64 GNU/Linux


# btrfs --version
btrfs-progs v4.7

# btrfs fi show
Label: none  uuid: 33f9089e-acc7-4a39-8b83-b18bb182faaf
Total devices 1 FS bytes used 14.95TiB
devid1 size 50.93TiB used 22.67TiB path /dev/sda1

# btrfs fi df /export/
Data, single: total=22.53TiB, used=14.89TiB
System, DUP: total=40.00MiB, used=2.39MiB
System, single: total=4.00MiB, used=0.00B
Metadata, DUP: total=70.50GiB, used=60.21GiB
Metadata, single: total=1.51GiB, used=0.00B
GlobalReserve, single: total=512.00MiB, used=0.00B

# dmesg
[  142.394841] [ cut here ]
[  142.394874] WARNING: CPU: 6 PID: 269 at fs/btrfs/extent-tree.c:2963 
btrfs_run_delayed_refs+0x292/0x2d0 [btrfs]

[  142.394876] BTRFS: Transaction aborted (error -17)
[  142.394878] Modules linked in: ipt_REJECT nf_reject_ipv4 xt_tcpudp 
nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack 
iptable_filter ip_tables x_tables ipmi_devintf nfsd auth_rpcgss nfs_acl 
nfs lockd grace sunrpc fscache sb_edac edac_core x86_pkg_temp_thermal 
intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul 
crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul 
glue_helper ablk_helper cryptd dm_multipath joydev lpc_ich mei_me mei 
ioatdma wmi ipmi_si ipmi_msghandler shpchp mac_hid btrfs lp parport ses 
enclosure scsi_transport_sas raid10 raid456 async_raid6_recov 
async_memcpy async_pq async_xor async_tx xor igb raid6_pq libcrc32c 
i2c_algo_bit raid1 hid_generic dca usbhid raid0 ptp hid ahci 
megaraid_sas multipath libahci pps_core linear dm_mirror dm_region_hash 
dm_log
[  142.394942] CPU: 6 PID: 269 Comm: kworker/u18:5 Not tainted 
4.7.0-custom #1
[  142.394944] Hardware name: Supermicro 
X9DRH-7TF/7F/iTF/iF/X9DRH-7TF/7F/iTF/iF, BIOS 3.0b 04/28/2014

[  142.394966] Workqueue: btrfs-extent-refs btrfs_extent_refs_helper [btrfs]
[  142.394969]   88086a057ca0 813b816c 
88086a057cf0
[  142.394972]   88086a057ce0 8107a321 
0b9325288170
[  142.394975]  8808519eb000 880825288170 88086b2c1000 
0020

[  142.394978] Call Trace:
[  142.394987]  [] dump_stack+0x63/0x87
[  142.394993]  [] __warn+0xd1/0xf0
[  142.394996]  [] warn_slowpath_fmt+0x4f/0x60
[  142.395012]  [] btrfs_run_delayed_refs+0x292/0x2d0 
[btrfs]
[  142.395025]  [] delayed_ref_async_start+0x94/0xb0 
[btrfs]

[  142.395044]  [] normal_work_helper+0xc0/0x2d0 [btrfs]
[  142.395050]  [] ? pwq_activate_delayed_work+0x42/0xb0
[  142.395066]  [] btrfs_extent_refs_helper+0x12/0x20 
[btrfs]

[  142.395070]  [] process_one_work+0x153/0x3f0
[  142.395073]  [] worker_thread+0x12b/0x4b0
[  142.395076]  [] ? rescuer_thread+0x340/0x340
[  142.395079]  [] kthread+0xc9/0xe0
[  142.395085]  [] ret_from_fork+0x1f/0x40
[  142.395088]  [] ? kthread_park+0x60/0x60
[  142.395090] ---[ end trace e2b0b8dc37502011 ]---
[  142.395093] BTRFS: error (device sda1) in 
btrfs_run_delayed_refs:2963: errno=-17 Object already exists

[  142.404418] BTRFS info (device sda1): forced readonly
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


corruption, bad block, input/output errors - do i run --repair?

2014-11-07 Thread Matt McKinnon

Hi All,

I'm running into some corruption and I wanted to seek out advice on 
whether or not to run btrfs check --repair, or if I should fall back to 
my backup file server, or both.


The system is mountable, and usable.

# uname -a
Linux cbmm-fs 3.17.2-custom #1 SMP Thu Oct 30 14:09:57 EDT 2014 x86_64 
x86_64 x86_64 GNU/Linux


# btrfs --version
Btrfs v3.14.2
# btrfs fi show
Label: none  uuid: 30c15060-8fb4-4926-87d4-f7d08c3033c5
Total devices 1 FS bytes used 58.92TiB
devid1 size 76.40TiB used 59.05TiB path /dev/sda1

# btrfs fi df /home
Data, single: total=58.75TiB, used=58.75TiB
System, DUP: total=32.00MiB, used=2.66MiB
System, single: total=4.00MiB, used=3.68MiB
Metadata, DUP: total=119.00GiB, used=116.63GiB
Metadata, single: total=64.01GiB, used=57.68GiB
GlobalReserve, single: total=512.00MiB, used=0.00B


I did run into some RO snapshot corruption which caused me to run btrfs 
check:


parent transid verify failed on 20809493159936 wanted 
4486137218058286914 found

390978
parent transid verify failed on 20809493159936 wanted 
4486137218058286914 found

390978
Ignoring transid failure
Checking filesystem on /dev/sda1
UUID: 30c15060-8fb4-4926-87d4-f7d08c3033c5
checking extents
bad block 69290357067776
Errors found in extent allocation tree or chunk allocation
checking free space cache
checking fs roots

...

dir isize wrong 1 error
errors 500, file extent discount, nbytes wrong 14 errors
errors 2001, no inode item, link count wrong 257302 errors

...

found 185063071745 bytes used err is 1
total csum bytes: 8428
total tree bytes: 1889284096
total fs tree bytes: 962678784
total extent tree bytes: 159297536
btree space waste bytes: 340014684
file data blocks allocated: 57344
 referenced 57344
Btrfs v3.14.2

Output of a scrub:

ERROR: scrubbing /home failed for device id 1 (Input/output error)
scrub canceled for 30c15060-8fb4-4926-87d4-f7d08c3033c5
scrub started at Mon Nov  3 06:43:58 2014 and was aborted after 
7613 seconds

data_extents_scrubbed: 248507555
tree_extents_scrubbed: 10870729
data_bytes_scrubbed: 15375990317056
tree_bytes_scrubbed: 44526505984
read_errors: 0
csum_errors: 0
verify_errors: 0
no_csum: 15712
csum_discards: 988018
super_errors: 0
malloc_errors: 0
uncorrectable_errors: 0
unverified_errors: 0
corrected_errors: 0
last_physical: 15425663205376

Output of a balance:

ERROR: error during balancing '/home' - Input/output error
There may be more info in syslog - try dmesg | tail

[501087.506642] [ cut here ]
[501087.543971] WARNING: CPU: 5 PID: 31885 at fs/btrfs/relocation.c:925 
build_backref_tree+0x11f0/0x1230 [btrfs]()
[501087.543991] Modules linked in: ipmi_devintf(E) autofs4(E) sb_edac(E) 
edac_core(E) joydev(E) mei_me(E) mei(E) lpc_ich(E) ioatdma(E) ipmi_si(E) 
wmi(E) mac_hid(E) bnep(E) rfcomm(E) bluetooth(E) lp(E) parport(E) 
nfsd(E) nfs_acl(E) auth_rpcgss(E) nfs(E) fscache(E) lockd(E) sunrpc(E) 
ses(E) enclosure(E) hid_generic(E) ahci(E) libahci(E) usbhid(E) hid(E) 
igb(E) dca(E) i2c_algo_bit(E) ptp(E) pps_core(E) megaraid_sas(E) 
btrfs(E) raid6_pq(E) xor(E) libcrc32c(E)
[501087.543995] CPU: 5 PID: 31885 Comm: btrfs Tainted: G  D E 
3.17.2-custom #1
[501087.543997] Hardware name: Supermicro 
X9DRH-7TF/7F/iTF/iF/X9DRH-7TF/7F/iTF/iF, BIOS 3.0a 12/27/2013
[501087.543999]  039d 88000eadb808 8176733c 
0282
[501087.544001]   88000eadb848 8107163c 
1000
[501087.544003]  8801d0d9acf0 880497c70380 0001 
0001

[501087.544004] Call Trace:
[501087.544014]  [8176733c] dump_stack+0x46/0x58
[501087.544022]  [8107163c] warn_slowpath_common+0x8c/0xc0
[501087.544024]  [8107168a] warn_slowpath_null+0x1a/0x20
[501087.544039]  [a00b4020] build_backref_tree+0x11f0/0x1230 
[btrfs]
[501087.544052]  [a00b4331] relocate_tree_blocks+0x2d1/0x690 
[btrfs]

[501087.544060]  [811c1609] ? kmem_cache_alloc_trace+0x39/0x1f0
[501087.544072]  [a00b54a2] relocate_block_group+0x202/0x5f0 
[btrfs]
[501087.544083]  [a00b5a40] 
btrfs_relocate_block_group+0x1b0/0x2d0 [btrfs]
[501087.544098]  [a0088cf5] 
btrfs_relocate_chunk.isra.62+0x75/0x760 [btrfs]
[501087.544111]  [a0084d86] ? release_extent_buffer+0x36/0xe0 
[btrfs]

[501087.544124]  [a0085281] ? free_extent_buffer+0x61/0xc0 [btrfs]
[501087.544136]  [a008d7db] btrfs_balance+0x8ab/0xf50 [btrfs]
[501087.544150]  [a00985ac] btrfs_ioctl_balance+0x1cc/0x530 
[btrfs]
[501087.544156]  [811786eb] ? 
lru_cache_add_active_or_unevictable+0x2b/0xa0

[501087.544168]  [a009aa82] btrfs_ioctl+0x562/0x1f00 [btrfs]
[501087.544173]  [811e9c0b] ? putname+0x2b/0x40
[501087.544176]  [811ef193] ? user_path_at_empty+0x63/0xa0
[501087.544183]  [8105f59c] ?