Re: Mount stalls indefinitely after enabling quota groups.

2018-08-11 Thread Dan Merillat
On Sat, Aug 11, 2018 at 9:36 PM Qu Wenruo  wrote:

> > I'll add a new rescue subcommand, 'btrfs rescue disable-quota' for you
> > to disable quota offline.
>
> Patch set (from my work mailbox), titled "[PATCH] btrfs-progs: rescue:
> Add ability to disable quota offline".
> Can also be fetched from github:
> https://github.com/adam900710/btrfs-progs/tree/quota_disable
>
> Usage is:
> # btrfs rescue disable-quota 
>
> Tested locally, it would just toggle the ON/OFF flag for quota, so the
> modification should be minimal.

Noticed one thing while testing this, but it's not related to the
patch so I'll keep it here.
I still had the ,ro mounts in fstab, and while it mounted ro quickly
*unmounting* the filesystem, even readonly,
got hung up:

Aug 11 23:47:27 fileserver kernel: [  484.314725] INFO: task
umount:5422 blocked for more than 120 seconds.
Aug 11 23:47:27 fileserver kernel: [  484.314787]   Not tainted
4.17.14-dirty #3
Aug 11 23:47:27 fileserver kernel: [  484.314892] "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 11 23:47:27 fileserver kernel: [  484.315006] umount  D
0  5422   4656 0x0080
Aug 11 23:47:27 fileserver kernel: [  484.315122] Call Trace:
Aug 11 23:47:27 fileserver kernel: [  484.315176]  ? __schedule+0x2c0/0x820
Aug 11 23:47:27 fileserver kernel: [  484.315270]  ?
kmem_cache_alloc+0x167/0x1b0
Aug 11 23:47:27 fileserver kernel: [  484.315358]  schedule+0x3c/0x90
Aug 11 23:47:27 fileserver kernel: [  484.315493]  schedule_timeout+0x1e4/0x430
Aug 11 23:47:27 fileserver kernel: [  484.315542]  ?
kmem_cache_alloc+0x167/0x1b0
Aug 11 23:47:27 fileserver kernel: [  484.315686]  wait_for_common+0xb1/0x170
Aug 11 23:47:27 fileserver kernel: [  484.315798]  ? wake_up_q+0x70/0x70
Aug 11 23:47:27 fileserver kernel: [  484.315911]
btrfs_qgroup_wait_for_completion+0x5f/0x80
Aug 11 23:47:27 fileserver kernel: [  484.316031]  close_ctree+0x27/0x2d0
Aug 11 23:47:27 fileserver kernel: [  484.316138]
generic_shutdown_super+0x69/0x110
Aug 11 23:47:27 fileserver kernel: [  484.316252]  kill_anon_super+0xe/0x20
Aug 11 23:47:27 fileserver kernel: [  484.316301]  btrfs_kill_super+0x13/0x100
Aug 11 23:47:27 fileserver kernel: [  484.316349]
deactivate_locked_super+0x39/0x70
Aug 11 23:47:27 fileserver kernel: [  484.316399]  cleanup_mnt+0x3b/0x70
Aug 11 23:47:27 fileserver kernel: [  484.316459]  task_work_run+0x89/0xb0
Aug 11 23:47:27 fileserver kernel: [  484.316519]
exit_to_usermode_loop+0x8c/0x90
Aug 11 23:47:27 fileserver kernel: [  484.316579]  do_syscall_64+0xf1/0x110
Aug 11 23:47:27 fileserver kernel: [  484.316639]
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Is it trying to write changes to a ro mount, or is it doing a bunch of
work that it's just going to throw away?  I ended up using sysrq-b
after commenting out the entries in fstab.

Everything seems fine with the filesystem now.  I appreciate all the help!


Re: [PATCH] btrfs-progs: rescue: Add ability to disable quota offline

2018-08-11 Thread Dan Merillat
On Sat, Aug 11, 2018 at 9:34 PM Qu Wenruo  wrote:
>
> Provide an offline tool to disable quota.
>
> For kernel which skip_balance doesn't work, there is no way to disable
> quota on huge fs with balance, as quota will cause balance to hang for a
> long long time for each tree block switch.
>
> So add an offline rescue tool to disable quota.
>
> Reported-by: Dan Merillat 
> Signed-off-by: Qu Wenruo 

That fixed it, thanks.

Tested-By: Dan Merillat 


Re: Mount stalls indefinitely after enabling quota groups.

2018-08-11 Thread Dan Merillat
On Sat, Aug 11, 2018 at 8:30 PM Qu Wenruo  wrote:
>
> It looks pretty like qgroup, but too many noise.
> The pin point trace event would btrfs_find_all_roots().

I had this half-written when you replied.

Agreed: looks like bulk of time spent resides in qgroups.  Spent some
time with sysrq-l and ftrace:

? __rcu_read_unlock+0x5/0x50
? return_to_handler+0x15/0x36
__rcu_read_unlock+0x5/0x50
find_extent_buffer+0x47/0x90extent_io.c:4888
read_block_for_search.isra.12+0xc8/0x350ctree.c:2399
btrfs_search_slot+0x3e7/0x9c0   ctree.c:2837
btrfs_next_old_leaf+0x1dc/0x410 ctree.c:5702
btrfs_next_old_item ctree.h:2952
add_all_parents backref.c:487
resolve_indirect_refs+0x3f7/0x7e0   backref.c:575
find_parent_nodes+0x42d/0x1290  backref.c:1236
? find_parent_nodes+0x5/0x1290  backref.c:1114
btrfs_find_all_roots_safe+0x98/0x100backref.c:1414
btrfs_find_all_roots+0x52/0x70  backref.c:1442
btrfs_qgroup_trace_extent_post+0x27/0x60qgroup.c:1503
btrfs_qgroup_trace_leaf_items+0x104/0x130   qgroup.c:1589
btrfs_qgroup_trace_subtree+0x26a/0x3a0  qgroup.c:1750
do_walk_down+0x33c/0x5a0extent-tree.c:8883
walk_down_tree+0xa8/0xd0extent-tree.c:9041
btrfs_drop_snapshot+0x370/0x8b0 extent-tree.c:9203
merge_reloc_roots+0xcf/0x220
btrfs_recover_relocation+0x26d/0x400
? btrfs_cleanup_fs_roots+0x16a/0x180
btrfs_remount+0x32e/0x510
do_remount_sb+0x67/0x1e0
do_mount+0x712/0xc90

The mount is looping in btrfs_qgroup_trace_subtree, as evidenced by
the following ftrace filter:
fileserver:/sys/kernel/tracing# cat set_ftrace_filter
btrfs_qgroup_trace_extent
btrfs_qgroup_trace_subtree

# cat trace
...
   mount-6803  [003]  80407.649752:
btrfs_qgroup_trace_extent <-btrfs_qgroup_trace_subtree
   mount-6803  [003]  80407.649772:
btrfs_qgroup_trace_extent <-btrfs_qgroup_trace_leaf_items
   mount-6803  [003]  80407.649797:
btrfs_qgroup_trace_extent <-btrfs_qgroup_trace_leaf_items
   mount-6803  [003]  80407.649821:
btrfs_qgroup_trace_extent <-btrfs_qgroup_trace_leaf_items
   mount-6803  [003]  80407.649846:
btrfs_qgroup_trace_extent <-btrfs_qgroup_trace_leaf_items
   mount-6803  [003]  80407.701652:
btrfs_qgroup_trace_extent <-btrfs_qgroup_trace_leaf_items
   mount-6803  [003]  80407.754547:
btrfs_qgroup_trace_extent <-btrfs_qgroup_trace_leaf_items
   mount-6803  [003]  80407.754574:
btrfs_qgroup_trace_extent <-btrfs_qgroup_trace_leaf_items
   mount-6803  [003]  80407.754598:
btrfs_qgroup_trace_extent <-btrfs_qgroup_trace_leaf_items
   mount-6803  [003]  80407.754622:
btrfs_qgroup_trace_extent <-btrfs_qgroup_trace_leaf_items
   mount-6803  [003]  80407.754646:
btrfs_qgroup_trace_extent <-btrfs_qgroup_trace_leaf_items

... repeats 240 times

   mount-6803  [002]  80412.568804:
btrfs_qgroup_trace_extent <-btrfs_qgroup_trace_leaf_items
   mount-6803  [002]  80412.568825:
btrfs_qgroup_trace_extent <-btrfs_qgroup_trace_leaf_items
   mount-6803  [002]  80412.568850:
btrfs_qgroup_trace_extent <-btrfs_qgroup_trace_subtree
   mount-6803  [002]  80412.568872:
btrfs_qgroup_trace_extent <-btrfs_qgroup_trace_leaf_items

Looks like invocations of btrfs_qgroup_trace_subtree are taking forever:

   mount-6803  [006]  80641.627709:
btrfs_qgroup_trace_subtree <-do_walk_down
   mount-6803  [003]  81433.760945:
btrfs_qgroup_trace_subtree <-do_walk_down
(add do_walk_down to the trace here)
   mount-6803  [001]  82124.623557: do_walk_down <-walk_down_tree
   mount-6803  [001]  82124.623567:
btrfs_qgroup_trace_subtree <-do_walk_down
   mount-6803  [006]  82695.241306: do_walk_down <-walk_down_tree
   mount-6803  [006]  82695.241316:
btrfs_qgroup_trace_subtree <-do_walk_down

So 10-13 minutes per cycle.

> 11T, with highly deduped usage is really the worst scenario case for qgroup.
> Qgroup is not really good at handle hight reflinked files, nor balance.
> When they combines, it goes worse.

I'm not really understanding the use-case of qgroup if it melts down
on large systems with a shared base + individual changes.

> I'll add a new rescue subcommand, 'btrfs rescue disable-quota' for you
> to disable quota offline.

Ok.  I was looking at just doing this to speed things up:

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 51b5e2da708c..c5bf937b79f0 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -8877,7 +8877,7 @@ static noinline int do_walk_down(struct
btrfs_trans_handle *trans,
parent = 0;
}

-   if (need_account) {
+   if (0) {
ret 

Re: Mount stalls indefinitely after enabling quota groups.

2018-08-11 Thread Dan Merillat
19 hours later, still going extremely slowly and taking longer and
longer for progress made.  Main symptom is the mount process is
spinning at 100% CPU, interspersed with btrfs-transaction spinning at
100% CPU.
So far it's racked up 14h45m of CPU time on mount and an additional
3h40m on btrfs-transaction.

The current drop key changes every 10-15 minutes when I check it via
inspect-internal, so some progress is slowly being made.

I built the kernel with ftrace to see what's going on internally, this
is the pattern I'm seeing:

   mount-6803  [002] ...1 69023.970964: btrfs_next_old_leaf
<-resolve_indirect_refs
   mount-6803  [002] ...1 69023.970965: btrfs_release_path
<-btrfs_next_old_leaf
   mount-6803  [002] ...1 69023.970965: btrfs_search_slot
<-btrfs_next_old_leaf
   mount-6803  [002] ...1 69023.970966:
btrfs_clear_path_blocking <-btrfs_search_slot
   mount-6803  [002] ...1 69023.970966:
btrfs_set_path_blocking <-btrfs_clear_path_blocking
   mount-6803  [002] ...1 69023.970967: btrfs_bin_search
<-btrfs_search_slot
   mount-6803  [002] ...1 69023.970967: btrfs_comp_cpu_keys
<-generic_bin_search.constprop.14
   mount-6803  [002] ...1 69023.970967: btrfs_get_token_64
<-read_block_for_search.isra.12
   mount-6803  [002] ...1 69023.970968: btrfs_get_token_64
<-read_block_for_search.isra.12
   mount-6803  [002] ...1 69023.970968: btrfs_node_key
<-read_block_for_search.isra.12
   mount-6803  [002] ...1 69023.970969: btrfs_buffer_uptodate
<-read_block_for_search.isra.12
   mount-6803  [002] ...1 69023.970969:
btrfs_clear_path_blocking <-btrfs_search_slot
   mount-6803  [002] ...1 69023.970970:
btrfs_set_path_blocking <-btrfs_clear_path_blocking
   mount-6803  [002] ...1 69023.970970: btrfs_bin_search
<-btrfs_search_slot
   mount-6803  [002] ...1 69023.970970: btrfs_comp_cpu_keys
<-generic_bin_search.constprop.14
   mount-6803  [002] ...1 69023.970971: btrfs_comp_cpu_keys
<-generic_bin_search.constprop.14
   mount-6803  [002] ...1 69023.970971: btrfs_comp_cpu_keys
<-generic_bin_search.constprop.14
   mount-6803  [002] ...1 69023.970972: btrfs_comp_cpu_keys
<-generic_bin_search.constprop.14
   mount-6803  [002] ...1 69023.970972: btrfs_comp_cpu_keys
<-generic_bin_search.constprop.14
   mount-6803  [002] ...1 69023.970973: btrfs_comp_cpu_keys
<-generic_bin_search.constprop.14
   mount-6803  [002] ...1 69023.970973: btrfs_comp_cpu_keys
<-generic_bin_search.constprop.14
   mount-6803  [002] ...1 69023.970973: btrfs_comp_cpu_keys
<-generic_bin_search.constprop.14
   mount-6803  [002] ...1 69023.970974: btrfs_comp_cpu_keys
<-generic_bin_search.constprop.14
   mount-6803  [002] ...1 69023.970974: btrfs_get_token_64
<-read_block_for_search.isra.12
   mount-6803  [002] ...1 69023.970975: btrfs_get_token_64
<-read_block_for_search.isra.12
   mount-6803  [002] ...1 69023.970975: btrfs_node_key
<-read_block_for_search.isra.12
   mount-6803  [002] ...1 69023.970976: btrfs_buffer_uptodate
<-read_block_for_search.isra.12
   mount-6803  [002] ...1 69023.970976:
btrfs_clear_path_blocking <-btrfs_search_slot
   mount-6803  [002] ...1 69023.970976:
btrfs_set_path_blocking <-btrfs_clear_path_blocking
   mount-6803  [002] ...1 69023.970977: btrfs_bin_search
<-btrfs_search_slot
   mount-6803  [002] ...1 69023.970977: btrfs_comp_cpu_keys
<-generic_bin_search.constprop.14
   mount-6803  [002] ...1 69023.970978: btrfs_comp_cpu_keys
<-generic_bin_search.constprop.14
   mount-6803  [002] ...1 69023.970978: btrfs_comp_cpu_keys
<-generic_bin_search.constprop.14
   mount-6803  [002] ...1 69023.970978: btrfs_comp_cpu_keys
<-generic_bin_search.constprop.14
   mount-6803  [002] ...1 69023.970979: btrfs_comp_cpu_keys
<-generic_bin_search.constprop.14
   mount-6803  [002] ...1 69023.970979: btrfs_comp_cpu_keys
<-generic_bin_search.constprop.14
   mount-6803  [002] ...1 69023.970980: btrfs_comp_cpu_keys
<-generic_bin_search.constprop.14
   mount-6803  [002] ...1 69023.970980: btrfs_comp_cpu_keys
<-generic_bin_search.constprop.14
   mount-6803  [002] ...1 69023.970980: btrfs_comp_cpu_keys
<-generic_bin_search.constprop.14
   mount-6803  [002] ...1 69023.970981: btrfs_get_token_64
<-read_block_for_search.isra.12
   mount-6803  [002] ...1 69023.970981: btrfs_get_token_64
<-read_block_for_search.isra.12
   mount-6803  [002] ...1 69023.970982: btrfs_node_key
<-read_block_for_search.isra.12
   mount-6803  [002] ...1 69023.970982: btrfs_buffer_uptodate
<-read_block_for_search.isra.12
   mount-6803  [002] ...1 69023.970983:
btrfs_clear_path_blocking <-btrfs_search_slot
   mount-6803  [002] ...1 69023.970983:
btrfs_set_path_blocking <-btrfs_clear_path_blocking
   mount-6803  [002] ...1 

Re: Mount stalls indefinitely after enabling quota groups.

2018-08-10 Thread Dan Merillat
On Fri, Aug 10, 2018 at 6:51 AM, Qu Wenruo  wrote:
>
>
> On 8/10/18 6:42 PM, Dan Merillat wrote:
>> On Fri, Aug 10, 2018 at 6:05 AM, Qu Wenruo  wrote:
>
> But considering your amount of block groups, mount itself may take some
> time (before trying to resume balance).

I'd believe it, a clean mount took 2-3 minutes normally.

btrfs check ran out of RAM eventually so I killed it and went on to
trying to mount again.

readonly mounted pretty quickly, so I'm just letting -o remount,rw
spin for however long it needs to.  Readonly access is fine over the
weekend, and hopefully it will be done by monday.

To be clear, what exactly am I watching with dump-tree to monitor
forward progress?

Thanks again for the help!


Re: Mount stalls indefinitely after enabling quota groups.

2018-08-10 Thread Dan Merillat
On Fri, Aug 10, 2018 at 6:05 AM, Qu Wenruo  wrote:

>
> Although not sure about the details, but the fs looks pretty huge.
> Tons of subvolume and its free space cache inodes.

11TB, 3 or so subvolumes and two snapshots I think.  Not particularly
large for NAS.

> But only 3 tree reloc trees, unless you have tons of reflinked files
> (off-line deduped), it shouldn't cause a lot of problem.

There's going to be a ton of reflinked files.  Both cp --reflink and
via the wholefile dedup.

I freed up ~1/2 TB last month doing dedup.

> At least, we have some progress dropping tree reloc tree for subvolume 6482.

Is there a way to get an idea of how much work is left to be done on
the reloc tree?  Can I walk it
with btrfs-inspect?

dump-tree -t TREE_RELOC is quite enormous (13+ million lines before I gave up)

> If you check the dump-tree output for the following data, the "drop key"
> should change during mount: (inspect dump-tree can be run mounted)
> item 175 key (TREE_RELOC ROOT_ITEM 6482) itemoff 8271 itemsize 439
> 
> drop key (2769795 EXTENT_DATA 12665933824) level 2
>  ^
>
> So for the worst case scenario, there is some way to determine whether
> it's processing.

I'll keep an eye on that.

> And according to the level (3), which is not small for each subvolume, I
> doubt that's the reason why it's so slow.
>
> BTW, for last skip_balance mount, is there any kernel message like
> "balance: resume skipped"?

No, the only reference to balance in kern.log is a hung
btrfs_cancel_balance from the first reboot.

> Have you tried mount the fs readonly with skip_balance? And then remount
> rw, still with skip_balance?

No, every operation takes a long time.  It's still running the btrfs
check, although I'm
going to cancel it and try mount -o ro,skip_balance before I go to
sleep and see where it is tomorrow.

Thank you for taking the time to help me with this.


Re: Mount stalls indefinitely after enabling quota groups.

2018-08-10 Thread Dan Merillat
E: Resending without the 500k attachment.

On Fri, Aug 10, 2018 at 5:13 AM, Qu Wenruo  wrote:
>
>
> On 8/10/18 4:47 PM, Dan Merillat wrote:
>> Unfortunately that doesn't appear to be it, a forced restart and
>> attempted to mount with skip_balance leads to the same thing.
>
> That's strange.
>
> Would you please provide the following output to determine whether we
> have any balance running?
>
> # btrfs inspect dump-super -fFa 

superblock: bytenr=65536, device=/dev/bcache0
-
csum_type0 (crc32c)
csum_size4
csum0xaeff2ec3 [match]
bytenr65536
flags0x1
( WRITTEN )
magic_BHRfS_M [match]
fsid16adc029-64c5-45ff-8114-e2f5b2f2d331
labelMEDIA
generation4584957
root33947648
sys_array_size129
chunk_root_generation4534813
root_level1
chunk_root13681127653376
chunk_root_level1
log_root0
log_root_transid0
log_root_level0
total_bytes12001954226176
bytes_used11387838865408
sectorsize4096
nodesize16384
leafsize (deprecated)16384
stripesize4096
root_dir6
num_devices1
compat_flags0x0
compat_ro_flags0x0
incompat_flags0x169
( MIXED_BACKREF |
  COMPRESS_LZO |
  BIG_METADATA |
  EXTENDED_IREF |
  SKINNY_METADATA )
cache_generation4584957
uuid_tree_generation4584925
dev_item.uuidec51cc1f-992a-47a2-b7b2-83af026723fd
dev_item.fsid16adc029-64c5-45ff-8114-e2f5b2f2d331 [match]
dev_item.type0
dev_item.total_bytes12001954226176
dev_item.bytes_used11613258579968
dev_item.io_align4096
dev_item.io_width4096
dev_item.sector_size4096
dev_item.devid1
dev_item.dev_group0
dev_item.seek_speed0
dev_item.bandwidth0
dev_item.generation0
sys_chunk_array[2048]:
item 0 key (FIRST_CHUNK_TREE CHUNK_ITEM 13681127456768)
length 33554432 owner 2 stripe_len 65536 type SYSTEM|DUP
io_align 65536 io_width 65536 sector_size 4096
num_stripes 2 sub_stripes 1
stripe 0 devid 1 offset 353298808832
dev_uuid ec51cc1f-992a-47a2-b7b2-83af026723fd
stripe 1 devid 1 offset 353332363264
dev_uuid ec51cc1f-992a-47a2-b7b2-83af026723fd
backup_roots[4]:
backup 0:
backup_tree_root:3666753175552gen: 4584956level: 1
backup_chunk_root:13681127653376gen: 4534813level: 1
backup_extent_root:3666740674560gen: 4584956level: 2
backup_fs_root:0gen: 0level: 0
backup_dev_root:199376896gen: 4584935level: 1
backup_csum_root:3666753568768gen: 4584956level: 3
backup_total_bytes:12001954226176
backup_bytes_used:11387838865408
backup_num_devices:1

backup 1:
backup_tree_root:33947648gen: 4584957level: 1
backup_chunk_root:13681127653376gen: 4534813level: 1
backup_extent_root:33980416gen: 4584957level: 2
backup_fs_root:0gen: 0level: 0
backup_dev_root:34160640gen: 4584957level: 1
backup_csum_root:34357248gen: 4584957level: 3
backup_total_bytes:12001954226176
backup_bytes_used:11387838865408
backup_num_devices:1

backup 2:
backup_tree_root:3666598461440gen: 4584954level: 1
backup_chunk_root:13681127653376gen: 4534813level: 1
backup_extent_root:3666595233792gen: 4584954level: 2
backup_fs_root:0gen: 0level: 0
backup_dev_root:199376896gen: 4584935level: 1
backup_csum_root:300034304gen: 4584954level: 3
backup_total_bytes:12001954226176
backup_bytes_used:11387838898176
backup_num_devices:1

backup 3:
backup_tree_root:390998272gen: 4584955level: 1
backup_chunk_root:13681127653376gen: 4534813level: 1
backup_extent_root:390293760gen: 4584955level: 2
backup_fs_root:0gen: 0level: 0
backup_dev_root:199376896gen: 4584935level: 1
backup_csum_root:391604480gen: 4584955level: 3
backup_total_bytes:12001954226176
backup_bytes_used:11387838881792
backup_num_devices:1


superblock: bytenr=67108864, device=/dev/bcache0
-
csum_type0 (crc32c)
csum_size4
csum0x0e9e060d [match]
bytenr67108864
flags0x1
( WRITTEN )
magic_BHRfS_M [match]
fsid16adc029-64c5-45f

Re: Mount stalls indefinitely after enabling quota groups.

2018-08-10 Thread Dan Merillat
Unfortunately that doesn't appear to be it, a forced restart and
attempted to mount with skip_balance leads to the same thing.

20 minutes in btrfs-transactio had a large burst of reads then started
spinning the CPU with the disk idle.

Is this recoverable? I could leave it for a day or so if it may make
progress, but if not I'd like to start on other options.

On Fri, Aug 10, 2018 at 3:59 AM, Qu Wenruo  wrote:
>
>
> On 8/10/18 3:40 PM, Dan Merillat wrote:
>> Kernel 4.17.9, 11tb BTRFS device (md-backed, not btrfs raid)
>>
>> I was testing something out and enabled quota groups and started getting
>> 2-5 minute long pauses where a btrfs-transaction thread spun at 100%.
>
> Looks pretty like a running balance and quota.
>
> Would you please try with balance disabled (temporarily) with
> skip_balance mount option to see if it works.
>
> If it works, then either try resume balance, or just cancel the balance.
>
> Nowadays balance is not needed routinely, especially when you still have
> unallocated space and enabled quota.
>
> Thanks,
> Qu
>
>>
>> Post-reboot the mount process spinds at 100% CPU, occasinally yielding
>> to a btrfs-transaction thread at 100% CPU.  The switchover is marked
>> by a burst of disk activity in btrace.
>>
>> Btrace shows all disk activity is returning promptly - no hanging submits.
>>
>> Currently the mount is at 6+ hours.
>>
>> Suggestions on how to go about debugging this?
>>
>


Re: Mount stalls indefinitely after enabling quota groups.

2018-08-10 Thread Dan Merillat
[23084.426006] sysrq: SysRq : Show Blocked State
[23084.426085]   taskPC stack   pid father
[23084.426332] mount   D0  4857   4618 0x0080
[23084.426403] Call Trace:
[23084.426531]  ? __schedule+0x2c3/0x830
[23084.426628]  ? __wake_up_common+0x6f/0x120
[23084.426751]  schedule+0x2d/0x90
[23084.426871]  wait_current_trans+0x98/0xc0
[23084.426953]  ? wait_woken+0x80/0x80
[23084.427058]  start_transaction+0x2e9/0x3e0
[23084.427128]  btrfs_drop_snapshot+0x48c/0x860
[23084.427220]  merge_reloc_roots+0xca/0x210
[23084.427277]  btrfs_recover_relocation+0x290/0x420
[23084.427399]  ? btrfs_cleanup_fs_roots+0x174/0x190
[23084.427533]  open_ctree+0x2158/0x2549
[23084.427592]  ? bdi_register_va.part.2+0x10a/0x1a0
[23084.427652]  btrfs_mount_root+0x678/0x730
[23084.427709]  ? pcpu_next_unpop+0x32/0x40
[23084.427797]  ? pcpu_alloc+0x2f6/0x680
[23084.427884]  ? mount_fs+0x30/0x150
[23084.427939]  ? btrfs_decode_error+0x20/0x20
[23084.427996]  mount_fs+0x30/0x150
[23084.428054]  vfs_kern_mount.part.7+0x4f/0xf0
[23084.428111]  btrfs_mount+0x156/0x8ad
[23084.428167]  ? pcpu_block_update_hint_alloc+0x15e/0x1d0
[23084.428226]  ? pcpu_next_unpop+0x32/0x40
[23084.428282]  ? pcpu_alloc+0x2f6/0x680
[23084.428338]  ? mount_fs+0x30/0x150
[23084.428393]  mount_fs+0x30/0x150
[23084.428450]  vfs_kern_mount.part.7+0x4f/0xf0
[23084.428507]  do_mount+0x5b0/0xc60
[23084.428563]  ksys_mount+0x7b/0xd0
[23084.428618]  __x64_sys_mount+0x1c/0x20
[23084.428676]  do_syscall_64+0x55/0x110
[23084.428734]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[23084.428794] RIP: 0033:0x7efeb90daa1a
[23084.428849] RSP: 002b:7ffcc8b8fee8 EFLAGS: 0206 ORIG_RAX:
00a5
[23084.428925] RAX: ffda RBX: 55d5bef05420 RCX: 7efeb90daa1a
[23084.428987] RDX: 55d5bef05600 RSI: 55d5bef05ab0 RDI: 55d5bef05b70
[23084.429048] RBP:  R08: 55d5bef08e40 R09: 003f
[23084.429109] R10: c0ed R11: 0206 R12: 55d5bef05b70
[23084.429170] R13: 55d5bef05600 R14:  R15: 


On Fri, Aug 10, 2018 at 3:40 AM, Dan Merillat  wrote:
> Kernel 4.17.9, 11tb BTRFS device (md-backed, not btrfs raid)
>
> I was testing something out and enabled quota groups and started getting
> 2-5 minute long pauses where a btrfs-transaction thread spun at 100%.
>
> Post-reboot the mount process spinds at 100% CPU, occasinally yielding
> to a btrfs-transaction thread at 100% CPU.  The switchover is marked
> by a burst of disk activity in btrace.
>
> Btrace shows all disk activity is returning promptly - no hanging submits.
>
> Currently the mount is at 6+ hours.
>
> Suggestions on how to go about debugging this?


Mount stalls indefinitely after enabling quota groups.

2018-08-10 Thread Dan Merillat
Kernel 4.17.9, 11tb BTRFS device (md-backed, not btrfs raid)

I was testing something out and enabled quota groups and started getting
2-5 minute long pauses where a btrfs-transaction thread spun at 100%.

Post-reboot the mount process spinds at 100% CPU, occasinally yielding
to a btrfs-transaction thread at 100% CPU.  The switchover is marked
by a burst of disk activity in btrace.

Btrace shows all disk activity is returning promptly - no hanging submits.

Currently the mount is at 6+ hours.

Suggestions on how to go about debugging this?


Re: mount time for big filesystems

2017-09-01 Thread Dan Merillat
On Fri, Sep 1, 2017 at 11:20 AM, Austin S. Hemmelgarn
 wrote:
> No, that's not what I'm talking about.  You always get one bcache device per
> backing device, but multiple bcache devices can use the same physical cache
> device (that is, backing devices map 1:1 to bcache devices, but cache
> devices can map 1:N to bcache devices).  So, in other words, the layout I'm
> suggesting looks like this:
>
> This is actually simpler to manage for multiple reasons, and will avoid
> wasting space on the cache device because of random choices made by BTRFS
> when deciding where to read data.

Be careful with bcache - if you lose the SSD and it has dirty data on
it, your entire FS is gone.   I ended up contributing a number of
patches to the recovery tools digging my array out from that.   Even
if a single file is dirty, the new metadata tree will only exist on
the cache device, which doesn't honor barriers writing back to the
underlying storage.   That means it's likely to have a root pointing
at a metadata tree that's no longer there.  The recovery method is
finding an older root that has a complete tree, and recovery-walking
the entire FS from that.

I don't know if dm-cache honors write barriers from the cache to the
backing storage, but I would still recommend using them both in
write-through mode, not write-back.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


btrfs-transaction spins forever on -next-20160818

2016-08-19 Thread Dan Merillat
I tried out -next to test the mm fixes, and immediately upon mounting my
array (11TB, 98% full at the time) the btrfs-transaction thread for it
spun at 100% CPU.

It acted like read-only, write-discarding media - deleted files
reappeared after a reboot every time.  I'm not sure about writes, since
it's running the crashplan backup target service - that resynchronizes,
but I don't know enough about it to see if it complained about writes
vanishing.

I tried multiple reboots before going back to 4.6.7, where everything
worked properly.  4.7 BTRFS works as well, but I was hitting the mm bug
that OOMs improperly under high IO loads.

The topology is 4x4GB drives in md-raid5, bcache'd with a 256gb SSD,
btrfs on the bcache0 block device.

(apologies for the quote, only way to convince Thunderbird to not mangle
log files)

> Aug 19 04:53:22 fileserver kernel: [  605.152050] INFO: task kworker/u4:1:22 
> blocked for more than 120 seconds.
> Aug 19 04:53:22 fileserver kernel: [  605.152097]   Not tainted 
> 4.8.0-rc2-next-20160818 #15
> Aug 19 04:53:22 fileserver kernel: [  605.152138] "echo 0 > 
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Aug 19 04:53:22 fileserver kernel: [  605.152175] kworker/u4:1D 
> 88022666bad8 022  2 0x
> Aug 19 04:53:22 fileserver kernel: [  605.152286] Workqueue: btrfs-submit 
> btrfs_submit_helper
> Aug 19 04:53:22 fileserver kernel: [  605.152357]  88022666bad8 
> 81a65800 880179694040 88022100
> Aug 19 04:53:22 fileserver kernel: [  605.152521]  88022666bae0 
> 88022666c000 880220510ac0 88022666bb10
> Aug 19 04:53:22 fileserver kernel: [  605.152688]  880220510ad8 
> 88022051 88022666baf0 81888bfa
> Aug 19 04:53:22 fileserver kernel: [  605.152851] Call Trace:
> Aug 19 04:53:22 fileserver kernel: [  605.152892]  [] 
> schedule+0x3a/0x90
> Aug 19 04:53:22 fileserver kernel: [  605.152929]  [] 
> rwsem_down_read_failed+0xe5/0x120
> Aug 19 04:53:22 fileserver kernel: [  605.152970]  [] 
> call_rwsem_down_read_failed+0x18/0x30
> Aug 19 04:53:22 fileserver kernel: [  605.153006]  [] 
> down_read+0x12/0x30
> Aug 19 04:53:22 fileserver kernel: [  605.153047]  [] 
> cached_dev_make_request+0x65e/0xd90
> Aug 19 04:53:22 fileserver kernel: [  605.153083]  [] 
> generic_make_request+0xdd/0x190
> Aug 19 04:53:22 fileserver kernel: [  605.153122]  [] 
> submit_bio+0x75/0x140
> Aug 19 04:53:22 fileserver kernel: [  605.153159]  [] ? 
> mempool_free+0x2d/0x90
> Aug 19 04:53:22 fileserver kernel: [  605.153202]  [] ? 
> preempt_count_sub+0x51/0x80
> Aug 19 04:53:22 fileserver kernel: [  605.153240]  [] 
> run_scheduled_bios+0x258/0x580
> Aug 19 04:53:22 fileserver kernel: [  605.153283]  [] ? 
> end_bio_extent_readpage+0x202/0x5b0
> Aug 19 04:53:22 fileserver kernel: [  605.153322]  [] 
> pending_bios_fn+0x10/0x20
> Aug 19 04:53:22 fileserver kernel: [  605.153363]  [] 
> btrfs_scrubparity_helper+0x77/0x340
> Aug 19 04:53:22 fileserver kernel: [  605.153403]  [] 
> btrfs_submit_helper+0x9/0x10
> Aug 19 04:53:22 fileserver kernel: [  605.153446]  [] 
> process_one_work+0x1e0/0x480
> Aug 19 04:53:22 fileserver kernel: [  605.153484]  [] 
> worker_thread+0x43/0x4e0
> Aug 19 04:53:22 fileserver kernel: [  605.153526]  [] ? 
> process_one_work+0x480/0x480
> Aug 19 04:53:22 fileserver kernel: [  605.153564]  [] 
> kthread+0xc4/0xe0
> Aug 19 04:53:22 fileserver kernel: [  605.153607]  [] 
> ret_from_fork+0x1f/0x40
> Aug 19 04:53:22 fileserver kernel: [  605.153644]  [] ? 
> kthread_worker_fn+0x110/0x110
> Aug 19 04:53:22 fileserver kernel: [  605.153701] INFO: task 
> bcache_writebac:972 blocked for more than 120 seconds.
> Aug 19 04:53:22 fileserver kernel: [  605.153740]   Not tainted 
> 4.8.0-rc2-next-20160818 #15
> Aug 19 04:53:22 fileserver kernel: [  605.153780] "echo 0 > 
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Aug 19 04:53:22 fileserver kernel: [  605.153822] bcache_writebac D 
> 880220bc3cd8 0   972  2 0x
> Aug 19 04:53:22 fileserver kernel: [  605.153929]  880220bc3cd8 
> 880220bc3cb0 81083200 8802248642c0
> Aug 19 04:53:22 fileserver kernel: [  605.154100]  88022051 
> 880220bc4000 880220510ac0 880220510ac0
> Aug 19 04:53:22 fileserver kernel: [  605.154274]  880220510ad8 
> 0001 880220bc3cf0 81888bfa
> Aug 19 04:53:22 fileserver kernel: [  605.154443] Call Trace:
> Aug 19 04:53:22 fileserver kernel: [  605.154484]  [] ? 
> finish_task_switch+0x180/0x1d0
> Aug 19 04:53:22 fileserver kernel: [  605.154522]  [] 
> schedule+0x3a/0x90
> Aug 19 04:53:22 fileserver kernel: [  605.154563]  [] 
> rwsem_down_write_failed+0x109/0x280
> Aug 19 04:53:22 fileserver kernel: [  605.154602]  [] 
> call_rwsem_down_write_failed+0x17/0x30
> Aug 19 04:53:22 fileserver kernel: [  605.154644]  [] ? 
> schedule+0x44/0x90
> Aug 19 04:53:22 fileserver kernel: [  605.154681]  [] 
> 

WARNING during btrfs send, kernel 4.1-rc1

2015-04-29 Thread Dan Merillat
Sand  receive from the same machine, from a read-only mount to a 
freshly formatted fs. Aside from the warning everything appears to
be working correctly, but since this is the latest btrfs code it 
needed reporting.

I'll probably have another opportunity to test this again, since I'm 
blowing up my array repeatedly trying to track down a bcache bug.

line number translates to:

/*
 * This is done when we lookup the root, it should already be complete
 * by the time we get here.
 */
WARN_ON(send_root-orphan_cleanup_state != ORPHAN_CLEANUP_DONE);

[  267.379126] [ cut here ]
[  267.379202] WARNING: CPU: 1 PID: 4423 at fs/btrfs/send.c:5699 
btrfs_ioctl_send+0x9d/0xe47()
[  267.379297] Modules linked in: binfmt_misc tun nbd rpcsec_gss_krb5 sit 
ip6table_filter ip6_tables ipt_REJECT nf_reject_ipv4 xt_conntrack xt_multiport 
iptable_filter xt_length xt_mark iptable_mangle iptable_raw ipt_MASQUERADE 
nf_nat_masquerade_ipv4 xt_nat xt_tcpudp iptable_nat nf_conntrack_ipv4 
nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack ip_tables x_tables powernow_k8 
pcspkr serio_raw k8temp i2c_piix4 i2c_core rtc_cmos wmi acpi_cpufreq netconsole 
it87 hwmon_vid ecryptfs ide_pci_generic firewire_ohci firewire_core 
sata_promise atiixp ide_core e100 pata_acpi ohci_pci sg ohci_hcd ehci_pci 
ehci_hcd
[  267.381752] CPU: 1 PID: 4423 Comm: btrfs Not tainted 4.1.0-rc1 #1
[  267.381813] Hardware name: Gigabyte Technology Co., Ltd. 
GA-MA78GPM-DS2H/GA-MA78GPM-DS2H, BIOS F1 06/03/2008
[  267.381880]  0009 8801f613fc18 8169d2be 
8000
[  267.382112]   8801f613fc58 81045577 
8800bbd9a000
[  267.382342]  81306f46 8800c51ee42c 7ffc4b080bc0 
880224699100
[  267.382566] Call Trace:
[  267.382630]  [8169d2be] dump_stack+0x4f/0x7b
[  267.382686]  [81045577] warn_slowpath_common+0x9c/0xb6
[  267.382747]  [81306f46] ? btrfs_ioctl_send+0x9d/0xe47
[  267.382809]  [81045625] warn_slowpath_null+0x15/0x17
[  267.382869]  [81306f46] btrfs_ioctl_send+0x9d/0xe47
[  267.382931]  [816a3c10] ? _raw_spin_unlock_irq+0x17/0x29
[  267.382994]  [816a0d66] ? __schedule+0x6df/0x90e
[  267.383057]  [812d7691] btrfs_ioctl+0x18a/0x2436
[  267.383119]  [81068ccd] ? sched_move_task+0x185/0x194
[  267.383183]  [8137d28c] ? find_next_bit+0x15/0x1b
[  267.383244]  [8106b1ba] ? __enqueue_entity+0x67/0x69
[  267.383306]  [8106d4ab] ? enqueue_task_fair+0xc00/0xcda
[  267.383367]  [810697cc] ? sched_clock_cpu+0x67/0xbc
[  267.383431]  [81322201] ? avc_has_perm+0x96/0xf8
[  267.383495]  [81153d51] do_vfs_ioctl+0x372/0x420
[  267.383558]  [8115bc80] ? __fget+0x6b/0x76
[  267.383619]  [81153e54] SyS_ioctl+0x55/0x7a
[  267.383680]  [816a419b] system_call_fastpath+0x16/0x6e
[  267.383741] ---[ end trace 01d1110aa9307411 ]---

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] btrfs-progs: separate the overwrite check.

2015-04-25 Thread Dan Merillat
It's a good question.  If path_name is absolute, the file descriptor
is ignored.  I used -1 (EBADF) instead of AT_FDCWD  there so if a
non-absolute path gets in there it errors out instead of attempting to
use a relative path off the current directory.

I'm not entirely sure if it's the best way, so if anyone else has
ideas let me know.

On Fri, Apr 24, 2015 at 11:24 AM, David Sterba dste...@suse.cz wrote:
 On Thu, Apr 23, 2015 at 12:51:33PM -0400, Dan Merillat wrote:
 +/* returns:
 + *  0 if the file exists and should be skipped.
 + *  1 if the file does NOT exist
 + *  2 if the file exists but is OK to overwrite
 + */
 +
 +static int overwrite_ok(const char * path)
 +{
 + static int warn = 0;
 + struct stat st;
 + int ret;
 +
 + /* don't be fooled by symlinks */
 + ret = fstatat(-1, path_name, st, AT_SYMLINK_NOFOLLOW);

 Is the filedescriptor -1 correct? Previously, stat was used that uses
 AT_FDCWD for the dirfd, which is -100. -1 could be intepreted as a bad
 filedescriptor (EBADF).
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/3] btrfs-progs: restore symlinks

2015-04-25 Thread Dan Merillat
At this point I I'm done - after writing the symlink patch I restored
everything of importance off my array to a scratch disk, wiped the
array and am in the process of copying everything back.  I'll keep an
eye on this thread if changes need to be made to my patches, but
hopefully I won't be needing btrfs restore for a few more years!


On Fri, Apr 24, 2015 at 12:38 AM, Duncan 1i5t5.dun...@cox.net wrote:
 Dan Merillat posted on Thu, 23 Apr 2015 12:47:29 -0400 as excerpted:

 Hopefully this is sufficiently paranoid, tested with PATH_MAX length
 symlinks, existing files, insufficient permissions, dangling symlinks. I
 think I got the coding style correct this time, I'll fix and resend if
 not.

 Includes a trivial fix from my metadata patch, the documentation got
 lost in the merge.

 Thanks for all this.  I've only had to use restore once and hopefully
 won't be using it again in the near future, but having it restore the
 metadata and symlinks as well would surely have made the experience
 easier.  There's a lot of people going to benefit from these patches over
 time as btrfs gains usage and the inevitable breakage happens to some of
 those filesystems. =:^/

 --
 Duncan - List replies preferred.   No HTML msgs.
 Every nonfree program has a lord, a master --
 and if you use the program, he is your master.  Richard Stallman

 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/3] btrfs-progs: separate the overwrite check.

2015-04-23 Thread Dan Merillat
Symlink restore needs this, but the cutpaste became
too complicated.  Simplify everything.

Signed-off-by: Dan Merillat dan.meril...@gmail.com
---
 cmds-restore.c | 53 ++---
 1 file changed, 34 insertions(+), 19 deletions(-)

diff --git a/cmds-restore.c b/cmds-restore.c
index e877548..8869f2a 100644
--- a/cmds-restore.c
+++ b/cmds-restore.c
@@ -781,6 +781,37 @@ out:
return ret;
 }
 
+/* returns:
+ *  0 if the file exists and should be skipped.
+ *  1 if the file does NOT exist
+ *  2 if the file exists but is OK to overwrite
+ */
+
+static int overwrite_ok(const char * path)
+{
+   static int warn = 0;
+   struct stat st;
+   int ret;
+
+   /* don't be fooled by symlinks */
+   ret = fstatat(-1, path_name, st, AT_SYMLINK_NOFOLLOW);
+
+   if (!ret) {
+   if (overwrite)
+   return 2;
+
+   if (verbose || !warn)
+   printf(Skipping existing file
+   %s\n, path);
+   if (!warn)
+   printf(If you wish to overwrite use 
+  the -o option to overwrite\n);
+   warn = 1;
+   return 0;
+   }
+   return 1;
+}
+
 static int search_dir(struct btrfs_root *root, struct btrfs_key *key,
  const char *output_rootdir, const char *in_dir,
  const regex_t *mreg)
@@ -897,25 +928,9 @@ static int search_dir(struct btrfs_root *root, struct 
btrfs_key *key,
 * files, no symlinks or anything else.
 */
if (type == BTRFS_FT_REG_FILE) {
-   if (!overwrite) {
-   static int warn = 0;
-   struct stat st;
-
-   ret = stat(path_name, st);
-   if (!ret) {
-   loops = 0;
-   if (verbose || !warn)
-   printf(Skipping existing file
-   %s\n, path_name);
-   if (warn)
-   goto next;
-   printf(If you wish to overwrite use 
-  the -o option to overwrite\n);
-   warn = 1;
-   goto next;
-   }
-   ret = 0;
-   }
+   if (!overwrite_ok(path_name))
+   goto next;
+
if (verbose)
printf(Restoring %s\n, path_name);
if (dry_run)
-- 
2.1.4

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/3] btrfs-progs: restore symlinks

2015-04-23 Thread Dan Merillat

Hopefully this is sufficiently paranoid, tested with PATH_MAX length
symlinks, existing files, insufficient permissions, dangling symlinks.
I think I got the coding style correct this time, I'll fix and resend if
not.

Includes a trivial fix from my metadata patch, the documentation got
lost in the merge.

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/3] btrfs-progs: restore: document metadata restore.

2015-04-23 Thread Dan Merillat
This was lost in the cleanup of 71a559

Signed-off-by: Dan Merillat dan.meril...@gmail.com
---
 Documentation/btrfs-restore.asciidoc | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/Documentation/btrfs-restore.asciidoc 
b/Documentation/btrfs-restore.asciidoc
index 20fc366..89e0c87 100644
--- a/Documentation/btrfs-restore.asciidoc
+++ b/Documentation/btrfs-restore.asciidoc
@@ -29,6 +29,9 @@ get snapshots, btrfs restore skips snapshots in default.
 -x::
 get extended attributes.
 
+-m|--metadata::
+restore owner, mode and times.
+
 -v::
 verbose.
 
-- 
2.1.4

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/3] btrfs-progs: optionally restore symlinks.

2015-04-23 Thread Dan Merillat
Restore symlinks, optionally with owner/times.

Signed-off-by: Dan Merillat dan.meril...@gmail.com
---
 Documentation/btrfs-restore.asciidoc |   3 +
 cmds-restore.c   | 140 ++-
 2 files changed, 140 insertions(+), 3 deletions(-)

diff --git a/Documentation/btrfs-restore.asciidoc 
b/Documentation/btrfs-restore.asciidoc
index 89e0c87..06a0498 100644
--- a/Documentation/btrfs-restore.asciidoc
+++ b/Documentation/btrfs-restore.asciidoc
@@ -32,6 +32,9 @@ get extended attributes.
 -m|--metadata::
 restore owner, mode and times.
 
+-S|--symlinks::
+restore symbolic links as well as normal files.
+
 -v::
 verbose.
 
diff --git a/cmds-restore.c b/cmds-restore.c
index 8869f2a..c7a3e96 100644
--- a/cmds-restore.c
+++ b/cmds-restore.c
@@ -45,9 +45,11 @@
 
 static char fs_name[PATH_MAX];
 static char path_name[PATH_MAX];
+static char symlink_target[PATH_MAX];
 static int get_snaps = 0;
 static int verbose = 0;
 static int restore_metadata = 0;
+static int restore_symlinks = 0;
 static int ignore_errors = 0;
 static int overwrite = 0;
 static int get_xattrs = 0;
@@ -812,6 +814,125 @@ static int overwrite_ok(const char * path)
return 1;
 }
 
+static int copy_symlink(struct btrfs_root *root, struct btrfs_key *key,
+const char *file)
+{
+   struct btrfs_path *path;
+   struct extent_buffer *leaf;
+   struct btrfs_file_extent_item *extent_item;
+   struct btrfs_inode_item *inode_item;
+   u32 len;
+   int ret;
+
+   ret = overwrite_ok(path_name);
+   if (ret == 0)
+   return 0; // skip this file.
+
+   if (ret == 2) { // symlink() can't overwrite, so unlink first.
+   ret = unlink(path_name);
+   if (ret) {
+   fprintf(stderr, failed to unlink '%s' for overwrite\n,
+   path_name);
+   return ret;
+   }
+   }
+
+   key-type = BTRFS_EXTENT_DATA_KEY;
+   key-offset = 0;
+
+   path = btrfs_alloc_path();
+   if (!path)
+   return -ENOMEM;
+
+   ret = btrfs_search_slot(NULL, root, key, path, 0, 0);
+   if (ret  0)
+   goto out;
+
+   leaf = path-nodes[0];
+   if (!leaf) {
+   fprintf(stderr, Error getting leaf for symlink '%s'\n, file);
+   ret = -1;
+   goto out;
+   }
+
+   extent_item = btrfs_item_ptr(leaf, path-slots[0],
+   struct btrfs_file_extent_item);
+
+   len = btrfs_file_extent_inline_item_len(leaf,
+   btrfs_item_nr(path-slots[0]));
+   if (len  PATH_MAX) {
+   fprintf(stderr, Symlink '%s' target length %d is longer than 
PATH_MAX\n,
+   fs_name, len);
+   ret = -1;
+   goto out;
+   }
+
+   u32 name_offset = (unsigned long) extent_item
+   + offsetof(struct btrfs_file_extent_item, disk_bytenr);
+   read_extent_buffer(leaf, symlink_target, name_offset, len);
+
+   symlink_target[len] = 0;
+
+   if (!dry_run) {
+   ret = symlink(symlink_target, path_name);
+   if (ret0) {
+   fprintf(stderr, Failed to restore symlink '%s': %s\n,
+   path_name, strerror(errno));
+   goto out;
+   }
+   }
+   printf(SYMLINK: '%s' = '%s'\n, path_name, symlink_target);
+
+   ret = 0;
+   if (!restore_metadata)
+   goto out;
+
+   /* Symlink metadata operates differently than files/directories,
+* so do our own work here.
+*/
+
+   key-type = BTRFS_INODE_ITEM_KEY;
+   key-offset = 0;
+
+   btrfs_release_path(path);
+
+   ret = btrfs_lookup_inode(NULL, root, path, key, 0);
+   if (ret) {
+   fprintf(stderr, Failed to lookup inode for '%s'\n, file);
+   goto out;
+   }
+
+   inode_item = btrfs_item_ptr(path-nodes[0], path-slots[0],
+   struct btrfs_inode_item);
+
+   fchownat(-1, file, btrfs_inode_uid(path-nodes[0], inode_item),
+  btrfs_inode_gid(path-nodes[0], inode_item),
+  AT_SYMLINK_NOFOLLOW);
+   if (ret) {
+   fprintf(stderr, Failed to change owner: %s\n,
+   strerror(errno));
+   goto out;
+   }
+
+   struct btrfs_timespec *bts;
+   struct timespec times[2];
+
+   bts = btrfs_inode_atime(inode_item);
+   times[0].tv_sec  = btrfs_timespec_sec(path-nodes[0], bts);
+   times[0].tv_nsec = btrfs_timespec_nsec(path-nodes[0], bts);
+
+   bts = btrfs_inode_mtime(inode_item);
+   times[1].tv_sec  = btrfs_timespec_sec(path-nodes[0], bts);
+   times[1].tv_nsec = btrfs_timespec_nsec(path-nodes[0], bts);
+
+   ret = utimensat(-1, file, times, AT_SYMLINK_NOFOLLOW

Re: [PATCH v2 1/1] btrfs-progs: optionally restore metadata

2015-04-23 Thread Dan Merillat
On Wed, Apr 22, 2015 at 12:53 PM, David Sterba dste...@suse.cz wrote:
 Applied, thanks.

 In future patches, please stick to the coding style used in progs ([1]),
 I've fixed spacing around =, comments and moved declarations before
 the statements.

 [1] https://www.kernel.org/doc/Documentation/CodingStyle

I'll try to clean it up more next time around.

 @@ -1168,10 +1275,12 @@ int cmd_restore(int argc, char **argv)
   static const struct option long_options[] = {
   { path-regex, 1, NULL, 256},
   { dry-run, 0, NULL, 'D'},
 + { metadata, 0, NULL, 'm'},
 + { debug-regex, 0, NULL, 257},

 This was unused and I've removed it.

That's cruft and I thought I removed it from the patch, sorry.

Got your symlink code, I'll look at that today.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 1/1] btrfs-progs: optionally restore metadata

2015-04-20 Thread Dan Merillat
As long as the inode is intact, the file metadata can be restored.
Directory data is restored at the end of search_dir.  Errors are
checked and returned, unless ignore_errors is requested.

Signed-off-by: Dan Merillat dan.meril...@gmail.com
---
 Documentation/btrfs-restore.txt |   3 ++
 cmds-restore.c  | 114 +++-
 2 files changed, 116 insertions(+), 1 deletion(-)

diff --git a/Documentation/btrfs-restore.txt b/Documentation/btrfs-restore.txt
index 20fc366..a4e4d37 100644
--- a/Documentation/btrfs-restore.txt
+++ b/Documentation/btrfs-restore.txt
@@ -29,6 +29,9 @@ get snapshots, btrfs restore skips snapshots in default.
 -x::
 get extended attributes.
 
+-m|--metadata::
+set owner, permissions, access time and modify time.
+
 -v::
 verbose.
 
diff --git a/cmds-restore.c b/cmds-restore.c
index d2fc951..e95018f 100644
--- a/cmds-restore.c
+++ b/cmds-restore.c
@@ -48,6 +48,7 @@ static char fs_name[4096];
 static char path_name[4096];
 static int get_snaps = 0;
 static int verbose = 0;
+static int restore_metadata = 0;
 static int ignore_errors = 0;
 static int overwrite = 0;
 static int get_xattrs = 0;
@@ -547,6 +548,57 @@ out:
return ret;
 }
 
+static int copy_metadata(struct btrfs_root *root, int fd,
+   struct btrfs_key *key)
+{
+   struct btrfs_path *path;
+   struct btrfs_inode_item *inode_item;
+   int ret;
+
+   path = btrfs_alloc_path();
+   if (!path) {
+   fprintf(stderr, Ran out of memory\n);
+   return -ENOMEM;
+   }
+
+   ret = btrfs_lookup_inode(NULL, root, path, key, 0);
+   if (ret == 0) {
+
+   inode_item = btrfs_item_ptr(path-nodes[0], path-slots[0],
+   struct btrfs_inode_item);
+
+   ret=fchown(fd, btrfs_inode_uid(path-nodes[0], inode_item),
+  btrfs_inode_gid(path-nodes[0], 
inode_item));
+   if (ret) {
+   fprintf(stderr, Failed to change owner: %s\n, 
strerror(errno));
+   goto out;
+   }
+   ret=fchmod(fd, btrfs_inode_mode(path-nodes[0], inode_item));
+   if (ret) {
+   fprintf(stderr, Failed to change mode: %s\n, 
strerror(errno));
+   goto out;
+   }
+   struct btrfs_timespec *bts;
+   struct timespec times[2];
+
+   bts = btrfs_inode_atime(inode_item);
+   times[0].tv_sec=btrfs_timespec_sec(path-nodes[0], bts);
+   times[0].tv_nsec=btrfs_timespec_nsec(path-nodes[0], bts);
+
+   bts = btrfs_inode_mtime(inode_item);
+   times[1].tv_sec=btrfs_timespec_sec(path-nodes[0], bts);
+   times[1].tv_nsec=btrfs_timespec_nsec(path-nodes[0], bts);
+
+   ret=futimens(fd, times);
+   if (ret) {
+   fprintf(stderr, Failed to set times: %s\n, 
strerror(errno));
+   goto out;
+   }
+   }
+out:
+   btrfs_release_path(path);
+   return ret;
+}
 
 static int copy_file(struct btrfs_root *root, int fd, struct btrfs_key *key,
 const char *file)
@@ -555,6 +607,7 @@ static int copy_file(struct btrfs_root *root, int fd, 
struct btrfs_key *key,
struct btrfs_path *path;
struct btrfs_file_extent_item *fi;
struct btrfs_inode_item *inode_item;
+   struct btrfs_timespec *bts;
struct btrfs_key found_key;
int ret;
int extent_type;
@@ -567,12 +620,41 @@ static int copy_file(struct btrfs_root *root, int fd, 
struct btrfs_key *key,
fprintf(stderr, Ran out of memory\n);
return -ENOMEM;
}
+   struct timespec times[2];
+   int times_ok=0;
 
ret = btrfs_lookup_inode(NULL, root, path, key, 0);
if (ret == 0) {
inode_item = btrfs_item_ptr(path-nodes[0], path-slots[0],
struct btrfs_inode_item);
found_size = btrfs_inode_size(path-nodes[0], inode_item);
+
+   if (restore_metadata) {
+   /* change the ownership and mode now, set times when
+* copyout is finished */
+
+   ret=fchown(fd, btrfs_inode_uid(path-nodes[0], 
inode_item),
+  
btrfs_inode_gid(path-nodes[0], inode_item));
+   if (ret  !ignore_errors) {
+   btrfs_release_path(path);
+   return ret;
+   }
+
+   ret=fchmod(fd, btrfs_inode_mode(path-nodes[0], 
inode_item));
+   if (ret  !ignore_errors) {
+   btrfs_release_path(path);
+   return ret;
+   }
+
+   bts = btrfs_inode_atime(inode_item

[PATCH v2 0/1] btrfs-progs: optionally restore metadata

2015-04-20 Thread Dan Merillat
Changes since v1:

* Documented in the manpage
* Added to usage() for btrfs restore
* Made it an optional flag (-m/--restore-metadata)
* Use endian-safe macros to access the on-disk data.
* Restore the proper mtime instead of atime twice.
* Restore owner and mode
* Restore metadata for directories as well as plain files.
* Since it's now explicitly requested, errors are fatal
  unless ignore_errors is requested.

I tested this on the array I'm restoring, it looks sane to me.

Thanks to Noah Massey for the patch review, and Duncan for the
prompt to add owner/permissions to the patch.

Symlinks and hardlinks are beyond the scope of these changes, I'll look
into them if this looks good to everyone.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs-progs: have restore set atime/mtime

2015-04-17 Thread Dan Merillat
On Fri, Apr 17, 2015 at 7:54 AM, Noah Massey noah.mas...@gmail.com wrote:
 On Thu, Apr 16, 2015 at 7:33 PM, Dan Merillat dan.meril...@gmail.com wrote:
 The inode is already found, use the data and make restore friendlier.

 Signed-off-by: Dan Merillat dan.meril...@gmail.com
 ---
  cmds-restore.c | 12 
  1 file changed, 12 insertions(+)

 diff --git a/cmds-restore.c b/cmds-restore.c
 index d2fc951..95ac487 100644
 --- a/cmds-restore.c
 +++ b/cmds-restore.c
 @@ -567,12 +567,22 @@ static int copy_file(struct btrfs_root *root, int
 fd, struct btrfs_key *key,
 fprintf(stderr, Ran out of memory\n);
 return -ENOMEM;
 }
 +   struct timespec times[2];
 +   int times_ok=0;

 ret = btrfs_lookup_inode(NULL, root, path, key, 0);
 if (ret == 0) {
 inode_item = btrfs_item_ptr(path-nodes[0], path-slots[0],
 struct btrfs_inode_item);
 found_size = btrfs_inode_size(path-nodes[0], inode_item);
 +   struct btrfs_timespec bts;
 +   read_eb_member(path-nodes[0], inode_item, struct 
 btrfs_inode_item, atime, bts);
 +   times[0].tv_sec=bts.sec;
 +   times[0].tv_nsec=bts.nsec;
 +   read_eb_member(path-nodes[0], inode_item, struct 
 btrfs_inode_item, atime, bts);

 I think you mean 'mtime' here

I absolutely do, whoops.  This is probably a good time to mention how
much I dislike the fake pointers being used everywhere, coupled with
the partially-implemented macro magic to get fields out of them.  Is
there a good reason why btrfs_item_ptr isn't just a type-pun, with the
understanding that you'll need to copy it to keep it?

 +   if (times_ok)
 +   futimens(fd, times);

 return value isn't checked here.

What could we do with the error if it occurred?  Restoring times is a
nice bonus if it works, but if it gets lost while the data was
restored successfully, that shouldn't be an error condition.  I can
add a comment to that effect to make it clearer why it's being ignored
though, or perhaps something like a warn_once if the filesystem being
restored to doesn't support changing the times.

On the subject of errors - is it possible for read_eb_member to fail
the way I'm using it?  It's defined, but never used anywhere else, so
I have nothing to compare it to.  My feeling is that if btrfs_item_ptr
works the data in the structure returned is going to be there, but I'm
not sure.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs-progs: have restore set atime/mtime

2015-04-16 Thread Dan Merillat
The inode is already found, use the data and make restore friendlier.

Signed-off-by: Dan Merillat dan.meril...@gmail.com
---
 cmds-restore.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/cmds-restore.c b/cmds-restore.c
index d2fc951..95ac487 100644
--- a/cmds-restore.c
+++ b/cmds-restore.c
@@ -567,12 +567,22 @@ static int copy_file(struct btrfs_root *root, int
fd, struct btrfs_key *key,
fprintf(stderr, Ran out of memory\n);
return -ENOMEM;
}
+   struct timespec times[2];
+   int times_ok=0;

ret = btrfs_lookup_inode(NULL, root, path, key, 0);
if (ret == 0) {
inode_item = btrfs_item_ptr(path-nodes[0], path-slots[0],
struct btrfs_inode_item);
found_size = btrfs_inode_size(path-nodes[0], inode_item);
+   struct btrfs_timespec bts;
+   read_eb_member(path-nodes[0], inode_item, struct 
btrfs_inode_item,
atime, bts);
+   times[0].tv_sec=bts.sec;
+   times[0].tv_nsec=bts.nsec;
+   read_eb_member(path-nodes[0], inode_item, struct 
btrfs_inode_item,
atime, bts);
+   times[1].tv_sec=bts.sec;
+   times[1].tv_nsec=bts.nsec;
+   times_ok=1;
}
btrfs_release_path(path);

@@ -680,6 +690,8 @@ set_size:
if (ret)
return ret;
}
+   if (times_ok)
+   futimens(fd, times);
return 0;
 }

-- 
2.1.4

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs-progs: have restore set atime/mtime

2015-04-16 Thread Dan Merillat
That's not a bad idea.  In my case it was all owned by the same user
(media storage) so the only thing of interest was the timestamps.

I can whip up a patch to do that as well.

On Thu, Apr 16, 2015 at 9:09 PM, Duncan 1i5t5.dun...@cox.net wrote:
 Dan Merillat posted on Thu, 16 Apr 2015 19:33:46 -0400 as excerpted:

 The inode is already found, use the data and make restore friendlier.

 Unless things have changed recently, restore doesn't even restore user/
 group ownership, let alone permissions.  IOW, atime/mtime are the least
 of the problem (particularly if people are running noatime as is
 recommended, unless you really need it for some reason).

 It simply creates the files it restores as the owner/group it is run as
 (normally root), using standard umask rules, I believe.

 So if you're going to have it start restoring metadata at all, might as
 well have it do ownership/perms too, if it can.  Otherwise atime/mtime
 are hardly worth bothering with.

 --
 Duncan - List replies preferred.   No HTML msgs.
 Every nonfree program has a lord, a master --
 and if you use the program, he is your master.  Richard Stallman

 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs-progs: have restore set atime/mtime

2015-04-16 Thread Dan Merillat
I think thunderbird ate that patch, sorry.

I didn't make it conditional - there's really no reason to not restore
the information.  I was actually surprised that it didn't restore
before this patch.

If it looks good I'll resend without the word-wrapping.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Recovering BTRFS from bcache failure.

2015-04-09 Thread Dan Merillat
On Tue, Apr 7, 2015 at 11:40 PM, Dan Merillat dan.meril...@gmail.com wrote:
 Bcache failures are nasty, because they leave a mix of old and new
 data on the disk.  In this case, there was very little dirty data, but
 of course the tree roots were dirty and out-of-sync.

 fileserver:/usr/src/btrfs-progs# ./btrfs --version
 Btrfs v3.18.2

 kernel version 3.18

 [  572.573566] BTRFS info (device bcache0): enabling auto recovery
 [  572.573619] BTRFS info (device bcache0): disk space caching is enabled
 [  574.266055] BTRFS (device bcache0): parent transid verify failed on
 7567956930560 wanted 613690 found 613681
 [  574.276952] BTRFS (device bcache0): parent transid verify failed on
 7567956930560 wanted 613690 found 613681
 [  574.277008] BTRFS: failed to read tree root on bcache0
 [  574.277187] BTRFS (device bcache0): parent transid verify failed on
 7567956930560 wanted 613690 found 613681
 [  574.277356] BTRFS (device bcache0): parent transid verify failed on
 7567956930560 wanted 613690 found 613681
 [  574.277398] BTRFS: failed to read tree root on bcache0
 [  574.285955] BTRFS (device bcache0): parent transid verify failed on
 7567965720576 wanted 613689 found 613694
 [  574.298741] BTRFS (device bcache0): parent transid verify failed on
 7567965720576 wanted 613689 found 610499
 [  574.298804] BTRFS: failed to read tree root on bcache0
 [  575.047079] BTRFS (device bcache0): bad tree block start 0 7567954464768
 [  575.111495] BTRFS (device bcache0): parent transid verify failed on
 7567954464768 wanted 613688 found 613685
 [  575.111559] BTRFS: failed to read tree root on bcache0
 [  575.121749] BTRFS (device bcache0): bad tree block start 0 7567954214912
 [  575.131803] BTRFS (device bcache0): parent transid verify failed on
 7567954214912 wanted 613687 found 613680
 [  575.131866] BTRFS: failed to read tree root on bcache0
 [  575.180101] BTRFS: open_ctree failed

 all the btrfs tools throw up their hands with similar errors:
 ileserver:/usr/src/btrfs-progs# btrfs restore /dev/bcache0 -l
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 Ignoring transid failure
 Couldn't setup extent tree
 Couldn't setup device tree
 Could not open root, trying backup super
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 Ignoring transid failure
 Couldn't setup extent tree
 Couldn't setup device tree
 Could not open root, trying backup super
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 Ignoring transid failure
 Couldn't setup extent tree
 Couldn't setup device tree
 Could not open root, trying backup super


 fileserver:/usr/src/btrfs-progs# ./btrfsck --repair /dev/bcache0
 --init-extent-tree
 enabling repair mode
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 Ignoring transid failure
 Couldn't setup extent tree
 Couldn't setup device tree
 Couldn't open file system

 Annoyingly:
 # ./btrfs-image -c9 -t4 -s -w /dev/bcache0 /tmp/test.out
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 Ignoring transid failure
 Couldn't setup extent tree
 Open ctree failed
 create failed (Success)

 So I can't even send an image for people to look at.

CCing some more people on this one, while this filesystem isn't
important I'd like to know that restore from backup isn't the only
option for BTRFS corruption.  All of the tools simply throw up their
hands and bail when confronted with this filesystem, even btrfs-image.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Recovering BTRFS from bcache failure.

2015-04-08 Thread Dan Merillat
It's a known bug with bcache and enabling discard, it was discarding
sections containing data it wanted.  After a reboot bcache refused to
accept the cache data, and of course it was dirty because I'm frankly
too stupid to breathe sometimes.

So yes, it's a bcache issue, but that's unresolvable.  I'm trying to
rescue the btrfs data that it trashed.


On Wed, Apr 8, 2015 at 2:27 PM, Cameron Berkenpas c...@neo-zeon.de wrote:
 Hello,

 I had some luck in the past with btrfs restore using the -r option. I don't
 recall how I determined the roots... Maybe I tried random numbers? I was
 able to recover nearly all of my data from a bcache related crash from over
 a year ago.

 What kind of bcache failure did you see? I've been doing some testing
 recently and ran into 2 bcache failures. With both of these failures, I had
 a ' bad btree header at bucket' error message (which is entirely different
 from the crash I had over a year back). I'm currently trying a different SSD
 to see if that alleviates the issue. The error makes me think that it's a
 bcache specific issue that's unrelated to btrfs or possibly (in my case) an
 issue with the previous SSD.

 Did you encounter this same error?

 With my 2 most recent crashes, I didn't try to recover very hard (or even
 try 'btrfs recover; at all) as I've been taking daily backups. I did try
 btrfsck, and not only would it fail, it would segfault.

 -Cameron


 On 04/08/2015 11:07 AM, Dan Merillat wrote:

 Any ideas on where to start with this?  I did flush the cache out to
 disk before I made changes to the bcache configuration, so there
 shouldn't be anything completely missing, just some bits of stale
 metadata.  If I can get the tools to take the closest match and run
 with it it would probably recover nearly everything.

 At worst, is there a way to scan the metadata blocks and rebuild from
 found extent-trees?




 On Tue, Apr 7, 2015 at 11:40 PM, Dan Merillat dan.meril...@gmail.com
 wrote:

 Bcache failures are nasty, because they leave a mix of old and new
 data on the disk.  In this case, there was very little dirty data, but
 of course the tree roots were dirty and out-of-sync.

 fileserver:/usr/src/btrfs-progs# ./btrfs --version
 Btrfs v3.18.2

 kernel version 3.18

 [  572.573566] BTRFS info (device bcache0): enabling auto recovery
 [  572.573619] BTRFS info (device bcache0): disk space caching is enabled
 [  574.266055] BTRFS (device bcache0): parent transid verify failed on
 7567956930560 wanted 613690 found 613681
 [  574.276952] BTRFS (device bcache0): parent transid verify failed on
 7567956930560 wanted 613690 found 613681
 [  574.277008] BTRFS: failed to read tree root on bcache0
 [  574.277187] BTRFS (device bcache0): parent transid verify failed on
 7567956930560 wanted 613690 found 613681
 [  574.277356] BTRFS (device bcache0): parent transid verify failed on
 7567956930560 wanted 613690 found 613681
 [  574.277398] BTRFS: failed to read tree root on bcache0
 [  574.285955] BTRFS (device bcache0): parent transid verify failed on
 7567965720576 wanted 613689 found 613694
 [  574.298741] BTRFS (device bcache0): parent transid verify failed on
 7567965720576 wanted 613689 found 610499
 [  574.298804] BTRFS: failed to read tree root on bcache0
 [  575.047079] BTRFS (device bcache0): bad tree block start 0 7567954464768
 [  575.111495] BTRFS (device bcache0): parent transid verify failed on
 7567954464768 wanted 613688 found 613685
 [  575.111559] BTRFS: failed to read tree root on bcache0
 [  575.121749] BTRFS (device bcache0): bad tree block start 0 7567954214912
 [  575.131803] BTRFS (device bcache0): parent transid verify failed on
 7567954214912 wanted 613687 found 613680
 [  575.131866] BTRFS: failed to read tree root on bcache0
 [  575.180101] BTRFS: open_ctree failed

 all the btrfs tools throw up their hands with similar errors:
 ileserver:/usr/src/btrfs-progs# btrfs restore /dev/bcache0 -l
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 Ignoring transid failure
 Couldn't setup extent tree
 Couldn't setup device tree
 Could not open root, trying backup super
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 Ignoring transid failure
 Couldn't setup extent tree
 Couldn't setup device tree
 Could not open root, trying backup super
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 parent

Re: Recovering BTRFS from bcache failure.

2015-04-08 Thread Dan Merillat
Sorry I pressed send before I finished my thoughts.

btrfs restore gets nowhere with any options.  btrfs-recover says the
superblocks are fine, and chunk recover does nothing after a few hours
of reading.

Everything else bails out with the errors I listed above.

On Wed, Apr 8, 2015 at 2:36 PM, Dan Merillat dan.meril...@gmail.com wrote:
 It's a known bug with bcache and enabling discard, it was discarding
 sections containing data it wanted.  After a reboot bcache refused to
 accept the cache data, and of course it was dirty because I'm frankly
 too stupid to breathe sometimes.

 So yes, it's a bcache issue, but that's unresolvable.  I'm trying to
 rescue the btrfs data that it trashed.


 On Wed, Apr 8, 2015 at 2:27 PM, Cameron Berkenpas c...@neo-zeon.de wrote:
 Hello,

 I had some luck in the past with btrfs restore using the -r option. I don't
 recall how I determined the roots... Maybe I tried random numbers? I was
 able to recover nearly all of my data from a bcache related crash from over
 a year ago.

 What kind of bcache failure did you see? I've been doing some testing
 recently and ran into 2 bcache failures. With both of these failures, I had
 a ' bad btree header at bucket' error message (which is entirely different
 from the crash I had over a year back). I'm currently trying a different SSD
 to see if that alleviates the issue. The error makes me think that it's a
 bcache specific issue that's unrelated to btrfs or possibly (in my case) an
 issue with the previous SSD.

 Did you encounter this same error?

 With my 2 most recent crashes, I didn't try to recover very hard (or even
 try 'btrfs recover; at all) as I've been taking daily backups. I did try
 btrfsck, and not only would it fail, it would segfault.

 -Cameron


 On 04/08/2015 11:07 AM, Dan Merillat wrote:

 Any ideas on where to start with this?  I did flush the cache out to
 disk before I made changes to the bcache configuration, so there
 shouldn't be anything completely missing, just some bits of stale
 metadata.  If I can get the tools to take the closest match and run
 with it it would probably recover nearly everything.

 At worst, is there a way to scan the metadata blocks and rebuild from
 found extent-trees?




 On Tue, Apr 7, 2015 at 11:40 PM, Dan Merillat dan.meril...@gmail.com
 wrote:

 Bcache failures are nasty, because they leave a mix of old and new
 data on the disk.  In this case, there was very little dirty data, but
 of course the tree roots were dirty and out-of-sync.

 fileserver:/usr/src/btrfs-progs# ./btrfs --version
 Btrfs v3.18.2

 kernel version 3.18

 [  572.573566] BTRFS info (device bcache0): enabling auto recovery
 [  572.573619] BTRFS info (device bcache0): disk space caching is enabled
 [  574.266055] BTRFS (device bcache0): parent transid verify failed on
 7567956930560 wanted 613690 found 613681
 [  574.276952] BTRFS (device bcache0): parent transid verify failed on
 7567956930560 wanted 613690 found 613681
 [  574.277008] BTRFS: failed to read tree root on bcache0
 [  574.277187] BTRFS (device bcache0): parent transid verify failed on
 7567956930560 wanted 613690 found 613681
 [  574.277356] BTRFS (device bcache0): parent transid verify failed on
 7567956930560 wanted 613690 found 613681
 [  574.277398] BTRFS: failed to read tree root on bcache0
 [  574.285955] BTRFS (device bcache0): parent transid verify failed on
 7567965720576 wanted 613689 found 613694
 [  574.298741] BTRFS (device bcache0): parent transid verify failed on
 7567965720576 wanted 613689 found 610499
 [  574.298804] BTRFS: failed to read tree root on bcache0
 [  575.047079] BTRFS (device bcache0): bad tree block start 0 7567954464768
 [  575.111495] BTRFS (device bcache0): parent transid verify failed on
 7567954464768 wanted 613688 found 613685
 [  575.111559] BTRFS: failed to read tree root on bcache0
 [  575.121749] BTRFS (device bcache0): bad tree block start 0 7567954214912
 [  575.131803] BTRFS (device bcache0): parent transid verify failed on
 7567954214912 wanted 613687 found 613680
 [  575.131866] BTRFS: failed to read tree root on bcache0
 [  575.180101] BTRFS: open_ctree failed

 all the btrfs tools throw up their hands with similar errors:
 ileserver:/usr/src/btrfs-progs# btrfs restore /dev/bcache0 -l
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 Ignoring transid failure
 Couldn't setup extent tree
 Couldn't setup device tree
 Could not open root, trying backup super
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 Ignoring

Re: Recovering BTRFS from bcache failure.

2015-04-08 Thread Dan Merillat
Any ideas on where to start with this?  I did flush the cache out to
disk before I made changes to the bcache configuration, so there
shouldn't be anything completely missing, just some bits of stale
metadata.  If I can get the tools to take the closest match and run
with it it would probably recover nearly everything.

At worst, is there a way to scan the metadata blocks and rebuild from
found extent-trees?




On Tue, Apr 7, 2015 at 11:40 PM, Dan Merillat dan.meril...@gmail.com wrote:
 Bcache failures are nasty, because they leave a mix of old and new
 data on the disk.  In this case, there was very little dirty data, but
 of course the tree roots were dirty and out-of-sync.

 fileserver:/usr/src/btrfs-progs# ./btrfs --version
 Btrfs v3.18.2

 kernel version 3.18

 [  572.573566] BTRFS info (device bcache0): enabling auto recovery
 [  572.573619] BTRFS info (device bcache0): disk space caching is enabled
 [  574.266055] BTRFS (device bcache0): parent transid verify failed on
 7567956930560 wanted 613690 found 613681
 [  574.276952] BTRFS (device bcache0): parent transid verify failed on
 7567956930560 wanted 613690 found 613681
 [  574.277008] BTRFS: failed to read tree root on bcache0
 [  574.277187] BTRFS (device bcache0): parent transid verify failed on
 7567956930560 wanted 613690 found 613681
 [  574.277356] BTRFS (device bcache0): parent transid verify failed on
 7567956930560 wanted 613690 found 613681
 [  574.277398] BTRFS: failed to read tree root on bcache0
 [  574.285955] BTRFS (device bcache0): parent transid verify failed on
 7567965720576 wanted 613689 found 613694
 [  574.298741] BTRFS (device bcache0): parent transid verify failed on
 7567965720576 wanted 613689 found 610499
 [  574.298804] BTRFS: failed to read tree root on bcache0
 [  575.047079] BTRFS (device bcache0): bad tree block start 0 7567954464768
 [  575.111495] BTRFS (device bcache0): parent transid verify failed on
 7567954464768 wanted 613688 found 613685
 [  575.111559] BTRFS: failed to read tree root on bcache0
 [  575.121749] BTRFS (device bcache0): bad tree block start 0 7567954214912
 [  575.131803] BTRFS (device bcache0): parent transid verify failed on
 7567954214912 wanted 613687 found 613680
 [  575.131866] BTRFS: failed to read tree root on bcache0
 [  575.180101] BTRFS: open_ctree failed

 all the btrfs tools throw up their hands with similar errors:
 ileserver:/usr/src/btrfs-progs# btrfs restore /dev/bcache0 -l
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 Ignoring transid failure
 Couldn't setup extent tree
 Couldn't setup device tree
 Could not open root, trying backup super
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 Ignoring transid failure
 Couldn't setup extent tree
 Couldn't setup device tree
 Could not open root, trying backup super
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 Ignoring transid failure
 Couldn't setup extent tree
 Couldn't setup device tree
 Could not open root, trying backup super


 fileserver:/usr/src/btrfs-progs# ./btrfsck --repair /dev/bcache0
 --init-extent-tree
 enabling repair mode
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 Ignoring transid failure
 Couldn't setup extent tree
 Couldn't setup device tree
 Couldn't open file system

 Annoyingly:
 # ./btrfs-image -c9 -t4 -s -w /dev/bcache0 /tmp/test.out
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 parent transid verify failed on 7567956930560 wanted 613690 found 613681
 Ignoring transid failure
 Couldn't setup extent tree
 Open ctree failed
 create failed (Success)

 So I can't even send an image for people to look at.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Recovering BTRFS from bcache failure.

2015-04-07 Thread Dan Merillat
Bcache failures are nasty, because they leave a mix of old and new
data on the disk.  In this case, there was very little dirty data, but
of course the tree roots were dirty and out-of-sync.

fileserver:/usr/src/btrfs-progs# ./btrfs --version
Btrfs v3.18.2

kernel version 3.18

[  572.573566] BTRFS info (device bcache0): enabling auto recovery
[  572.573619] BTRFS info (device bcache0): disk space caching is enabled
[  574.266055] BTRFS (device bcache0): parent transid verify failed on
7567956930560 wanted 613690 found 613681
[  574.276952] BTRFS (device bcache0): parent transid verify failed on
7567956930560 wanted 613690 found 613681
[  574.277008] BTRFS: failed to read tree root on bcache0
[  574.277187] BTRFS (device bcache0): parent transid verify failed on
7567956930560 wanted 613690 found 613681
[  574.277356] BTRFS (device bcache0): parent transid verify failed on
7567956930560 wanted 613690 found 613681
[  574.277398] BTRFS: failed to read tree root on bcache0
[  574.285955] BTRFS (device bcache0): parent transid verify failed on
7567965720576 wanted 613689 found 613694
[  574.298741] BTRFS (device bcache0): parent transid verify failed on
7567965720576 wanted 613689 found 610499
[  574.298804] BTRFS: failed to read tree root on bcache0
[  575.047079] BTRFS (device bcache0): bad tree block start 0 7567954464768
[  575.111495] BTRFS (device bcache0): parent transid verify failed on
7567954464768 wanted 613688 found 613685
[  575.111559] BTRFS: failed to read tree root on bcache0
[  575.121749] BTRFS (device bcache0): bad tree block start 0 7567954214912
[  575.131803] BTRFS (device bcache0): parent transid verify failed on
7567954214912 wanted 613687 found 613680
[  575.131866] BTRFS: failed to read tree root on bcache0
[  575.180101] BTRFS: open_ctree failed

all the btrfs tools throw up their hands with similar errors:
ileserver:/usr/src/btrfs-progs# btrfs restore /dev/bcache0 -l
parent transid verify failed on 7567956930560 wanted 613690 found 613681
parent transid verify failed on 7567956930560 wanted 613690 found 613681
parent transid verify failed on 7567956930560 wanted 613690 found 613681
parent transid verify failed on 7567956930560 wanted 613690 found 613681
Ignoring transid failure
Couldn't setup extent tree
Couldn't setup device tree
Could not open root, trying backup super
parent transid verify failed on 7567956930560 wanted 613690 found 613681
parent transid verify failed on 7567956930560 wanted 613690 found 613681
parent transid verify failed on 7567956930560 wanted 613690 found 613681
parent transid verify failed on 7567956930560 wanted 613690 found 613681
Ignoring transid failure
Couldn't setup extent tree
Couldn't setup device tree
Could not open root, trying backup super
parent transid verify failed on 7567956930560 wanted 613690 found 613681
parent transid verify failed on 7567956930560 wanted 613690 found 613681
parent transid verify failed on 7567956930560 wanted 613690 found 613681
parent transid verify failed on 7567956930560 wanted 613690 found 613681
Ignoring transid failure
Couldn't setup extent tree
Couldn't setup device tree
Could not open root, trying backup super


fileserver:/usr/src/btrfs-progs# ./btrfsck --repair /dev/bcache0
--init-extent-tree
enabling repair mode
parent transid verify failed on 7567956930560 wanted 613690 found 613681
parent transid verify failed on 7567956930560 wanted 613690 found 613681
parent transid verify failed on 7567956930560 wanted 613690 found 613681
parent transid verify failed on 7567956930560 wanted 613690 found 613681
Ignoring transid failure
Couldn't setup extent tree
Couldn't setup device tree
Couldn't open file system

Annoyingly:
# ./btrfs-image -c9 -t4 -s -w /dev/bcache0 /tmp/test.out
parent transid verify failed on 7567956930560 wanted 613690 found 613681
parent transid verify failed on 7567956930560 wanted 613690 found 613681
parent transid verify failed on 7567956930560 wanted 613690 found 613681
parent transid verify failed on 7567956930560 wanted 613690 found 613681
Ignoring transid failure
Couldn't setup extent tree
Open ctree failed
create failed (Success)

So I can't even send an image for people to look at.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Btrfs raid1 array has issues with rtorrent usage pattern.

2014-11-01 Thread Dan Merillat
On Thu, Oct 30, 2014 at 3:50 AM, Koen Kooi k...@dominion.thruhere.net wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1

 Dan Merillat schreef op 30-10-14 04:17:
 It's specifically BTRFS related, I was able to reproduce it on a bare
 drive (no lvm, no md, no bcache).  It's not bad RAM, I was able to
 reproduce it on multiple machines running either 3.17 or late RCs.

 I've tested 3.18-rc2 for about 2 hours now, can't get any failures, so
 that's good.  If anyone else can reproduce this it'll probably need to be
 sent to 3.17-stable.

 3.17.2 has a lot of btrfs backports queued[1] already, could you see if the
 fix for your problem is already present?


Sorry about all the top-posting, I dislike the way gmail makes it the default.

Yes, the patches queued for 3.17.2 appear to have fixed it.  I didn't
have time to run a bisection to see where it broke between .16 and
.17, though.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Btrfs raid1 array has issues with rtorrent usage pattern.

2014-10-29 Thread Dan Merillat
I'm in the middle of debugging the exact same thing.  3.17.0 -
rtorrent dies with SIGBUS.

I've done some debugging, the sequence is something like this:
open a new file
fallocate() to the final size
mmap() all (or a portion) of the file
write to the region
run SHA1 on that mmap'd region to validate the chink
crash, eventually.  Generally not at the same point.

Reading that file (cat  /dev/null) returns -EIO.

Looking up the process maps, the SIGBUS appears to be happening in the
middle of a mapped region of a pre-allocated file - I.E. it shouldn't
be.  I'm not completely ruling out a rtorrent bug but it appears sane
to me.

Weirder: old files, that have been around a while, work just fine for seeding.
I've re-hashed my entire collection without an error.

Seeing this on both inherit-COW and no-inherit-COW files, and the
filesystem is not using compression.

The interesting part is going back and attempting to read the files
later they sometimes don't throw an IO error.

Absolutely nothing in dmesg.

Working on a testcase that triggers it reliably but no luck so far.  I
thought I had bad RAM but two people upgrading to 3.17 and seeing the
same bug at around the same time can't be a coincidence.  I rebooted
to 3.17 on the 25th, the first new download was on the 28th and that
failed.

Working on a testcase for it that's more reproducable than go grab
torrent files with rtorrent.

On Tue, Oct 28, 2014 at 12:49 PM, Alec Blayne a...@tevsa.net wrote:
 Hi, it seems that when using rtorrent to download into a btrfs system,
 it leads to the creation of files that fail to read properly.
 For instance, I get rtorrent to crash, but if I try to rsync the file he
 was writting into someplace else, rsync also fails with the message
 can't map file $file: Input/Output error (5).
 If I give it time, eventually the file gets into a good state and I can
 rsync it somewhere else (as long as rtorrent doesn't keep writting into
 it). This doesn't happen using ext4 on the same system.

 No btrfs errors, or any other errors, show up in any log. Scrubbing or
 balancing don't turn up any issues. I've tried using a subvolume mounted
 with nodatacow and/or flushoncommit, which didn't help. I'm not using
 quotas and at some point had a single snapshot that I deleted. The
 filesystem was originally created recently (on a 3.16.4+ kernel).

 Here's what the array looks like:

 Label: 'data'  uuid: ffe83a3d-f4ba-46b7-8424-4ec3380cb811
 Total devices 4 FS bytes used 3.14TiB
 devid4 size 2.73TiB used 2.36TiB path /dev/sdd1
 devid5 size 1.82TiB used 1.45TiB path /dev/sdc1
 devid6 size 1.82TiB used 1.45TiB path /dev/sdb1
 devid7 size 1.82TiB used 1.45TiB path /dev/sda1

 Btrfs v3.17

 Data, RAID1: total=3.34TiB, used=3.13TiB
 System, RAID1: total=32.00MiB, used=512.00KiB
 Metadata, RAID1: total=10.00GiB, used=7.31GiB
 GlobalReserve, single: total=512.00MiB, used=0.00B


 On linux 3.17.1: Linux 3.17.1-gentoo-r1 #3 SMP PREEMPT Tue Oct 28
 02:43:11 WET 2014 x86_64 AMD Athlon(tm) 5350 APU with Radeon(tm) R3
 AuthenticAMD GNU/Linux

 I'm utterly puzzled and clueless at how to dig into this issue.
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Btrfs raid1 array has issues with rtorrent usage pattern.

2014-10-29 Thread Dan Merillat
The following code reliably throws a SIGBUS in the memset, and cat
testfile  /dev/null returns an IO error.

I've sometimes gotten as high as iteration 900 before a SIGBUS, so
don't assume a single clear is OK.

linux 3.17.0, SATA - MD(raid5) - bcache (ssd) - btrfs

Working on eliminating more variables.

#include fcntl.h
#include unistd.h
#include sys/mman.h
#include stdint.h
#include stdlib.h
#include stdio.h
#include string.h

#define MB  (1024ull * 1024)
#define GB  (1024ull * MB)
#define TEST_SIZE   (4096)

int main() {
int fd;
srandom(1024);
fd=open(testfile, O_RDWR|O_CREAT, 0600);
posix_fallocate(fd, 0, TEST_SIZE * MB);

uint8_t * map = 0;

int i;
for(i=0;i1000;i++) {
size_t location=(random() % (TEST_SIZE-1)) * MB;
map = (uint8_t *) mmap(map, MB, PROT_READ|PROT_WRITE,
MAP_SHARED,
fd, location);

printf(%d: writing at %04zd mb\n, i, location);

memset(map, 0x5a, 1 * MB);
msync(map, 1*MB, MS_ASYNC);

munmap(map, MB);
}
}

On Wed, Oct 29, 2014 at 5:50 PM, Dan Merillat dan.meril...@gmail.com wrote:
 I'm in the middle of debugging the exact same thing.  3.17.0 -
 rtorrent dies with SIGBUS.

 I've done some debugging, the sequence is something like this:
 open a new file
 fallocate() to the final size
 mmap() all (or a portion) of the file
 write to the region
 run SHA1 on that mmap'd region to validate the chink
 crash, eventually.  Generally not at the same point.

 Reading that file (cat  /dev/null) returns -EIO.

 Looking up the process maps, the SIGBUS appears to be happening in the
 middle of a mapped region of a pre-allocated file - I.E. it shouldn't
 be.  I'm not completely ruling out a rtorrent bug but it appears sane
 to me.

 Weirder: old files, that have been around a while, work just fine for 
 seeding.
 I've re-hashed my entire collection without an error.

 Seeing this on both inherit-COW and no-inherit-COW files, and the
 filesystem is not using compression.

 The interesting part is going back and attempting to read the files
 later they sometimes don't throw an IO error.

 Absolutely nothing in dmesg.

 Working on a testcase that triggers it reliably but no luck so far.  I
 thought I had bad RAM but two people upgrading to 3.17 and seeing the
 same bug at around the same time can't be a coincidence.  I rebooted
 to 3.17 on the 25th, the first new download was on the 28th and that
 failed.

 Working on a testcase for it that's more reproducable than go grab
 torrent files with rtorrent.

 On Tue, Oct 28, 2014 at 12:49 PM, Alec Blayne a...@tevsa.net wrote:
 Hi, it seems that when using rtorrent to download into a btrfs system,
 it leads to the creation of files that fail to read properly.
 For instance, I get rtorrent to crash, but if I try to rsync the file he
 was writting into someplace else, rsync also fails with the message
 can't map file $file: Input/Output error (5).
 If I give it time, eventually the file gets into a good state and I can
 rsync it somewhere else (as long as rtorrent doesn't keep writting into
 it). This doesn't happen using ext4 on the same system.

 No btrfs errors, or any other errors, show up in any log. Scrubbing or
 balancing don't turn up any issues. I've tried using a subvolume mounted
 with nodatacow and/or flushoncommit, which didn't help. I'm not using
 quotas and at some point had a single snapshot that I deleted. The
 filesystem was originally created recently (on a 3.16.4+ kernel).

 Here's what the array looks like:

 Label: 'data'  uuid: ffe83a3d-f4ba-46b7-8424-4ec3380cb811
 Total devices 4 FS bytes used 3.14TiB
 devid4 size 2.73TiB used 2.36TiB path /dev/sdd1
 devid5 size 1.82TiB used 1.45TiB path /dev/sdc1
 devid6 size 1.82TiB used 1.45TiB path /dev/sdb1
 devid7 size 1.82TiB used 1.45TiB path /dev/sda1

 Btrfs v3.17

 Data, RAID1: total=3.34TiB, used=3.13TiB
 System, RAID1: total=32.00MiB, used=512.00KiB
 Metadata, RAID1: total=10.00GiB, used=7.31GiB
 GlobalReserve, single: total=512.00MiB, used=0.00B


 On linux 3.17.1: Linux 3.17.1-gentoo-r1 #3 SMP PREEMPT Tue Oct 28
 02:43:11 WET 2014 x86_64 AMD Athlon(tm) 5350 APU with Radeon(tm) R3
 AuthenticAMD GNU/Linux

 I'm utterly puzzled and clueless at how to dig into this issue.
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Btrfs raid1 array has issues with rtorrent usage pattern.

2014-10-29 Thread Dan Merillat
It's specifically BTRFS related, I was able to reproduce it on a bare
drive (no lvm, no md, no bcache).  It's not bad RAM, I was able to
reproduce it on multiple machines running either 3.17 or late RCs.

I've tested 3.18-rc2 for about 2 hours now, can't get any failures, so
that's good.  If anyone else can reproduce this it'll probably need to
be sent to 3.17-stable.

On Wed, Oct 29, 2014 at 7:24 PM, Alec Blayne a...@tevsa.net wrote:
 Really nice to know it's already getting handled :)

 I'm already downgrading to 3.16.6 now that I know I won't have that
 issue. I was already planning to because of the read-only snapshots issue.

 Thank you and good luck debugging!

 On 29-10-2014 21:50, Dan Merillat wrote:
 I'm in the middle of debugging the exact same thing.  3.17.0 -
 rtorrent dies with SIGBUS.

 I've done some debugging, the sequence is something like this:
 open a new file
 fallocate() to the final size
 mmap() all (or a portion) of the file
 write to the region
 run SHA1 on that mmap'd region to validate the chink
 crash, eventually.  Generally not at the same point.

 Reading that file (cat  /dev/null) returns -EIO.

 Looking up the process maps, the SIGBUS appears to be happening in the
 middle of a mapped region of a pre-allocated file - I.E. it shouldn't
 be.  I'm not completely ruling out a rtorrent bug but it appears sane
 to me.

 Weirder: old files, that have been around a while, work just fine for 
 seeding.
 I've re-hashed my entire collection without an error.

 Seeing this on both inherit-COW and no-inherit-COW files, and the
 filesystem is not using compression.

 The interesting part is going back and attempting to read the files
 later they sometimes don't throw an IO error.

 Absolutely nothing in dmesg.

 Working on a testcase that triggers it reliably but no luck so far.  I
 thought I had bad RAM but two people upgrading to 3.17 and seeing the
 same bug at around the same time can't be a coincidence.  I rebooted
 to 3.17 on the 25th, the first new download was on the 28th and that
 failed.

 Working on a testcase for it that's more reproducable than go grab
 torrent files with rtorrent.

 On Tue, Oct 28, 2014 at 12:49 PM, Alec Blayne a...@tevsa.net wrote:
 Hi, it seems that when using rtorrent to download into a btrfs system,
 it leads to the creation of files that fail to read properly.
 For instance, I get rtorrent to crash, but if I try to rsync the file he
 was writting into someplace else, rsync also fails with the message
 can't map file $file: Input/Output error (5).
 If I give it time, eventually the file gets into a good state and I can
 rsync it somewhere else (as long as rtorrent doesn't keep writting into
 it). This doesn't happen using ext4 on the same system.

 No btrfs errors, or any other errors, show up in any log. Scrubbing or
 balancing don't turn up any issues. I've tried using a subvolume mounted
 with nodatacow and/or flushoncommit, which didn't help. I'm not using
 quotas and at some point had a single snapshot that I deleted. The
 filesystem was originally created recently (on a 3.16.4+ kernel).

 Here's what the array looks like:

 Label: 'data'  uuid: ffe83a3d-f4ba-46b7-8424-4ec3380cb811
 Total devices 4 FS bytes used 3.14TiB
 devid4 size 2.73TiB used 2.36TiB path /dev/sdd1
 devid5 size 1.82TiB used 1.45TiB path /dev/sdc1
 devid6 size 1.82TiB used 1.45TiB path /dev/sdb1
 devid7 size 1.82TiB used 1.45TiB path /dev/sda1

 Btrfs v3.17

 Data, RAID1: total=3.34TiB, used=3.13TiB
 System, RAID1: total=32.00MiB, used=512.00KiB
 Metadata, RAID1: total=10.00GiB, used=7.31GiB
 GlobalReserve, single: total=512.00MiB, used=0.00B


 On linux 3.17.1: Linux 3.17.1-gentoo-r1 #3 SMP PREEMPT Tue Oct 28
 02:43:11 WET 2014 x86_64 AMD Athlon(tm) 5350 APU with Radeon(tm) R3
 AuthenticAMD GNU/Linux

 I'm utterly puzzled and clueless at how to dig into this issue.
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 3.16 Managed to ENOSPC with 80% used

2014-09-25 Thread Dan Merillat
On Wed, Sep 24, 2014 at 6:23 PM, Holger Hoffstätte
holger.hoffstae...@googlemail.com wrote:

 Basically it's been data allocation happy, since I haven't deleted
 53GB at any point.  Unfortunately, none of the chunks are at 0% usage
 so a balance -dusage=0 finds nothing to drop.

 Also try -musage=0..10, just for fun.

Tried a few of them.  When it's completely wedged, balance with any
usage above zero won't work, because it needs one allocatable group to
move to.   I'm not sure if it was needing a new data chunk to merge
partials into, or if it thought it needed more metadata space to write
out the changes.  (Metadata was also only 75% used).

 Is this recoverable, or do I need to copy to another disk and back?

 Another neat trick that will free up space is to convert to single
 metadata: -mconvert=single -f (to force). A subsequent balance
 with -musage=0..10 will likely free up quite some space.

Deleting files or dropping snapshots is difficult when it's wedged as
well, a lot of disk activity (journal thrash?) and no persistent
progress - a reboot brigs the deleted files back.  I eventually
managed to empty a single data chunk and after that it was a trivial
recovery.

 That particular workload seems to cause the block allocator to go
 on a spending spree; you're not the first to see this.

I could see normal-user usage patterns getting ignored, but this is
the patterns of the people working on BTRFS.   Maybe they need to
remove their balance cronjobs for a while. :)
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


3.16 Managed to ENOSPC with 80% used

2014-09-24 Thread Dan Merillat
Any idea how to recover?  I can't cut-paste but it's
Total devices 1 FS bytes used 176.22GiB
size 233.59GiB used 233.59GiB

Basically it's been data allocation happy, since I haven't deleted
53GB at any point.  Unfortunately, none of the chunks are at 0% usage
so a balance -dusage=0 finds nothing to drop.

Attempting a balance with -dusage=25 instantly dies with ENOSPC, since
100% of space is allocated.

Is this recoverable, or do I need to copy to another disk and back?
This is a really unfortunate failure mode for BTRFS.  Usually I catch
it before I get exactly 100% used and can use a balance to get it back
into shape.

What causes it to keep allocating datablocks when it's got so much
free space?  The workload is pretty standard (for devs, at least): git
and kernel builds, and git and android builds.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Rapid memory exhaustion during normal operation

2014-01-25 Thread Dan Merillat
I'm trying to track this down - this started happening without changing the 
kernel in use, so probably
a corrupted filesystem. The symptoms are that all memory is suddenly used by no 
apparent source.  OOM
killer is invoked on every task, still can't free up enough memory to continue.

When it goes wrong, it's extremely rapid - system goes from stable to dead in 
less than 30 seconds.

Tested 3.9.0, 3.12.0, 3.12.8.   Limited testing on 3.13 shows I think the same 
problem but I need
to double-check that it's not a different issue.  Blows up the exact same way 
on a real kernel or in
UML.

All sorts of things can trigger it - defrag, random writes to files.  Balance 
and scrub don't,
readonly mount doesn't.

I can reproduce this trivially, mount the filesystem read-write and perform 
some activity.  It only
takes a few minutes.   The other btrfs filesystems on the same machine don't 
show similar problems.
Unfortunately, the output of btrfs-image -c9 is 75gb, much more than I can 
reasonably share.  I've got
a reliable reproducer in UML using UML-COW to always start with the same base 
image, defrag a file with
33,000 extents and the system explodes within a minute.

Here's the OOM report, the formatting is a bit off due to being delivered via 
netconsole.
Swap was disabled on this run, but it makes no difference.  I get insta-OOM 
issues out of the blue
with very little memory swapped out.

[ 1184.871419] parent transid verify failed on 8049834639360 wanted 1736567 
found 1734749
[ 1184.879873] parent transid verify failed on 8049834639360 wanted 1736567 
found 1734749
[ 1184.894932] parent transid verify failed on 8049834639360 wanted 1736567 
found 1734749
[ 1184.898207] parent transid verify failed on 8049834639360 wanted 1736567 
found 1734749
[ 1184.902116] parent transid verify failed on 8049834639360 wanted 1736567 
found 1734749
[ 1184.902454] parent transid verify failed on 8049834639360 wanted 1736567 
found 1734749
[ 1184.90] parent transid verify failed on 8049834639360 wanted 1736567 
found 1734749
[ 1184.903588] parent transid verify failed on 8049834639360 wanted 1736567 
found 1734749
[ 1184.904592] parent transid verify failed on 8049834639360 wanted 1736567 
found 1734749
[ 1184.904839] parent transid verify failed on 8049834639360 wanted 1736567 
found 1734749
[ 1192.113082] verify_parent_transid: 16 callbacks suppressed
[ 1192.113166] parent transid verify failed on 8049835315200 wanted 1736567 
found 1736533
[ 1192.113269] parent transid verify failed on 8049835315200 wanted 1736567 
found 1736533
[ 1192.176637] parent transid verify failed on 8049835315200 wanted 1736567 
found 1736533
[ 1192.178119] parent transid verify failed on 8049835315200 wanted 1736567 
found 1736533
[ 1192.203369] parent transid verify failed on 8049835315200 wanted 1736567 
found 1736533
[ 1192.203503] parent transid verify failed on 8049835315200 wanted 1736567 
found 1736533
[ 1192.204112] parent transid verify failed on 8049835315200 wanted 1736567 
found 1736533
[ 1192.205324] parent transid verify failed on 8049835315200 wanted 1736567 
found 1736533
[ 1192.814465] parent transid verify failed on 8049835315200 wanted 1736567 
found 1736533
[ 1192.817226] parent transid verify failed on 8049835315200 wanted 1736567 
found 1736533
[ 1219.366168] ntpd invoked oom-killer: gfp_mask=0x201da, order=0, 
oom_score_adj=0
[ 1219.366270] CPU: 1 PID: 5479 Comm: ntpd Not tainted 3.12.8-00848-g97f15f1 #2
[ 1219.366324] Hardware name: Gigabyte Technology Co., Ltd. 
GA-MA78GPM-DS2H/GA-MA78GPM-DS2H, BIOS F1 06/03/2008
[ 1219.366402]   8800c02339a8 815ccf3b 
3f51a67e
[ 1219.366632]  8800c557ae40 8800c0233a48 815c8551 
0100
[ 1219.366861]  0001 8800c02339e8 815d4f46 
000ef3e4
[ 1219.367086] Call Trace:
[ 1219.367155]  [815ccf3b] dump_stack+0x50/0x85
[ 1219.367262]  [815c8551] dump_header.isra.14+0x6d/0x1b5
[ 1219.367322]  [815d4f46] ? sub_preempt_count+0x33/0x46
[ 1219.367390]  [815d1b9d] ? _raw_spin_unlock_irqrestore+0x2b/0x48
[ 1219.367448]  [8132849a] ? ___ratelimit+0xda/0xf8
[ 1219.367514]  [810cf773] oom_kill_process+0x70/0x303
[ 1219.367614]  [81041930] ? has_capability_noaudit+0x12/0x16
[ 1219.367672]  [810cfe91] out_of_memory+0x314/0x347
[ 1219.367734]  [810d3ee3] __alloc_pages_nodemask+0x629/0x7c8
[ 1219.367798]  [811052db] alloc_pages_current+0xb2/0xbb
[ 1219.367852]  [810cd36e] __page_cache_alloc+0xb/0xd
[ 1219.367915]  [810ceb9a] filemap_fault+0x249/0x362
[ 1219.367973]  [810eb378] __do_fault+0xa7/0x418
[ 1219.368071]  [815d1b9d] ? _raw_spin_unlock_irqrestore+0x2b/0x48
[ 1219.368130]  [810606c4] ? get_parent_ip+0xe/0x3e
[ 1219.368184]  [810eed47] handle_mm_fault+0x2b4/0x907
[ 1219.368239]  [815d1a93] ? _raw_spin_unlock_irq+0x17/0x32
[ 1219.368297]  [815d4dc4] 

filesystem stuck RO after losing a device

2013-04-05 Thread Dan Merillat
first off: this was just junk data, and is all readable in degraded
mode anyway.

Label: 'ROOT'  uuid: cc80d150-af98-4af4-bc68-c8df352bda4f
Total devices 2 FS bytes used 138.00GB
devid1 size 232.79GB used 189.04GB path /dev/sdc2
devid3 size 232.89GB used 14.06GB path /dev/sdb

The filesystem was created in 3.6 or so, and abandoned when I moved to
a SSD as my main root.  Playing around with it, I added a raw disk to
it and did some IO but wanted that disk back.  Due to the automatic
upgrade to 'dup' when adding a second device, I couldn't do a btrfs
dev delete so I ended up just unmounting it and reformatting /dev/sdb
as a backup for my SSD.

Given the 'dup' profile, I should be able to just blow away the stub
of sdb and continue using sdc, but I can't figure out any way to get
it to allow that.

$ uname -a
$ uname -a
Linux wolf 3.8.0-rc5-dan #3 SMP PREEMPT Tue Jan 29 00:55:14 EST 2013
x86_64 GNU/Linux
(I forgot to remove extraversion, git is clean 3.8-rc5)

$ sudo mount -o degraded /dev/sdc2 /mnt/t2
mount: wrong fs type, bad option, bad superblock on /dev/sdc2,
$ dmesg | tail
[1648243.075565] device label ROOT devid 1 transid 15051 /dev/sdc2
[1648243.076531] btrfs: allowing degraded mounts
[1648243.076539] btrfs: disk space caching is enabled
[1648243.891735] Btrfs: too many missing devices, writeable mount is not allowed
[1648243.898122] btrfs: open_ctree failed

$ sudo mount -o degraded,ro /dev/sdc2 /mnt/t2
$ dmesg | tail

[1648331.898660] device label ROOT devid 1 transid 15051 /dev/sdc2
[1648331.900371] btrfs: allowing degraded mounts
[1648331.900380] btrfs: disk space caching is enabled

$ sudo btrfs dev del missing /mnt/t2
ERROR: error removing the device 'missing' - Read-only file system
$ sudo btrfs dev add /dev/loop0 /mnt/t2
ERROR: error adding the device '/dev/loop0' - Read-only file system

$ sudo umount /mnt/t2
$ sudo ./btrfsck --repair /dev/sdc2
[sudo] password for harik:
enabling repair mode
ERROR: device scan failed '/dev/sdb' - Device or resource busy
ERROR: device scan failed '/dev/sdb' - Device or resource busy
Check tree block failed, want=211559927808, have=0
Check tree block failed, want=211559927808, have=0
Check tree block failed, want=211644346368, have=3611932269563901032
Check tree block failed, want=211644346368, have=3611932269563901032
Check tree block failed, want=211559563264, have=70368744177680
Check tree block failed, want=211559563264, have=70368744177680
Check tree block failed, want=211641229312, have=2308722807962755443
Check tree block failed, want=211641229312, have=2308722807962755443
Check tree block failed, want=211640909824, have=651398145056990559
Check tree block failed, want=211640909824, have=651398145056990559
Checking filesystem on /dev/sdc2
UUID: cc80d150-af98-4af4-bc68-c8df352bda4f
checking extents
Check tree block failed, want=212375867392, have=3431074926722403215
thousands of these
Check tree block failed, want=211559571456, have=15880152022637367237
checking root refs
btrfsck: extent-tree.c:2553: btrfs_reserve_extent: Assertion `!(ret)' failed.

btrfsck is from btrfs-progs master, g7854c8b66

So I can't mount RW because I only have one active disk, I can't add a
new one, and I can't remove the missing disk.  This seems somewhat
awkward, if using 2-disk BTRFS and a drive dies, how do you replace 
recover?
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: filesystem stuck RO after losing a device

2013-04-05 Thread Dan Merillat
On Fri, Apr 5, 2013 at 7:43 PM, Dan Merillat dan.meril...@gmail.com wrote:

 first off: this was just junk data, and is all readable in degraded
 mode anyway.

 Label: 'ROOT'  uuid: cc80d150-af98-4af4-bc68-c8df352bda4f
 Total devices 2 FS bytes used 138.00GB
 devid1 size 232.79GB used 189.04GB path /dev/sdc2
 devid3 size 232.89GB used 14.06GB path /dev/sdb

 The filesystem was created in 3.6 or so, and abandoned when I moved to
 a SSD as my main root.  Playing around with it, I added a raw disk to
 it and did some IO but wanted that disk back.  Due to the automatic
 upgrade to 'dup' when adding a second device, I couldn't do a btrfs
 dev delete so I ended up just unmounting it and reformatting /dev/sdb
 as a backup for my SSD.

 Given the 'dup' profile, I should be able to just blow away the stub
 of sdb and continue using sdc, but I can't figure out any way to get
 it to allow that.

$ btrfs fi df /mnt/t2
Data: total=173.01GB, used=136.76GB
System, DUP: total=40.00MB, used=32.00KB
System: total=4.00MB, used=0.00
Metadata, RAID1: total=14.00GB, used=504.24MB
Metadata, DUP: total=1.00GB, used=756.14MB
Metadata: total=8.00MB, used=0.00

So the problem is I ended up with DUP profile instead of RAID1.  Is
there any way to force it to mount RW and update that? (or update
offline?)

That's a usability issue, actually - adding a disk to a single makes
it so that you can't fail gracefully unless you know to run a balance
-draid1 -mraid1.  Which I know, after looking up why this failed.
Unfortunately, I can't recover from this.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Segregate metadata to SSD?

2012-09-02 Thread Dan Merillat
Is it possible to weight the allocations of data/system/metadata so
that data goes on large, slow drives while system/metadata goes on a
fast SSD?  I don't have exact numbers, but I'd guess a vast majority
of seeks during operation are lookups of tiny bits of data, while data
readswrites are done in much larger chunks.

Obviously a database load would be a different balance, but for most
systems it would seem to be a rather vast improvement.

Data: total=5625880576k (5.24TB), used=5455806964k (5.08TB)
System, DUP: total=32768k (32.00MB), used=724k (724.00KB)
System: total=4096k (4.00MB), used=0k (0.00)
Metadata, DUP: total=117291008k (111.86GB), used=13509540k (12.88GB)

Out of my nearly 6tb setup I could trivially accelerate the whole
thing with a 128mb SSD.

On a side note,  that's a nearly 10:1 metadata overusage and I've
never had more than 3
snapshots at a given time - current, rollback1, rollback2 - I think it
grew that large during a
rebalance.  Aside from that, I could get away with a tiny 64gb SSD.


pretty_sizes was too granular to use in monitoring scripts, so:
diff --git a/cmds-filesystem.c b/cmds-filesystem.c
index b1457de..dc5fea6 100644
--- a/cmds-filesystem.c
+++ b/cmds-filesystem.c
@@ -145,8 +145,9 @@ static int cmd_df(int argc, char **argv)

total_bytes = pretty_sizes(sargs-spaces[i].total_bytes);
used_bytes = pretty_sizes(sargs-spaces[i].used_bytes);
-   printf(%s: total=%s, used=%s\n, description, total_bytes,
-  used_bytes);
+   printf(%s: total=%ldk (%s), used=%ldk (%s)\n, description,
+   sargs-spaces[i].total_bytes/1024, total_bytes,
+   sargs-spaces[i].used_bytes/1024, used_bytes);
}
free(sargs);
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [GIT PULL] Btrfs pull request

2011-11-08 Thread Dan Merillat
On Sun, Nov 6, 2011 at 1:38 PM, Chris Mason chris.ma...@oracle.com wrote:
 Hi everyone,

 This pull request is pretty beefy, it ended up merging a number of long
 running projects and cleanup queues.  I've got btrfs patches in the new
 kernel.org btrfs repo.  There are two different branches with the same
 changes.  for-linus is against 3.1 and has also been tested against
 Linus' tree as of yesterday.

[91795.123286] device label ROOT devid 1 transid 3331 /dev/sdi2
[91795.123538] btrfs: open_ctree failed

FS created on 3.1 (x64), mounted once on 3.2-rc1 (i386), got that
whenI tried to mount on 3.1 (x64) again.  Format change in 3.2 or
32/64 bit compatibility issues?
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [GIT PULL] Btrfs pull request

2011-11-08 Thread Dan Merillat
On Tue, Nov 8, 2011 at 3:17 PM, Chris Mason chris.ma...@oracle.com wrote:
 On Tue, Nov 08, 2011 at 01:27:28PM -0500, Chris Mason wrote:
 On Tue, Nov 08, 2011 at 12:55:40PM -0500, Dan Merillat wrote:
  On Sun, Nov 6, 2011 at 1:38 PM, Chris Mason chris.ma...@oracle.com wrote:
   Hi everyone,
  
   This pull request is pretty beefy, it ended up merging a number of long
   running projects and cleanup queues.  I've got btrfs patches in the new
   kernel.org btrfs repo.  There are two different branches with the same
   changes.  for-linus is against 3.1 and has also been tested against
   Linus' tree as of yesterday.
 
  [91795.123286] device label ROOT devid 1 transid 3331 /dev/sdi2
  [91795.123538] btrfs: open_ctree failed
 
  FS created on 3.1 (x64), mounted once on 3.2-rc1 (i386), got that
  whenI tried to mount on 3.1 (x64) again.  Format change in 3.2 or
  32/64 bit compatibility issues?

 I'm trying to reproduce right now but I did many bounces between 3.2 and
 3.1 code before releasing.  I didn't try jumping between 32 and 64 bit.

 Are there any other messages in dmesg?  Could you please see what
 btrfs-debug-tree says?

 Ok, so I spun the wheel going between 32 and 64 and 3.1 and 3.2.  I'm
 not having trouble with basic tests.

 So, we'll have to dig in and see why the open is failing.  btrfsck or
 btrfs-debug-tree will help.

This is on a USB device, however I had used the filesystem quite a bit
on the 64bit machine before moving it to the 32bit 3.2 box.   It's
still mountable on the 32bit box even when I get the open_ctree failed
on 3.1

[140865.425067] device label ROOT devid 1 transid 3436 /dev/sdi2
[140865.426291] btrfs: open_ctree failed
harik@fileserver:~/src/3.0/3.2-rc1$ sudo btrfsck /dev/sdi2
[sudo] password for harik:
found 3105894400 bytes used err is 0
total csum bytes: 2916272
total tree bytes: 119631872
total fs tree bytes: 109928448
btree space waste bytes: 33045213
file data blocks allocated: 5391962112
 referenced 2984988672
Btrfs Btrfs v0.19

http://dl.dropbox.com/u/1071112/btrfs-debug-tree.sdi2.bz2

Exact kernel that won't mount is linus 3.1 +
Author: David Sterba dste...@suse.cz
Date:   Wed Aug 3 11:08:02 2011 -0700
btrfs: allow cross-subvolume file clone

Author: Li Zefan l...@cn.fujitsu.com
Date:   Fri Sep 2 15:56:25 2011 +0800

Btrfs: fix defragmentation regression
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/4] Btrfs: fix defragmentation regression

2011-10-18 Thread Dan Merillat
On Fri, Sep 2, 2011 at 4:42 AM, Christoph Hellwig h...@infradead.org wrote:
 On Fri, Sep 02, 2011 at 03:56:25PM +0800, Li Zefan wrote:
 There's an off-by-one bug:

   # create a file with lots of 4K file extents
   # btrfs fi defrag /mnt/file
   # sync
   # filefrag -v /mnt/file
   Filesystem type is: 9123683e
   File size of /mnt/file is 1228800 (300 blocks, blocksize 4096)
    ext logical physical expected length flags
      0       0     3372              64
      1      64     3136     3435      1
      2      65     3436     3136     64
      3     129     3201     3499      1
      4     130     3500     3201     64
      5     194     3266     3563      1
      6     195     3564     3266     64
      7     259     3331     3627      1
      8     260     3628     3331     40 eof

 After this patch:

 Can you please create an xfstests testcase for this?

Did this fix get lost?  I don't see it in git, and defragmenting a
file still results in 10x as many fragments as it started with.
(3.1-rc9)
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: fixing slow sync(2)

2011-10-12 Thread Dan Merillat
On Sat, Oct 8, 2011 at 11:35 AM, Josef Bacik jo...@redhat.com wrote:

 I think I fixed this, try my git tree

 git://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-work.git

I wanted to Ack this as well - 3.1-rc4 was completely unusable when
firefox was running (30+ second pauses to read directories, btrfs
threads were constantly running, even the mouse was jerky due to the
load)

Built from your tree (fa5cf66) and everything works like it should
again.  No load, fast response to IO requests.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] xfstests 255: add a seek_data/seek_hole tester

2011-08-30 Thread Dan Merillat
On Tue, Aug 30, 2011 at 11:29 PM, Dave Chinner da...@fromorbit.com wrote:
 On Tue, Aug 30, 2011 at 06:17:02PM -0700, Sunil Mushran wrote:
 Instead
 we should let the fs weigh the cost of providing accurate information
 with the possible gain in performance.

 Data:
 A range in a file that could contain something other than nulls.
 If in doubt, it is data.

 Hole:
 A range in a file that only contains nulls.

 And that's -exactly- the ambiguous, vague definition that has raised
 all these questions in the first place. I was in doubt about whether
 unwritten extents can be considered a hole, and by your definition
 that means it should be data. But Andreas seems to be in no doubt it
 should be considered a hole.

That's fine, though.   Different filesystems have different abilities
to recognize a data hole - FAT can't do it at all.   Perhaps the
requirements would be better stated in reverse:  If the filesystem
knows that a read() will return nulls (for whatever reason based on
it's internal knowledge), it can report a hole.  If it can't guarantee
that, it's data.   It's an absolute requirement that SEEK_DATA never
miss data.  SEEK_HOLE working is a nicety that userspace would
appreciate - remember that the consumer here is cp(1), using it to
skip empty portions of files and create sparse destination files.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: processes stuck in llseek

2011-08-19 Thread Dan Merillat
 Here it is.

 http://marc.info/?l=linux-btrfsm=131176036219732w=2

That was it, thanks.   Confirmed fixed.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC, crash][PATCH] btrfs: allow cross-subvolume file clone

2011-08-11 Thread Dan Merillat
On Tue, Aug 9, 2011 at 1:50 PM, David Sterba d...@jikos.cz wrote:
 On Thu, Aug 04, 2011 at 09:19:26AM +0800, Miao Xie wrote:
  the patch has been applied on top of current linus which contains patches 
  from
  both pull requests (ed8f37370d83).

 I think it is because the caller didn't reserve enough space.Could you try to
 apply the following patch? It might fix this bug.

 [PATCH v2] Btrfs: reserve enough space for file clone
 http://marc.info/?l=linux-btrfsm=131192686626576w=2

 Thanks! Yes, it does not crash anymore. Trees reflinked succesfully,
 md5sums verified.

This isn't a cross-subvolume problem, I hit the same bug trying to
reflink a pile of files within the same subvolume.   I applied the
above patch and retried and it worked correctly.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html