Re: how long should btrfs device delete missing ... take?

2014-09-12 Thread Duncan
Chris Murphy posted on Thu, 11 Sep 2014 20:10:26 -0600 as excerpted:

 Sure. But what's the next step? Given 260+ snapshots might mean well
 more than 350GB of data, depending on how deduplicated the fs is, it
 still probably would be faster to rsync this to a pile of drives in
 linear/concat+XFS than wait a month (?) for device delete to finish.

That was what I was getting at in my other just-finished short reply.  It 
may be time to give up on the btrfs specific solutions for the moment and 
go with tried and tested traditional solutions (tho I'd definitely *NOT* 
try rsync or the like with the delete still going, we know from other 
reports that rsync places its own stresses on btrfs and one major 
stressor, the delete-triggered rebalance, at a time, is bad enough).

 Alternatively, script some way to create 260+ ro snapshots to btrfs
 send/receive to a new btrfs volume; and turn it into a raid1 later.

No confirmation yet but I strongly suspect most of those subs are 
snapshots.  Assuming that's the case, it's very likely most of them can 
simply be eliminated as I originally suggested, a process that /should/ 
be fast, decomplexifying the situation dramatically.

 I'm curious if a sysrq+s followed by sysrq+u might leave the filessystem
 in a state where it could still be rw mountable. But I'm skeptical of
 anything interrupting the device delete before being fully prepared for
 the fs to be toast for rw mount. If only ro mount is possible, any
 chance of creating ro snapshots is out.

In theory, that is, barring bugs, interrupting the delete with normal 
shutdown to the extent possible, then sysrq+s, sysrq+u, should not be a 
problem.  The delete is basically a balance, going chunk by chunk, and 
either the chunk has been duplicated to the new device or it hasn't.  In 
either case, the existing chunk on the remaining old device shouldn't be 
affected.

So rebooting in that way in ordered to stop the delete temporarily 
/should/ have no bad effects.  Of course, that's barring bugs.  Btrfs is 
still not fully stabilized, and bugs do happen, so anything's possible.  
But I'd consider it safe enough to try here, certainly so if I had 
backups, as is still STRONGLY recommended for btrfs at this point, much 
more so than the routine sysadmin if it's not backed up by definition 
it's not valuable to you rule.

-- 
Duncan - List replies preferred.   No HTML msgs.
Every nonfree program has a lord, a master --
and if you use the program, he is your master.  Richard Stallman

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: how long should btrfs device delete missing ... take?

2014-09-12 Thread Chris Murphy

On Sep 11, 2014, at 11:19 PM, Russell Coker russ...@coker.com.au wrote:

 It would be nice if a file system mounted ro counted as ro snapshots for 
 btrfs send.
 
 When a file system is so messed up it can't be mounted rw it should be 
 regarded as ro for all operations.

Yes it's come up before, and there's a question whether mount -o ro is reliably 
ro enough for this. Maybe a force option?

But then another one is a recursive btrfs send to go along with the above. I 
might want them all, or I might want all of the ones in two particular 
subvolumes, etc. And even combine the recursive ro snapshot and recursive send 
as a btrfs rescue option that would work even if the volume is mounted 
read-only.


Chris Murphy--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


how long should btrfs device delete missing ... take?

2014-09-11 Thread Tomasz Chmielewski
After a disk died and was replaced, btrfs device delete missing is 
taking more than 10 days on an otherwise idle server:


# btrfs fi show /home
Label: none  uuid: 84d087aa-3a32-46da-844f-a233237cf04f
Total devices 3 FS bytes used 362.44GiB
devid2 size 1.71TiB used 365.03GiB path /dev/sdb4
devid3 size 1.71TiB used 58.00GiB path /dev/sda4
*** Some devices missing

Btrfs v3.16



So far, it has copied 58 GB out of 365 GB - and it took 10 days. At this 
speed, the whole operation will take 2-3 months (assuming that the only 
healthy disk doesn't die in the meantime).

Is this expected time for btrfs RAID-1?

There are no errors in dmesg/smart, performance of both disks is fine:

# hdparm -t /dev/sda /dev/sdb

/dev/sda:
 Timing buffered disk reads: 442 MB in  3.01 seconds = 146.99 MB/sec

/dev/sdb:
 Timing buffered disk reads: 402 MB in  3.39 seconds = 118.47 MB/sec


# btrfs fi df /home
Data, RAID1: total=352.00GiB, used=351.02GiB
System, RAID1: total=32.00MiB, used=96.00KiB
Metadata, RAID1: total=13.00GiB, used=11.38GiB
unknown, single: total=512.00MiB, used=67.05MiB

# btrfs sub list /home | wc -l
260

# uptime
 17:21:53 up 10 days,  6:01,  2 users,  load average: 3.22, 3.53, 3.55


I've tried running this on the latest 3.16.x kernel earlier, but since 
the progress was so slow, rebooted after about a week to see if the 
latest RC will be any faster.



--
Tomasz Chmielewski
http://www.sslrack.com



--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: how long should btrfs device delete missing ... take?

2014-09-11 Thread Duncan
Tomasz Chmielewski posted on Thu, 11 Sep 2014 17:22:15 +0200 as excerpted:

 After a disk died and was replaced, btrfs device delete missing is 
 taking more than 10 days on an otherwise idle server:
 
 # btrfs fi show /home
 Label: none  uuid: 84d087aa-3a32-46da-844f-a233237cf04f
  Total devices 3 FS bytes used 362.44GiB
  devid2 size 1.71TiB used 365.03GiB path /dev/sdb4
  devid3 size 1.71TiB used 58.00GiB path /dev/sda4
  *** Some devices missing
 
 Btrfs v3.16
 
 So far, it has copied 58 GB out of 365 GB - and it took 10 days. At this 
 speed, the whole operation will take 2-3 months (assuming that the only 
 healthy disk doesn't die in the meantime).
 Is this expected time for btrfs RAID-1?

Device delete definitely takes time.  For the sub-GiB usage shown above,
10 days for 50 GiB out of 350+ does seem excessive, but there are extreme
cases where it isn't entirely out of line.  See below.

 There are no errors in dmesg/smart, performance of both disks is fine:

 # btrfs sub list /home | wc -l
 260

 I've tried running this on the latest 3.16.x kernel earlier, but since 
 the progress was so slow, rebooted after about a week to see if the 
 latest RC will be any faster.

The good thing is that once a block group is copied over, it should be
fine and won't need re-copied if the process is stopped over a reboot
and restarted on a new kernel, etc.

The bad thing is that if I'm interpreting your report correctly, that
likely means 7+10=17 days for that 58 gig. =:^(

Questions:

* Presumably most of those 260 subvolumes are snapshots, correct?
What was your snapshotting schedule and did you have old snapshot
cleanup-deletion scheduled as well?

* Do you run with autodefrag or was the system otherwise regularly
defragged?

* Do you have large (GiB plus) database or virtual machine image files
on that filesystem?  If so, had you properly set the NOCOW file
attribute (chattr +C) on them and were they on dedicated subvolumes?


200+ snapshots is somewhat high and could be part of the issue, tho
it's nothing like the extremes (thousands) we've seen posted in the
past.  Were it me, I'd have tried deleting as many as possible before
the device delete missing, in ordered to simplify the process and
eliminate as much extra data as possible.

The real issue is going to be fragmentation, on spinning-rust drives.
Run filefrag on some of your gig-plus files that get written to
frequently (vm-images and database files are the classic cases) and
see how many extents are reported.  (Tho note that filefrag doesn't
understand btrfs compression and won't be accurate in that case, and
also that due to the btrfs data chunk size of 1 GiB, that's the
maximum extent size, so multi-gig files will typically be two extents
more than the number of gigs, filling up the current chunk,
N whole-gig chunks, the file tail.)  The nocow file attribute (which
must be set while the file is zero-sized to be effective, see
discussion elsewhere) can help with that, but snapshotting an
actively being rewritten nocow file more or less defeats the
purpose of nocow, since the snapshot locks in place the existing
data and the first rewrite to a block must then be cowed anyway.
But putting those files on dedicated subvolumes and then not
snapshotting those subvolumes is a workaround.


I wouldn't try defragging now, but it might be worthwhile to stop the
device delete (rebooting to do so since I don't think there's a cancel)
and delete as many snapshots as possible.  That should help matters.
Additionally, if you have recent backups of highly fragmented files
such as the VM-images and DBs I mentioned, you might consider simply
deleting them, thus eliminating that fragment processing from the
device delete.  I don't know that making a backup now would go much
faster than the device delete, however, so I don't know whether to
recommend that or not.

-- 
Duncan - List replies preferred.   No HTML msgs.
Every nonfree program has a lord, a master --
and if you use the program, he is your master.  Richard Stallman

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: how long should btrfs device delete missing ... take?

2014-09-11 Thread Chris Murphy

On Sep 11, 2014, at 1:31 PM, Duncan 1i5t5.dun...@cox.net wrote:
 
 I wouldn't try defragging now, but it might be worthwhile to stop the
 device delete (rebooting to do so since I don't think there's a cancel)

'btrfs replace cancel' does exist, although I haven't tried it.

Something isn't right though, because it's clearly neither reading nor writing 
at anywhere close to 1/2 the drive read throughput. I'm curious what 'iotop 
-d30 -o' shows (during the replace, before cancel), which should be pretty 
consistent by averaging 30 seconds worth of io. And then try 'iotop -d3 -o' and 
see if there are spikes. I'm willing to bet there's a lot of nothing going on, 
with occasional spikes, rather than a constant trickle. 

And then the question is to find out what btrfs is thinking about while nothing 
is reading or writing. Even though it's not 5000+ snapshots, I wonder if the 
balance code (and hence btrfs replace) makes extensive use of fiemap that's 
causing this to go catatonic. 
http://comments.gmane.org/gmane.comp.file-systems.btrfs/35724

Chris Murphy

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: how long should btrfs device delete missing ... take?

2014-09-11 Thread Tomasz Chmielewski

After a disk died and was replaced, btrfs device delete missing is
taking more than 10 days on an otherwise idle server:


Something isn't right though, because it's clearly neither reading nor 
writing at \
anywhere close to 1/2 the drive read throughput. I'm curious what 
'iotop -d30 -o' \
shows (during the replace, before cancel), which should be pretty 
consistent by \
averaging 30 seconds worth of io. And then try 'iotop -d3 -o' and see 
if there are \
spikes. I'm willing to bet there's a lot of nothing going on, with 
occasional spikes, \

rather than a constant trickle.


That's more or less what I'm seeing with both. The numbers will go up or 
down slightly, but it's counted in kilobytes per second:


Total DISK READ:   0.00 B/s | Total DISK WRITE: 545.82 B/s
  TID  PRIO  USER DISK READ  DISK WRITE  SWAPIN IOCOMMAND
  940 be/3 root0.00 B/s  136.46 B/s  0.00 %  0.10 % [jbd2/md2-8]
 4714 be/4 root0.00 B/s  329.94 K/s  0.00 %  0.00 % 
[btrfs-transacti]
25534 be/4 root0.00 B/s  402.97 K/s  0.00 %  0.00 % 
[kworker/u16:0]



The bottleneck may be here - one CPU core is mostly 100% busy (kworker). 
Not sure what it's really busy with though:


  PID USER  PRI  NI  VIRT   RES   SHR S CPU% MEM%   TIME+  Command
25546 root   20   0 0 0 0 R 93.0  0.0 18:22.94 
kworker/u16:7
14473 root   20   0 0 0 0 S  5.0  0.0 25:00.14 
kworker/0:0



[912979.063432] SysRq : Show Blocked State
[912979.063485]   taskPC stack   pid father
[912979.063545] btrfs   D 88083fa515c0 0  4793   4622 
0x
[912979.063601]  88061a29b878 0086  
88003683e040
[912979.063701]  000115c0 4000 880813e3 
88003683e040
[912979.063800]  88061a29b7e8 8105d8e9 88083fa4 
88083fa115c0

[912979.063899] Call Trace:
[912979.063951]  [8105d8e9] ? enqueue_task_fair+0x3e5/0x44f
[912979.064006]  [81053484] ? resched_curr+0x47/0x57
[912979.064058]  [81053aed] ? check_preempt_curr+0x3e/0x6d
[912979.064111]  [81053b2e] ? ttwu_do_wakeup+0x12/0x7f
[912979.064164]  [81053c3c] ? 
ttwu_do_activate.constprop.74+0x57/0x5c

[912979.064220]  [813acc1e] schedule+0x65/0x67
[912979.064272]  [813aed0c] schedule_timeout+0x26/0x198
[912979.064324]  [8105639d] ? wake_up_process+0x31/0x35
[912979.064378]  [81049baf] ? wake_up_worker+0x1f/0x21
[912979.064431]  [81049df6] ? insert_work+0x87/0x94
[912979.064493]  [a02d524b] ? free_block_list+0x1f/0x34 
[btrfs]

[912979.064548]  [813ad443] wait_for_common+0x10d/0x13e
[912979.064600]  [8105635d] ? try_to_wake_up+0x251/0x251
[912979.064653]  [813ad48c] wait_for_completion+0x18/0x1a
[912979.064710]  [a0283a01] 
btrfs_async_run_delayed_refs+0xc1/0xe4 [btrfs]
[912979.064814]  [a02983c5] 
__btrfs_end_transaction+0x2bb/0x2e1 [btrfs]
[912979.064916]  [a02983f9] 
btrfs_end_transaction_throttle+0xe/0x10 [btrfs]
[912979.065020]  [a02d973d] relocate_block_group+0x2ad/0x4de 
[btrfs]
[912979.065079]  [a02d9ac6] 
btrfs_relocate_block_group+0x158/0x278 [btrfs]
[912979.065184]  [a02b66f0] 
btrfs_relocate_chunk.isra.62+0x58/0x5f7 [btrfs]
[912979.065286]  [a02c58d7] ? 
btrfs_set_lock_blocking_rw+0x68/0x95 [btrfs]
[912979.065387]  [a0276b04] ? 
btrfs_set_path_blocking+0x23/0x54 [btrfs]
[912979.065486]  [a027b517] ? btrfs_search_slot+0x7bc/0x816 
[btrfs]
[912979.065546]  [a02b2bd5] ? free_extent_buffer+0x6f/0x7c 
[btrfs]
[912979.065605]  [a02b89e9] btrfs_shrink_device+0x23c/0x3a5 
[btrfs]
[912979.065679]  [a02bb2c7] btrfs_rm_device+0x2a1/0x759 
[btrfs]

[912979.065747]  [a02c3ab3] btrfs_ioctl+0xa52/0x227f [btrfs]
[912979.065811]  [81107182] ? putname+0x23/0x2c
[912979.065863]  [8110b3cb] ? user_path_at_empty+0x60/0x90
[912979.065918]  [81173b1a] ? avc_has_perm+0x2e/0xf7
[912979.065978]  [810d7ad5] ? __vm_enough_memory+0x25/0x13c
[912979.066032]  [8110d3c1] do_vfs_ioctl+0x3f2/0x43c
[912979.066084]  [811026fd] ? vfs_stat+0x16/0x18
[912979.066136]  [8110d459] SyS_ioctl+0x4e/0x7d
[912979.066188]  [81030a71] ? do_page_fault+0xc/0xf
[912979.066240]  [813afd92] system_call_fastpath+0x16/0x1b
[912979.066296] Sched Debug Version: v0.11, 3.17.0-rc3 #1
[912979.066347] ktime   : 
913460840.666210
[912979.066401] sched_clk   : 
912979066.295474
[912979.066454] cpu_clk : 
912979066.295485

[912979.066507] jiffies : 4386283381
[912979.066560] sched_clock_stable(): 1
[912979.066610]
[912979.066656] sysctl_sched
[912979.066703]   .sysctl_sched_latency: 24.00
[912979.066756]   .sysctl_sched_min_granularity 

Re: how long should btrfs device delete missing ... take?

2014-09-11 Thread Duncan
Chris Murphy posted on Thu, 11 Sep 2014 15:25:51 -0600 as excerpted:

 On Sep 11, 2014, at 1:31 PM, Duncan 1i5t5.dun...@cox.net wrote:
 
 I wouldn't try defragging now, but it might be worthwhile to stop the
 device delete (rebooting to do so since I don't think there's a cancel)
 
 'btrfs replace cancel' does exist, although I haven't tried it.

Btrfs replace cancel exists, yes, but does it work for btrfs device 
delete, which is what the OP used?

 Something isn't right though, because it's clearly neither reading nor
 writing at anywhere close to 1/2 the drive read throughput. I'm curious
 what 'iotop -d30 -o' shows (during the replace, before cancel), which
 should be pretty consistent by averaging 30 seconds worth of io. And
 then try 'iotop -d3 -o' and see if there are spikes. I'm willing to bet
 there's a lot of nothing going on, with occasional spikes, rather than a
 constant trickle.
 
 And then the question is to find out what btrfs is thinking about while
 nothing is reading or writing. Even though it's not 5000+ snapshots, I
 wonder if the balance code (and hence btrfs replace) makes extensive use
 of fiemap that's causing this to go catatonic.
 http://comments.gmane.org/gmane.comp.file-systems.btrfs/35724

Not sure (some of that stuff's beyond me), but one thing we /do/ know is 
that btrfs has so far been focused mostly on features and debugging, not 
on optimization beyond the worst-cases, which themselves remain a big 
enough problem, tho it's slowly getting better.


-- 
Duncan - List replies preferred.   No HTML msgs.
Every nonfree program has a lord, a master --
and if you use the program, he is your master.  Richard Stallman

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: how long should btrfs device delete missing ... take?

2014-09-11 Thread Chris Murphy

On Sep 11, 2014, at 5:51 PM, Duncan 1i5t5.dun...@cox.net wrote:

 Chris Murphy posted on Thu, 11 Sep 2014 15:25:51 -0600 as excerpted:
 
 On Sep 11, 2014, at 1:31 PM, Duncan 1i5t5.dun...@cox.net wrote:
 
 I wouldn't try defragging now, but it might be worthwhile to stop the
 device delete (rebooting to do so since I don't think there's a cancel)
 
 'btrfs replace cancel' does exist, although I haven't tried it.
 
 Btrfs replace cancel exists, yes, but does it work for btrfs device 
 delete, which is what the OP used?

Oops, right! I'm not sure what can do this safely.

And then when I think about just creating a new fs, using btrfs send/receive, 
the snapshots need to be ro first. So if there's any uncertainty about safely 
canceling the 'device delete' those ro snapshots need to be taken first, in the 
event only an ro mount is possible subsequently. And then there's some 
uncertainty how long 260+ ro snapshots will take (should be fast, but) and how 
much worse that makes the current situation. But probably worth the risk to 
take the snapshots and just wait a while before trying something like umount or 
sysrq+s followed by sysrq+u.



 
 Something isn't right though, because it's clearly neither reading nor
 writing at anywhere close to 1/2 the drive read throughput. I'm curious
 what 'iotop -d30 -o' shows (during the replace, before cancel), which
 should be pretty consistent by averaging 30 seconds worth of io. And
 then try 'iotop -d3 -o' and see if there are spikes. I'm willing to bet
 there's a lot of nothing going on, with occasional spikes, rather than a
 constant trickle.
 
 And then the question is to find out what btrfs is thinking about while
 nothing is reading or writing. Even though it's not 5000+ snapshots, I
 wonder if the balance code (and hence btrfs replace) makes extensive use
 of fiemap that's causing this to go catatonic.
 http://comments.gmane.org/gmane.comp.file-systems.btrfs/35724
 
 Not sure (some of that stuff's beyond me), but one thing we /do/ know is 
 that btrfs has so far been focused mostly on features and debugging, not 
 on optimization beyond the worst-cases, which themselves remain a big 
 enough problem, tho it's slowly getting better.

Sure. But what's the next step? Given 260+ snapshots might mean well more than 
350GB of data, depending on how deduplicated the fs is, it still probably would 
be faster to rsync this to a pile of drives in linear/concat+XFS than wait a 
month (?) for device delete to finish.

Alternatively, script some way to create 260+ ro snapshots to btrfs 
send/receive to a new btrfs volume; and turn it into a raid1 later.

I'm curious if a sysrq+s followed by sysrq+u might leave the filessystem in a 
state where it could still be rw mountable. But I'm skeptical of anything 
interrupting the device delete before being fully prepared for the fs to be 
toast for rw mount. If only ro mount is possible, any chance of creating ro 
snapshots is out.


Chris Murphy--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: how long should btrfs device delete missing ... take?

2014-09-11 Thread Russell Coker
It would be nice if a file system mounted ro counted as ro snapshots for btrfs 
send.

When a file system is so messed up it can't be mounted rw it should be regarded 
as ro for all operations.
-- 
Sent from my Samsung Galaxy Note 2 with K-9 Mail.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: how long should btrfs device delete missing ... take?

2014-09-11 Thread Duncan
Russell Coker posted on Fri, 12 Sep 2014 15:19:04 +1000 as excerpted:

 It would be nice if a file system mounted ro counted as ro snapshots for
 btrfs send.
 
 When a file system is so messed up it can't be mounted rw it should be
 regarded as ro for all operations.

Indeed, and that has been suggested before, but unfortunately it's not 
current behavior.

-- 
Duncan - List replies preferred.   No HTML msgs.
Every nonfree program has a lord, a master --
and if you use the program, he is your master.  Richard Stallman

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: how long should btrfs device delete missing ... take?

2014-09-11 Thread Duncan
Chris Murphy posted on Thu, 11 Sep 2014 20:10:26 -0600 as excerpted:

 And then when I think about just creating a new fs, using btrfs
 send/receive, the snapshots need to be ro first.

FWIW, at this point I'd forget about send/receive and create the backup 
(assuming one doesn't exist already) using more normal methods.  At least 
if the original send/receive hasn't yet been done, so it'd be copying off 
all the data anyway.  Perhaps mount selected snapshots and back them up 
too (after the current case is backed up), but throw away most of the 
snapshots.

Of course if there's an existing relatively current sent/received base to 
build on, and no indication that send/receive is broken, definitely try 
that first as the amount of data to sync in that case should be MUCH 
lower, but if not...

-- 
Duncan - List replies preferred.   No HTML msgs.
Every nonfree program has a lord, a master --
and if you use the program, he is your master.  Richard Stallman

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html