1 week to rebuid 4x 3TB raid10 is a long time!

2014-07-20 Thread TM
Hi,

I have a raid10 with 4x 3TB disks on a microserver
http://n40l.wikia.com/wiki/Base_Hardware_N54L , 8Gb RAM

Recently one disk started to fail (smart errors), so I replaced it
Mounted as degraded, added new disk, removed old
Started yesterday
I am monitoring /var/log/messages and it seems it will take a long time
Started at about 8010631739392
And 20 hours later I am at 6910631739392 
btrfs: relocating block group 6910631739392 flags 65

At this rate it will take a week to complete the raid rebuild!!!

Furthermore it seems that the operation is getting slower and slower
When the rebuild started I had a new message every half a minute, now it’s
getting to OneAndHalf minutes
Most files are small files like flac/jpeg

One week for a raid10 rebuild 4x3TB drives is a very long time.
Any thoughts?
Can you share any statistics from your RAID10 rebuilds?

If I shut down the system, before the rebuild, what is the proper procedure
to remount it? Again degraded? Or normally? Can the process of rebuilding
the raid continue after a reboot? Will it survive, and continue rebuilding?

Thanks in advance
TM


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: ENOSPC errors during balance

2014-07-20 Thread Marc Joliet
Am Sat, 19 Jul 2014 19:11:00 -0600
schrieb Chris Murphy li...@colorremedies.com:

 I'm seeing this also in the 2nd dmesg:
 
 [  249.893310] BTRFS error (device sdg2): free space inode generation (0) did 
 not match free space cache generation (26286)
 
 
 So you could try umounting the volume. And doing a one time mount with the 
 clear_cache mount option. Give it some time to rebuild the space cache.
 
 After that you could umount again, and mount with enospc_debug and try to 
 reproduce the enospc with another balance and see if dmesg contains more 
 information this time.

OK, I did that, and the new dmesg is attached. Also, some outputs again, first
filesystem df (that total surge at the end sure is consistent):

# btrfs filesystem df /mnt   
Data, single: total=237.00GiB, used=229.67GiB
System, DUP: total=32.00MiB, used=36.00KiB
Metadata, DUP: total=4.50GiB, used=3.49GiB
unknown, single: total=512.00MiB, used=0.00

And here what I described in my initial post, the output of balance status
immediately after the error (turns out my memory was correct):

btrfs filesystem balance status /mnt
Balance on '/mnt' is running
0 out of about 0 chunks balanced (0 considered), -nan% left

(Also, this is with Gentoo kernel 3.15.6 now.)

-- 
Marc Joliet
--
People who think they know everything really annoy those of us who know we
don't - Bjarne Stroustrup


dmesg4.log.xz
Description: application/xz


signature.asc
Description: PGP signature


Re: ENOSPC errors during balance

2014-07-20 Thread Marc Joliet
Am Sat, 19 Jul 2014 18:53:03 -0600
schrieb Chris Murphy li...@colorremedies.com:

 
 On Jul 19, 2014, at 2:58 PM, Marc Joliet mar...@gmx.de wrote:
 
  Am Sat, 19 Jul 2014 22:10:51 +0200
  schrieb Marc Joliet mar...@gmx.de:
  
  [...]
  Another random idea:  the number of errors decreased the second time I ran
  balance (from 4 to 2), I could run another full balance and see if it keeps
  decreasing.
  
  Well, this time there were still 2 ENOSPC errors.  But I can show the df 
  output
  after such an ENOSPC error, to illustrate what I meant with the sudden surge
  in total usage:
  
  # btrfs filesystem df /run/media/marcec/MARCEC_BACKUP 
  Data, single: total=236.00GiB, used=229.04GiB
  System, DUP: total=32.00MiB, used=36.00KiB
  Metadata, DUP: total=4.00GiB, used=3.20GiB
  unknown, single: total=512.00MiB, used=0.00
  
  And then after running a balance and (almost) immediately cancelling:
  
  # btrfs filesystem df /run/media/marcec/MARCEC_BACKUP 
  Data, single: total=230.00GiB, used=229.04GiB
  System, DUP: total=32.00MiB, used=36.00KiB
  Metadata, DUP: total=4.00GiB, used=3.20GiB
  unknown, single: total=512.00MiB, used=0.00
 
 I think it's a bit weird. Two options: a. Keep using the file system, with 
 judicious backups, if a dev wants more info they'll reply to the thread; b. 
 Migrate the data to a new file system, first capture the file system with 
 btrfs-image in case a dev wants more info and you've since blown away the 
 filesystem, and then move it to a new btrfs fs. I'd use send/receive for this 
 to preserve subvolumes and snapshots.

OK, I'll keep that in mind.  I'll keep running the file system for now, just in
case it's a run-time error (i.e., a bug in the balance code, and not a problem
with the file system itself).  If it gets trashed on its own, or I move to a new
file system, I'll be sure to follow the steps you outlined.

 Chris Murphy

Thanks
-- 
Marc Joliet
--
People who think they know everything really annoy those of us who know we
don't - Bjarne Stroustrup


signature.asc
Description: PGP signature


Re: ENOSPC errors during balance

2014-07-20 Thread Marc Joliet
Am Sun, 20 Jul 2014 02:39:27 + (UTC)
schrieb Duncan 1i5t5.dun...@cox.net:

 Chris Murphy posted on Sat, 19 Jul 2014 11:38:08 -0600 as excerpted:
 
  I'm not sure of the reason for the BTRFS info (device sdg2): 2 enospc
  errors during balance but it seems informational rather than either a
  warning or problem. I'd treat ext4-btrfs converted file systems to be
  something of an odd duck, in that it's uncommon, therefore isn't getting
  as much testing and extra caution is a good idea. Make frequent backups.
 
 Expanding on that a bit...
 
 Balance simply rewrites chunks, combining where possible and possibly 
 converting to a different layout (single/dup/raid0/1/10/5/6[1]) in the 
 process.  The most common reason for enospc during balance is of course 
 all space allocated to chunks, with various workarounds for that if it 
 happens, but that doesn't seem to be what was happening to you
 (Mark J./OP).
 
 Based on very similar issues reported by another ext4 - btrfs converter 
 and the discussion on that thread, here's what I think happened:
 
 First a critical question for you as it's a critical piece of this 
 scenario that you didn't mention in your summary.  The wiki page on
 ext4 - btrfs conversion suggests deleting the ext2_saved subvolume and 
 then doing a full defrag and rebalance.  You're attempting a full 
 rebalance, but have you yet deleted ext2_saved and did you do the defrag 
 before attempting the rebalance?
 
 I'm guessing not, as was the case with the other user that reported this 
 issue.  Here's what apparently happened in his case and how we fixed it:

Ah, I actually did, in fact.  I only implicitly said it, though.  Here's what I
wrote:

After converting the backup partition about a week ago, following the wiki
entry on ext4 conversion, I eventually ran a full balance [...]

The wiki says to run a full balance (and defragment before that, but that was
slw, so I didn't do it), *after* deleting the ext4 file system image.
So the full balance was right after doing that :) .

 The problem is that btrfs data chunks are 1 GiB each.  Thus, the maximum 
 size of a btrfs extent is 1 GiB.  But ext4 doesn't have an arbitrary 
 limitation on extent size, and for files over a GiB in size, ext4 extents 
 can /also/ be over a GiB in size.
 
 That results in two potential issues at balance time.  First, btrfs 
 treats the ext2_saved subvolume as a read-only snapshot and won't touch 
 it, thus keeping the ext* data intact in case the user wishes to rollback 
 to ext*.  I don't think btrfs touches that data during a balance either, 
 as it really couldn't do so /safely/ without incorporating all of the 
 ext* code into btrfs.  I'm not sure how it expresses that situation, but 
 it's quite possible that btrfs treats it as enospc.
 
 Second, for files that had ext4 extents greater than a GiB, balance will 
 naturally enospc, because even the biggest possible btrfs extent, a full 
 1 GiB data chunk, is too small to hold the existing file extent.  Of 
 course this only happens on filesystems converted from ext*, because 
 natively btrfs has no way to make an extent larger than a GiB, so it 
 won't run into the problem if it was created natively instead of 
 converted from ext*.
 
 Once the ext2_saved subvolume/snapshot is deleted, defragging should cure 
 the problem as it rewrites those files to btrfs-native chunks, normally 
 defragging but in this case fragging to the 1 GiB btrfs-native data-chunk-
 size extent size.

Hmm, well, I didn't defragment because it would have taken *forever* to go
through all those hardlinks, plus my experience is that ext* doesn't fragment
much at all, so I skipped that step.  But I certainly have files over 1GB in
size.

On the other hand, the wiki [0] says that defragmentation (and balancing) is
optional, and the only reason stated for doing either is because they will have
impact on performance.

 Alternatively, and this is what the other guy did, one can find all the 
 files from the original ext*fs over a GiB in size, and move them off-
 filesystem and back AFAIK he had several gigs of spare RAM and no files 
 larger than that, so he used tmpfs as the temporary storage location, 
 which is memory so the only I/O is that on the btrfs in question.  By 
 doing that he deleted the existing files on btrfs and recreated them, 
 naturally splitting the extents on data-chunk-boundaries as btrfs 
 normally does, in the recreation.
 
 If you had deleted the ext2_saved subvolume/snapshot and done the defrag 
 already, that explanation doesn't work as-is, but I'd still consider it 
 an artifact from the conversion, and try the alternative move-off-
 filesystem-temporarily method.

I'll try this and see, but I think I have more files 1GB than would account
for this error (which comes towards the end of the balance when only a few
chunks are left).  I'll see what find /mnt -type f -size +1G finds :) .

 If you don't have any files over a GiB in size, then I don't know... 

Re: ENOSPC errors during balance

2014-07-20 Thread Marc Joliet
Am Sun, 20 Jul 2014 12:22:33 +0200
schrieb Marc Joliet mar...@gmx.de:

[...]
 I'll try this and see, but I think I have more files 1GB than would account
 for this error (which comes towards the end of the balance when only a few
 chunks are left).  I'll see what find /mnt -type f -size +1G finds :) .

Now that I think about it, though, it sounds like it could explain the sudden
surge in total data size: for one very big file, several chunks/extents are
created, but the data cannot be copied from the original ext4 extent.

So far, the above find command has only found a few handful of files (plus all
the reflinks in the snapshots), much to my surprise. It still has one subvolume
to go through, though.

And just for completeness, that same find command didn't find any files on /,
which I also converted from ext4, and for which a full balance completed
successfully.  So maybe this is in the right direction, but I'll wait and see
what Chris Murphy (or anyone else) might find in my latest dmesg output.

-- 
Marc Joliet
--
People who think they know everything really annoy those of us who know we
don't - Bjarne Stroustrup


signature.asc
Description: PGP signature


Re: ENOSPC errors during balance

2014-07-20 Thread Duncan
Marc Joliet posted on Sun, 20 Jul 2014 12:22:33 +0200 as excerpted:

 On the other hand, the wiki [0] says that defragmentation (and
 balancing) is optional, and the only reason stated for doing either is
 because they will have impact on performance.

Yes.  That's what threw off the other guy as well.  He decided to skip it 
for the same reason.

If I had a wiki account I'd change it, but for whatever reason I tend to 
be far more comfortable writing list replies, sometimes repeatedly, than 
writing anything on the web, which I tend to treat as read-only.  So I've 
never gotten a wiki account and thus haven't changed it, and apparently 
the other guy with the problem and anyone else that knows hasn't changed 
it either, so the conversion page still continues to underemphasize the 
importance of completing the conversion steps, including the defrag, in 
proper order.

-- 
Duncan - List replies preferred.   No HTML msgs.
Every nonfree program has a lord, a master --
and if you use the program, he is your master.  Richard Stallman

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[GIT PULL] Btrfs

2014-07-20 Thread Chris Mason

Hi Linus,

We have two more fixes in my for-linus branch:

git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs.git for-linus

I was hoping to also include a fix for a btrfs deadlock with compression
enabled, but we're still nailing that one down.

Liu Bo (1) commits (+11/-0):
Btrfs: fix abnormal long waiting in fsync

Eric Sandeen (1) commits (+4/-4):
btrfs: test for valid bdev before kobj removal in btrfs_rm_device

Total: (2) commits (+15/-4)

 fs/btrfs/ordered-data.c | 11 +++
 fs/btrfs/volumes.c  |  8 
 2 files changed, 15 insertions(+), 4 deletions(-)
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 1 week to rebuid 4x 3TB raid10 is a long time!

2014-07-20 Thread Austin S Hemmelgarn
On 07/20/2014 10:00 AM, Tomasz Torcz wrote:
 On Sun, Jul 20, 2014 at 01:53:34PM +, Duncan wrote:
 TM posted on Sun, 20 Jul 2014 08:45:51 + as excerpted:

 One week for a raid10 rebuild 4x3TB drives is a very long time.
 Any thoughts?
 Can you share any statistics from your RAID10 rebuilds?


 At a week, that's nearly 5 MiB per second, which isn't great, but isn't 
 entirely out of the realm of reason either, given all the processing it's 
 doing.  A day would be 33.11+, reasonable thruput for a straight copy, 
 and a raid rebuild is rather more complex than a straight copy, so...
 
   Uhm, sorry, but 5MBps is _entirely_ unreasonable.  It is order-of-magnitude
 unreasonable.  And all the processing shouldn't even show as a blip
 on modern CPUs.
   This speed is undefendable.
 
I wholly agree that it's undefendable, but I can tell you why it is so
slow, it's not 'all the processing' (which is maybe a few hundred
instructions on x86 for each block), it's because BTRFS still serializes
writes to devices, instead of queuing all of them in parallel (that is,
when there are four devices that need written to, it writes to each one
in sequence, waiting for the previous write to finish before dispatching
the next write).  Personally, I would love to see this behavior
improved, but I really don't have any time to work on it myself.



smime.p7s
Description: S/MIME Cryptographic Signature


Re: Questions on incremental backups

2014-07-20 Thread Sam Bull
Thanks everyone for the responses. I'll start setting up my backup
strategy in 2 or 3 weeks. I'll give the diff and unionFS tips a go, and
report back on any progress.


signature.asc
Description: This is a digitally signed message part


Re: 1 week to rebuid 4x 3TB raid10 is a long time!

2014-07-20 Thread Bob Marley

On 20/07/2014 10:45, TM wrote:

Hi,

I have a raid10 with 4x 3TB disks on a microserver
http://n40l.wikia.com/wiki/Base_Hardware_N54L , 8Gb RAM

Recently one disk started to fail (smart errors), so I replaced it
Mounted as degraded, added new disk, removed old
Started yesterday
I am monitoring /var/log/messages and it seems it will take a long time
Started at about 8010631739392
And 20 hours later I am at 6910631739392
btrfs: relocating block group 6910631739392 flags 65

At this rate it will take a week to complete the raid rebuild!!!

Furthermore it seems that the operation is getting slower and slower
When the rebuild started I had a new message every half a minute, now it’s
getting to OneAndHalf minutes
Most files are small files like flac/jpeg



Hi TM, are you doing other significant filesystem activity during this 
rebuild, especially random accesses?

This can reduce performances a lot on HDDs.
E.g. if you were doing strenous multithreaded random writes in the 
meanwhile, I could expect even less than 5MB/sec overall...


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 1 week to rebuid 4x 3TB raid10 is a long time!

2014-07-20 Thread Roman Mamedov
On Sun, 20 Jul 2014 21:15:31 +0200
Bob Marley bobmar...@shiftmail.org wrote:

 Hi TM, are you doing other significant filesystem activity during this 
 rebuild, especially random accesses?
 This can reduce performances a lot on HDDs.
 E.g. if you were doing strenous multithreaded random writes in the 
 meanwhile, I could expect even less than 5MB/sec overall...

I believe the problem here might be that a Btrfs rebuild *is* a strenuous
random read (+ random-ish write) just by itself.

Mdadm-based RAID would rebuild the array reading/writing disks in a completely
linear manner, and it would finish an order of magnitude faster.

-- 
With respect,
Roman


signature.asc
Description: PGP signature


Re: 1 week to rebuid 4x 3TB raid10 is a long time!

2014-07-20 Thread ashford
This is the cause for the slow reconstruct.

 I believe the problem here might be that a Btrfs rebuild *is* a strenuous
 random read (+ random-ish write) just by itself.

If you assume a 12ms average seek time (normal for 7200RPM SATA drives),
an 8.3ms rotational latency (half a rotation), an average 64kb write and a
100MB/S streaming write speed, each write comes in at ~21ms, which gives
us ~47 IOPS.  With the 64KB write size, this comes out to ~3MB/S, DISK
LIMITED.

The on-disk cache helps a bit during the startup, but once the cache is
full, it's back to writes at disk speed, with some small gains if the
on-disk controller can schedule the writes efficiently.

Based on the single-threaded I/O that BTRFS uses during a reconstruct, I
expect that the average write size is somewhere around 200KB. 
Multi-threading the reconstruct disk I/O (possibly adding look-ahead),
would double the reconstruct speed for this array, but that's not a
trivial task.

The 5MB/S that TM is seeing is fine, considering the small files he says
he has.

Peter Ashford

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Blocked tasks on 3.15.1

2014-07-20 Thread Matt
[ deadlocks during rsync in 3.15 with compression enabled ]

Hi everyone,

I still haven't been able to reproduce this one here, but I'm going
through a series of tests with lzo compression foraced and every
operation forced to ordered.  Hopefully it'll kick it out soon.

While I'm hammering away, could you please try this patch.  If this is
the buy you're hitting, the deadlock will go away and you'll see this
printk in the log.

thanks!

-chris

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 3668048..8ab56df 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -8157,6 +8157,13 @@ void btrfs_destroy_inode(struct inode *inode)
  spin_unlock(root-fs_info-ordered_root_lock);
  }

+ spin_lock(root-fs_info-ordered_root_lock);
+ if (!list_empty(BTRFS_I(inode)-ordered_operations)) {
+ list_del_init(BTRFS_I(inode)-ordered_operations);
+printk(KERN_CRIT racing inode deletion with ordered operations!!!\n);
+ }
+ spin_unlock(root-fs_info-ordered_root_lock);
+
  if (test_bit(BTRFS_INODE_HAS_ORPHAN_ITEM,
  BTRFS_I(inode)-runtime_flags)) {
  btrfs_info(root-fs_info, inode %llu still on the orphan list,
--



Hi Chris,

just had that hang during rsync from /home (ZFS, mirrored) to /bak
(Btrfs w. lzo compression) again with that patch applied, it doesn't
seem to be related to that issue (or patch) - only applicable to my
case, obviously - since search for that string (e.g. racing) doesn't
show anything in that message:

[16028.169347] INFO: task kworker/u16:2:11956 blocked for more than 180 seconds.
[16028.169349] Tainted: P O 3.14.13_btrfs+_BFS_test27_integration #2
[16028.169350] echo 0  /proc/sys/kernel/hung_task_timeout_secs
disables this message.
[16028.169351] kworker/u16:2 D 88081ec13540 0 11956 2 0x0008
[16028.169356] Workqueue: btrfs-delalloc normal_work_helper
[16028.169358] 8806180ab8e0 0046 
0004
[16028.169359] a000 8806210f16b0 8806180abfd8
81e11500
[16028.169360] 8806210f16b0 0206 8113e6cc
88081ec135c0
[16028.169362] Call Trace:
[16028.169367] [8113e6cc] ? delayacct_end+0x7c/0x90
[16028.169370] [811689d0] ? wait_on_page_read+0x60/0x60
[16028.169374] [819cfc78] ? io_schedule+0x88/0xe0
[16028.169375] [811689d5] ? sleep_on_page+0x5/0x10
[16028.169377] [819cfffc] ? __wait_on_bit_lock+0x3c/0x90
[16028.169378] [81168ac5] ? __lock_page+0x65/0x70
[16028.169382] [810f5580] ? autoremove_wake_function+0x30/0x30
[16028.169384] [81169854] ? __find_lock_page+0x44/0x70
[16028.169385] [811698ca] ? find_or_create_page+0x2a/0xa0
[16028.169388] [8145a1cf] ? io_ctl_prepare_pages+0x4f/0x150
[16028.169390] [8145bd45] ? __load_free_space_cache+0x195/0x5d0
[16028.169392] [8145c26b] ? load_free_space_cache+0xeb/0x1b0
[16028.169395] [813fd6a1] ? cache_block_group+0x191/0x390
[16028.169396] [810f5550] ? prepare_to_wait_event+0xf0/0xf0
[16028.169398] [814085ea] ? find_free_extent+0x95a/0xdb0
[16028.169400] [81408bf9] ? btrfs_reserve_extent+0x69/0x150
[16028.169403] [81421116] ? cow_file_range+0x136/0x420
[16028.169404] [81422493] ? submit_compressed_extents+0x1f3/0x480
[16028.169406] [81422720] ? submit_compressed_extents+0x480/0x480
[16028.169407] [8144896b] ? normal_work_helper+0x1ab/0x330
[16028.169410] [810df26d] ? process_one_work+0x16d/0x490
[16028.169411] [810dff8b] ? worker_thread+0x12b/0x410
[16028.169412] [810dfe60] ? manage_workers.isra.28+0x2c0/0x2c0
[16028.169414] [810e579a] ? kthread+0xca/0xe0
[16028.169415] [810e56d0] ? kthread_create_on_node+0x180/0x180
[16028.169417] [819d3c7c] ? ret_from_fork+0x7c/0xb0
[16028.169418] [810e56d0] ? kthread_create_on_node+0x180/0x180
[16028.169422] INFO: task btrfs-transacti:12042 blocked for more than
180 seconds.
[16028.169422] Tainted: P O 3.14.13_btrfs+_BFS_test27_integration #2
[16028.169423] echo 0  /proc/sys/kernel/hung_task_timeout_secs
disables this message.
[16028.169423] btrfs-transacti D 88081ec13540 0 12042 2 0x0008
[16028.169425] 88009c7adb20 0046 
88040d84ca68
[16028.169426] a000 88061f284ba0 88009c7adfd8
81e11500
[16028.169427] 88061f284ba0 88061a21dea8 811b8c2d
8805fc919e00
[16028.169428] Call Trace:
[16028.169431] [811b8c2d] ? kmem_cache_alloc_trace+0x14d/0x160
[16028.169433] [813fd632] ? cache_block_group+0x122/0x390
[16028.169434] [810f5550] ? prepare_to_wait_event+0xf0/0xf0
[16028.169436] [814085ea] ? find_free_extent+0x95a/0xdb0
[16028.169437] [81408bf9] ? btrfs_reserve_extent+0x69/0x150
[16028.169439] [81422fa8] ? __btrfs_prealloc_file_range+0xe8/0x380
[16028.169441] [8140b6f2] ? btrfs_write_dirty_block_groups+0x642/0x6d0
[16028.169442] [819cb00c] ? 

Re: 1 week to rebuid 4x 3TB raid10 is a long time!

2014-07-20 Thread George Mitchell

On 07/20/2014 02:28 PM, Bob Marley wrote:

On 20/07/2014 21:36, Roman Mamedov wrote:

On Sun, 20 Jul 2014 21:15:31 +0200
Bob Marley bobmar...@shiftmail.org wrote:


Hi TM, are you doing other significant filesystem activity during this
rebuild, especially random accesses?
This can reduce performances a lot on HDDs.
E.g. if you were doing strenous multithreaded random writes in the
meanwhile, I could expect even less than 5MB/sec overall...
I believe the problem here might be that a Btrfs rebuild *is* a 
strenuous

random read (+ random-ish write) just by itself.

Mdadm-based RAID would rebuild the array reading/writing disks in a 
completely

linear manner, and it would finish an order of magnitude faster.


Now this explains a lot!
So they would just need to be sorted?
Sorting the files of a disk from lowest to highers block number prior 
to starting reconstruction seems feasible. Maybe not all of them 
together because they will be millions, but sorting them in chunks of 
1000 files would still produce a very significant speedup!

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


As I understand the problem, it has to do with where btrfs is in the 
overall development process.  There are a LOT of opportunities for 
optimization, but optimization cannot begin until btrfs is feature 
complete, because any work done beforehand would be wasted effort in 
that it would likely have to be repeated after being broken by feature 
enhancements.  So now it is a waiting game for completion of all the 
major features (like additional RAID levels and possible n-way options, 
etc) before optimization efforts can begin.  Once that happens we will 
likely see HUGE gains in efficiency and speed, but until then we are 
kind of stuck in this position where it works but leaves somewhat to 
be desired.  I think this is one reason developers often caution users 
not to expect too much from btrfs at this point.  Its just not there yet 
and it will still be some time yet before it is.

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 1 week to rebuid 4x 3TB raid10 is a long time!

2014-07-20 Thread Wang Shilong

Hi,

On 07/20/2014 04:45 PM, TM wrote:

Hi,

I have a raid10 with 4x 3TB disks on a microserver
http://n40l.wikia.com/wiki/Base_Hardware_N54L , 8Gb RAM

Recently one disk started to fail (smart errors), so I replaced it
Mounted as degraded, added new disk, removed old
Started yesterday
I am monitoring /var/log/messages and it seems it will take a long time
Started at about 8010631739392
And 20 hours later I am at 6910631739392
btrfs: relocating block group 6910631739392 flags 65

At this rate it will take a week to complete the raid rebuild!!!

Just my two cents:

Since 'btrfs replace' support RADI10, I suppose using replace
operation is better than 'device removal and add'.

Another Question is related to btrfs snapshot-aware balance.
How many snapshots did you have in your system?

Of course, During balance/resize/device removal operations,
you could still snapshot, but fewer snapshots should speed things up!

Anyway 'btrfs replace' is implemented more effective than
'device remova and add'.:-)

Thanks,
Wang


Furthermore it seems that the operation is getting slower and slower
When the rebuild started I had a new message every half a minute, now it’s
getting to OneAndHalf minutes
Most files are small files like flac/jpeg

One week for a raid10 rebuild 4x3TB drives is a very long time.
Any thoughts?
Can you share any statistics from your RAID10 rebuilds?

If I shut down the system, before the rebuild, what is the proper procedure
to remount it? Again degraded? Or normally? Can the process of rebuilding
the raid continue after a reboot? Will it survive, and continue rebuilding?

Thanks in advance
TM


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html



--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: ENOSPC errors during balance

2014-07-20 Thread Duncan
Marc Joliet posted on Sun, 20 Jul 2014 21:44:40 +0200 as excerpted:

 Am Sun, 20 Jul 2014 13:40:54 +0200 schrieb Marc Joliet mar...@gmx.de:
 
 Am Sun, 20 Jul 2014 12:22:33 +0200 schrieb Marc Joliet mar...@gmx.de:
 
 [...]
  I'll try this and see, but I think I have more files 1GB than would
  account for this error (which comes towards the end of the balance
  when only a few chunks are left).  I'll see what find /mnt -type f
  -size +1G finds :) .

Note that it's extent's over 1 GiB on the converted former ext4, not 
necessarily files over 1 GiB.  You may have files over a GiB that were 
already broken into extents that are all less than a GiB, and btrfs would 
be able to deal with them fine.  It's only when a single extent ended up 
larger than a GiB on the former ext4 that btrfs can't deal with it.

 Now that I think about it, though, it sounds like it could explain the
 sudden surge in total data size: for one very big file, several
 chunks/extents are created, but the data cannot be copied from the
 original ext4 extent.

I hadn't thought about that effect, but good deductive reasoning. =:^)

 Well, turns out that was it!
 
 What I did:
 
 - delete the single largest file on the file system, a 12 GB VM image,
 along with all subvolumes that contained it
 - rsync it over again - start a full balance
 
 This time, the balance finished successfully :-) .

Good to read!

We're now two for two on this technique working around this problem! =:^)

-- 
Duncan - List replies preferred.   No HTML msgs.
Every nonfree program has a lord, a master --
and if you use the program, he is your master.  Richard Stallman

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 1 week to rebuid 4x 3TB raid10 is a long time!

2014-07-20 Thread Duncan
ashford posted on Sun, 20 Jul 2014 12:59:21 -0700 as excerpted:

 If you assume a 12ms average seek time (normal for 7200RPM SATA drives),
 an 8.3ms rotational latency (half a rotation), an average 64kb write and
 a 100MB/S streaming write speed, each write comes in at ~21ms, which
 gives us ~47 IOPS.  With the 64KB write size, this comes out to ~3MB/S,
 DISK LIMITED.

 The 5MB/S that TM is seeing is fine, considering the small files he says
 he has.

Thanks for the additional numbers supporting my point. =:^)

I had run some of the numbers but not to the extent you just did, so I 
didn't know where 5 MiB/s fit in, only that it wasn't entirely out of the 
range of expectation for spinning rust, given the current state of 
optimization... or more accurately the lack thereof, due to the focus 
still being on features.

-- 
Duncan - List replies preferred.   No HTML msgs.
Every nonfree program has a lord, a master --
and if you use the program, he is your master.  Richard Stallman

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html