1 week to rebuid 4x 3TB raid10 is a long time!
Hi, I have a raid10 with 4x 3TB disks on a microserver http://n40l.wikia.com/wiki/Base_Hardware_N54L , 8Gb RAM Recently one disk started to fail (smart errors), so I replaced it Mounted as degraded, added new disk, removed old Started yesterday I am monitoring /var/log/messages and it seems it will take a long time Started at about 8010631739392 And 20 hours later I am at 6910631739392 btrfs: relocating block group 6910631739392 flags 65 At this rate it will take a week to complete the raid rebuild!!! Furthermore it seems that the operation is getting slower and slower When the rebuild started I had a new message every half a minute, now it’s getting to OneAndHalf minutes Most files are small files like flac/jpeg One week for a raid10 rebuild 4x3TB drives is a very long time. Any thoughts? Can you share any statistics from your RAID10 rebuilds? If I shut down the system, before the rebuild, what is the proper procedure to remount it? Again degraded? Or normally? Can the process of rebuilding the raid continue after a reboot? Will it survive, and continue rebuilding? Thanks in advance TM -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ENOSPC errors during balance
Am Sat, 19 Jul 2014 19:11:00 -0600 schrieb Chris Murphy li...@colorremedies.com: I'm seeing this also in the 2nd dmesg: [ 249.893310] BTRFS error (device sdg2): free space inode generation (0) did not match free space cache generation (26286) So you could try umounting the volume. And doing a one time mount with the clear_cache mount option. Give it some time to rebuild the space cache. After that you could umount again, and mount with enospc_debug and try to reproduce the enospc with another balance and see if dmesg contains more information this time. OK, I did that, and the new dmesg is attached. Also, some outputs again, first filesystem df (that total surge at the end sure is consistent): # btrfs filesystem df /mnt Data, single: total=237.00GiB, used=229.67GiB System, DUP: total=32.00MiB, used=36.00KiB Metadata, DUP: total=4.50GiB, used=3.49GiB unknown, single: total=512.00MiB, used=0.00 And here what I described in my initial post, the output of balance status immediately after the error (turns out my memory was correct): btrfs filesystem balance status /mnt Balance on '/mnt' is running 0 out of about 0 chunks balanced (0 considered), -nan% left (Also, this is with Gentoo kernel 3.15.6 now.) -- Marc Joliet -- People who think they know everything really annoy those of us who know we don't - Bjarne Stroustrup dmesg4.log.xz Description: application/xz signature.asc Description: PGP signature
Re: ENOSPC errors during balance
Am Sat, 19 Jul 2014 18:53:03 -0600 schrieb Chris Murphy li...@colorremedies.com: On Jul 19, 2014, at 2:58 PM, Marc Joliet mar...@gmx.de wrote: Am Sat, 19 Jul 2014 22:10:51 +0200 schrieb Marc Joliet mar...@gmx.de: [...] Another random idea: the number of errors decreased the second time I ran balance (from 4 to 2), I could run another full balance and see if it keeps decreasing. Well, this time there were still 2 ENOSPC errors. But I can show the df output after such an ENOSPC error, to illustrate what I meant with the sudden surge in total usage: # btrfs filesystem df /run/media/marcec/MARCEC_BACKUP Data, single: total=236.00GiB, used=229.04GiB System, DUP: total=32.00MiB, used=36.00KiB Metadata, DUP: total=4.00GiB, used=3.20GiB unknown, single: total=512.00MiB, used=0.00 And then after running a balance and (almost) immediately cancelling: # btrfs filesystem df /run/media/marcec/MARCEC_BACKUP Data, single: total=230.00GiB, used=229.04GiB System, DUP: total=32.00MiB, used=36.00KiB Metadata, DUP: total=4.00GiB, used=3.20GiB unknown, single: total=512.00MiB, used=0.00 I think it's a bit weird. Two options: a. Keep using the file system, with judicious backups, if a dev wants more info they'll reply to the thread; b. Migrate the data to a new file system, first capture the file system with btrfs-image in case a dev wants more info and you've since blown away the filesystem, and then move it to a new btrfs fs. I'd use send/receive for this to preserve subvolumes and snapshots. OK, I'll keep that in mind. I'll keep running the file system for now, just in case it's a run-time error (i.e., a bug in the balance code, and not a problem with the file system itself). If it gets trashed on its own, or I move to a new file system, I'll be sure to follow the steps you outlined. Chris Murphy Thanks -- Marc Joliet -- People who think they know everything really annoy those of us who know we don't - Bjarne Stroustrup signature.asc Description: PGP signature
Re: ENOSPC errors during balance
Am Sun, 20 Jul 2014 02:39:27 + (UTC) schrieb Duncan 1i5t5.dun...@cox.net: Chris Murphy posted on Sat, 19 Jul 2014 11:38:08 -0600 as excerpted: I'm not sure of the reason for the BTRFS info (device sdg2): 2 enospc errors during balance but it seems informational rather than either a warning or problem. I'd treat ext4-btrfs converted file systems to be something of an odd duck, in that it's uncommon, therefore isn't getting as much testing and extra caution is a good idea. Make frequent backups. Expanding on that a bit... Balance simply rewrites chunks, combining where possible and possibly converting to a different layout (single/dup/raid0/1/10/5/6[1]) in the process. The most common reason for enospc during balance is of course all space allocated to chunks, with various workarounds for that if it happens, but that doesn't seem to be what was happening to you (Mark J./OP). Based on very similar issues reported by another ext4 - btrfs converter and the discussion on that thread, here's what I think happened: First a critical question for you as it's a critical piece of this scenario that you didn't mention in your summary. The wiki page on ext4 - btrfs conversion suggests deleting the ext2_saved subvolume and then doing a full defrag and rebalance. You're attempting a full rebalance, but have you yet deleted ext2_saved and did you do the defrag before attempting the rebalance? I'm guessing not, as was the case with the other user that reported this issue. Here's what apparently happened in his case and how we fixed it: Ah, I actually did, in fact. I only implicitly said it, though. Here's what I wrote: After converting the backup partition about a week ago, following the wiki entry on ext4 conversion, I eventually ran a full balance [...] The wiki says to run a full balance (and defragment before that, but that was slw, so I didn't do it), *after* deleting the ext4 file system image. So the full balance was right after doing that :) . The problem is that btrfs data chunks are 1 GiB each. Thus, the maximum size of a btrfs extent is 1 GiB. But ext4 doesn't have an arbitrary limitation on extent size, and for files over a GiB in size, ext4 extents can /also/ be over a GiB in size. That results in two potential issues at balance time. First, btrfs treats the ext2_saved subvolume as a read-only snapshot and won't touch it, thus keeping the ext* data intact in case the user wishes to rollback to ext*. I don't think btrfs touches that data during a balance either, as it really couldn't do so /safely/ without incorporating all of the ext* code into btrfs. I'm not sure how it expresses that situation, but it's quite possible that btrfs treats it as enospc. Second, for files that had ext4 extents greater than a GiB, balance will naturally enospc, because even the biggest possible btrfs extent, a full 1 GiB data chunk, is too small to hold the existing file extent. Of course this only happens on filesystems converted from ext*, because natively btrfs has no way to make an extent larger than a GiB, so it won't run into the problem if it was created natively instead of converted from ext*. Once the ext2_saved subvolume/snapshot is deleted, defragging should cure the problem as it rewrites those files to btrfs-native chunks, normally defragging but in this case fragging to the 1 GiB btrfs-native data-chunk- size extent size. Hmm, well, I didn't defragment because it would have taken *forever* to go through all those hardlinks, plus my experience is that ext* doesn't fragment much at all, so I skipped that step. But I certainly have files over 1GB in size. On the other hand, the wiki [0] says that defragmentation (and balancing) is optional, and the only reason stated for doing either is because they will have impact on performance. Alternatively, and this is what the other guy did, one can find all the files from the original ext*fs over a GiB in size, and move them off- filesystem and back AFAIK he had several gigs of spare RAM and no files larger than that, so he used tmpfs as the temporary storage location, which is memory so the only I/O is that on the btrfs in question. By doing that he deleted the existing files on btrfs and recreated them, naturally splitting the extents on data-chunk-boundaries as btrfs normally does, in the recreation. If you had deleted the ext2_saved subvolume/snapshot and done the defrag already, that explanation doesn't work as-is, but I'd still consider it an artifact from the conversion, and try the alternative move-off- filesystem-temporarily method. I'll try this and see, but I think I have more files 1GB than would account for this error (which comes towards the end of the balance when only a few chunks are left). I'll see what find /mnt -type f -size +1G finds :) . If you don't have any files over a GiB in size, then I don't know...
Re: ENOSPC errors during balance
Am Sun, 20 Jul 2014 12:22:33 +0200 schrieb Marc Joliet mar...@gmx.de: [...] I'll try this and see, but I think I have more files 1GB than would account for this error (which comes towards the end of the balance when only a few chunks are left). I'll see what find /mnt -type f -size +1G finds :) . Now that I think about it, though, it sounds like it could explain the sudden surge in total data size: for one very big file, several chunks/extents are created, but the data cannot be copied from the original ext4 extent. So far, the above find command has only found a few handful of files (plus all the reflinks in the snapshots), much to my surprise. It still has one subvolume to go through, though. And just for completeness, that same find command didn't find any files on /, which I also converted from ext4, and for which a full balance completed successfully. So maybe this is in the right direction, but I'll wait and see what Chris Murphy (or anyone else) might find in my latest dmesg output. -- Marc Joliet -- People who think they know everything really annoy those of us who know we don't - Bjarne Stroustrup signature.asc Description: PGP signature
Re: ENOSPC errors during balance
Marc Joliet posted on Sun, 20 Jul 2014 12:22:33 +0200 as excerpted: On the other hand, the wiki [0] says that defragmentation (and balancing) is optional, and the only reason stated for doing either is because they will have impact on performance. Yes. That's what threw off the other guy as well. He decided to skip it for the same reason. If I had a wiki account I'd change it, but for whatever reason I tend to be far more comfortable writing list replies, sometimes repeatedly, than writing anything on the web, which I tend to treat as read-only. So I've never gotten a wiki account and thus haven't changed it, and apparently the other guy with the problem and anyone else that knows hasn't changed it either, so the conversion page still continues to underemphasize the importance of completing the conversion steps, including the defrag, in proper order. -- Duncan - List replies preferred. No HTML msgs. Every nonfree program has a lord, a master -- and if you use the program, he is your master. Richard Stallman -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[GIT PULL] Btrfs
Hi Linus, We have two more fixes in my for-linus branch: git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs.git for-linus I was hoping to also include a fix for a btrfs deadlock with compression enabled, but we're still nailing that one down. Liu Bo (1) commits (+11/-0): Btrfs: fix abnormal long waiting in fsync Eric Sandeen (1) commits (+4/-4): btrfs: test for valid bdev before kobj removal in btrfs_rm_device Total: (2) commits (+15/-4) fs/btrfs/ordered-data.c | 11 +++ fs/btrfs/volumes.c | 8 2 files changed, 15 insertions(+), 4 deletions(-) -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 1 week to rebuid 4x 3TB raid10 is a long time!
On 07/20/2014 10:00 AM, Tomasz Torcz wrote: On Sun, Jul 20, 2014 at 01:53:34PM +, Duncan wrote: TM posted on Sun, 20 Jul 2014 08:45:51 + as excerpted: One week for a raid10 rebuild 4x3TB drives is a very long time. Any thoughts? Can you share any statistics from your RAID10 rebuilds? At a week, that's nearly 5 MiB per second, which isn't great, but isn't entirely out of the realm of reason either, given all the processing it's doing. A day would be 33.11+, reasonable thruput for a straight copy, and a raid rebuild is rather more complex than a straight copy, so... Uhm, sorry, but 5MBps is _entirely_ unreasonable. It is order-of-magnitude unreasonable. And all the processing shouldn't even show as a blip on modern CPUs. This speed is undefendable. I wholly agree that it's undefendable, but I can tell you why it is so slow, it's not 'all the processing' (which is maybe a few hundred instructions on x86 for each block), it's because BTRFS still serializes writes to devices, instead of queuing all of them in parallel (that is, when there are four devices that need written to, it writes to each one in sequence, waiting for the previous write to finish before dispatching the next write). Personally, I would love to see this behavior improved, but I really don't have any time to work on it myself. smime.p7s Description: S/MIME Cryptographic Signature
Re: Questions on incremental backups
Thanks everyone for the responses. I'll start setting up my backup strategy in 2 or 3 weeks. I'll give the diff and unionFS tips a go, and report back on any progress. signature.asc Description: This is a digitally signed message part
Re: 1 week to rebuid 4x 3TB raid10 is a long time!
On 20/07/2014 10:45, TM wrote: Hi, I have a raid10 with 4x 3TB disks on a microserver http://n40l.wikia.com/wiki/Base_Hardware_N54L , 8Gb RAM Recently one disk started to fail (smart errors), so I replaced it Mounted as degraded, added new disk, removed old Started yesterday I am monitoring /var/log/messages and it seems it will take a long time Started at about 8010631739392 And 20 hours later I am at 6910631739392 btrfs: relocating block group 6910631739392 flags 65 At this rate it will take a week to complete the raid rebuild!!! Furthermore it seems that the operation is getting slower and slower When the rebuild started I had a new message every half a minute, now it’s getting to OneAndHalf minutes Most files are small files like flac/jpeg Hi TM, are you doing other significant filesystem activity during this rebuild, especially random accesses? This can reduce performances a lot on HDDs. E.g. if you were doing strenous multithreaded random writes in the meanwhile, I could expect even less than 5MB/sec overall... -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 1 week to rebuid 4x 3TB raid10 is a long time!
On Sun, 20 Jul 2014 21:15:31 +0200 Bob Marley bobmar...@shiftmail.org wrote: Hi TM, are you doing other significant filesystem activity during this rebuild, especially random accesses? This can reduce performances a lot on HDDs. E.g. if you were doing strenous multithreaded random writes in the meanwhile, I could expect even less than 5MB/sec overall... I believe the problem here might be that a Btrfs rebuild *is* a strenuous random read (+ random-ish write) just by itself. Mdadm-based RAID would rebuild the array reading/writing disks in a completely linear manner, and it would finish an order of magnitude faster. -- With respect, Roman signature.asc Description: PGP signature
Re: 1 week to rebuid 4x 3TB raid10 is a long time!
This is the cause for the slow reconstruct. I believe the problem here might be that a Btrfs rebuild *is* a strenuous random read (+ random-ish write) just by itself. If you assume a 12ms average seek time (normal for 7200RPM SATA drives), an 8.3ms rotational latency (half a rotation), an average 64kb write and a 100MB/S streaming write speed, each write comes in at ~21ms, which gives us ~47 IOPS. With the 64KB write size, this comes out to ~3MB/S, DISK LIMITED. The on-disk cache helps a bit during the startup, but once the cache is full, it's back to writes at disk speed, with some small gains if the on-disk controller can schedule the writes efficiently. Based on the single-threaded I/O that BTRFS uses during a reconstruct, I expect that the average write size is somewhere around 200KB. Multi-threading the reconstruct disk I/O (possibly adding look-ahead), would double the reconstruct speed for this array, but that's not a trivial task. The 5MB/S that TM is seeing is fine, considering the small files he says he has. Peter Ashford -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Blocked tasks on 3.15.1
[ deadlocks during rsync in 3.15 with compression enabled ] Hi everyone, I still haven't been able to reproduce this one here, but I'm going through a series of tests with lzo compression foraced and every operation forced to ordered. Hopefully it'll kick it out soon. While I'm hammering away, could you please try this patch. If this is the buy you're hitting, the deadlock will go away and you'll see this printk in the log. thanks! -chris diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 3668048..8ab56df 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -8157,6 +8157,13 @@ void btrfs_destroy_inode(struct inode *inode) spin_unlock(root-fs_info-ordered_root_lock); } + spin_lock(root-fs_info-ordered_root_lock); + if (!list_empty(BTRFS_I(inode)-ordered_operations)) { + list_del_init(BTRFS_I(inode)-ordered_operations); +printk(KERN_CRIT racing inode deletion with ordered operations!!!\n); + } + spin_unlock(root-fs_info-ordered_root_lock); + if (test_bit(BTRFS_INODE_HAS_ORPHAN_ITEM, BTRFS_I(inode)-runtime_flags)) { btrfs_info(root-fs_info, inode %llu still on the orphan list, -- Hi Chris, just had that hang during rsync from /home (ZFS, mirrored) to /bak (Btrfs w. lzo compression) again with that patch applied, it doesn't seem to be related to that issue (or patch) - only applicable to my case, obviously - since search for that string (e.g. racing) doesn't show anything in that message: [16028.169347] INFO: task kworker/u16:2:11956 blocked for more than 180 seconds. [16028.169349] Tainted: P O 3.14.13_btrfs+_BFS_test27_integration #2 [16028.169350] echo 0 /proc/sys/kernel/hung_task_timeout_secs disables this message. [16028.169351] kworker/u16:2 D 88081ec13540 0 11956 2 0x0008 [16028.169356] Workqueue: btrfs-delalloc normal_work_helper [16028.169358] 8806180ab8e0 0046 0004 [16028.169359] a000 8806210f16b0 8806180abfd8 81e11500 [16028.169360] 8806210f16b0 0206 8113e6cc 88081ec135c0 [16028.169362] Call Trace: [16028.169367] [8113e6cc] ? delayacct_end+0x7c/0x90 [16028.169370] [811689d0] ? wait_on_page_read+0x60/0x60 [16028.169374] [819cfc78] ? io_schedule+0x88/0xe0 [16028.169375] [811689d5] ? sleep_on_page+0x5/0x10 [16028.169377] [819cfffc] ? __wait_on_bit_lock+0x3c/0x90 [16028.169378] [81168ac5] ? __lock_page+0x65/0x70 [16028.169382] [810f5580] ? autoremove_wake_function+0x30/0x30 [16028.169384] [81169854] ? __find_lock_page+0x44/0x70 [16028.169385] [811698ca] ? find_or_create_page+0x2a/0xa0 [16028.169388] [8145a1cf] ? io_ctl_prepare_pages+0x4f/0x150 [16028.169390] [8145bd45] ? __load_free_space_cache+0x195/0x5d0 [16028.169392] [8145c26b] ? load_free_space_cache+0xeb/0x1b0 [16028.169395] [813fd6a1] ? cache_block_group+0x191/0x390 [16028.169396] [810f5550] ? prepare_to_wait_event+0xf0/0xf0 [16028.169398] [814085ea] ? find_free_extent+0x95a/0xdb0 [16028.169400] [81408bf9] ? btrfs_reserve_extent+0x69/0x150 [16028.169403] [81421116] ? cow_file_range+0x136/0x420 [16028.169404] [81422493] ? submit_compressed_extents+0x1f3/0x480 [16028.169406] [81422720] ? submit_compressed_extents+0x480/0x480 [16028.169407] [8144896b] ? normal_work_helper+0x1ab/0x330 [16028.169410] [810df26d] ? process_one_work+0x16d/0x490 [16028.169411] [810dff8b] ? worker_thread+0x12b/0x410 [16028.169412] [810dfe60] ? manage_workers.isra.28+0x2c0/0x2c0 [16028.169414] [810e579a] ? kthread+0xca/0xe0 [16028.169415] [810e56d0] ? kthread_create_on_node+0x180/0x180 [16028.169417] [819d3c7c] ? ret_from_fork+0x7c/0xb0 [16028.169418] [810e56d0] ? kthread_create_on_node+0x180/0x180 [16028.169422] INFO: task btrfs-transacti:12042 blocked for more than 180 seconds. [16028.169422] Tainted: P O 3.14.13_btrfs+_BFS_test27_integration #2 [16028.169423] echo 0 /proc/sys/kernel/hung_task_timeout_secs disables this message. [16028.169423] btrfs-transacti D 88081ec13540 0 12042 2 0x0008 [16028.169425] 88009c7adb20 0046 88040d84ca68 [16028.169426] a000 88061f284ba0 88009c7adfd8 81e11500 [16028.169427] 88061f284ba0 88061a21dea8 811b8c2d 8805fc919e00 [16028.169428] Call Trace: [16028.169431] [811b8c2d] ? kmem_cache_alloc_trace+0x14d/0x160 [16028.169433] [813fd632] ? cache_block_group+0x122/0x390 [16028.169434] [810f5550] ? prepare_to_wait_event+0xf0/0xf0 [16028.169436] [814085ea] ? find_free_extent+0x95a/0xdb0 [16028.169437] [81408bf9] ? btrfs_reserve_extent+0x69/0x150 [16028.169439] [81422fa8] ? __btrfs_prealloc_file_range+0xe8/0x380 [16028.169441] [8140b6f2] ? btrfs_write_dirty_block_groups+0x642/0x6d0 [16028.169442] [819cb00c] ?
Re: 1 week to rebuid 4x 3TB raid10 is a long time!
On 07/20/2014 02:28 PM, Bob Marley wrote: On 20/07/2014 21:36, Roman Mamedov wrote: On Sun, 20 Jul 2014 21:15:31 +0200 Bob Marley bobmar...@shiftmail.org wrote: Hi TM, are you doing other significant filesystem activity during this rebuild, especially random accesses? This can reduce performances a lot on HDDs. E.g. if you were doing strenous multithreaded random writes in the meanwhile, I could expect even less than 5MB/sec overall... I believe the problem here might be that a Btrfs rebuild *is* a strenuous random read (+ random-ish write) just by itself. Mdadm-based RAID would rebuild the array reading/writing disks in a completely linear manner, and it would finish an order of magnitude faster. Now this explains a lot! So they would just need to be sorted? Sorting the files of a disk from lowest to highers block number prior to starting reconstruction seems feasible. Maybe not all of them together because they will be millions, but sorting them in chunks of 1000 files would still produce a very significant speedup! -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html As I understand the problem, it has to do with where btrfs is in the overall development process. There are a LOT of opportunities for optimization, but optimization cannot begin until btrfs is feature complete, because any work done beforehand would be wasted effort in that it would likely have to be repeated after being broken by feature enhancements. So now it is a waiting game for completion of all the major features (like additional RAID levels and possible n-way options, etc) before optimization efforts can begin. Once that happens we will likely see HUGE gains in efficiency and speed, but until then we are kind of stuck in this position where it works but leaves somewhat to be desired. I think this is one reason developers often caution users not to expect too much from btrfs at this point. Its just not there yet and it will still be some time yet before it is. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 1 week to rebuid 4x 3TB raid10 is a long time!
Hi, On 07/20/2014 04:45 PM, TM wrote: Hi, I have a raid10 with 4x 3TB disks on a microserver http://n40l.wikia.com/wiki/Base_Hardware_N54L , 8Gb RAM Recently one disk started to fail (smart errors), so I replaced it Mounted as degraded, added new disk, removed old Started yesterday I am monitoring /var/log/messages and it seems it will take a long time Started at about 8010631739392 And 20 hours later I am at 6910631739392 btrfs: relocating block group 6910631739392 flags 65 At this rate it will take a week to complete the raid rebuild!!! Just my two cents: Since 'btrfs replace' support RADI10, I suppose using replace operation is better than 'device removal and add'. Another Question is related to btrfs snapshot-aware balance. How many snapshots did you have in your system? Of course, During balance/resize/device removal operations, you could still snapshot, but fewer snapshots should speed things up! Anyway 'btrfs replace' is implemented more effective than 'device remova and add'.:-) Thanks, Wang Furthermore it seems that the operation is getting slower and slower When the rebuild started I had a new message every half a minute, now it’s getting to OneAndHalf minutes Most files are small files like flac/jpeg One week for a raid10 rebuild 4x3TB drives is a very long time. Any thoughts? Can you share any statistics from your RAID10 rebuilds? If I shut down the system, before the rebuild, what is the proper procedure to remount it? Again degraded? Or normally? Can the process of rebuilding the raid continue after a reboot? Will it survive, and continue rebuilding? Thanks in advance TM -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ENOSPC errors during balance
Marc Joliet posted on Sun, 20 Jul 2014 21:44:40 +0200 as excerpted: Am Sun, 20 Jul 2014 13:40:54 +0200 schrieb Marc Joliet mar...@gmx.de: Am Sun, 20 Jul 2014 12:22:33 +0200 schrieb Marc Joliet mar...@gmx.de: [...] I'll try this and see, but I think I have more files 1GB than would account for this error (which comes towards the end of the balance when only a few chunks are left). I'll see what find /mnt -type f -size +1G finds :) . Note that it's extent's over 1 GiB on the converted former ext4, not necessarily files over 1 GiB. You may have files over a GiB that were already broken into extents that are all less than a GiB, and btrfs would be able to deal with them fine. It's only when a single extent ended up larger than a GiB on the former ext4 that btrfs can't deal with it. Now that I think about it, though, it sounds like it could explain the sudden surge in total data size: for one very big file, several chunks/extents are created, but the data cannot be copied from the original ext4 extent. I hadn't thought about that effect, but good deductive reasoning. =:^) Well, turns out that was it! What I did: - delete the single largest file on the file system, a 12 GB VM image, along with all subvolumes that contained it - rsync it over again - start a full balance This time, the balance finished successfully :-) . Good to read! We're now two for two on this technique working around this problem! =:^) -- Duncan - List replies preferred. No HTML msgs. Every nonfree program has a lord, a master -- and if you use the program, he is your master. Richard Stallman -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 1 week to rebuid 4x 3TB raid10 is a long time!
ashford posted on Sun, 20 Jul 2014 12:59:21 -0700 as excerpted: If you assume a 12ms average seek time (normal for 7200RPM SATA drives), an 8.3ms rotational latency (half a rotation), an average 64kb write and a 100MB/S streaming write speed, each write comes in at ~21ms, which gives us ~47 IOPS. With the 64KB write size, this comes out to ~3MB/S, DISK LIMITED. The 5MB/S that TM is seeing is fine, considering the small files he says he has. Thanks for the additional numbers supporting my point. =:^) I had run some of the numbers but not to the extent you just did, so I didn't know where 5 MiB/s fit in, only that it wasn't entirely out of the range of expectation for spinning rust, given the current state of optimization... or more accurately the lack thereof, due to the focus still being on features. -- Duncan - List replies preferred. No HTML msgs. Every nonfree program has a lord, a master -- and if you use the program, he is your master. Richard Stallman -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html