Re: 1 week to rebuid 4x 3TB raid10 is a long time!
On Jul 22, 2014, at 11:13 AM, Chris Murphy li...@colorremedies.com wrote: It's been a while since I did a rebuild on HDDs, So I did this yesterday and day before with an SSD and HDD in raid1, and made the HDD do the rebuild. Baseline for this hard drive: hdparm -t 35.68 MB/sec dd if=/dev/zero of=/dev/rdisk2s1 bs=256k 13508091392 bytes transferred in 521.244920 secs (25915056 bytes/sec) I don't know why hdparm gets such good reads, and dd writes are 75% of that, but the 26MB/s write speed is realistic (this is a Firewire 400 external device) and what I typically get with long sequential writes. It's probable this is interface limited to mode S200, not a drive limitation since on SATA Rev 2 or 3 interface I get 100+MB/s transfers. During the rebuild, iotop reports actual write averaging in the 24MB/s range, and the total data to restore divided by total time for the replace command comes out to 23MB/s. The source data is a Fedora 21 install with no meaningful user data (cache files and such), so mostly a bunch of libraries, programs, and documentation. Therefore it's not exclusively small files, yet the iotop rate was very stable throughout the 4 minute rebuild. So I still think 5MB/s for a SATA connected (?) drive is to be unexpected. Chris Murphy-- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 1 week to rebuid 4x 3TB raid10 is a long time!
Wang Shilong wangsl.fnst at cn.fujitsu.com writes: The latest btrfs-progs include man page of btrfs-replace. Actually, you could use it something like: btrfs replace start srcdev|devid targetdev mnt You could use 'btrfs file show' to see missing device id. and then run btrfs replace. Hi Wang, I physically removed the drive before the rebuild, having a failing device as a source is not a good idea anyway. Without the device in place, the device name is not showing up, since the missing device is not under /dev/sdXX or anything else. That is why I asked if the special parameter 'missing' may be used in a replace. I can't say if it is supported. But I guess not, since I found no documentation on this matter. So I guess replace is not aimed at fault tolerance / rebuilding. It's just a convenient way to lets lay replace the disks with larger disks , to extend your array. A convenience tool, not an emergency tool. TM Thanks, Wang -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 1 week to rebuid 4x 3TB raid10 is a long time!
On Tue, 22 Jul 2014 14:43:45 + (UTC), Tm wrote: Wang Shilong wangsl.fnst at cn.fujitsu.com writes: The latest btrfs-progs include man page of btrfs-replace. Actually, you could use it something like: btrfs replace start srcdev|devid targetdev mnt You could use 'btrfs file show' to see missing device id. and then run btrfs replace. Hi Wang, I physically removed the drive before the rebuild, having a failing device as a source is not a good idea anyway. Without the device in place, the device name is not showing up, since the missing device is not under /dev/sdXX or anything else. That is why I asked if the special parameter 'missing' may be used in a replace. I can't say if it is supported. But I guess not, since I found no documentation on this matter. So I guess replace is not aimed at fault tolerance / rebuilding. It's just a convenient way to lets lay replace the disks with larger disks , to extend your array. A convenience tool, not an emergency tool. TM, Just read the man-page. You could have used the replace tool after physically removing the failing device. Quoting the man page: If the source device is not available anymore, or if the -r option is set, the data is built only using the RAID redundancy mechanisms. Options -r only read from srcdev if no other zero-defect mirror exists (enable this if your drive has lots of read errors, the access would be very slow) Concerning the rebuild performance, the access to the disk is linear for both reading and writing, I measured above 75 MByte/s at that time with regular 7200 RPM disks, which would be less than 10 hours to replace a 3TB disk (in worst case, if it is completely filled up). Unused/unallocated areas are skipped and additionally improve the rebuild speed. For missing disks, unfortunately the command invocation is not using the term missing but the numerical device-id instead of the device name. missing _is_ implemented in the kernel part of the replace code, but was simply forgotten in the user mode part, at least it was forgotten in the man page. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 1 week to rebuid 4x 3TB raid10 is a long time!
On Jul 21, 2014, at 8:51 PM, Duncan 1i5t5.dun...@cox.net wrote: It does not matter at all what the average file size is. … and the filesize /does/ matter. I'm not sure how. A rebuild is replicating chunks, not doing the equivalent of cp or rsync on files. Copying chunks (or strips of chunks in the case of raid10) should be a rather sequential operation. So I'm not sure where the random write behavior would come from that could drop the write performance to ~5MB/s on drives that can read/write ~100MB/s. Thus is is perfectly reasonabe to expect ~50MByte/second, per spindle, when doing a raid rebuild. ... And perfectly reasonable, at least at this point, to expect ~5 MiB/ sec total thruput, one spindle at a time, for btrfs. It's been a while since I did a rebuild on HDDs, but on SSDs the rebuilds have maxed out the replacement drive. Obviously the significant difference is rotational latency. If everyone with spinning disks and many small files is getting 5MB/s rebuilds, it suggests a rotational latency penalty if the performance is expected. I'm just not sure where that would be coming from. Random IO would incur the effect of rotational latency, but the rebuild shouldn't be random IO, rather sequential. Chris Murphy -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 1 week to rebuid 4x 3TB raid10 is a long time!
Stefan Behrens sbehrens at giantdisaster.de writes: TM, Just read the man-page. You could have used the replace tool after physically removing the failing device. Quoting the man page: If the source device is not available anymore, or if the -r option is set, the data is built only using the RAID redundancy mechanisms. Options -r only read from srcdev if no other zero-defect mirror exists (enable this if your drive has lots of read errors, the access would be very slow) Concerning the rebuild performance, the access to the disk is linear for both reading and writing, I measured above 75 MByte/s at that time with regular 7200 RPM disks, which would be less than 10 hours to replace a 3TB disk (in worst case, if it is completely filled up). Unused/unallocated areas are skipped and additionally improve the rebuild speed. For missing disks, unfortunately the command invocation is not using the term missing but the numerical device-id instead of the device name. missing _is_ implemented in the kernel part of the replace code, but was simply forgotten in the user mode part, at least it was forgotten in the man page. Hi Stefan, thank you very much, for the comprehensive info, I will opt to use replace next time. Breaking news :-) from Jul 19 14:41:36 microserver kernel: [ 1134.244007] btrfs: relocating block group 8974430633984 flags 68 to Jul 22 16:54:54 microserver kernel: [268419.463433] btrfs: relocating block group 2991474081792 flags 65 Rebuild ended before counting down to So flight time was 3 days, and I see no more messages or btrfs processes utilizing cpu. So rebuild seams ready. Just a few hours ago another disk showed some earlly touble accumulating Current_Pending_Sector but no Reallocated_Sector_Ct yet. TM -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 1 week to rebuid 4x 3TB raid10 is a long time!
Wang Shilong wangsl.fnst at cn.fujitsu.com writes: Just my two cents: Since 'btrfs replace' support RADI10, I suppose using replace operation is better than 'device removal and add'. Another Question is related to btrfs snapshot-aware balance. How many snapshots did you have in your system? Of course, During balance/resize/device removal operations, you could still snapshot, but fewer snapshots should speed things up! Anyway 'btrfs replace' is implemented more effective than 'device remova and add'. Hi Wang, just one subvolume, no snaphots or anything else. device replace: to tell you the truth I have not used it in the past. Most of my testing was done 2 years ago. So in this 'kind of production' system I did not try it. But if I knew that it was faster, perhaps I could have used it. Anyone has statistics for such a replace and the time it takes? Also, can replace be used when one device is missing? Cant find documentation. eg. btrfs replace start missing /dev/sdXX TM -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 1 week to rebuid 4x 3TB raid10 is a long time!
On Sun, Jul 20, 2014 at 7:48 PM, Duncan 1i5t5.dun...@cox.net wrote: ashford posted on Sun, 20 Jul 2014 12:59:21 -0700 as excerpted: If you assume a 12ms average seek time (normal for 7200RPM SATA drives), an 8.3ms rotational latency (half a rotation), an average 64kb write and a 100MB/S streaming write speed, each write comes in at ~21ms, which gives us ~47 IOPS. With the 64KB write size, this comes out to ~3MB/S, DISK LIMITED. The 5MB/S that TM is seeing is fine, considering the small files he says he has. Thanks for the additional numbers supporting my point. =:^) I had run some of the numbers but not to the extent you just did, so I didn't know where 5 MiB/s fit in, only that it wasn't entirely out of the range of expectation for spinning rust, given the current state of optimization... or more accurately the lack thereof, due to the focus still being on features. That is actually nonsense. Raid rebuild operates on the block/stripe layer and not on the filesystem layer. It does not matter at all what the average file size is. Raid rebuild is really only limited by disk i/o speed when performing a linear read of the whole spindle using huge i/o sizes, or, if you have multiple spindles on the same bus, the bus saturation speed. Thus is is perfectly reasonabe to expect ~50MByte/second, per spindle, when doing a raid rebuild. That is for the naive rebuild that rebuilds every single stripe. A smarter rebuild that knows which stripes are unused can skip the unused stripes and thus become even faster than that. Now, that the rebuild is off by an order of magnitude is by design but should be fixed at some stage, but with the current state of btrfs it is probably better to focus on other more urgent areas first. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 1 week to rebuid 4x 3TB raid10 is a long time!
On Jul 21, 2014, at 10:46 AM, ronnie sahlberg ronniesahlb...@gmail.com wrote: On Sun, Jul 20, 2014 at 7:48 PM, Duncan 1i5t5.dun...@cox.net wrote: ashford posted on Sun, 20 Jul 2014 12:59:21 -0700 as excerpted: If you assume a 12ms average seek time (normal for 7200RPM SATA drives), an 8.3ms rotational latency (half a rotation), an average 64kb write and a 100MB/S streaming write speed, each write comes in at ~21ms, which gives us ~47 IOPS. With the 64KB write size, this comes out to ~3MB/S, DISK LIMITED. The 5MB/S that TM is seeing is fine, considering the small files he says he has. Thanks for the additional numbers supporting my point. =:^) I had run some of the numbers but not to the extent you just did, so I didn't know where 5 MiB/s fit in, only that it wasn't entirely out of the range of expectation for spinning rust, given the current state of optimization... or more accurately the lack thereof, due to the focus still being on features. That is actually nonsense. Raid rebuild operates on the block/stripe layer and not on the filesystem layer. Not on Btrfs. It is on the filesystem layer. However, a rebuild is about replicating metadata (up to 256MB) and data (up to 1GB) chunks. For raid10, those are further broken down into 64KB strips. So the smallest size unit for replication during a rebuild on Btrfs would be 64KB. Anyway 5MB/s seems really low to me, so I'm suspicious something else is going on. I haven't done a rebuild in a couple months, but my recollection is it's always been as fast as the write performance of a single device in the btrfs volume. I'd be looking in dmesg for any of the physical drives being reset, having read or write errors, and I'd do some individual drive testing to see if the problem can be isolated. And if that's not helpful, well, this is really tedious and verbose amounts of information but it might reveal some issue is to capture actual commands going to physical devices: http://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg34886.html My expectation (i.e. I'm guessing) based on previous testing is that whether raid1 or raid10, the actual read/write commands will each be 256KB in size. Btrfs rebuild is basically designed to be a sequential operation. This could maybe fall apart if there were somehow many minimally full chunks, which is probably unlikely. Chris Murphy -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 1 week to rebuid 4x 3TB raid10 is a long time!
On 07/21/2014 10:00 PM, TM wrote: Wang Shilong wangsl.fnst at cn.fujitsu.com writes: Just my two cents: Since 'btrfs replace' support RADI10, I suppose using replace operation is better than 'device removal and add'. Another Question is related to btrfs snapshot-aware balance. How many snapshots did you have in your system? Of course, During balance/resize/device removal operations, you could still snapshot, but fewer snapshots should speed things up! Anyway 'btrfs replace' is implemented more effective than 'device remova and add'. Hi Wang, just one subvolume, no snaphots or anything else. device replace: to tell you the truth I have not used it in the past. Most of my testing was done 2 years ago. So in this 'kind of production' system I did not try it. But if I knew that it was faster, perhaps I could have used it. Anyone has statistics for such a replace and the time it takes? I don't have specific statistics about this. The conclusion come from implementation differences between replace and 'device removal'. Also, can replace be used when one device is missing? Cant find documentation. eg. btrfs replace start missing /dev/sdXX TM -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 1 week to rebuid 4x 3TB raid10 is a long time!
On 07/21/2014 10:00 PM, TM wrote: Wang Shilong wangsl.fnst at cn.fujitsu.com writes: Just my two cents: Since 'btrfs replace' support RADI10, I suppose using replace operation is better than 'device removal and add'. Another Question is related to btrfs snapshot-aware balance. How many snapshots did you have in your system? Of course, During balance/resize/device removal operations, you could still snapshot, but fewer snapshots should speed things up! Anyway 'btrfs replace' is implemented more effective than 'device remova and add'. Hi Wang, just one subvolume, no snaphots or anything else. device replace: to tell you the truth I have not used it in the past. Most of my testing was done 2 years ago. So in this 'kind of production' system I did not try it. But if I knew that it was faster, perhaps I could have used it. Anyone has statistics for such a replace and the time it takes? I don't have specific statistics about this. The conclusion come from implementation differences between replace and 'device removal'. Also, can replace be used when one device is missing? Cant find documentation. eg. btrfs replace start missing /dev/sdXX The latest btrfs-progs include man page of btrfs-replace. Actually, you could use it something like: btrfs replace start srcdev|devid targetdev mnt You could use 'btrfs file show' to see missing device id. and then run btrfs replace. Thanks, Wang TM -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 1 week to rebuid 4x 3TB raid10 is a long time!
ronnie sahlberg posted on Mon, 21 Jul 2014 09:46:07 -0700 as excerpted: On Sun, Jul 20, 2014 at 7:48 PM, Duncan 1i5t5.dun...@cox.net wrote: ashford posted on Sun, 20 Jul 2014 12:59:21 -0700 as excerpted: If you assume a 12ms average seek time (normal for 7200RPM SATA drives), an 8.3ms rotational latency (half a rotation), an average 64kb write and a 100MB/S streaming write speed, each write comes in at ~21ms, which gives us ~47 IOPS. With the 64KB write size, this comes out to ~3MB/S, DISK LIMITED. The 5MB/S that TM is seeing is fine, considering the small files he says he has. That is actually nonsense. Raid rebuild operates on the block/stripe layer and not on the filesystem layer. If we were talking about a normal raid, yes. But we're talking about btrFS, note the FS for filesystem, so indeed it *IS* the filesystem layer. Now this particular filesystem /does/ happen to have raid properties as well, but it's definitely filesystem level... It does not matter at all what the average file size is. ... and the filesize /does/ matter. Raid rebuild is really only limited by disk i/o speed when performing a linear read of the whole spindle using huge i/o sizes, or, if you have multiple spindles on the same bus, the bus saturation speed. Makes sense... if you're dealing at the raid level. If we were talking about dmraid or mdraid... and they're both much more mature and optimized, as well, so 50 MiB/sec, per spindle in parallel, would indeed be a reasonable expectation for them. But (barring bugs, which will and do happen at this stage of development) btrfs both makes far better data validity guarantees, and does a lot more complex processing what with COW and snapshotting, etc, of course in addition to the normal filesystem level stuff AND the raid-level stuff it does. Thus is is perfectly reasonabe to expect ~50MByte/second, per spindle, when doing a raid rebuild. ... And perfectly reasonable, at least at this point, to expect ~5 MiB/ sec total thruput, one spindle at a time, for btrfs. That is for the naive rebuild that rebuilds every single stripe. A smarter rebuild that knows which stripes are unused can skip the unused stripes and thus become even faster than that. Now, that the rebuild is off by an order of magnitude is by design but should be fixed at some stage, but with the current state of btrfs it is probably better to focus on other more urgent areas first. Because of all the extra work it does, btrfs may never get to full streaming speed across all spindles at once. But it can and will certainly get much better than it is, once the focus moves to optimization. *AND*, because it /does/ know which areas of the device are actually in use, once btrfs is optimized, it's quite likely that despite the slower raw speed, because it won't have to deal with the unused area, at least with the typically 20-60% unused filesystems most people run, rebuild times will match or be faster than raid-layer-only technologies that must rebuild the entire device, because they do /not/ know which areas are unused. -- Duncan - List replies preferred. No HTML msgs. Every nonfree program has a lord, a master -- and if you use the program, he is your master. Richard Stallman -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 1 week to rebuid 4x 3TB raid10 is a long time!
On 07/20/2014 10:00 AM, Tomasz Torcz wrote: On Sun, Jul 20, 2014 at 01:53:34PM +, Duncan wrote: TM posted on Sun, 20 Jul 2014 08:45:51 + as excerpted: One week for a raid10 rebuild 4x3TB drives is a very long time. Any thoughts? Can you share any statistics from your RAID10 rebuilds? At a week, that's nearly 5 MiB per second, which isn't great, but isn't entirely out of the realm of reason either, given all the processing it's doing. A day would be 33.11+, reasonable thruput for a straight copy, and a raid rebuild is rather more complex than a straight copy, so... Uhm, sorry, but 5MBps is _entirely_ unreasonable. It is order-of-magnitude unreasonable. And all the processing shouldn't even show as a blip on modern CPUs. This speed is undefendable. I wholly agree that it's undefendable, but I can tell you why it is so slow, it's not 'all the processing' (which is maybe a few hundred instructions on x86 for each block), it's because BTRFS still serializes writes to devices, instead of queuing all of them in parallel (that is, when there are four devices that need written to, it writes to each one in sequence, waiting for the previous write to finish before dispatching the next write). Personally, I would love to see this behavior improved, but I really don't have any time to work on it myself. smime.p7s Description: S/MIME Cryptographic Signature
Re: 1 week to rebuid 4x 3TB raid10 is a long time!
On 20/07/2014 10:45, TM wrote: Hi, I have a raid10 with 4x 3TB disks on a microserver http://n40l.wikia.com/wiki/Base_Hardware_N54L , 8Gb RAM Recently one disk started to fail (smart errors), so I replaced it Mounted as degraded, added new disk, removed old Started yesterday I am monitoring /var/log/messages and it seems it will take a long time Started at about 8010631739392 And 20 hours later I am at 6910631739392 btrfs: relocating block group 6910631739392 flags 65 At this rate it will take a week to complete the raid rebuild!!! Furthermore it seems that the operation is getting slower and slower When the rebuild started I had a new message every half a minute, now it’s getting to OneAndHalf minutes Most files are small files like flac/jpeg Hi TM, are you doing other significant filesystem activity during this rebuild, especially random accesses? This can reduce performances a lot on HDDs. E.g. if you were doing strenous multithreaded random writes in the meanwhile, I could expect even less than 5MB/sec overall... -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 1 week to rebuid 4x 3TB raid10 is a long time!
On Sun, 20 Jul 2014 21:15:31 +0200 Bob Marley bobmar...@shiftmail.org wrote: Hi TM, are you doing other significant filesystem activity during this rebuild, especially random accesses? This can reduce performances a lot on HDDs. E.g. if you were doing strenous multithreaded random writes in the meanwhile, I could expect even less than 5MB/sec overall... I believe the problem here might be that a Btrfs rebuild *is* a strenuous random read (+ random-ish write) just by itself. Mdadm-based RAID would rebuild the array reading/writing disks in a completely linear manner, and it would finish an order of magnitude faster. -- With respect, Roman signature.asc Description: PGP signature
Re: 1 week to rebuid 4x 3TB raid10 is a long time!
This is the cause for the slow reconstruct. I believe the problem here might be that a Btrfs rebuild *is* a strenuous random read (+ random-ish write) just by itself. If you assume a 12ms average seek time (normal for 7200RPM SATA drives), an 8.3ms rotational latency (half a rotation), an average 64kb write and a 100MB/S streaming write speed, each write comes in at ~21ms, which gives us ~47 IOPS. With the 64KB write size, this comes out to ~3MB/S, DISK LIMITED. The on-disk cache helps a bit during the startup, but once the cache is full, it's back to writes at disk speed, with some small gains if the on-disk controller can schedule the writes efficiently. Based on the single-threaded I/O that BTRFS uses during a reconstruct, I expect that the average write size is somewhere around 200KB. Multi-threading the reconstruct disk I/O (possibly adding look-ahead), would double the reconstruct speed for this array, but that's not a trivial task. The 5MB/S that TM is seeing is fine, considering the small files he says he has. Peter Ashford -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 1 week to rebuid 4x 3TB raid10 is a long time!
On 07/20/2014 02:28 PM, Bob Marley wrote: On 20/07/2014 21:36, Roman Mamedov wrote: On Sun, 20 Jul 2014 21:15:31 +0200 Bob Marley bobmar...@shiftmail.org wrote: Hi TM, are you doing other significant filesystem activity during this rebuild, especially random accesses? This can reduce performances a lot on HDDs. E.g. if you were doing strenous multithreaded random writes in the meanwhile, I could expect even less than 5MB/sec overall... I believe the problem here might be that a Btrfs rebuild *is* a strenuous random read (+ random-ish write) just by itself. Mdadm-based RAID would rebuild the array reading/writing disks in a completely linear manner, and it would finish an order of magnitude faster. Now this explains a lot! So they would just need to be sorted? Sorting the files of a disk from lowest to highers block number prior to starting reconstruction seems feasible. Maybe not all of them together because they will be millions, but sorting them in chunks of 1000 files would still produce a very significant speedup! -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html As I understand the problem, it has to do with where btrfs is in the overall development process. There are a LOT of opportunities for optimization, but optimization cannot begin until btrfs is feature complete, because any work done beforehand would be wasted effort in that it would likely have to be repeated after being broken by feature enhancements. So now it is a waiting game for completion of all the major features (like additional RAID levels and possible n-way options, etc) before optimization efforts can begin. Once that happens we will likely see HUGE gains in efficiency and speed, but until then we are kind of stuck in this position where it works but leaves somewhat to be desired. I think this is one reason developers often caution users not to expect too much from btrfs at this point. Its just not there yet and it will still be some time yet before it is. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 1 week to rebuid 4x 3TB raid10 is a long time!
Hi, On 07/20/2014 04:45 PM, TM wrote: Hi, I have a raid10 with 4x 3TB disks on a microserver http://n40l.wikia.com/wiki/Base_Hardware_N54L , 8Gb RAM Recently one disk started to fail (smart errors), so I replaced it Mounted as degraded, added new disk, removed old Started yesterday I am monitoring /var/log/messages and it seems it will take a long time Started at about 8010631739392 And 20 hours later I am at 6910631739392 btrfs: relocating block group 6910631739392 flags 65 At this rate it will take a week to complete the raid rebuild!!! Just my two cents: Since 'btrfs replace' support RADI10, I suppose using replace operation is better than 'device removal and add'. Another Question is related to btrfs snapshot-aware balance. How many snapshots did you have in your system? Of course, During balance/resize/device removal operations, you could still snapshot, but fewer snapshots should speed things up! Anyway 'btrfs replace' is implemented more effective than 'device remova and add'.:-) Thanks, Wang Furthermore it seems that the operation is getting slower and slower When the rebuild started I had a new message every half a minute, now it’s getting to OneAndHalf minutes Most files are small files like flac/jpeg One week for a raid10 rebuild 4x3TB drives is a very long time. Any thoughts? Can you share any statistics from your RAID10 rebuilds? If I shut down the system, before the rebuild, what is the proper procedure to remount it? Again degraded? Or normally? Can the process of rebuilding the raid continue after a reboot? Will it survive, and continue rebuilding? Thanks in advance TM -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 1 week to rebuid 4x 3TB raid10 is a long time!
ashford posted on Sun, 20 Jul 2014 12:59:21 -0700 as excerpted: If you assume a 12ms average seek time (normal for 7200RPM SATA drives), an 8.3ms rotational latency (half a rotation), an average 64kb write and a 100MB/S streaming write speed, each write comes in at ~21ms, which gives us ~47 IOPS. With the 64KB write size, this comes out to ~3MB/S, DISK LIMITED. The 5MB/S that TM is seeing is fine, considering the small files he says he has. Thanks for the additional numbers supporting my point. =:^) I had run some of the numbers but not to the extent you just did, so I didn't know where 5 MiB/s fit in, only that it wasn't entirely out of the range of expectation for spinning rust, given the current state of optimization... or more accurately the lack thereof, due to the focus still being on features. -- Duncan - List replies preferred. No HTML msgs. Every nonfree program has a lord, a master -- and if you use the program, he is your master. Richard Stallman -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html