Re: [OpenIndiana-discuss] Zfs stability Scrubs
On Mon, Oct 15, 2012 at 5:02 PM, Richard Elling richard.ell...@richardelling.com wrote: There is some interesting research that shows how scrubs for RAID-5 systems can contaminate otherwise good data. The reason is that if a RAID-5 parity mismatch occurs, how do you know where the data corruption is when the disks themselves do not fail. In those cases, scrubs are evil. ZFS does not suffer from this problem because the checksums are stored in the parent's metadata. A similar problem happens for traditional RAID-1 mirrors. If mirror verification shows the two disks differ, there's no way of knowing which is correct. -- David Brodbeck System Administrator, Linguistics University of Washington ___ OpenIndiana-discuss mailing list OpenIndiana-discuss@openindiana.org http://openindiana.org/mailman/listinfo/openindiana-discuss
Re: [OpenIndiana-discuss] Zfs stability Scrubs
On Mon, Oct 15, 2012 at 6:21 PM, Jason Matthews ja...@broken.net wrote: From: heinrich.vanr...@gmail.com [mailto:heinrich.vanr...@gmail.com] My point is most high end storage units has some form of data verification process that is active all the time. As does ZFS. The blocks are checksumed on each read. Assuming you have mirrors or parity redundancy, the misbehaving block is corrected, reallocated, etc. Right, I understand ZFS checks data on each read, my point is checking the disk or data periodically. In my opinion scrubs should be considered depending on the importance of data and the frequency based on what type of raidz, change rates and disk type used. One point of scrubs is to verify the data that you don't normally read. Otherwise, the errors would be found in real time upon the next read. Understood, if full backups are executed weekly/monthly no scrub is required. Perhaps in future ZFS will have the ability to limit resource allocation when scrubbing like with BV where it can be set. Rebuild priory can also be set. There are tunables for this. Thanks, did not know will research, had a fairly heavy impact the other day replacing a disk.. Also some high end controllers have port verify for each disk (media read) when using their integrated raid that runs periodically. Since in the world of ZFS it is recommended to use JBOD I see it as more than just the filesystem. I have never deployed a system containing mission critical data using filesystem raid protection other than with ZFS since there is no protection in them an I would much rather bank on the controller. Unfortunately my parser was unable to grok this. Seems like you would prefer a raid controller. Sorry, boils down to this, if ZFS is not an option I use a raid controller if data is important. In fact I do not like to be tied to a specific controller, zfs gives me the freedom to change at any point j. ___ OpenIndiana-discuss mailing list OpenIndiana-discuss@openindiana.org http://openindiana.org/mailman/listinfo/openindiana-discuss ___ OpenIndiana-discuss mailing list OpenIndiana-discuss@openindiana.org http://openindiana.org/mailman/listinfo/openindiana-discuss
Re: [OpenIndiana-discuss] Zfs stability Scrubs
On Oct 15, 2012, at 3:00 PM, heinrich.vanr...@gmail.com wrote: Most of my storage background is with EMC CX and VNX and that is used in a vast amount of datacenters. They run a process called sniiffer that runs in the background and request a read of all blocks on each disk individually for a specific LUN, if there is an unrecoverable read error a Background Verify (BV) is requested by the process to check for data consistency. The unit will also conduct a proactive copy to a hotspare, I believe once data has been verified, from the disk where the error(s) were seen. A BV is also requested when there is a LUN failover, enclosure path failure or a storage processor failure. My point is most high end storage units has some form of data verification process that is active all the time. Don't assume BV is data verification. On most midrange- systems these scrubbers just check for disks to report errors. While this should catch most media errors, it does not catch phantom writes or other corruption in the datapath. On systems with SATA disks, there is no way to add any additional checksums to the sector, so they are SOL if there is data corruption that does not also cause a disk failure. For SAS or FC disks, some vendors use larger sectors and include per-sector checksums that can help catch some phantom write or datapath corruption. There is some interesting research that shows how scrubs for RAID-5 systems can contaminate otherwise good data. The reason is that if a RAID-5 parity mismatch occurs, how do you know where the data corruption is when the disks themselves do not fail. In those cases, scrubs are evil. ZFS does not suffer from this problem because the checksums are stored in the parent's metadata. In my opinion scrubs should be considered depending on the importance of data and the frequency based on what type of raidz, change rates and disk type used. Perhaps in future ZFS will have the ability to limit resource allocation when scrubbing like with BV where it can be set. Rebuild priory can also be set. Throttling exists today, but most people don't consider mdb as a suitable method for setting :-( Scrub priority is already lowest priority, I don't see much need to increase it. -- richard Also some high end controllers have port verify for each disk (media read) when using their integrated raid that runs periodically. Since in the world of ZFS it is recommended to use JBOD I see it as more than just the filesystem. I have never deployed a system containing mission critical data using filesystem raid protection other than with ZFS since there is no protection in them an I would much rather bank on the controller. my few cents on scrubs. Thanks From: Jim Klimov Sent: October 13, 2012 9:02 To: Discussion list for OpenIndiana Subject: Re: [OpenIndiana-discuss] Zfs stability Scrubs 2012-10-13 7:26, Michael Stapleton wrote: The VAST majority of data centers are not storing data in storage that does checksums to verify data, that is just the reality. Regular backups and site replication rule. And this actually concerns me... we help maintain some deployments built by customers including professional arrays like Sun Storagetek 6140 serving a few LUNs to directly attached servers (so it happens). The arrays are black boxes to us - we don't know if they use something block-checksummed similar to ZFS inside, or can only protect against whole-disk failures, when a device just stops responding? We still have little idea - in what config would the data be safer to hold a ZFS pool, and which should give more performance: * if we use the array with its internal RAID6, and the client computer makes a pool over the single LUN * a couple of RAID6 array boxes in a mirror provided by arrays' firmware (independently of client computers, who see a MPxIO target LUN), and the computer makes a pool over the single multi-pathed LUN * a couple of RAID6 array boxes in a mirror provided by ZFS (two independent LUNs mirrored by computer) * serve LUNs from each disk in JBOD manner from the one or two arrays, and have ZFS construct pools over that. Having expensive hardware RAIDs (anyway available on customer's site) serving as JBODs is kind of overkill - any well-built JBOD costing a fraction of this array could suffice. But regarding data integrity known to be provided by ZFS and unknown to be really provided by black-box appliances, downgrading the arrays to JBODs might be better. Who knows?.. (We don't, advice welcome). There are several more things to think about: 1) Redundant configs without knowledge of which side of the mirror is good, or what permutation of RAID blocks yields the correct answer, is basically useless, and it can propagate errors by overwriting an unknownly-good copy of the data with unknownly- corrupted one. For example
Re: [OpenIndiana-discuss] Zfs stability Scrubs
2012-10-16 3:57, Heinrich van Riel wrote: Understood, if full backups are executed weekly/monthly no scrub is required. I'd argue that this is not a completely true statement. It might hold for raidzN backing storage with single-copy blocks, but if mirrors and/or two or three copies are involved (i.e. for metadata blocks) or ditto blocks on deduped pools, you have say a 50/50 or 33/67 chance of only reading once a particular copy of a block during the backup'ing procedure, and if errors hide in other copies - you'll miss them. That's where scrub should shine, by enforcing reads of all copies of all blocks while walking the block pointer tree of the pool. Hope I'm correct ;) //Jim ___ OpenIndiana-discuss mailing list OpenIndiana-discuss@openindiana.org http://openindiana.org/mailman/listinfo/openindiana-discuss
Re: [OpenIndiana-discuss] Zfs stability Scrubs
Thank you all for the good answers! So if i put it all together : 1. ZFS is, in mirror and RAID configs, the best currently available option for reliable data 2. Without scrubs data is checked on every read for integrity 3. Unread data will not be checked for integrity 4. Scrubs will solve point 3. 5. Real servers with good hardware (HCL), ECC memory and servergrade harddisks have a very low chance of dataloss/corruption when used with ZFS. 6. Large modern drives with large storage like any 750 GB hd have a higher chance for corruption 7. Real SAS and SCSi drives offer the best option for reliable data 8. So called near-line SAS drives can give problems when combined with ZFS because they haven't been tested very long 9. Checking your logs for hardware messages should be a daily job Kind regards, The out-side Op 13 okt. 2012 om 05:26 heeft Michael Stapleton michael.staple...@techsologic.com het volgende geschreven: I'm not a mathematician, but can anyone calculate the chance of the Same 8K datablock on Both submirrors Going bad on terabyte drives, before the data is ever read and fixed automatically during normal read operations? And if you are not doing mirroring, you have already accepted a much larger margin of error for the sake of $. The VAST majority of data centers are not storing data in storage that does checksums to verify data, that is just the reality. Regular backups and site replication rule. I am Not saying scubs are a bad thing, just that they are being over emphasized and some people who do not really understand are getting the wrong impression that doing scrubs very often will somehow make them a lot safer. Scrubs help. But a lot of people who are worrying about scrubs are not even doing proper backups or regular DR testing. Mike On Fri, 2012-10-12 at 22:36 -0400, Doug Hughes wrote: So?}?\, a lot of people have already answered this in various ways. I'm going to provide a little bit of direct answer and focus to some of those other answers (and emphasis) On 10/12/2012 5:07 PM, Michael Stapleton wrote: It is easy to understand that zfs srubs can be useful, But, How often do we scrub or the equivalent of any other file system? UFS? VXFS? NTFS? ... ZFS has scrubs as a feature, but is it a need? I do not think so. Other file systems accept the risk, mostly because they can not really do anything if there were errors. That's right. They cannot do anything. Why is that a good thing? If you have a corruption on your filesystem because a block or even a single bit went wrong, wouldn't you want to know? Wouldn't you want to fix it? What if a number in an important financial document changed? Seems unlikely, but we've discovered at least 5 instances of spontaneous disk data corruption over the course of a couple of years. zfs corrected them transparently. No data lost, automatic, clean, and transparent. The more data that we make, the more that possibility of spontaneous data corruption becomes reality. It does no harm to do periodic scrubs, but I would not recommend doing them often or even at all if scrubs get in the way of production. What is the real risk of not doing scrubs? data changing without you knowing it. Maybe this doesn't matter on an image file (though a jpeg could end up looking nasty or destroyed, and mpeg4 could be permanently damaged, but in a TIFF or other uncompressed format, you'd probably never know) Risk can not be eliminated, and we have to accept some risk. For example, data deduplication uses digests on data to detect duplication. Most dedup systems assume that if the digest is the same for two pieces of data, then the data must be the same. This assumption is not actually true. Two differing pieces of data can have the same digest, but the chance of this happening is so low that the risk is accepted. but, the risk of data being flipped once you have TBs of data is way above 0%. You can also do your own erasure coding if you like. That would be one way to achieve the same affect outside of ZFS. I'm only writing this because I get the feeling some people think scrubs are a need. Maybe people associate doing scrubs with something like doing NTFS defrags? NTFS defrag would only help with performance. scrub helps with integrity. Totally different things. ___ OpenIndiana-discuss mailing list OpenIndiana-discuss@openindiana.org http://openindiana.org/mailman/listinfo/openindiana-discuss ___ OpenIndiana-discuss mailing list OpenIndiana-discuss@openindiana.org http://openindiana.org/mailman/listinfo/openindiana-discuss ___ OpenIndiana-discuss mailing list OpenIndiana-discuss@openindiana.org http://openindiana.org/mailman/listinfo/openindiana-discuss
Re: [OpenIndiana-discuss] Zfs stability Scrubs
10. If SUN had listen to the engineers instead of financials it now would have been marketleader in the server market ;-( Op 13 okt. 2012 om 09:56 heeft Roel_D openindi...@out-side.nl het volgende geschreven: Thank you all for the good answers! So if i put it all together : 1. ZFS is, in mirror and RAID configs, the best currently available option for reliable data 2. Without scrubs data is checked on every read for integrity 3. Unread data will not be checked for integrity 4. Scrubs will solve point 3. 5. Real servers with good hardware (HCL), ECC memory and servergrade harddisks have a very low chance of dataloss/corruption when used with ZFS. 6. Large modern drives with large storage like any 750 GB hd have a higher chance for corruption 7. Real SAS and SCSi drives offer the best option for reliable data 8. So called near-line SAS drives can give problems when combined with ZFS because they haven't been tested very long 9. Checking your logs for hardware messages should be a daily job Kind regards, The out-side Op 13 okt. 2012 om 05:26 heeft Michael Stapleton michael.staple...@techsologic.com het volgende geschreven: I'm not a mathematician, but can anyone calculate the chance of the Same 8K datablock on Both submirrors Going bad on terabyte drives, before the data is ever read and fixed automatically during normal read operations? And if you are not doing mirroring, you have already accepted a much larger margin of error for the sake of $. The VAST majority of data centers are not storing data in storage that does checksums to verify data, that is just the reality. Regular backups and site replication rule. I am Not saying scubs are a bad thing, just that they are being over emphasized and some people who do not really understand are getting the wrong impression that doing scrubs very often will somehow make them a lot safer. Scrubs help. But a lot of people who are worrying about scrubs are not even doing proper backups or regular DR testing. Mike On Fri, 2012-10-12 at 22:36 -0400, Doug Hughes wrote: So?}?\, a lot of people have already answered this in various ways. I'm going to provide a little bit of direct answer and focus to some of those other answers (and emphasis) On 10/12/2012 5:07 PM, Michael Stapleton wrote: It is easy to understand that zfs srubs can be useful, But, How often do we scrub or the equivalent of any other file system? UFS? VXFS? NTFS? ... ZFS has scrubs as a feature, but is it a need? I do not think so. Other file systems accept the risk, mostly because they can not really do anything if there were errors. That's right. They cannot do anything. Why is that a good thing? If you have a corruption on your filesystem because a block or even a single bit went wrong, wouldn't you want to know? Wouldn't you want to fix it? What if a number in an important financial document changed? Seems unlikely, but we've discovered at least 5 instances of spontaneous disk data corruption over the course of a couple of years. zfs corrected them transparently. No data lost, automatic, clean, and transparent. The more data that we make, the more that possibility of spontaneous data corruption becomes reality. It does no harm to do periodic scrubs, but I would not recommend doing them often or even at all if scrubs get in the way of production. What is the real risk of not doing scrubs? data changing without you knowing it. Maybe this doesn't matter on an image file (though a jpeg could end up looking nasty or destroyed, and mpeg4 could be permanently damaged, but in a TIFF or other uncompressed format, you'd probably never know) Risk can not be eliminated, and we have to accept some risk. For example, data deduplication uses digests on data to detect duplication. Most dedup systems assume that if the digest is the same for two pieces of data, then the data must be the same. This assumption is not actually true. Two differing pieces of data can have the same digest, but the chance of this happening is so low that the risk is accepted. but, the risk of data being flipped once you have TBs of data is way above 0%. You can also do your own erasure coding if you like. That would be one way to achieve the same affect outside of ZFS. I'm only writing this because I get the feeling some people think scrubs are a need. Maybe people associate doing scrubs with something like doing NTFS defrags? NTFS defrag would only help with performance. scrub helps with integrity. Totally different things. ___ OpenIndiana-discuss mailing list OpenIndiana-discuss@openindiana.org http://openindiana.org/mailman/listinfo/openindiana-discuss ___ OpenIndiana-discuss mailing list OpenIndiana-discuss@openindiana.org http://openindiana.org/mailman/listinfo/openindiana-discuss
Re: [OpenIndiana-discuss] Zfs stability Scrubs
2012-10-13 2:06, Jan Owoc wrote: All scrubbing does is put stress on drives and verify that data can still be read from them. If a hard drive ever fails on you and you need to replace it (how often does that happen?), then you know hey, just last week all the other hard drives were able to read their data under stress, so are less likely to fail on me. Also note that there are different types of media that are differently impacted by IOs. CDs/DVDs and tape can get more scratches upon reads, SSDs wear out upon writes, while HDDs in stable conditions (good heat, power and vibration) don't care about doing IOs in terms of their media, though mechanics of the head movement can wear out - thus, see the disk's ratings (i.e. 24x7 or not) and vendor-assumed lifetime. I heard a statement which I am ready to accept but can not vouch for validity of, that by having the magnetic head read the bits from the platter can actually help the media hold its data, by aligning the magnetic domains to one of their two valid positions. Due to brownian movement and other factors, these miniature crystals can turn around in their little beds and spell zeroes or ones with less and less exactness. Applying oriented magnetic fields can push them back into one of the stable positions. Well, whether that was crap or not - I'm not ready to say, but one thing that is more likely true is that HDDs have ECC on their sectors. If a read produces repairable bad data, the HDD itself can try to repair the sector in-place or by relocation to spare area, perhaps by applying stronger fields to discern the bits better, and if it succeeds - it would return no error to the HBA and return the fixed data. If the repair result was wrong, ZFS would detect incorrect data and issue its own repairs, using other copies or raidzN permutations. Also note that this self-repair takes time while the HDD does nothing else, and *that* IO timeout can cause grief for RAID systems, HBA reset storms and so on (hence the RAID editions of drives, TLER and so on). On the other hand, if you're putting regular stress on the disks and see some error counters (monitoring!) go high, you can preemptively order and replace aging disks, instead of trying to recover from a pool with reduced redundancy a few days or months later. HTH, //Jim Klimov ___ OpenIndiana-discuss mailing list OpenIndiana-discuss@openindiana.org http://openindiana.org/mailman/listinfo/openindiana-discuss
Re: [OpenIndiana-discuss] Zfs stability Scrubs
A few more comments: 2012-10-13 11:56, Roel_D wrote: Thank you all for the good answers! So if i put it all together : 1. ZFS is, in mirror and RAID configs, the best currently available option for reliable data Yes, though even it is not replacement for backups, because data loss can be caused by reasons outside ZFS control, including admin errors, datacenter fires, code bugs and so on. 2. Without scrubs data is checked on every read for integrity With normal reads, this check only takes place for the one semi-randomly chosen copy of the block. If this copy is not valid, other copies are consulted. 3. Unread data will not be checked for integrity 4. Scrubs will solve point 3. Yes, because they enforce reads and checks of all copies. 5. Real servers with good hardware (HCL), ECC memory and servergrade harddisks have a very low chance of dataloss/corruption when used with ZFS. Put otherwise, cheaper hardware tends to cause problems of various nature, that can not be detected and fixed by this hardware and corrupted data is propagated to ZFS and it trustily saves trash to disks. Few programs do verify-on-write to test the saved results... 6. Large modern drives with large storage like any 750 GB hd have a higher chance for corruption The bit-error rates are somewhat the same for disks of the past decade, being roughly one bit per 10Tb of IOs. With disk sizes and overall throughputs growing, the chance of hitting an error on a particular large disk increases. 7. Real SAS and SCSi drives offer the best option for reliable data 8. So called near-line SAS drives can give problems when combined with ZFS because they haven't been tested very long There are also some architectural things and lessons learned, like don't use SATA disks with SAS expanders, while direct attachment of SATA disks to individual HBA ports works without problems (i.e. Sun Thumpers are built like this - with six eight-port HBAs on board to drive the 48 disks in the box). 9. Checking your logs for hardware messages should be a daily job Better yet, some monitoring system (nagios, zabbix, whatever) should check these logs so you have one dashboard for all your computers with a big green light on it, meaning no problems detected anywhere. You can worry if the light goes not-green ;) You should manually check the system with drills too, to test that it itself monitors stuff correctly, though - but that can be a non-daily routine. //Jim ___ OpenIndiana-discuss mailing list OpenIndiana-discuss@openindiana.org http://openindiana.org/mailman/listinfo/openindiana-discuss
Re: [OpenIndiana-discuss] Zfs stability Scrubs
2012-10-13 7:26, Michael Stapleton wrote: The VAST majority of data centers are not storing data in storage that does checksums to verify data, that is just the reality. Regular backups and site replication rule. And this actually concerns me... we help maintain some deployments built by customers including professional arrays like Sun Storagetek 6140 serving a few LUNs to directly attached servers (so it happens). The arrays are black boxes to us - we don't know if they use something block-checksummed similar to ZFS inside, or can only protect against whole-disk failures, when a device just stops responding? We still have little idea - in what config would the data be safer to hold a ZFS pool, and which should give more performance: * if we use the array with its internal RAID6, and the client computer makes a pool over the single LUN * a couple of RAID6 array boxes in a mirror provided by arrays' firmware (independently of client computers, who see a MPxIO target LUN), and the computer makes a pool over the single multi-pathed LUN * a couple of RAID6 array boxes in a mirror provided by ZFS (two independent LUNs mirrored by computer) * serve LUNs from each disk in JBOD manner from the one or two arrays, and have ZFS construct pools over that. Having expensive hardware RAIDs (anyway available on customer's site) serving as JBODs is kind of overkill - any well-built JBOD costing a fraction of this array could suffice. But regarding data integrity known to be provided by ZFS and unknown to be really provided by black-box appliances, downgrading the arrays to JBODs might be better. Who knows?.. (We don't, advice welcome). There are several more things to think about: 1) Redundant configs without knowledge of which side of the mirror is good, or what permutation of RAID blocks yields the correct answer, is basically useless, and it can propagate errors by overwriting an unknownly-good copy of the data with unknownly- corrupted one. For example, take a root mirror. You find that your OS can't boot. You can try to split the mirror into two separate disks, fsck each of them and if one is still correct, recreate the mirror using it as base (first half). Even if both disks give some errors, these might be in different parts of the data, so you have a chance of reconstructing the data using these two halves and/or backups. However, if your simplistic RAID just copies data from disk1 to disk2 in case of any discrepancies and unclean shutdowns, you're roughly 50% likely to corrupt a good disk2 with bad data from disk1. This setup assumed that bit-rot never occurred or was too rare, bus/RAM errors never happened or were ruled out by CRC/ECC, and instead disks died altogether, instantly becoming bricks (which could be quite true in the old days, and can still be probable with expensive enterprise hardware). Basically, this assumed that data written from a process was the same data that hit the disk platters and the same data that was returned upon reads (unless an IO error/deviceMissing were reported) - in that case old RAIDs could indeed propagate assumed-good data onto replacement disk(s) during reconstruction of the array. 2) Backups and replicas without means to verify them (checksums or at least three-way comparisons at some level) are also tainted, because you don't really know if what you read from them ever matches what you wrote to them (perhaps several years ago, counting from the moment the data was written onto RAID originally). My few cents, //Jim ___ OpenIndiana-discuss mailing list OpenIndiana-discuss@openindiana.org http://openindiana.org/mailman/listinfo/openindiana-discuss
Re: [OpenIndiana-discuss] Zfs stability Scrubs
Nice list. You could add: 10. Dedup comes with a price. Mike On Sat, 2012-10-13 at 09:56 +0200, Roel_D wrote: Thank you all for the good answers! So if i put it all together : 1. ZFS is, in mirror and RAID configs, the best currently available option for reliable data 2. Without scrubs data is checked on every read for integrity 3. Unread data will not be checked for integrity 4. Scrubs will solve point 3. 5. Real servers with good hardware (HCL), ECC memory and servergrade harddisks have a very low chance of dataloss/corruption when used with ZFS. 6. Large modern drives with large storage like any 750 GB hd have a higher chance for corruption 7. Real SAS and SCSi drives offer the best option for reliable data 8. So called near-line SAS drives can give problems when combined with ZFS because they haven't been tested very long 9. Checking your logs for hardware messages should be a daily job Kind regards, The out-side Op 13 okt. 2012 om 05:26 heeft Michael Stapleton michael.staple...@techsologic.com het volgende geschreven: I'm not a mathematician, but can anyone calculate the chance of the Same 8K datablock on Both submirrors Going bad on terabyte drives, before the data is ever read and fixed automatically during normal read operations? And if you are not doing mirroring, you have already accepted a much larger margin of error for the sake of $. The VAST majority of data centers are not storing data in storage that does checksums to verify data, that is just the reality. Regular backups and site replication rule. I am Not saying scubs are a bad thing, just that they are being over emphasized and some people who do not really understand are getting the wrong impression that doing scrubs very often will somehow make them a lot safer. Scrubs help. But a lot of people who are worrying about scrubs are not even doing proper backups or regular DR testing. Mike On Fri, 2012-10-12 at 22:36 -0400, Doug Hughes wrote: So?}?\, a lot of people have already answered this in various ways. I'm going to provide a little bit of direct answer and focus to some of those other answers (and emphasis) On 10/12/2012 5:07 PM, Michael Stapleton wrote: It is easy to understand that zfs srubs can be useful, But, How often do we scrub or the equivalent of any other file system? UFS? VXFS? NTFS? ... ZFS has scrubs as a feature, but is it a need? I do not think so. Other file systems accept the risk, mostly because they can not really do anything if there were errors. That's right. They cannot do anything. Why is that a good thing? If you have a corruption on your filesystem because a block or even a single bit went wrong, wouldn't you want to know? Wouldn't you want to fix it? What if a number in an important financial document changed? Seems unlikely, but we've discovered at least 5 instances of spontaneous disk data corruption over the course of a couple of years. zfs corrected them transparently. No data lost, automatic, clean, and transparent. The more data that we make, the more that possibility of spontaneous data corruption becomes reality. It does no harm to do periodic scrubs, but I would not recommend doing them often or even at all if scrubs get in the way of production. What is the real risk of not doing scrubs? data changing without you knowing it. Maybe this doesn't matter on an image file (though a jpeg could end up looking nasty or destroyed, and mpeg4 could be permanently damaged, but in a TIFF or other uncompressed format, you'd probably never know) Risk can not be eliminated, and we have to accept some risk. For example, data deduplication uses digests on data to detect duplication. Most dedup systems assume that if the digest is the same for two pieces of data, then the data must be the same. This assumption is not actually true. Two differing pieces of data can have the same digest, but the chance of this happening is so low that the risk is accepted. but, the risk of data being flipped once you have TBs of data is way above 0%. You can also do your own erasure coding if you like. That would be one way to achieve the same affect outside of ZFS. I'm only writing this because I get the feeling some people think scrubs are a need. Maybe people associate doing scrubs with something like doing NTFS defrags? NTFS defrag would only help with performance. scrub helps with integrity. Totally different things. ___ OpenIndiana-discuss mailing list OpenIndiana-discuss@openindiana.org http://openindiana.org/mailman/listinfo/openindiana-discuss ___ OpenIndiana-discuss mailing list OpenIndiana-discuss@openindiana.org http://openindiana.org/mailman/listinfo/openindiana-discuss
Re: [OpenIndiana-discuss] Zfs stability - our scrub script
2012-10-13 0:41, Doug Hughes wrote: yes, you shoud do a scrub and no, there isn't very much risk to this. This will scan your disks for bits that have gone stale or the like. You should do it. We do a scrub once per week. Just in case this helps anyone, here's the script we use to initiate scrubbing from cron (i.e. once a week on fridays). Just add a line to crontab and receive emails ;) There's some config-initialization and include cruft at the start (we have a large package of admin-scripts), I hope absence of config files (which can be used to override hardcoded defaults) and libraries won't preclude the script from running on systems without our package: # cat /opt/COSas/bin/zpool-scrub.sh - #!/bin/bash # $Id: zpool-scrub.sh,v 1.6 2010/11/15 14:32:19 jim Exp $ # this script will go through all pools and scrub them one at a time # # Use like this in crontab: # 0 22 * * 5 [ -x /opt/COSas/bin/zpool-scrub.sh ] /opt/COSas/bin/zpool-scrub.sh # # (C) 2007 nic...@aspiringsysadmin.com and commenters # http://aspiringsysadmin.com/blog/2007/06/07/scrub-your-zfs-file-systems-regularly/ # (C) 2009 Jim Klimov, cosmetic mods and logging; 2010 - locking # #[ x$MAILRECIPIENT = x ] MAILRECIPIENT=ad...@domain.com [ x$MAILRECIPIENT = x ] MAILRECIPIENT=root [ x$ZPOOL = x ] ZPOOL=/usr/sbin/zpool [ x$TMPFILE = x ] TMPFILE=/tmp/scrub.sh.$$.$RANDOM [ x$LOCK = x ] LOCK=/tmp/`basename $0`.`dirname $0 | sed 's/\//_/g'`.lock COSAS_BINDIR=`dirname $0` if [ x$COSAS_BINDIR = x./ -o x$COSAS_BINDIR = x. ]; then COSAS_BINDIR=`pwd` fi # Source optional config files [ x$COSAS_CFGDIR = x ] COSAS_CFGDIR=$COSAS_BINDIR/../etc if [ -d $COSAS_CFGDIR ]; then [ -f $COSAS_CFGDIR/COSas.conf ] \ . $COSAS_CFGDIR/COSas.conf [ -f $COSAS_CFGDIR/`basename $0`.conf ] \ . $COSAS_CFGDIR/`basename $0`.conf fi [ ! -x $ZPOOL ] exit 1 ### Include this after config files, in case of RUNLEVEL_NOKICK mask override RUN_CHECKLEVEL= [ -s $COSAS_BINDIR/runlevel_check.include ] . $COSAS_BINDIR/runlevel_check.include block_runlevel # Check LOCKfile if [ -f $LOCK ]; then OLDPID=`head -n 1 $LOCK` BN=`basename $0` TRYOLDPID=`ps -ef | grep $BN | grep -v grep | awk '{ print $2 }' | grep $OLDPID` if [ x$TRYOLDPID != x ]; then LF=`cat $LOCK` echo = ZPoolScrub wrapper aborted because another copy is running - lockfile found: $LF Aborting... | wall exit 1 fi fi echo $$ $LOCK scrub_in_progress() { ### Check that we're not yet shutting down if [ x$RUN_CHECKLEVEL != x ]; then if [ x`check_runlevel` != x ]; then echo INFO: System is shutting down. Aborting scrub of pool '$1'! 2 zpool scrub -s $1 return 1 fi fi if $ZPOOL status $1 | grep scrub in progress /dev/null; then return 0 else return 1 fi } RESULT=0 for pool in `$ZPOOL list -H -o name`; do echo === `TZ=UTC date` @ `hostname`: $ZPOOL scrub $pool started... $ZPOOL scrub $pool while scrub_in_progress $pool; do sleep 60; done echo === `TZ=UTC date` @ `hostname`: $ZPOOL scrub $pool completed if ! $ZPOOL status $pool | grep with 0 errors /dev/null; then $ZPOOL status $pool | tee -a $TMPFILE RESULT=$(($RESULT+1)) fi done if [ -s $TMPFILE ]; then cat $TMPFILE | mailx -s zpool scrub on `hostname` generated errors $MAILRECIPIENT fi rm -f $TMPFILE # Be nice, clean up rm -f $LOCK exit $RESULT - HTH, //Jim Klimov ___ OpenIndiana-discuss mailing list OpenIndiana-discuss@openindiana.org http://openindiana.org/mailman/listinfo/openindiana-discuss
Re: [OpenIndiana-discuss] Zfs stability Scrubs
Some basic thoughts: The one advantage of using a storage array instead of a JBOD is the write cache when doing random writes. But the cost is that you loose the data integrity features if the ZFS pool is not configured with redundancy. ZFS works best when it has multiple direct paths to multiple physical devices configured with mirrored VDevs. So the bottom line for ZFS is that JBODs are almost always the best choice as long as the quality of the devices and device drivers are similar. SANs provide centralized administration and maintenance, which is their main feature. If you could map actual hard drives from the SAN to ZFS everyone could be happy. Backup done while services are running all too often results in unhappy people. There are few easy answers when it comes for performance. And the actual answer to most questions is It Depends. Mike On Sat, 2012-10-13 at 17:02 +0400, Jim Klimov wrote: 2012-10-13 7:26, Michael Stapleton wrote: The VAST majority of data centers are not storing data in storage that does checksums to verify data, that is just the reality. Regular backups and site replication rule. And this actually concerns me... we help maintain some deployments built by customers including professional arrays like Sun Storagetek 6140 serving a few LUNs to directly attached servers (so it happens). The arrays are black boxes to us - we don't know if they use something block-checksummed similar to ZFS inside, or can only protect against whole-disk failures, when a device just stops responding? We still have little idea - in what config would the data be safer to hold a ZFS pool, and which should give more performance: * if we use the array with its internal RAID6, and the client computer makes a pool over the single LUN * a couple of RAID6 array boxes in a mirror provided by arrays' firmware (independently of client computers, who see a MPxIO target LUN), and the computer makes a pool over the single multi-pathed LUN * a couple of RAID6 array boxes in a mirror provided by ZFS (two independent LUNs mirrored by computer) * serve LUNs from each disk in JBOD manner from the one or two arrays, and have ZFS construct pools over that. Having expensive hardware RAIDs (anyway available on customer's site) serving as JBODs is kind of overkill - any well-built JBOD costing a fraction of this array could suffice. But regarding data integrity known to be provided by ZFS and unknown to be really provided by black-box appliances, downgrading the arrays to JBODs might be better. Who knows?.. (We don't, advice welcome). There are several more things to think about: 1) Redundant configs without knowledge of which side of the mirror is good, or what permutation of RAID blocks yields the correct answer, is basically useless, and it can propagate errors by overwriting an unknownly-good copy of the data with unknownly- corrupted one. For example, take a root mirror. You find that your OS can't boot. You can try to split the mirror into two separate disks, fsck each of them and if one is still correct, recreate the mirror using it as base (first half). Even if both disks give some errors, these might be in different parts of the data, so you have a chance of reconstructing the data using these two halves and/or backups. However, if your simplistic RAID just copies data from disk1 to disk2 in case of any discrepancies and unclean shutdowns, you're roughly 50% likely to corrupt a good disk2 with bad data from disk1. This setup assumed that bit-rot never occurred or was too rare, bus/RAM errors never happened or were ruled out by CRC/ECC, and instead disks died altogether, instantly becoming bricks (which could be quite true in the old days, and can still be probable with expensive enterprise hardware). Basically, this assumed that data written from a process was the same data that hit the disk platters and the same data that was returned upon reads (unless an IO error/deviceMissing were reported) - in that case old RAIDs could indeed propagate assumed-good data onto replacement disk(s) during reconstruction of the array. 2) Backups and replicas without means to verify them (checksums or at least three-way comparisons at some level) are also tainted, because you don't really know if what you read from them ever matches what you wrote to them (perhaps several years ago, counting from the moment the data was written onto RAID originally). My few cents, //Jim ___ OpenIndiana-discuss mailing list OpenIndiana-discuss@openindiana.org http://openindiana.org/mailman/listinfo/openindiana-discuss
[OpenIndiana-discuss] Zfs stability
Being on the list and reading all ZFS problem and question posts makes me a little scared. I have 4 Sun X4140 servers running in the field for 4 years now and they all have ZFS mirrors (2x HD). They are running Solaris 10 and 1 is running solaris 11. I also have some other servers running OI, also with ZFS. The Solaris servers N E V E R had any ZFS scrub. I didn't even knew such existed ;-) Since it all worked flawless for years now i am a huge Solaris/OI fan. But how stable are things nowaday? Does one need to do a scrub? Or a resilver? How come i see so much ZFS trouble? Kind regards, The out-side ___ OpenIndiana-discuss mailing list OpenIndiana-discuss@openindiana.org http://openindiana.org/mailman/listinfo/openindiana-discuss
Re: [OpenIndiana-discuss] Zfs stability
yes, you shoud do a scrub and no, there isn't very much risk to this. This will scan your disks for bits that have gone stale or the like. You should do it. We do a scrub once per week. On Fri, Oct 12, 2012 at 3:55 PM, Roel_D openindi...@out-side.nl wrote: Being on the list and reading all ZFS problem and question posts makes me a little scared. I have 4 Sun X4140 servers running in the field for 4 years now and they all have ZFS mirrors (2x HD). They are running Solaris 10 and 1 is running solaris 11. I also have some other servers running OI, also with ZFS. The Solaris servers N E V E R had any ZFS scrub. I didn't even knew such existed ;-) Since it all worked flawless for years now i am a huge Solaris/OI fan. But how stable are things nowaday? Does one need to do a scrub? Or a resilver? How come i see so much ZFS trouble? Kind regards, The out-side ___ OpenIndiana-discuss mailing list OpenIndiana-discuss@openindiana.org http://openindiana.org/mailman/listinfo/openindiana-discuss ___ OpenIndiana-discuss mailing list OpenIndiana-discuss@openindiana.org http://openindiana.org/mailman/listinfo/openindiana-discuss
Re: [OpenIndiana-discuss] Zfs stability
Also, the reason there's so much talk about broken ZFS is because nobody complains when their pools aren't broken. On Fri, Oct 12, 2012 at 4:41 PM, Doug Hughes d...@will.to wrote: yes, you shoud do a scrub and no, there isn't very much risk to this. This will scan your disks for bits that have gone stale or the like. You should do it. We do a scrub once per week. On Fri, Oct 12, 2012 at 3:55 PM, Roel_D openindi...@out-side.nl wrote: Being on the list and reading all ZFS problem and question posts makes me a little scared. I have 4 Sun X4140 servers running in the field for 4 years now and they all have ZFS mirrors (2x HD). They are running Solaris 10 and 1 is running solaris 11. I also have some other servers running OI, also with ZFS. The Solaris servers N E V E R had any ZFS scrub. I didn't even knew such existed ;-) Since it all worked flawless for years now i am a huge Solaris/OI fan. But how stable are things nowaday? Does one need to do a scrub? Or a resilver? How come i see so much ZFS trouble? Kind regards, The out-side ___ OpenIndiana-discuss mailing list OpenIndiana-discuss@openindiana.org http://openindiana.org/mailman/listinfo/openindiana-discuss ___ OpenIndiana-discuss mailing list OpenIndiana-discuss@openindiana.org http://openindiana.org/mailman/listinfo/openindiana-discuss -- Seconds to the drop, but it seems like hours. http://www.openmedia.ca https://robbiecrash.me ___ OpenIndiana-discuss mailing list OpenIndiana-discuss@openindiana.org http://openindiana.org/mailman/listinfo/openindiana-discuss
Re: [OpenIndiana-discuss] Zfs stability Scrubs
It is easy to understand that zfs srubs can be useful, But, How often do we scrub or the equivalent of any other file system? UFS? VXFS? NTFS? ... ZFS has scrubs as a feature, but is it a need? I do not think so. Other file systems accept the risk, mostly because they can not really do anything if there were errors. It does no harm to do periodic scrubs, but I would not recommend doing them often or even at all if scrubs get in the way of production. What is the real risk of not doing scrubs? Risk can not be eliminated, and we have to accept some risk. For example, data deduplication uses digests on data to detect duplication. Most dedup systems assume that if the digest is the same for two pieces of data, then the data must be the same. This assumption is not actually true. Two differing pieces of data can have the same digest, but the chance of this happening is so low that the risk is accepted. I'm only writing this because I get the feeling some people think scrubs are a need. Maybe people associate doing scrubs with something like doing NTFS defrags? Just my 2 cents! Mike On Fri, 2012-10-12 at 16:41 -0400, Doug Hughes wrote: yes, you shoud do a scrub and no, there isn't very much risk to this. This will scan your disks for bits that have gone stale or the like. You should do it. We do a scrub once per week. On Fri, Oct 12, 2012 at 3:55 PM, Roel_D openindi...@out-side.nl wrote: Being on the list and reading all ZFS problem and question posts makes me a little scared. I have 4 Sun X4140 servers running in the field for 4 years now and they all have ZFS mirrors (2x HD). They are running Solaris 10 and 1 is running solaris 11. I also have some other servers running OI, also with ZFS. The Solaris servers N E V E R had any ZFS scrub. I didn't even knew such existed ;-) Since it all worked flawless for years now i am a huge Solaris/OI fan. But how stable are things nowaday? Does one need to do a scrub? Or a resilver? How come i see so much ZFS trouble? Kind regards, The out-side ___ OpenIndiana-discuss mailing list OpenIndiana-discuss@openindiana.org http://openindiana.org/mailman/listinfo/openindiana-discuss ___ OpenIndiana-discuss mailing list OpenIndiana-discuss@openindiana.org http://openindiana.org/mailman/listinfo/openindiana-discuss ___ OpenIndiana-discuss mailing list OpenIndiana-discuss@openindiana.org http://openindiana.org/mailman/listinfo/openindiana-discuss
Re: [OpenIndiana-discuss] Zfs stability
+1. What the previous poster is missing is this: it's entirely possible for sectors on a disk to go bad and if you haven't read them in awhile, you might not notice. Then, say, the other disk (in a mirror for example) dies entirely. You are dismayed to realize your redundant disk configuration has lost data for you anyway. -Original Message- From: Doug Hughes [mailto:d...@will.to] Sent: Friday, October 12, 2012 4:42 PM To: Discussion list for OpenIndiana Subject: Re: [OpenIndiana-discuss] Zfs stability yes, you shoud do a scrub and no, there isn't very much risk to this. This will scan your disks for bits that have gone stale or the like. You should do it. We do a scrub once per week. On Fri, Oct 12, 2012 at 3:55 PM, Roel_D openindi...@out-side.nl wrote: Being on the list and reading all ZFS problem and question posts makes me a little scared. I have 4 Sun X4140 servers running in the field for 4 years now and they all have ZFS mirrors (2x HD). They are running Solaris 10 and 1 is running solaris 11. I also have some other servers running OI, also with ZFS. The Solaris servers N E V E R had any ZFS scrub. I didn't even knew such existed ;-) Since it all worked flawless for years now i am a huge Solaris/OI fan. But how stable are things nowaday? Does one need to do a scrub? Or a resilver? How come i see so much ZFS trouble? Kind regards, The out-side ___ OpenIndiana-discuss mailing list OpenIndiana-discuss@openindiana.org http://openindiana.org/mailman/listinfo/openindiana-discuss ___ OpenIndiana-discuss mailing list OpenIndiana-discuss@openindiana.org http://openindiana.org/mailman/listinfo/openindiana-discuss ___ OpenIndiana-discuss mailing list OpenIndiana-discuss@openindiana.org http://openindiana.org/mailman/listinfo/openindiana-discuss
Re: [OpenIndiana-discuss] Zfs stability Scrubs
Maybe people associate doing scrubs with something like doing NTFS defrags? Well if read all the posts and because i installed napp-it on my homeserver which has a scrub scheduler i was almost at the point of assuming such. I recently bought a secondhand x4140 just because it performs so well. I had until recently running mysql cluster on an old HP G3 with solaris 10. It served a lot of data with heavy writes every 15 minutes. The whole cluster was running in zones based on ZFS storage. Worked like a charm, without scrubs for 3 years. It had 4 scsi 73GB drives. Had to stop it because i moved all to a X4140. ZFS saved me so much trouble and is so fast that i am afraid that new OI users will get scared when they read all the bad news. Kind regards, The out-side Op 12 okt. 2012 om 23:07 heeft Michael Stapleton michael.staple...@techsologic.com het volgende geschreven: Maybe people associate doing scrubs with something like doing NTFS defrags? ___ OpenIndiana-discuss mailing list OpenIndiana-discuss@openindiana.org http://openindiana.org/mailman/listinfo/openindiana-discuss
Re: [OpenIndiana-discuss] Zfs stability Scrubs
--- On Fri, 10/12/12, Michael Stapleton michael.staple...@techsologic.com wrote: I'm only writing this because I get the feeling some people think scrubs are a need. Maybe people associate doing scrubs with something like doing NTFS defrags? I normally do scrubs when I think about it. Which has been a long time between scrubs in most cases. I got more interested in doing it regularly when I encountered SMART errors for excessive sector remapping after a reboot. I don't know if a scrub would detect that or not. The admin skills in this list vary from very high to very low. High skill admins take any threat to system integrity seriously and try to reduce it. At a job I worked many years ago, the admins were replacing several failed disks every week in the RAID arrays. If you have lots of disks, you will have lots of failures. There are a lot of companies w/ many petabytes of data on disk. Even w/ 4 TB drives, that's still a lot of drives. And you're always stuck running disks which are several years old and failing more often. Have Fun! Reg ___ OpenIndiana-discuss mailing list OpenIndiana-discuss@openindiana.org http://openindiana.org/mailman/listinfo/openindiana-discuss
Re: [OpenIndiana-discuss] Zfs stability Scrubs
The problem is when people are overly paranoid because the feature exists and end up causing problems by doing scrubs when they should not because they feel they need to. Skilled admins also understand SLAs. Mike On Fri, 2012-10-12 at 14:38 -0700, Reginald Beardsley wrote: --- On Fri, 10/12/12, Michael Stapleton michael.staple...@techsologic.com wrote: I'm only writing this because I get the feeling some people think scrubs are a need. Maybe people associate doing scrubs with something like doing NTFS defrags? I normally do scrubs when I think about it. Which has been a long time between scrubs in most cases. I got more interested in doing it regularly when I encountered SMART errors for excessive sector remapping after a reboot. I don't know if a scrub would detect that or not. The admin skills in this list vary from very high to very low. High skill admins take any threat to system integrity seriously and try to reduce it. At a job I worked many years ago, the admins were replacing several failed disks every week in the RAID arrays. If you have lots of disks, you will have lots of failures. There are a lot of companies w/ many petabytes of data on disk. Even w/ 4 TB drives, that's still a lot of drives. And you're always stuck running disks which are several years old and failing more often. Have Fun! Reg ___ OpenIndiana-discuss mailing list OpenIndiana-discuss@openindiana.org http://openindiana.org/mailman/listinfo/openindiana-discuss ___ OpenIndiana-discuss mailing list OpenIndiana-discuss@openindiana.org http://openindiana.org/mailman/listinfo/openindiana-discuss
Re: [OpenIndiana-discuss] Zfs stability Scrubs
On Fri, Oct 12, 2012 at 3:07 PM, Michael Stapleton michael.staple...@techsologic.com wrote: It is easy to understand that zfs srubs can be useful, But, How often do we scrub or the equivalent of any other file system? UFS? VXFS? NTFS? ... If your data has checksums, it is standard practice to periodically verify your checksums and correct if necessary. ECC memory does do a scrub every once in a while :-). The FS you named don't have checksums, so scrubbing would do no good. For example, data deduplication uses digests on data to detect duplication. Most dedup systems assume that if the digest is the same for two pieces of data, then the data must be the same. This assumption is not actually true. Two differing pieces of data can have the same digest, but the chance of this happening is so low that the risk is accepted. So low is an understatement. Have you ever taken 2 to the power of 256? (ZFS currently requires sha256 checksums if you want to do dedup.) Chances of a block being different but having a duplicate sha256 is 1 in 115792089237316195423570985008687907853269984665640564039457584007913129639936. Just for fun, let's see what those odds give you. Say you were writing all human information ever produced (2.56e+20 bytes) [1] on one ZFS filesystem (with 1-byte blocksize). Let's say you were writing this much data every second for the age of the known universe (4.3e+17 s). Your odds of having one false positive with this amount of data are 1 in 1e+39. [1] http://www.wired.co.uk/news/archive/2011-02/14/256-exabytes-of-human-information I'm only writing this because I get the feeling some people think scrubs are a need. Maybe people associate doing scrubs with something like doing NTFS defrags? All scrubbing does is put stress on drives and verify that data can still be read from them. If a hard drive ever fails on you and you need to replace it (how often does that happen?), then you know hey, just last week all the other hard drives were able to read their data under stress, so are less likely to fail on me. Jan ___ OpenIndiana-discuss mailing list OpenIndiana-discuss@openindiana.org http://openindiana.org/mailman/listinfo/openindiana-discuss
Re: [OpenIndiana-discuss] Zfs stability Scrubs
But that the deal with mailing list everywhere. Be they OI or what ever else. Be it some problem someone is having, or some way to enhance a product, or to get it to do something it was never intended to do. Support mailing list and forums wouldn't exist if people didn't have problems that the didn't need support over coming. Jerry On 10/12/12 04:34 PM, Roel_D wrote: ZFS saved me so much trouble and is so fast that i am afraid that new OI users will get scared when they read all the bad news. ___ OpenIndiana-discuss mailing list OpenIndiana-discuss@openindiana.org http://openindiana.org/mailman/listinfo/openindiana-discuss
Re: [OpenIndiana-discuss] Zfs stability
On 10/12/12 16:45, Robbie Crash wrote: Also, the reason there's so much talk about broken ZFS is because nobody complains when their pools aren't broken. On Fri, Oct 12, 2012 at 3:55 PM, Roel_D openindi...@out-side.nl wrote: How come i see so much ZFS trouble? I suspect there's more to it than that. ZFS, unlike most file systems, has a built-in checksum feature that checks block integrity. If you have problems on the drive, in the controller, in the DMA mechanism, or in memory itself, you're liable to trip over ZFS checksum errors, which ZFS will then try hard to repair from a mirror or RAID-Z reconstruction. Because most other file systems don't have this capability, they just don't notice. Unless the drive itself flags the data as bad with an uncorrectable low-level read error, the OS happily believes almost any garbage it happens to read from the disk. Thus, I believe that at least some of the people complaining about ZFS stability problems here are actually getting a wonderful canary-in-a-coal-mine warning out of ZFS about the reliability of the hardware they own. Whether those folks take that warning to heart or simply wish it away by changing OSes, well, I guess that's up to them. -- James Carlson 42.703N 71.076W carls...@workingcode.com ___ OpenIndiana-discuss mailing list OpenIndiana-discuss@openindiana.org http://openindiana.org/mailman/listinfo/openindiana-discuss
Re: [OpenIndiana-discuss] Zfs stability Scrubs
So?}?\, a lot of people have already answered this in various ways. I'm going to provide a little bit of direct answer and focus to some of those other answers (and emphasis) On 10/12/2012 5:07 PM, Michael Stapleton wrote: It is easy to understand that zfs srubs can be useful, But, How often do we scrub or the equivalent of any other file system? UFS? VXFS? NTFS? ... ZFS has scrubs as a feature, but is it a need? I do not think so. Other file systems accept the risk, mostly because they can not really do anything if there were errors. That's right. They cannot do anything. Why is that a good thing? If you have a corruption on your filesystem because a block or even a single bit went wrong, wouldn't you want to know? Wouldn't you want to fix it? What if a number in an important financial document changed? Seems unlikely, but we've discovered at least 5 instances of spontaneous disk data corruption over the course of a couple of years. zfs corrected them transparently. No data lost, automatic, clean, and transparent. The more data that we make, the more that possibility of spontaneous data corruption becomes reality. It does no harm to do periodic scrubs, but I would not recommend doing them often or even at all if scrubs get in the way of production. What is the real risk of not doing scrubs? data changing without you knowing it. Maybe this doesn't matter on an image file (though a jpeg could end up looking nasty or destroyed, and mpeg4 could be permanently damaged, but in a TIFF or other uncompressed format, you'd probably never know) Risk can not be eliminated, and we have to accept some risk. For example, data deduplication uses digests on data to detect duplication. Most dedup systems assume that if the digest is the same for two pieces of data, then the data must be the same. This assumption is not actually true. Two differing pieces of data can have the same digest, but the chance of this happening is so low that the risk is accepted. but, the risk of data being flipped once you have TBs of data is way above 0%. You can also do your own erasure coding if you like. That would be one way to achieve the same affect outside of ZFS. I'm only writing this because I get the feeling some people think scrubs are a need. Maybe people associate doing scrubs with something like doing NTFS defrags? NTFS defrag would only help with performance. scrub helps with integrity. Totally different things. ___ OpenIndiana-discuss mailing list OpenIndiana-discuss@openindiana.org http://openindiana.org/mailman/listinfo/openindiana-discuss
Re: [OpenIndiana-discuss] Zfs stability Scrubs
I'm not a mathematician, but can anyone calculate the chance of the Same 8K datablock on Both submirrors Going bad on terabyte drives, before the data is ever read and fixed automatically during normal read operations? And if you are not doing mirroring, you have already accepted a much larger margin of error for the sake of $. The VAST majority of data centers are not storing data in storage that does checksums to verify data, that is just the reality. Regular backups and site replication rule. I am Not saying scubs are a bad thing, just that they are being over emphasized and some people who do not really understand are getting the wrong impression that doing scrubs very often will somehow make them a lot safer. Scrubs help. But a lot of people who are worrying about scrubs are not even doing proper backups or regular DR testing. Mike On Fri, 2012-10-12 at 22:36 -0400, Doug Hughes wrote: So?}?\, a lot of people have already answered this in various ways. I'm going to provide a little bit of direct answer and focus to some of those other answers (and emphasis) On 10/12/2012 5:07 PM, Michael Stapleton wrote: It is easy to understand that zfs srubs can be useful, But, How often do we scrub or the equivalent of any other file system? UFS? VXFS? NTFS? ... ZFS has scrubs as a feature, but is it a need? I do not think so. Other file systems accept the risk, mostly because they can not really do anything if there were errors. That's right. They cannot do anything. Why is that a good thing? If you have a corruption on your filesystem because a block or even a single bit went wrong, wouldn't you want to know? Wouldn't you want to fix it? What if a number in an important financial document changed? Seems unlikely, but we've discovered at least 5 instances of spontaneous disk data corruption over the course of a couple of years. zfs corrected them transparently. No data lost, automatic, clean, and transparent. The more data that we make, the more that possibility of spontaneous data corruption becomes reality. It does no harm to do periodic scrubs, but I would not recommend doing them often or even at all if scrubs get in the way of production. What is the real risk of not doing scrubs? data changing without you knowing it. Maybe this doesn't matter on an image file (though a jpeg could end up looking nasty or destroyed, and mpeg4 could be permanently damaged, but in a TIFF or other uncompressed format, you'd probably never know) Risk can not be eliminated, and we have to accept some risk. For example, data deduplication uses digests on data to detect duplication. Most dedup systems assume that if the digest is the same for two pieces of data, then the data must be the same. This assumption is not actually true. Two differing pieces of data can have the same digest, but the chance of this happening is so low that the risk is accepted. but, the risk of data being flipped once you have TBs of data is way above 0%. You can also do your own erasure coding if you like. That would be one way to achieve the same affect outside of ZFS. I'm only writing this because I get the feeling some people think scrubs are a need. Maybe people associate doing scrubs with something like doing NTFS defrags? NTFS defrag would only help with performance. scrub helps with integrity. Totally different things. ___ OpenIndiana-discuss mailing list OpenIndiana-discuss@openindiana.org http://openindiana.org/mailman/listinfo/openindiana-discuss ___ OpenIndiana-discuss mailing list OpenIndiana-discuss@openindiana.org http://openindiana.org/mailman/listinfo/openindiana-discuss