Re: [Bacula-users] Bacula and 16 bay JBOD
Chris Hoogendyk wrote: > > After that, I convinced management to pay for mirrored drives. > How much was the overtime bill? ;) -- Enable your software for Intel(R) Active Management Technology to meet the growing manageability and security demands of your customers. Businesses are taking advantage of Intel(R) vPro (TM) technology - will your software be a part of the solution? Download the Intel(R) Manageability Checker today! http://p.sf.net/sfu/intel-dev2devmar ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Bacula and 16 bay JBOD
On 3/23/11 12:51 PM, Alan Brown wrote: > Mehma Sarja wrote: >> Since drives ONLY fail on Friday afternoons local time, an effective >> remedy is to check for SMART messages before the weekend. Foolish as >> that is, I am surprised how many times it has held true for me. > For similar reasons we only perform work on critical infrastructure on > tuesday (or thursday if a followup is needed). > > Mondays are for picking up any pieces from the weekend and Fridays are > best left alone. > > Equipment tends to fail the evening or day after it was last worked on... Worst case scenario: A number of years ago, I had a critical central proxy server whose drive failed on Christmas Day. Okay, campus was closed and everyone was on vacation, but lots of people doing research from home could not access resources. I had no replacement drive on hand. I had to recover onto a completely different machine, set up the proxy services, change DNS entries to point to it, set up virtual interfaces for the proxy connections, etc. Spent the better part of Christmas Day and evening alone in the server room sweating out the details, after I had tracked down the cause of the difficulties. After that, I convinced management to pay for mirrored drives. -- --- Chris Hoogendyk - O__ Systems Administrator c/ /'_ --- Biology& Geology Departments (*) \(*) -- 140 Morrill Science Center ~~ - University of Massachusetts, Amherst --- Erdös 4 -- Enable your software for Intel(R) Active Management Technology to meet the growing manageability and security demands of your customers. Businesses are taking advantage of Intel(R) vPro (TM) technology - will your software be a part of the solution? Download the Intel(R) Manageability Checker today! http://p.sf.net/sfu/intel-dev2devmar ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Bacula and 16 bay JBOD
Mehma Sarja wrote: > Since drives ONLY fail on Friday afternoons local time, an effective > remedy is to check for SMART messages before the weekend. Foolish as > that is, I am surprised how many times it has held true for me. For similar reasons we only perform work on critical infrastructure on tuesday (or thursday if a followup is needed). Mondays are for picking up any pieces from the weekend and Fridays are best left alone. Equipment tends to fail the evening or day after it was last worked on... -- Enable your software for Intel(R) Active Management Technology to meet the growing manageability and security demands of your customers. Businesses are taking advantage of Intel(R) vPro (TM) technology - will your software be a part of the solution? Download the Intel(R) Manageability Checker today! http://p.sf.net/sfu/intel-dev2devmar ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Bacula and 16 bay JBOD
On 3/23/11 7:28 AM, Alan Brown wrote: > Phil Stracchino wrote: > >> Well, a good start is to use something like SMART monitoring set up to >> alert you when any drive enters what it considers a pre-fail state. >> (Which can be simple age, increasing numbers of hard errors, increasing >> variation in spindle speed, increasing slow starts, etc, etc...) > FWIW: Nexan, Xyratec and Infortrend all have SMART tracking disabled on > their hardware arrays because they claim it usually only says a drive is > on its way out a few hours after it died. > > (Personally: I use it and find that it does predict imminent drive > failures, but usually with less than 24 hours to go. That's still better > than no warning at all.) > > Since drives ONLY fail on Friday afternoons local time, an effective remedy is to check for SMART messages before the weekend. Foolish as that is, I am surprised how many times it has held true for me. Mehma -- Enable your software for Intel(R) Active Management Technology to meet the growing manageability and security demands of your customers. Businesses are taking advantage of Intel(R) vPro (TM) technology - will your software be a part of the solution? Download the Intel(R) Manageability Checker today! http://p.sf.net/sfu/intel-dev2devmar ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Bacula and 16 bay JBOD
John Drescher wrote: >> I haven't had as many die as you have (Do your users kick their computers >> around the room?) but my experience matches yours when looking at changes in >> the raw data. The problem is I haven't had enough die to put 100% certainty >> on it so I tend to rely on smartd's output. >> > > I have between 100 and 200 drives at work. That's less than half the number I have in the server room alone (approx 450 there), plus about 150 managed PCs in various offices - even the user machines have overall disk failure rates well below 1% per 3 years. We have a firm policy of replacing server drives at 5 years and handling them "with kid gloves" at all times, so that may be one of the reasons we don't see as many failures. AB -- Enable your software for Intel(R) Active Management Technology to meet the growing manageability and security demands of your customers. Businesses are taking advantage of Intel(R) vPro (TM) technology - will your software be a part of the solution? Download the Intel(R) Manageability Checker today! http://p.sf.net/sfu/intel-dev2devmar ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Bacula and 16 bay JBOD
> I haven't had as many die as you have (Do your users kick their computers > around the room?) but my experience matches yours when looking at changes in > the raw data. The problem is I haven't had enough die to put 100% certainty > on it so I tend to rely on smartd's output. > I have between 100 and 200 drives at work. John -- Enable your software for Intel(R) Active Management Technology to meet the growing manageability and security demands of your customers. Businesses are taking advantage of Intel(R) vPro (TM) technology - will your software be a part of the solution? Download the Intel(R) Manageability Checker today! http://p.sf.net/sfu/intel-dev2devmar ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Bacula and 16 bay JBOD
John Drescher wrote: > I would say this is true for smart PASS / FAIL but if you look at the > raw SMART data you can use this to predict failure before it totally > fails. I agree but they don't do that. > At least I have been able to predict this for the 10 to 20 > drives that have died here at work since 2009. I haven't had as many die as you have (Do your users kick their computers around the room?) but my experience matches yours when looking at changes in the raw data. The problem is I haven't had enough die to put 100% certainty on it so I tend to rely on smartd's output. -- Enable your software for Intel(R) Active Management Technology to meet the growing manageability and security demands of your customers. Businesses are taking advantage of Intel(R) vPro (TM) technology - will your software be a part of the solution? Download the Intel(R) Manageability Checker today! http://p.sf.net/sfu/intel-dev2devmar ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Bacula and 16 bay JBOD
>> Well, a good start is to use something like SMART monitoring set up to >> alert you when any drive enters what it considers a pre-fail state. >> (Which can be simple age, increasing numbers of hard errors, increasing >> variation in spindle speed, increasing slow starts, etc, etc...) > > FWIW: Nexan, Xyratec and Infortrend all have SMART tracking disabled on > their hardware arrays because they claim it usually only says a drive is > on its way out a few hours after it died. > I would say this is true for smart PASS / FAIL but if you look at the raw SMART data you can use this to predict failure before it totally fails. At least I have been able to predict this for the 10 to 20 drives that have died here at work since 2009. I usually know a week or so before a drive is going to die and I pull the drive from the raid for further testing. And what I mean as further testing is I do a 4 pass badblocks read / write test looking at the SMART raw data before and after. John -- Enable your software for Intel(R) Active Management Technology to meet the growing manageability and security demands of your customers. Businesses are taking advantage of Intel(R) vPro (TM) technology - will your software be a part of the solution? Download the Intel(R) Manageability Checker today! http://p.sf.net/sfu/intel-dev2devmar ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Bacula and 16 bay JBOD
Phil Stracchino wrote: > Well, a good start is to use something like SMART monitoring set up to > alert you when any drive enters what it considers a pre-fail state. > (Which can be simple age, increasing numbers of hard errors, increasing > variation in spindle speed, increasing slow starts, etc, etc...) FWIW: Nexan, Xyratec and Infortrend all have SMART tracking disabled on their hardware arrays because they claim it usually only says a drive is on its way out a few hours after it died. (Personally: I use it and find that it does predict imminent drive failures, but usually with less than 24 hours to go. That's still better than no warning at all.) -- Enable your software for Intel(R) Active Management Technology to meet the growing manageability and security demands of your customers. Businesses are taking advantage of Intel(R) vPro (TM) technology - will your software be a part of the solution? Download the Intel(R) Manageability Checker today! http://p.sf.net/sfu/intel-dev2devmar ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Bacula and 16 bay JBOD
Mehma Sarja wrote: > There is one more thing to think about and that is cumulative aging. > Starting with all new disks is a false sense of security because as they > age, and if they are in any sort of RAID/performance configuration, they > will age and wear evenly. Expanding on that: It is generally a bad idea to use 100% drives from the same manufacturing batch because their eventual failures are likely to be very close together Similarly it is a good idea to use a mix of drives from different manufacturers, in order to try and spread any failures out over a longer period of time. I know of very few hardware suppliers who mix batches in their raid arrays and of none who mix manufacturers. So far this hasn't been a problem for our hardware arrays, but I prefer to mix things up when building software arrays. AB -- Enable your software for Intel(R) Active Management Technology to meet the growing manageability and security demands of your customers. Businesses are taking advantage of Intel(R) vPro (TM) technology - will your software be a part of the solution? Download the Intel(R) Manageability Checker today! http://p.sf.net/sfu/intel-dev2devmar ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Bacula and 16 bay JBOD
On 03/18/11 21:00, Mehma Sarja wrote: > I can only think of staggering drive age and maintenance. Here's hoping > that someone on the list can come up with more creative solutions/practices. Try to avoid buying a large number of drives from the same batch. This is frequently easily accomplished by spreading purchases across several vendors. Four drives here, four there... -- Phil Stracchino, CDK#2 DoD#299792458 ICBM: 43.5607, -71.355 ala...@caerllewys.net ala...@metrocast.net p...@co.ordinate.org Renaissance Man, Unix ronin, Perl hacker, SQL wrangler, Free Stater It's not the years, it's the mileage. -- Colocation vs. Managed Hosting A question and answer guide to determining the best fit for your organization - today and in the future. http://p.sf.net/sfu/internap-sfd2d ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Bacula and 16 bay JBOD
On 3/18/11 4:41 PM, Marcello Romani wrote: > Il 18/03/2011 19:01, Mehma Sarja ha scritto: >> On 3/17/11 4:57 PM, Phil Stracchino wrote: >>> On 03/17/11 18:46, Marcello Romani wrote: Il 16/03/2011 18:38, Phil Stracchino ha scritto: > On 03/16/11 13:08, Mike Hobbs wrote: >> Hello, I'm currently testing bacula v5.0.3 and so far so good. >> One >> of my issues though, I have a 16 bay Promise Technologies VessJBOD. How >> do I get bacula to use all the disks for writing volumes to? >> >> I guess the way I envision it working would be, 50gb volumes would be >> used and when disk1 fills up, bacula switches over to disk2 and starts >> writing out volumes until that disk is filled, then on to disk3, etc.. >> eventually coming back around and recycling the volumes on disk 1. >> >> I'm not sure the above scenario is the best way to go about this, I've >> read that some people create a "pool" for each drive. What is the most >> common practice when setting up a JBOD unit with bacula? Any >> suggestions or advice would be appropriated. > That scheme sounds like a bad and overly complex idea, honestly. > Depending on your data load, I'd use software RAID to make them into a > single RAID5 or RAID10 volume. RAID10 would be faster and, if set up > correctly[1], more redundant; RAID5 is more space-efficient, but slower. > > > [1] There's a right and a wrong way to set up RAID10. The wrong way is > to set up two five-disk stripes, then mirror them; lose one disk from > each stripe, and you're dead in the water. The right way is to set up > five mirrored pairs, then stripe the pairs; this will survive multiple > disk failures as long as you don't lose both disks of any single pair. > > Hi Phil, that last sentence sounds a little scary to me: "this will survive multiple disk failures *as long as you don't lose both disks of any single pair*". Isn't RAID6 a safer bet ? >>> That depends. >>> >>> With RAID6, you can survive any one or two disk failures, in degraded >>> mode. You'll have a larger working set than RAID10, but performance >>> will be slower because of the overhead of parity calculations. A third >>> failure will bring the array down and you will lose the data. >>> >>> With RAID10 with sixteen drives, you can survive any one drive failure >>> with minimal performance degradation. There is a 1 in 15 chance that a >>> second failure will be the other drive of that pair, and bring the array >>> down. If not, then there is a 1 in 7 chance that a third drive failure >>> will be on the same pair as one of the two drives already failed. If >>> not, the array will still continue to operate, with some read >>> performance degradation, and there is now a just less than 1 in 4 chance >>> (3/13) that if a fourth drive fails, it will be on the same pair as one >>> of the three already failed. ... And so on. There is a cumulative 39% >>> chance that four random failures will fail the entire array, which rises >>> to 59% with five failures, and 78% with six. (91% at seven, 98% at >>> eight, and no matter how many leprechauns live in your back yard, at >>> nine failures you're screwed of course. It's like the joke about the >>> two men in the airliner.) >>> >>> But if the array was RAID6, it already went down for the count when the >>> third drive failed. >>> >>> >>> >>> Now, granted, multiple failures like that are rare. But ... I had a >>> cascade failure of three drives out of a twelve-drive RAIDZ2 array >>> between 4am and 8am one morning. Each drive that failed pushed the load >>> on the remaining drives higher, and after a couple of hours of that, the >>> next weakest drive failed, which pushed the load still higher. And when >>> the third drive failed, the entire array went down. It can happen. >>> >>> But ... I'm running RAIDZ3 right now, and as soon as I can replace the >>> rest of the drives with new drives, I'll be going back to RAIDZ2. >>> Because RAIDZ3 is a bit too much of a performance hit on my server, and >>> - with drives that aren't dying of old age - RAIDZ2 is redundant >>> *enough* for me. There is no data on the array that is crucial *AND* >>> irreplaceable *AND* not also stored somewhere else. >>> >>> What it comes down to is, you have to decide for yourself what your >>> priorities are - redundancy, performance, space efficiency - and how >>> much of each you're willing to give up to get as much as you want of the >>> others. >>> >>> >> There is one more thing to think about and that is cumulative aging. >> Starting with all new disks is a false sense of security because as they >> age, and if they are in any sort of RAID/performance configuration, they >> will age and wear evenly. Which means they will all start to fail >> together. It is OK to design a system and assume one or two simultaneous >> drive failure - when the drives a
Re: [Bacula-users] Bacula and 16 bay JBOD
On 03/18/11 19:41, Marcello Romani wrote: > Il 18/03/2011 19:01, Mehma Sarja ha scritto: >> There is one more thing to think about and that is cumulative aging. >> Starting with all new disks is a false sense of security because as they >> age, and if they are in any sort of RAID/performance configuration, they >> will age and wear evenly. Which means they will all start to fail >> together. It is OK to design a system and assume one or two simultaneous >> drive failure - when the drives are relatively young. After 3 years of >> sustained use, like email storage, you are at higher risk no matter >> which RAID scheme you have used. >> >> Mehma > > This is an interesting point. But what parameter should one take into > account to decide when it's time to replace an aged (but still good) > disk with a fresh one ? > > Marcello Well, a good start is to use something like SMART monitoring set up to alert you when any drive enters what it considers a pre-fail state. (Which can be simple age, increasing numbers of hard errors, increasing variation in spindle speed, increasing slow starts, etc, etc...) -- Phil Stracchino, CDK#2 DoD#299792458 ICBM: 43.5607, -71.355 ala...@caerllewys.net ala...@metrocast.net p...@co.ordinate.org Renaissance Man, Unix ronin, Perl hacker, SQL wrangler, Free Stater It's not the years, it's the mileage. -- Colocation vs. Managed Hosting A question and answer guide to determining the best fit for your organization - today and in the future. http://p.sf.net/sfu/internap-sfd2d ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Bacula and 16 bay JBOD
Il 18/03/2011 19:01, Mehma Sarja ha scritto: > On 3/17/11 4:57 PM, Phil Stracchino wrote: >> On 03/17/11 18:46, Marcello Romani wrote: >>> Il 16/03/2011 18:38, Phil Stracchino ha scritto: On 03/16/11 13:08, Mike Hobbs wrote: > Hello, I'm currently testing bacula v5.0.3 and so far so good. One > of my issues though, I have a 16 bay Promise Technologies VessJBOD. How > do I get bacula to use all the disks for writing volumes to? > > I guess the way I envision it working would be, 50gb volumes would be > used and when disk1 fills up, bacula switches over to disk2 and starts > writing out volumes until that disk is filled, then on to disk3, etc.. > eventually coming back around and recycling the volumes on disk 1. > > I'm not sure the above scenario is the best way to go about this, I've > read that some people create a "pool" for each drive. What is the most > common practice when setting up a JBOD unit with bacula? Any > suggestions or advice would be appropriated. That scheme sounds like a bad and overly complex idea, honestly. Depending on your data load, I'd use software RAID to make them into a single RAID5 or RAID10 volume. RAID10 would be faster and, if set up correctly[1], more redundant; RAID5 is more space-efficient, but slower. [1] There's a right and a wrong way to set up RAID10. The wrong way is to set up two five-disk stripes, then mirror them; lose one disk from each stripe, and you're dead in the water. The right way is to set up five mirrored pairs, then stripe the pairs; this will survive multiple disk failures as long as you don't lose both disks of any single pair. >>> Hi Phil, >>>that last sentence sounds a little scary to me: "this will survive >>> multiple disk failures *as long as you don't lose both disks of any >>> single pair*". >>> Isn't RAID6 a safer bet ? >> That depends. >> >> With RAID6, you can survive any one or two disk failures, in degraded >> mode. You'll have a larger working set than RAID10, but performance >> will be slower because of the overhead of parity calculations. A third >> failure will bring the array down and you will lose the data. >> >> With RAID10 with sixteen drives, you can survive any one drive failure >> with minimal performance degradation. There is a 1 in 15 chance that a >> second failure will be the other drive of that pair, and bring the array >> down. If not, then there is a 1 in 7 chance that a third drive failure >> will be on the same pair as one of the two drives already failed. If >> not, the array will still continue to operate, with some read >> performance degradation, and there is now a just less than 1 in 4 chance >> (3/13) that if a fourth drive fails, it will be on the same pair as one >> of the three already failed. ... And so on. There is a cumulative 39% >> chance that four random failures will fail the entire array, which rises >> to 59% with five failures, and 78% with six. (91% at seven, 98% at >> eight, and no matter how many leprechauns live in your back yard, at >> nine failures you're screwed of course. It's like the joke about the >> two men in the airliner.) >> >> But if the array was RAID6, it already went down for the count when the >> third drive failed. >> >> >> >> Now, granted, multiple failures like that are rare. But ... I had a >> cascade failure of three drives out of a twelve-drive RAIDZ2 array >> between 4am and 8am one morning. Each drive that failed pushed the load >> on the remaining drives higher, and after a couple of hours of that, the >> next weakest drive failed, which pushed the load still higher. And when >> the third drive failed, the entire array went down. It can happen. >> >> But ... I'm running RAIDZ3 right now, and as soon as I can replace the >> rest of the drives with new drives, I'll be going back to RAIDZ2. >> Because RAIDZ3 is a bit too much of a performance hit on my server, and >> - with drives that aren't dying of old age - RAIDZ2 is redundant >> *enough* for me. There is no data on the array that is crucial *AND* >> irreplaceable *AND* not also stored somewhere else. >> >> What it comes down to is, you have to decide for yourself what your >> priorities are - redundancy, performance, space efficiency - and how >> much of each you're willing to give up to get as much as you want of the >> others. >> >> > There is one more thing to think about and that is cumulative aging. > Starting with all new disks is a false sense of security because as they > age, and if they are in any sort of RAID/performance configuration, they > will age and wear evenly. Which means they will all start to fail > together. It is OK to design a system and assume one or two simultaneous > drive failure - when the drives are relatively young. After 3 years of > sustained use, like email storage, you are at higher risk no matter > which RAID scheme you have used.
Re: [Bacula-users] Bacula and 16 bay JBOD
On 3/17/11 4:57 PM, Phil Stracchino wrote: > On 03/17/11 18:46, Marcello Romani wrote: >> Il 16/03/2011 18:38, Phil Stracchino ha scritto: >>> On 03/16/11 13:08, Mike Hobbs wrote: Hello, I'm currently testing bacula v5.0.3 and so far so good. One of my issues though, I have a 16 bay Promise Technologies VessJBOD. How do I get bacula to use all the disks for writing volumes to? I guess the way I envision it working would be, 50gb volumes would be used and when disk1 fills up, bacula switches over to disk2 and starts writing out volumes until that disk is filled, then on to disk3, etc.. eventually coming back around and recycling the volumes on disk 1. I'm not sure the above scenario is the best way to go about this, I've read that some people create a "pool" for each drive. What is the most common practice when setting up a JBOD unit with bacula? Any suggestions or advice would be appropriated. >>> That scheme sounds like a bad and overly complex idea, honestly. >>> Depending on your data load, I'd use software RAID to make them into a >>> single RAID5 or RAID10 volume. RAID10 would be faster and, if set up >>> correctly[1], more redundant; RAID5 is more space-efficient, but slower. >>> >>> >>> [1] There's a right and a wrong way to set up RAID10. The wrong way is >>> to set up two five-disk stripes, then mirror them; lose one disk from >>> each stripe, and you're dead in the water. The right way is to set up >>> five mirrored pairs, then stripe the pairs; this will survive multiple >>> disk failures as long as you don't lose both disks of any single pair. >>> >>> >> Hi Phil, >> that last sentence sounds a little scary to me: "this will survive >> multiple disk failures *as long as you don't lose both disks of any >> single pair*". >> Isn't RAID6 a safer bet ? > That depends. > > With RAID6, you can survive any one or two disk failures, in degraded > mode. You'll have a larger working set than RAID10, but performance > will be slower because of the overhead of parity calculations. A third > failure will bring the array down and you will lose the data. > > With RAID10 with sixteen drives, you can survive any one drive failure > with minimal performance degradation. There is a 1 in 15 chance that a > second failure will be the other drive of that pair, and bring the array > down. If not, then there is a 1 in 7 chance that a third drive failure > will be on the same pair as one of the two drives already failed. If > not, the array will still continue to operate, with some read > performance degradation, and there is now a just less than 1 in 4 chance > (3/13) that if a fourth drive fails, it will be on the same pair as one > of the three already failed. ... And so on. There is a cumulative 39% > chance that four random failures will fail the entire array, which rises > to 59% with five failures, and 78% with six. (91% at seven, 98% at > eight, and no matter how many leprechauns live in your back yard, at > nine failures you're screwed of course. It's like the joke about the > two men in the airliner.) > > But if the array was RAID6, it already went down for the count when the > third drive failed. > > > > Now, granted, multiple failures like that are rare. But ... I had a > cascade failure of three drives out of a twelve-drive RAIDZ2 array > between 4am and 8am one morning. Each drive that failed pushed the load > on the remaining drives higher, and after a couple of hours of that, the > next weakest drive failed, which pushed the load still higher. And when > the third drive failed, the entire array went down. It can happen. > > But ... I'm running RAIDZ3 right now, and as soon as I can replace the > rest of the drives with new drives, I'll be going back to RAIDZ2. > Because RAIDZ3 is a bit too much of a performance hit on my server, and > - with drives that aren't dying of old age - RAIDZ2 is redundant > *enough* for me. There is no data on the array that is crucial *AND* > irreplaceable *AND* not also stored somewhere else. > > What it comes down to is, you have to decide for yourself what your > priorities are - redundancy, performance, space efficiency - and how > much of each you're willing to give up to get as much as you want of the > others. > > There is one more thing to think about and that is cumulative aging. Starting with all new disks is a false sense of security because as they age, and if they are in any sort of RAID/performance configuration, they will age and wear evenly. Which means they will all start to fail together. It is OK to design a system and assume one or two simultaneous drive failure - when the drives are relatively young. After 3 years of sustained use, like email storage, you are at higher risk no matter which RAID scheme you have used. Mehma -- Colocation vs. Managed Hosting A question and answer
Re: [Bacula-users] Bacula and 16 bay JBOD
Not really, RAID6+0 only requires 8 drives minimum you can create two RAID6's of 4 drives each and stripe them together.This has a benefit as multi-layer based parity raids increases random write iops performance. But the main issue is array integrity, mainly with large capacity drives as they have UBR rates in 10^14 or 10^15 range but when you consider their capacity this dominates availability calculations for an array well over any 'hard' failure calculations. Generally with 1TB+ drives with 10^14 UBR rates I would be hard pressed to use more than 6 drives in a raid group (4D+2P), with 10^15 you may get by with 8D+2P but you have to choose your own risk level. Personally, I don't like building arrays that have a probability of not reading a sector in a single sub array greater than 5% (ideally it should be less than 1% but you're not going to get that unless you're talking 10^16 and small drives (<500GB). This still doesn't address the silent errors that happen which is the main thrust behind file systems like ZFS and/or T10 DIF (fat sectors) which do checking to make sure what sector your requesting is the sector you're getting. On 2011-03-18 08:22, Alan Brown wrote: Phil Stracchino wrote: With RAID6, you can survive any one or two disk failures, in degraded mode. You'll have a larger working set than RAID10, but performance will be slower because of the overhead of parity calculations. A third failure will bring the array down and you will lose the data. There's always RAID60, but that requires a lot of drives. -- Colocation vs. Managed Hosting A question and answer guide to determining the best fit for your organization - today and in the future. http://p.sf.net/sfu/internap-sfd2d ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users -- Colocation vs. Managed Hosting A question and answer guide to determining the best fit for your organization - today and in the future. http://p.sf.net/sfu/internap-sfd2d___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Bacula and 16 bay JBOD
Phil Stracchino wrote: > With RAID6, you can survive any one or two disk failures, in degraded > mode. You'll have a larger working set than RAID10, but performance > will be slower because of the overhead of parity calculations. A third > failure will bring the array down and you will lose the data. There's always RAID60, but that requires a lot of drives. -- Colocation vs. Managed Hosting A question and answer guide to determining the best fit for your organization - today and in the future. http://p.sf.net/sfu/internap-sfd2d ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Bacula and 16 bay JBOD
Il 18/03/2011 00:57, Phil Stracchino ha scritto: > On 03/17/11 18:46, Marcello Romani wrote: >> Il 16/03/2011 18:38, Phil Stracchino ha scritto: >>> On 03/16/11 13:08, Mike Hobbs wrote: Hello, I'm currently testing bacula v5.0.3 and so far so good. One of my issues though, I have a 16 bay Promise Technologies VessJBOD. How do I get bacula to use all the disks for writing volumes to? I guess the way I envision it working would be, 50gb volumes would be used and when disk1 fills up, bacula switches over to disk2 and starts writing out volumes until that disk is filled, then on to disk3, etc.. eventually coming back around and recycling the volumes on disk 1. I'm not sure the above scenario is the best way to go about this, I've read that some people create a "pool" for each drive. What is the most common practice when setting up a JBOD unit with bacula? Any suggestions or advice would be appropriated. >>> >>> That scheme sounds like a bad and overly complex idea, honestly. >>> Depending on your data load, I'd use software RAID to make them into a >>> single RAID5 or RAID10 volume. RAID10 would be faster and, if set up >>> correctly[1], more redundant; RAID5 is more space-efficient, but slower. >>> >>> >>> [1] There's a right and a wrong way to set up RAID10. The wrong way is >>> to set up two five-disk stripes, then mirror them; lose one disk from >>> each stripe, and you're dead in the water. The right way is to set up >>> five mirrored pairs, then stripe the pairs; this will survive multiple >>> disk failures as long as you don't lose both disks of any single pair. >>> >>> >> >> Hi Phil, >> that last sentence sounds a little scary to me: "this will survive >> multiple disk failures *as long as you don't lose both disks of any >> single pair*". >> Isn't RAID6 a safer bet ? > > That depends. > > With RAID6, you can survive any one or two disk failures, in degraded > mode. You'll have a larger working set than RAID10, but performance > will be slower because of the overhead of parity calculations. A third > failure will bring the array down and you will lose the data. > > With RAID10 with sixteen drives, you can survive any one drive failure > with minimal performance degradation. There is a 1 in 15 chance that a > second failure will be the other drive of that pair, and bring the array > down. If not, then there is a 1 in 7 chance that a third drive failure > will be on the same pair as one of the two drives already failed. If > not, the array will still continue to operate, with some read > performance degradation, and there is now a just less than 1 in 4 chance > (3/13) that if a fourth drive fails, it will be on the same pair as one > of the three already failed. ... And so on. There is a cumulative 39% > chance that four random failures will fail the entire array, which rises > to 59% with five failures, and 78% with six. (91% at seven, 98% at > eight, and no matter how many leprechauns live in your back yard, at > nine failures you're screwed of course. It's like the joke about the > two men in the airliner.) > > But if the array was RAID6, it already went down for the count when the > third drive failed. > > > > Now, granted, multiple failures like that are rare. But ... I had a > cascade failure of three drives out of a twelve-drive RAIDZ2 array > between 4am and 8am one morning. Each drive that failed pushed the load > on the remaining drives higher, and after a couple of hours of that, the > next weakest drive failed, which pushed the load still higher. And when > the third drive failed, the entire array went down. It can happen. > > But ... I'm running RAIDZ3 right now, and as soon as I can replace the > rest of the drives with new drives, I'll be going back to RAIDZ2. > Because RAIDZ3 is a bit too much of a performance hit on my server, and > - with drives that aren't dying of old age - RAIDZ2 is redundant > *enough* for me. There is no data on the array that is crucial *AND* > irreplaceable *AND* not also stored somewhere else. > > What it comes down to is, you have to decide for yourself what your > priorities are - redundancy, performance, space efficiency - and how > much of each you're willing to give up to get as much as you want of the > others. > > Phil, that was an interesting read. Thanks for your detailed response. (Your last paragraph is of course the definitive word on the subject.) Now that I think about it, I realize I didn't fully take into account the high number of drives we're talking about. Probably if using RAID6 a spare drive is to be considered. Or, better yet, a mirror machine... But then we're back to "it depends", I guess :-) Oh, and BTW, maybe it's time for me to move past these old limited raid levels and investigate ZFS and those intriguing RAIDZx arrays... Marcello -- Colocation vs. Ma
Re: [Bacula-users] Bacula and 16 bay JBOD
On 03/17/11 18:46, Marcello Romani wrote: > Il 16/03/2011 18:38, Phil Stracchino ha scritto: >> On 03/16/11 13:08, Mike Hobbs wrote: >>>Hello, I'm currently testing bacula v5.0.3 and so far so good. One >>> of my issues though, I have a 16 bay Promise Technologies VessJBOD. How >>> do I get bacula to use all the disks for writing volumes to? >>> >>> I guess the way I envision it working would be, 50gb volumes would be >>> used and when disk1 fills up, bacula switches over to disk2 and starts >>> writing out volumes until that disk is filled, then on to disk3, etc.. >>> eventually coming back around and recycling the volumes on disk 1. >>> >>> I'm not sure the above scenario is the best way to go about this, I've >>> read that some people create a "pool" for each drive. What is the most >>> common practice when setting up a JBOD unit with bacula? Any >>> suggestions or advice would be appropriated. >> >> That scheme sounds like a bad and overly complex idea, honestly. >> Depending on your data load, I'd use software RAID to make them into a >> single RAID5 or RAID10 volume. RAID10 would be faster and, if set up >> correctly[1], more redundant; RAID5 is more space-efficient, but slower. >> >> >> [1] There's a right and a wrong way to set up RAID10. The wrong way is >> to set up two five-disk stripes, then mirror them; lose one disk from >> each stripe, and you're dead in the water. The right way is to set up >> five mirrored pairs, then stripe the pairs; this will survive multiple >> disk failures as long as you don't lose both disks of any single pair. >> >> > > Hi Phil, > that last sentence sounds a little scary to me: "this will survive > multiple disk failures *as long as you don't lose both disks of any > single pair*". > Isn't RAID6 a safer bet ? That depends. With RAID6, you can survive any one or two disk failures, in degraded mode. You'll have a larger working set than RAID10, but performance will be slower because of the overhead of parity calculations. A third failure will bring the array down and you will lose the data. With RAID10 with sixteen drives, you can survive any one drive failure with minimal performance degradation. There is a 1 in 15 chance that a second failure will be the other drive of that pair, and bring the array down. If not, then there is a 1 in 7 chance that a third drive failure will be on the same pair as one of the two drives already failed. If not, the array will still continue to operate, with some read performance degradation, and there is now a just less than 1 in 4 chance (3/13) that if a fourth drive fails, it will be on the same pair as one of the three already failed. ... And so on. There is a cumulative 39% chance that four random failures will fail the entire array, which rises to 59% with five failures, and 78% with six. (91% at seven, 98% at eight, and no matter how many leprechauns live in your back yard, at nine failures you're screwed of course. It's like the joke about the two men in the airliner.) But if the array was RAID6, it already went down for the count when the third drive failed. Now, granted, multiple failures like that are rare. But ... I had a cascade failure of three drives out of a twelve-drive RAIDZ2 array between 4am and 8am one morning. Each drive that failed pushed the load on the remaining drives higher, and after a couple of hours of that, the next weakest drive failed, which pushed the load still higher. And when the third drive failed, the entire array went down. It can happen. But ... I'm running RAIDZ3 right now, and as soon as I can replace the rest of the drives with new drives, I'll be going back to RAIDZ2. Because RAIDZ3 is a bit too much of a performance hit on my server, and - with drives that aren't dying of old age - RAIDZ2 is redundant *enough* for me. There is no data on the array that is crucial *AND* irreplaceable *AND* not also stored somewhere else. What it comes down to is, you have to decide for yourself what your priorities are - redundancy, performance, space efficiency - and how much of each you're willing to give up to get as much as you want of the others. -- Phil Stracchino, CDK#2 DoD#299792458 ICBM: 43.5607, -71.355 ala...@caerllewys.net ala...@metrocast.net p...@co.ordinate.org Renaissance Man, Unix ronin, Perl hacker, SQL wrangler, Free Stater It's not the years, it's the mileage. -- Colocation vs. Managed Hosting A question and answer guide to determining the best fit for your organization - today and in the future. http://p.sf.net/sfu/internap-sfd2d ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Bacula and 16 bay JBOD
Il 16/03/2011 18:38, Phil Stracchino ha scritto: > On 03/16/11 13:08, Mike Hobbs wrote: >>Hello, I'm currently testing bacula v5.0.3 and so far so good. One >> of my issues though, I have a 16 bay Promise Technologies VessJBOD. How >> do I get bacula to use all the disks for writing volumes to? >> >> I guess the way I envision it working would be, 50gb volumes would be >> used and when disk1 fills up, bacula switches over to disk2 and starts >> writing out volumes until that disk is filled, then on to disk3, etc.. >> eventually coming back around and recycling the volumes on disk 1. >> >> I'm not sure the above scenario is the best way to go about this, I've >> read that some people create a "pool" for each drive. What is the most >> common practice when setting up a JBOD unit with bacula? Any >> suggestions or advice would be appropriated. > > That scheme sounds like a bad and overly complex idea, honestly. > Depending on your data load, I'd use software RAID to make them into a > single RAID5 or RAID10 volume. RAID10 would be faster and, if set up > correctly[1], more redundant; RAID5 is more space-efficient, but slower. > > > [1] There's a right and a wrong way to set up RAID10. The wrong way is > to set up two five-disk stripes, then mirror them; lose one disk from > each stripe, and you're dead in the water. The right way is to set up > five mirrored pairs, then stripe the pairs; this will survive multiple > disk failures as long as you don't lose both disks of any single pair. > > Hi Phil, that last sentence sounds a little scary to me: "this will survive multiple disk failures *as long as you don't lose both disks of any single pair*". Isn't RAID6 a safer bet ? Thanks. Marcello -- Colocation vs. Managed Hosting A question and answer guide to determining the best fit for your organization - today and in the future. http://p.sf.net/sfu/internap-sfd2d ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Bacula and 16 bay JBOD
Il 16/03/2011 18:08, Mike Hobbs ha scritto: >Hello, I'm currently testing bacula v5.0.3 and so far so good. One > of my issues though, I have a 16 bay Promise Technologies VessJBOD. How > do I get bacula to use all the disks for writing volumes to? > > I guess the way I envision it working would be, 50gb volumes would be > used and when disk1 fills up, bacula switches over to disk2 and starts > writing out volumes until that disk is filled, then on to disk3, etc.. > eventually coming back around and recycling the volumes on disk 1. > > I'm not sure the above scenario is the best way to go about this, I've > read that some people create a "pool" for each drive. What is the most > common practice when setting up a JBOD unit with bacula? Any > suggestions or advice would be appropriated. > > I have all my drives listed in the bacula-sd.conf file: > > Device { > Name = disk1 > Media Type = File > Archive Device = /export/disk1 > LabelMedia = yes; > Random Access = Yes; > AutomaticMount = yes; > RemovableMedia = no; > AlwaysOpen = no; > } > > I also have each drive listed in the bacula-dir.conf file, although I do > not know if this is correct, > > Storage { > Name = File > Maximum Concurrent Jobs = 20 > Address = mtl-backup2# N.B. Use a fully qualified > name here > SDPort = 9103 > Password = "" > Device = disk1 > Media Type = File > } > > Another question, how does bacula handle a dead disk? How do you get a > file listing of what was on that disk so you can manually run another > backup of the missing clients and file systems? Do you need to prune > the dead disks information from the DB? or does bacula handle that when > the recycle times come around? > > Thank you for any help, advice or suggestions anyone can provide. > > mike > > -- > Colocation vs. Managed Hosting > A question and answer guide to determining the best fit > for your organization - today and in the future. > http://p.sf.net/sfu/internap-sfd2d > ___ > Bacula-users mailing list > Bacula-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/bacula-users Hi, I think it could be useful for you to look into "Maximum Volume Bytes" and "Maximum Volumes" Pool directives. For example, I have a 1.4TB raid array where I want to use at most 1TB for bacula storage. I have created a "File" pool like this: # File Pool definition Pool { Name = File Pool Type = Backup Recycle = yes AutoPrune = yes Volume Retention = 30 days # 40x25 = 1TB Maximum Volume Bytes = 25G Maximum Volumes = 40 } Every volume in the File pool is named FileVolumeXXX, where XXX is a number (001, 002, ... 040). Maybe you could create a similar pool for each disk, ensuring no more that the disk capacity is used by Bacula. Just my 2 cents. HTH Marcello -- Colocation vs. Managed Hosting A question and answer guide to determining the best fit for your organization - today and in the future. http://p.sf.net/sfu/internap-sfd2d ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Bacula and 16 bay JBOD
On Wed, Mar 16, 2011 at 1:29 PM, Mike Hobbs wrote: > On 03/16/2011 01:12 PM, Robison, Dave wrote: >> Just curious, why not put that jbod into a RAID array? I believe you'd >> get far better performance with the additional spools and you'd get >> redundancy as well. >> >> Personally I'd set that up as a RAIDZ using ZFS on FreeBSD. >> >> > > I believe the reason why we decided not to use raid was in case the raid > array got corrupted. We would then lose all of our backups.. I believe that is a very big danger with a raid. I never recommend a single raid array for all your backups. Two raid arrays on separate raid controllers (preferably separate machines) each containing at least 1 backup of everything are fine but a single raid to hold your only backup copy is dangerous. > Where as > if one disk dies, we only lose what was on that disk. There may have > been another reason but I think that was the main reason. > I do not have time to explain the details at the moment but I recommend you take a look at the bacula vchanger for what you are trying to do: http://sourceforge.net/projects/vchanger/ John -- Colocation vs. Managed Hosting A question and answer guide to determining the best fit for your organization - today and in the future. http://p.sf.net/sfu/internap-sfd2d ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Bacula and 16 bay JBOD
On 03/16/11 13:08, Mike Hobbs wrote: > Hello, I'm currently testing bacula v5.0.3 and so far so good. One > of my issues though, I have a 16 bay Promise Technologies VessJBOD. How > do I get bacula to use all the disks for writing volumes to? > > I guess the way I envision it working would be, 50gb volumes would be > used and when disk1 fills up, bacula switches over to disk2 and starts > writing out volumes until that disk is filled, then on to disk3, etc.. > eventually coming back around and recycling the volumes on disk 1. > > I'm not sure the above scenario is the best way to go about this, I've > read that some people create a "pool" for each drive. What is the most > common practice when setting up a JBOD unit with bacula? Any > suggestions or advice would be appropriated. That scheme sounds like a bad and overly complex idea, honestly. Depending on your data load, I'd use software RAID to make them into a single RAID5 or RAID10 volume. RAID10 would be faster and, if set up correctly[1], more redundant; RAID5 is more space-efficient, but slower. [1] There's a right and a wrong way to set up RAID10. The wrong way is to set up two five-disk stripes, then mirror them; lose one disk from each stripe, and you're dead in the water. The right way is to set up five mirrored pairs, then stripe the pairs; this will survive multiple disk failures as long as you don't lose both disks of any single pair. -- Phil Stracchino, CDK#2 DoD#299792458 ICBM: 43.5607, -71.355 ala...@caerllewys.net ala...@metrocast.net p...@co.ordinate.org Renaissance Man, Unix ronin, Perl hacker, SQL wrangler, Free Stater It's not the years, it's the mileage. -- Colocation vs. Managed Hosting A question and answer guide to determining the best fit for your organization - today and in the future. http://p.sf.net/sfu/internap-sfd2d ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Bacula and 16 bay JBOD
On 03/16/2011 06:29 PM, Mike Hobbs wrote: > On 03/16/2011 01:12 PM, Robison, Dave wrote: >> Just curious, why not put that jbod into a RAID array? I believe you'd >> get far better performance with the additional spools and you'd get >> redundancy as well. >> >> Personally I'd set that up as a RAIDZ using ZFS on FreeBSD. >> >> > > I believe the reason why we decided not to use raid was in case the raid > array got corrupted. We would then lose all of our backups.. Where as > if one disk dies, we only lose what was on that disk. There may have > been another reason but I think that was the main reason. > > mike > Nice but if the controller do silent corruption, you're down too :-) -- Bruno Friedmann Ioda-Net Sàrl www.ioda-net.ch openSUSE Member & Ambassador GPG KEY : D5C9B751C4653227 irc: tigerfoot -- Colocation vs. Managed Hosting A question and answer guide to determining the best fit for your organization - today and in the future. http://p.sf.net/sfu/internap-sfd2d ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Bacula and 16 bay JBOD
On 03/16/2011 01:12 PM, Robison, Dave wrote: > Just curious, why not put that jbod into a RAID array? I believe you'd > get far better performance with the additional spools and you'd get > redundancy as well. > > Personally I'd set that up as a RAIDZ using ZFS on FreeBSD. > > I believe the reason why we decided not to use raid was in case the raid array got corrupted. We would then lose all of our backups.. Where as if one disk dies, we only lose what was on that disk. There may have been another reason but I think that was the main reason. mike -- Colocation vs. Managed Hosting A question and answer guide to determining the best fit for your organization - today and in the future. http://p.sf.net/sfu/internap-sfd2d ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
[Bacula-users] Bacula and 16 bay JBOD
Hello, I'm currently testing bacula v5.0.3 and so far so good. One of my issues though, I have a 16 bay Promise Technologies VessJBOD. How do I get bacula to use all the disks for writing volumes to? I guess the way I envision it working would be, 50gb volumes would be used and when disk1 fills up, bacula switches over to disk2 and starts writing out volumes until that disk is filled, then on to disk3, etc.. eventually coming back around and recycling the volumes on disk 1. I'm not sure the above scenario is the best way to go about this, I've read that some people create a "pool" for each drive. What is the most common practice when setting up a JBOD unit with bacula? Any suggestions or advice would be appropriated. I have all my drives listed in the bacula-sd.conf file: Device { Name = disk1 Media Type = File Archive Device = /export/disk1 LabelMedia = yes; Random Access = Yes; AutomaticMount = yes; RemovableMedia = no; AlwaysOpen = no; } I also have each drive listed in the bacula-dir.conf file, although I do not know if this is correct, Storage { Name = File Maximum Concurrent Jobs = 20 Address = mtl-backup2# N.B. Use a fully qualified name here SDPort = 9103 Password = "" Device = disk1 Media Type = File } Another question, how does bacula handle a dead disk? How do you get a file listing of what was on that disk so you can manually run another backup of the missing clients and file systems? Do you need to prune the dead disks information from the DB? or does bacula handle that when the recycle times come around? Thank you for any help, advice or suggestions anyone can provide. mike -- Colocation vs. Managed Hosting A question and answer guide to determining the best fit for your organization - today and in the future. http://p.sf.net/sfu/internap-sfd2d ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users