On Sun, 2004-01-04 at 21:23, Rich Adamson wrote: > Part of the point of many of the questions is that there really are a > lot of dependencies on devices other then asterisk, and simply going down > a path that says clustering (or whichever approach) can handle something > is probably ignoring several of those dependencies which does not actually > improve the end-to-end availability of asterisk. (Technically, asterisk > is up, you just can't reach it because your phone (or whatever) doesn't > know how to get to it.) > > Using another load-balancing box (F5 or whatever) only moves the problem > to that box. Duplicating it, moves the problem to another box, until > the costs exponentially grow beyond the initial intended value of the > solution. The weak points become lots of other boxes and infrastructure, > suggesting that asterisk really isn't "the" weakest point (regardless of > what its built on).
Rich is hitting the main point in designing anything for high reliability. So lets enumerate failures and then what if anything can be done to eliminate them. 1. Line failures. I'll lump them together as they can occur anywhere from the CO to your premises. I've experienced them in just about every section in my short time in this part of the industry. I have had lines broken inside the CO. I have had water get to the lines along the street during construction, and it could have just as easily been the construction people cutting the line if they had been any more careless. Inside the building problems that luckily aren't as likely to crop up after install. BTW, this is the same even if your incoming phones are VoIP lines. 2. Hardware failure. This can be drives, memory, cpu, NIC, or any other part that basically renders the hardware unavailable or unstable. 3. Software failure. This could be any number of bugs not yet found or that will be introduced later. 4. Phones. This can be split to a VoIP and an analog section as the problems and solutions are different. a. VoIP b. analog 5. Power. This also falls into two parts split on VoIP and analog as it doesn't help to have power on the switch if all your phones go dark. Think about in cases where there is a storm or other adverse conditions and you need to call authorities. So now you go to solutions. 1. Your solution to this is based on budget because the only solutions cost a monthly fee. Also for truely good solution, the install fee will go up too. Basically the solution here comes via redundancy. Not just in multiples, but in getting the lines from different locations and making sure they don't follow the same paths. Most locations are not wired from different paths unless your location attracted a fiber loop. So if you have to have it, it might cost quite a bit or not be available. 2. Raid and hot swap drives combined with hot swap redundant power supplies. This is about the limit of what is currently available on a budget in the x86 world. Also with Raid, make sure you have actual redundancy. Raid doesn't always mean you are in a condition all the time to recover from a failure. If it is really important, you will also have hot spares in the machine. As you can see, this adds cost each time you add a drive to make a system more resilient to failure. During a recent presentation at our LUG, it was explained that even Raid can fail. The presenter had several drives die all together due to an AC failure. They had hot spares, but as drives failed, extra stress was applied to weakened drives till they failed. Soon they exceeded their fault tolerance and had to rely on what they could scrape together from backups to recover. So if possible, look into Raid equipment that has some form of interface to see what is going on especially if you aren't in a monitored environment. If your Raid is able to generate messages at the driver layer and you can watch these messages, you can fix a problem before it escalates. While multiple machines is another way of solving a total system failure, you are probably more likely to experience a line failure more often than a hardware failure if you treat your hardware well. Some forms of this solution also require software modification. 3. This one basically only is combated by due diligence. Mark and the other CVS comiters due their best to review everything before it goes in. Those who write patches try not to write buggy code. The implementers should still spend some time testing all the components to verify the functions work as needed. 4. Phones luckily have few failures. And when they do fail, it doesn't usually take down any other phones. Analog phones can be just swapped out as there are few differences between them. Only ADSI would complicate this, but not if you had spares of the same ADSI phones. VoIP is pretty much the same. 5. Power is important as good clean power makes your hardware last longer. Add to this that it is needed to survive any adverse weather conditions. Analog phones makes your power requirements able to be centralized. VoIP either needs to be powered with power over ethernet or you will need to support power at every phone. At some point a good enterprise solution is for a building backup, and small UPSs for the units that it is important to isolate from a generator coming online. I have witnessed fried hardware during a generator test while working in a hospital. So this is pretty high on the list of importance. So places to work to make at least high availability on a budget for the small to medium company. 1. Code. Test it well before deployment. 2. Power. This is important to keep your hardware safe. 3. Hardware. Start with good quality, then add in redundancy. -- Steven Critchfield <[EMAIL PROTECTED]> _______________________________________________ Asterisk-Users mailing list [EMAIL PROTECTED] http://lists.digium.com/mailman/listinfo/asterisk-users