Re: [Asterisk-Users] Hardware to build an Enterprise AsteriskUniversal Gateway
> I don't want to drag this into a long thread, but note the original says > "the system should survive just about anything short of an act of God", > and suddenly you are talking about a reliable server and a few switches. > These are quite different things. I have yet to see a 5 x 9's server > room. Fire, mechanical damage and other factors will normally keep the > location itself well below 5 x 9's. Think "system" instead of "server > equipment", and the picture looks very different. Even for a single PC > type server, downtime due to telecoms lines, power problems, fire, > flood, typhoon damage, theft and a mass of other stuff mught well exceed > the server unavailablility itself. I've seen many servers not fail in 5 > years. I have yet to see the best location go that long without causing > at least one substantial period of downtime. 5 x 9's allows about 6 > minutes downtime a year. That means 100% of all failures must have > automated failover, as manuals repair could never be achieved so fast. > Physical diversity if essential for that. The five-9's thread has been discussed under several different subjects in the last few months, and its not difficult to detect from the postings the subject has lots of very different levels of technical understandings. It's also obvious that many have not worked in a business or institution where disaster recovery or business continuity plans mean something much different then redundant power supplies, raid, motherboard on the shelf, a Sun multiprocessor system, a database server, redundent layer-2 switches, or lots of toys in one's basement. Whether one refers to application/system availability as five-9's, maximum uptime, or some other set of words is mostly irrelevant; the objective is still to provide the highest level of functionality possible given a set of business parameters that might include cost, time to repair, commercial power stability, regional susceptibility to tornados or floods, etc, etc. Low-end ISP's tend to believe a UPS would address their needs, small companies tend towards hot/cold spares, while larger organizations frequent towards other approaches that minimize the need for human involvement to recovery from any form of failure. Gus may have a strong conviction that clustering addresses his needs (given his set of business drivers), while Joe's needs are to recover from "any" event (including loss of building"s") within X hours that may be driven by outside requirements such as government regulations, etc. It is a given the recovery plan and supported investments will be dramatically different for many business cases. Neither one should be ragged on since none of us on the list are exposed to their business drivers. Regardless of how one chooses to address application availability (for the purposes of this list anyway), sharing configuration and operational data between multiple asterisk boxes on a more real-time basis is/will be important to those involved with systems in the small business category and above. Therefore, the list would benefit from discussions and implementations that help support the task of dynamically sharing asterisk data across multiple systems to improve uptime (whatever that happens to mean to each reader). Excluding the low-end ISP approach and from a 5,000-foot level, it would appear that an underlying/common design data-point might be "what are the asterisk design changes that need to occur to support two (or more) asterisk systems in seperate physical locations?" (Note that if someone's business drivers suggest the systems remain within the same building/room, that's fine. If they are separated by 10 feet or 100 miles, that's fine. If someone wants to include UPS, power supplies, raid, dual-this-or-that, layer two/three boxes, load balancers, Sun systems, database servers, etc, that's fine. If an external T1 switch is required, that's fine. If clustering add's some value for someone's deployment, that's fine. If a hot spare meets the business needs, that's fine. If lots of people have issues with a particular sip phone vendor's method of fail over, I'll bet some vendors would be more then willing to improve code "if" they understood it gives them a competitive advantage over another vendor, etc, etc.) Thoughts? Rich ___ Asterisk-Users mailing list [EMAIL PROTECTED] http://lists.digium.com/mailman/listinfo/asterisk-users To UNSUBSCRIBE or update options visit: http://lists.digium.com/mailman/listinfo/asterisk-users
Re: [Asterisk-Users] Hardware to build an Enterprise AsteriskUniversal Gateway
Automated failover is a nice thought in this instance but in the Telco world it may not be necessary. Most industries will allow for weekend work as well as planned downtime (Yes, even in a three shift manufacturing facility) In my experience, fires and acts of God are far and few between but someone tripping over a power cord or shutting something down or pulling the wrong patch cord is a regular occurance. Not sure if I am agreeing with Steve or not, the more I read his post the less I am sure what he is saying. - Original Message - From: "Steve Underwood" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Saturday, January 10, 2004 2:43 AM Subject: Re: [Asterisk-Users] Hardware to build an Enterprise AsteriskUniversal Gateway > Hi, > > I don't want to drag this into a long thread, but note the original says > "the system should survive just about anything short of an act of God", > and suddenly you are talking about a reliable server and a few switches. > These are quite different things. I have yet to see a 5 x 9's server > room. Fire, mechanical damage and other factors will normally keep the > location itself well below 5 x 9's. Think "system" instead of "server > equipment", and the picture looks very different. Even for a single PC > type server, downtime due to telecoms lines, power problems, fire, > flood, typhoon damage, theft and a mass of other stuff mught well exceed > the server unavailablility itself. I've seen many servers not fail in 5 > years. I have yet to see the best location go that long without causing > at least one substantial period of downtime. 5 x 9's allows about 6 > minutes downtime a year. That means 100% of all failures must have > automated failover, as manuals repair could never be achieved so fast. > Physical diversity if essential for that. > > Regards, > Steve > > > Chris Albertson wrote: > > >--- Steve Underwood <[EMAIL PROTECTED]> wrote: > > > > > >>WipeOut wrote: > >> > >> > >> > >>>Granted five 9's is never easy but in a cluster of 10+ servers the > >>>system should survive just about anything short of an act of God.. > >>> > >>> > >>You do realise that is a real dumb statement, don't you? :-) > >> > >>A cluster of 10 machines, each on a different site. Guarantees from > >>the > >>power company - checked personally to see that aren't cheating - that > >> > >>you have genuinely independant feeds to these sites. Large UPSs, with > >> > >>diesel generator backups. Multiple diverse telecoms links between the > >> > >> > > > >If he says "cluster" he likely means 10 servers in one rack. But still > >you are right. It is all the other stuff that could break. You > >will need paralleld Ethernet switches (Yes they make these, no, they > >are NOT cheap.) you will need some kind of fail over. The switches > >can do that for you. (do a google on "level 3 switch") > > > >It's the level three switches that make .9 possible but half or > >more of your hardware will be just "hot spares" so it really will > >take a rack full of boxes > > > >Each box should have mirrored drives and dual power supplies and each > >AC power cord needs to go to it's own UPS > > > >Has anyone tried to build Asterisk on SPARC/Solaris? One SPARC > >server is almost five nines all by itself as it can do thinks > >like "boot around" failed CPU, RAM or disks. I've actually > >pulled a disk drive out of a running Sun SPARC and applications > >continoued to run. > > > > > > > ___ > Asterisk-Users mailing list > [EMAIL PROTECTED] > http://lists.digium.com/mailman/listinfo/asterisk-users > To UNSUBSCRIBE or update options visit: >http://lists.digium.com/mailman/listinfo/asterisk-users > ___ Asterisk-Users mailing list [EMAIL PROTECTED] http://lists.digium.com/mailman/listinfo/asterisk-users To UNSUBSCRIBE or update options visit: http://lists.digium.com/mailman/listinfo/asterisk-users
Re: [Asterisk-Users] Hardware to build an Enterprise AsteriskUniversal Gateway
Hi, I don't want to drag this into a long thread, but note the original says "the system should survive just about anything short of an act of God", and suddenly you are talking about a reliable server and a few switches. These are quite different things. I have yet to see a 5 x 9's server room. Fire, mechanical damage and other factors will normally keep the location itself well below 5 x 9's. Think "system" instead of "server equipment", and the picture looks very different. Even for a single PC type server, downtime due to telecoms lines, power problems, fire, flood, typhoon damage, theft and a mass of other stuff mught well exceed the server unavailablility itself. I've seen many servers not fail in 5 years. I have yet to see the best location go that long without causing at least one substantial period of downtime. 5 x 9's allows about 6 minutes downtime a year. That means 100% of all failures must have automated failover, as manuals repair could never be achieved so fast. Physical diversity if essential for that. Regards, Steve Chris Albertson wrote: --- Steve Underwood <[EMAIL PROTECTED]> wrote: WipeOut wrote: Granted five 9's is never easy but in a cluster of 10+ servers the system should survive just about anything short of an act of God.. You do realise that is a real dumb statement, don't you? :-) A cluster of 10 machines, each on a different site. Guarantees from the power company - checked personally to see that aren't cheating - that you have genuinely independant feeds to these sites. Large UPSs, with diesel generator backups. Multiple diverse telecoms links between the If he says "cluster" he likely means 10 servers in one rack. But still you are right. It is all the other stuff that could break. You will need paralleld Ethernet switches (Yes they make these, no, they are NOT cheap.) you will need some kind of fail over. The switches can do that for you. (do a google on "level 3 switch") It's the level three switches that make .9 possible but half or more of your hardware will be just "hot spares" so it really will take a rack full of boxes Each box should have mirrored drives and dual power supplies and each AC power cord needs to go to it's own UPS Has anyone tried to build Asterisk on SPARC/Solaris? One SPARC server is almost five nines all by itself as it can do thinks like "boot around" failed CPU, RAM or disks. I've actually pulled a disk drive out of a running Sun SPARC and applications continoued to run. ___ Asterisk-Users mailing list [EMAIL PROTECTED] http://lists.digium.com/mailman/listinfo/asterisk-users To UNSUBSCRIBE or update options visit: http://lists.digium.com/mailman/listinfo/asterisk-users
Re: [Asterisk-Users] Hardware to build an Enterprise AsteriskUniversal Gateway
--- Steve Underwood <[EMAIL PROTECTED]> wrote: > WipeOut wrote: > > > Granted five 9's is never easy but in a cluster of 10+ servers the > > system should survive just about anything short of an act of God.. > > You do realise that is a real dumb statement, don't you? :-) > > A cluster of 10 machines, each on a different site. Guarantees from > the > power company - checked personally to see that aren't cheating - that > > you have genuinely independant feeds to these sites. Large UPSs, with > > diesel generator backups. Multiple diverse telecoms links between the If he says "cluster" he likely means 10 servers in one rack. But still you are right. It is all the other stuff that could break. You will need paralleld Ethernet switches (Yes they make these, no, they are NOT cheap.) you will need some kind of fail over. The switches can do that for you. (do a google on "level 3 switch") It's the level three switches that make .9 possible but half or more of your hardware will be just "hot spares" so it really will take a rack full of boxes Each box should have mirrored drives and dual power supplies and each AC power cord needs to go to it's own UPS Has anyone tried to build Asterisk on SPARC/Solaris? One SPARC server is almost five nines all by itself as it can do thinks like "boot around" failed CPU, RAM or disks. I've actually pulled a disk drive out of a running Sun SPARC and applications continoued to run. = Chris Albertson Home: 310-376-1029 [EMAIL PROTECTED] Cell: 310-990-7550 Office: 310-336-5189 [EMAIL PROTECTED] KG6OMK __ Do you Yahoo!? Yahoo! Hotjobs: Enter the "Signing Bonus" Sweepstakes http://hotjobs.sweepstakes.yahoo.com/signingbonus ___ Asterisk-Users mailing list [EMAIL PROTECTED] http://lists.digium.com/mailman/listinfo/asterisk-users To UNSUBSCRIBE or update options visit: http://lists.digium.com/mailman/listinfo/asterisk-users
Re: [Asterisk-Users] Hardware to build an Enterprise AsteriskUniversal Gateway
On Fri, 2004-01-09 at 21:36, Steve Underwood wrote: > WipeOut wrote: > > > Granted five 9's is never easy but in a cluster of 10+ servers the > > system should survive just about anything short of an act of God.. > > You do realise that is a real dumb statement, don't you? :-) > > A cluster of 10 machines, each on a different site. Guarantees from the > power company - checked personally to see that aren't cheating - that > you have genuinely independant feeds to these sites. Large UPSs, with > diesel generator backups. Multiple diverse telecoms links between the > sites, personally checked multiple times to see there is genuine > diversity (Its a waste of time asking a telco for guarantees of this > kind, as they lie by habit). This *might* start to approach 5 9's. Just > having 10 servers means *very* little. Maybe the fact that the main clusters I have knowledge or in university settings meant to increase compute power, but cluster tends to have the connotation of being in one location. In the case of a single location, the extra machines do mean higher odds of loosing parts due to average time between failure. A friend of mine made a comment about one of the top 500 super computer clusters maintenance having to have a box of memory, and drives. It was mentioned that they lost a certain number of memory modules a day. That freaked me out as the only times I had experienced memory failure was due to miss handling not normal course of computer operation. The setup you mention above isn't what I would normally associate with clustering. It also is unlikely to make a difference for a single office location keeping their system available. -- Steven Critchfield <[EMAIL PROTECTED]> ___ Asterisk-Users mailing list [EMAIL PROTECTED] http://lists.digium.com/mailman/listinfo/asterisk-users To UNSUBSCRIBE or update options visit: http://lists.digium.com/mailman/listinfo/asterisk-users
Re: [Asterisk-Users] Hardware to build an Enterprise AsteriskUniversal Gateway
WipeOut wrote: Granted five 9's is never easy but in a cluster of 10+ servers the system should survive just about anything short of an act of God.. You do realise that is a real dumb statement, don't you? :-) A cluster of 10 machines, each on a different site. Guarantees from the power company - checked personally to see that aren't cheating - that you have genuinely independant feeds to these sites. Large UPSs, with diesel generator backups. Multiple diverse telecoms links between the sites, personally checked multiple times to see there is genuine diversity (Its a waste of time asking a telco for guarantees of this kind, as they lie by habit). This *might* start to approach 5 9's. Just having 10 servers means *very* little. Regards, Steve ___ Asterisk-Users mailing list [EMAIL PROTECTED] http://lists.digium.com/mailman/listinfo/asterisk-users To UNSUBSCRIBE or update options visit: http://lists.digium.com/mailman/listinfo/asterisk-users
Re: [Asterisk-Users] Hardware to build an Enterprise AsteriskUniversal Gateway
Hi Richard, >Load balancers have some added value, but those that have had to deal >with a problem where a single system within the cluster is up but not >processing data would probably argue their actual value. I've done quite a lot of work with clustered/ha linux configurations. I usualy try to keep additional boxes/hardware to an absolute minimum, otherwise the newly introduced points of (hardware) failure tend to make the whole exersize pointless. A solution I found to work quite well: Software load balancer (using LVS) run as a HA service (ldirectord) on two of the servers. This allows use of quite specific probes for the real servers being balanced, so a server not correctly processing requests can be removed from the list of active quite reliably. Since the director script is perl, adding probes for protocols not supported in the default install is fairly streightforward. >If any proposed design actually involved a different MAC address, >obviously all local sip phones would die since the arp cache timeout >within the phones would preclude a failover. (Not cool.) Arp cache timeouts usualy don't come into this: when moving a cluster IP address to a different NIC (probaly on a different machine) you can broadcast gratuitous arp packets on the affected ethernet segment; this updates the arp caches of all connected devices and allows failovers far faster than arp chache timeout. Notable exception: some firewalls can be quite paranoid wrt. to arp updates and will NOT accept gratuitous arp packets. I've run into this with a cluster installation with one of my customers. >Technology now supports 100 meg layer-2 pipes throughout a city at a >reasonable cost. If a cluster were split across mutiple >buildings within a city, it certainly would be of interest to those >that are responsible for business continuity planning. Are there limitations? I'm wary of split cluster configurations because often the need for multiple, independent communication paths between cluster nodes gets overlooked or ignored in these configurations, greatly increasing risk of "split-brain" configurations, i.e. several nodes in the cluster thinking they're the only online server and trying to take over services. This easily/usually leads to a real mess (data corruption) that can be costly to clean up. When keeping your nodes in physical proximity it's much easier to have, say, 2 network links + one serial link between cluster nodes thus providing a very resilient fabric for inter-cluster communications. >Someone mentioned the only data needed to be shared between clustered >systems was phone Registration info (and then quickly jumped >to engineering a solution for that). Is that the only data needed or >might someone need a ton of other stuff? (Is cdr, iax, dialplans, agi, >vm, and/or other dynamic data an issue that needs to be considered in >a reasonable high-availability design?) Depends on what you want/need to fail over in case your asterisk box goes down. in stages that'd be 1 (cluster) IP address for sip/h323 etc. services 2 voice mail, recordings, activity logs 3 registrations for connected VoIP clients 4 active calls (VoIP + PSTN) For the moment, item 4 definitely isn't feasible; even if we get some hardware to switch over E1/T1/PRI whatever interfaves, card or interface initialisation will kill active calls. Item 2 would be plain file on-disk data; for an active/standby cluster replicating these should be pretty straigthforward using either shared storage or an apropriate filesystem/blockdevice replication system. I've personaly had good experience with drbd (block device replication over the network; only supports 2 nodes in active/standby configuration but works quite well for that.) Item 3 should also feasible; this information is already persistent over asterisk restarts and seems to be just a berkley db file for a default install. Sme method as for item 2 should work. >I'd have to guess there are probably hundreds on this list that can >engineer raid drives, ups's for ethernet closet switches, protected >cat 5 cabling, and switch boxes that can move physical >interfaces between servers. But, I'd also guess there are far fewer >that can identify many of the sip, rtp, iax, nat, cdr, etc, etc, >issues. What are some of those issues? (Maybe there aren't any?) Since I'm still very much an asterisk beginner I'll have to pass on this one; However, I'm definitely going to do some experiments on my test cluster systems with asterisk to just see what breaks when failing over asterisk services. Also, things get MUCH more interesting when yo start to move from plain active/standby to active/active configurations: here, for failover, you'll end up with the registration and file data from the failed server and need to integrate that into an already running server merging the seperate sets of information - preferably without trashing the running server :-) Bye, Martin ___
Re: [Asterisk-Users] Hardware to build an Enterprise AsteriskUniversal Gateway
> > Using another load-balancing box (F5 or whatever) only moves the problem > > to that box. Duplicating it, moves the problem to another box, until > > the costs exponentially grow beyond the initial intended value of the > > solution. The weak points become lots of other boxes and infrastructure, > > suggesting that asterisk really isn't "the" weakest point (regardless of > > what its built on). > > Rich is hitting the main point in designing anything for high > reliability. So lets enumerate failures and then what if anything can be > done to eliminate them. > > 1. Line failures. > 2. Hardware failure. > 3. Software failure. > This could be any number of bugs not yet found or that will be > introduced later. > 4. Phones. The primary points the questions were attempting to uncover are more related to basic layer-2 and layer-3 issues (of all necessary components in an end-to-end telephony implementation), and not just basic hardware configurations. Having spent a fair number of years working with corporations that have attempted to build high-availability solutions, the typical engineering approach is almost always oriented towards throwing more hardware at the problem and not thinking about the basic layer-2/3/4 issues. (I don't have an answer that I'm sponsoring either, just looking for comments from those that intimately know the "end-to-end" impact of doing things like hot-sparing or clustering.) I'm sure its fairly clear to most that adding redundant supplies, ups, raid, etc, will improve the uptime of the * box. However, once past throwing hardware at "the" server, where are the pitfalls associated with hot-sparing or clustering * servers? Several well-known companies have attempted products that swap MAC addresses between machines (layer-2), hide servers behind a virtual IP (layer-3), hide a cluster behind some form of load balancing hardware (generally layer-2 & 3), etc. Most of those solutions end up creating yet another problem that was not considered in the original thought process. I.e., not well thought out. (Even Cisco with a building full of engineers didn't initially consider the impact of flip-flopping between boxes when hsrp was first implemented. And there still are issues with that approach that many companies have witnessed first hand.) Load balancers have some added value, but those that have had to deal with a problem where a single system within the cluster is up but not processing data would probably argue their actual value. So, if one were to attempt either hot-sparing or clustering, are there issues associated with sip, rtp, iax, nat and/or other asterisk protocols that would impact the high-availability design? One issue that would _seem_ to be a problem are those installations that have to use canreinvite=no (meaning, even in a clustered environment those rtp sessions are going to be dropped with a server failure. Maybe its okay to simply note the exceptions in a proposed high-availability design.) If any proposed design actually involved a different MAC address, obviously all local sip phones would die since the arp cache timeout within the phones would preclude a failover. (Not cool.) IBM (with their stack of AIX machines) and Tandem (with their non-stop architecture) didn't throw clustered database servers at the problem. Both had them, but not as a means of increasing the availability of the base systems. Technology now supports 100 meg layer-2 pipes throughout a city at a reasonable cost. If a cluster were split across mutiple buildings within a city, it certainly would be of interest to those that are responsible for business continuity planning. Are there limitations? Someone mentioned the only data needed to be shared between clustered systems was phone Registration info (and then quickly jumped to engineering a solution for that). Is that the only data needed or might someone need a ton of other stuff? (Is cdr, iax, dialplans, agi, vm, and/or other dynamic data an issue that needs to be considered in a reasonable high-availability design?) Whether the objective is 2, 3, 4, or 5 nines is somewhat irrelavent. If one had to stand in front of the President or Board and represent/sell availability, they are going to assume end-to-end and not just "the" server. Later, they are not going to talk kindly about the phone system when your single F5 box died; or, (not all that unusual) you say asterisk was up the entire time, its your stupid phones that couldn't find it!! (Or, you lost five hours of cdr data because of why???) I'd have to guess there are probably hundreds on this list that can engineer raid drives, ups's for ethernet closet switches, protected cat 5 cabling, and switch boxes that can move physical interfaces between servers. But, I'd also guess there are far fewer that can identify many of the sip, rtp, iax, nat, cdr, etc, etc, issues. What are some of those issues? (Maybe there aren't any?) Rich __
Re: [Asterisk-Users] Hardware to build an Enterprise AsteriskUniversal Gateway
> > Does your telco provide you with SLAs that make five 9s reasonable at > all ? > LOL... Our telco services could be down for several hours at a time. We found than most US Broadband carriers (DSL and Cable) offer a "best effort" zero SLA service. If you are using broadband as a primary transport, expect the failure points to be "up stream" more than "in house". > Do you really need five 9s ? There is no such thing I'm aware of in > enterprise grade telephony. Cisco has a white paper "IP Telephony: The Five Nines Story" http://www.cisco.com/warp/public/cc/so/neso/vvda/iptl/5nine_wp.htm My take on the "nine's" is that Telcordia SR-323 / Bellcore MIL-HDBK-217 attempted to predict reliability of individual electronic components, and marketing departments have used the predictions as sales tools to best an opponents product. > You have to go to "carrier grade" > equipment, which asterisk, and PCs in general, are definetly not aimed > at. > Most Carrier and even Enterprise phone equipment use a "blade" design. PC's can be configured in a hot swap blade design. Doug -- FREE Unlimited Worldwide Voip calling set-up an account and start saving today! http://www.voippages.com ext. 7000 http://www.pulver.com/fwd/ ext. 83740 free IP phone software @ http://www.xten.com/ http://iaxclient.sourceforge.net/iaxcomm/ ___ Asterisk-Users mailing list [EMAIL PROTECTED] http://lists.digium.com/mailman/listinfo/asterisk-users
Re: [Asterisk-Users] Hardware to build an Enterprise AsteriskUniversal Gateway
Nicolas Bougues wrote: On Sun, Jan 04, 2004 at 07:38:16PM +, WipeOut wrote: Also a failover system would typically only be 2 servers, if there were a cluster system there could be 10 servers in which case five 9's should be easy.. Err, no. five 9s is *never* easy. Does your telco provide you with SLAs that make five 9s reasonable at all ? Do you really need five 9s ? There is no such thing I'm aware of in enterprise grade telephony. You have to go to "carrier grade" equipment, which asterisk, and PCs in general, are definetly not aimed at. Granted five 9's is never easy but in a cluster of 10+ servers the system should survive just about anything short of an act of God.. Maybe, as mentioned eariler, a more realistic goal for Asterisk is three or four 9's.. Three 9's could probably be achived already on a single server with RAID and hot swap power so four 9's is probably a good target to go for.. Later.. ___ Asterisk-Users mailing list [EMAIL PROTECTED] http://lists.digium.com/mailman/listinfo/asterisk-users
Re: [Asterisk-Users] Hardware to build an Enterprise AsteriskUniversal Gateway
On Sun, Jan 04, 2004 at 07:38:16PM +, WipeOut wrote: > > Also a failover system would typically only be 2 servers, if there were > a cluster system there could be 10 servers in which case five 9's should > be easy.. > Err, no. five 9s is *never* easy. Does your telco provide you with SLAs that make five 9s reasonable at all ? Do you really need five 9s ? There is no such thing I'm aware of in enterprise grade telephony. You have to go to "carrier grade" equipment, which asterisk, and PCs in general, are definetly not aimed at. -- Nicolas Bougues Axialys Interactive ___ Asterisk-Users mailing list [EMAIL PROTECTED] http://lists.digium.com/mailman/listinfo/asterisk-users
Re: [Asterisk-Users] Hardware to build an Enterprise AsteriskUniversal Gateway
On Sun, 2004-01-04 at 21:23, Rich Adamson wrote: > Part of the point of many of the questions is that there really are a > lot of dependencies on devices other then asterisk, and simply going down > a path that says clustering (or whichever approach) can handle something > is probably ignoring several of those dependencies which does not actually > improve the end-to-end availability of asterisk. (Technically, asterisk > is up, you just can't reach it because your phone (or whatever) doesn't > know how to get to it.) > > Using another load-balancing box (F5 or whatever) only moves the problem > to that box. Duplicating it, moves the problem to another box, until > the costs exponentially grow beyond the initial intended value of the > solution. The weak points become lots of other boxes and infrastructure, > suggesting that asterisk really isn't "the" weakest point (regardless of > what its built on). Rich is hitting the main point in designing anything for high reliability. So lets enumerate failures and then what if anything can be done to eliminate them. 1. Line failures. I'll lump them together as they can occur anywhere from the CO to your premises. I've experienced them in just about every section in my short time in this part of the industry. I have had lines broken inside the CO. I have had water get to the lines along the street during construction, and it could have just as easily been the construction people cutting the line if they had been any more careless. Inside the building problems that luckily aren't as likely to crop up after install. BTW, this is the same even if your incoming phones are VoIP lines. 2. Hardware failure. This can be drives, memory, cpu, NIC, or any other part that basically renders the hardware unavailable or unstable. 3. Software failure. This could be any number of bugs not yet found or that will be introduced later. 4. Phones. This can be split to a VoIP and an analog section as the problems and solutions are different. a. VoIP b. analog 5. Power. This also falls into two parts split on VoIP and analog as it doesn't help to have power on the switch if all your phones go dark. Think about in cases where there is a storm or other adverse conditions and you need to call authorities. So now you go to solutions. 1. Your solution to this is based on budget because the only solutions cost a monthly fee. Also for truely good solution, the install fee will go up too. Basically the solution here comes via redundancy. Not just in multiples, but in getting the lines from different locations and making sure they don't follow the same paths. Most locations are not wired from different paths unless your location attracted a fiber loop. So if you have to have it, it might cost quite a bit or not be available. 2. Raid and hot swap drives combined with hot swap redundant power supplies. This is about the limit of what is currently available on a budget in the x86 world. Also with Raid, make sure you have actual redundancy. Raid doesn't always mean you are in a condition all the time to recover from a failure. If it is really important, you will also have hot spares in the machine. As you can see, this adds cost each time you add a drive to make a system more resilient to failure. During a recent presentation at our LUG, it was explained that even Raid can fail. The presenter had several drives die all together due to an AC failure. They had hot spares, but as drives failed, extra stress was applied to weakened drives till they failed. Soon they exceeded their fault tolerance and had to rely on what they could scrape together from backups to recover. So if possible, look into Raid equipment that has some form of interface to see what is going on especially if you aren't in a monitored environment. If your Raid is able to generate messages at the driver layer and you can watch these messages, you can fix a problem before it escalates. While multiple machines is another way of solving a total system failure, you are probably more likely to experience a line failure more often than a hardware failure if you treat your hardware well. Some forms of this solution also require software modification. 3. This one basically only is combated by due diligence. Mark and the other CVS comiters due their best to review everything before it goes in. Those who write patches try not to write buggy code. The implementers should still spend some time testing all the components to verify the functions work as needed. 4. Phones luckily have few failures. And when they do fail, it doesn't usually take down any other phones. Analog phones can be just swapped out as there are few differences between them. Only ADSI would complicate this, but not if you had spares of the same ADSI phones. VoIP is pretty much the same. 5. Power is important as good clean power makes your hardware last longer. Add to this that it is needed to survive any adverse weather conditions. Analog phones makes yo
Re: [Asterisk-Users] Hardware to build an Enterprise AsteriskUniversal Gateway
The comments below are certainly not intended as any form of negativism, but rather to pursue thought processes for redundant systems. > > 1. Moving a physical interface (whether a T1, ethernet or 2-wire pstn) is > > mostly trivial, however what "signal" is needed to detect a system failure > > and move the physical connection to a second machine/interface? (If there > > are three systems in a cluster, what signal is needed? If a three-way > > switch is required, does someone want to design, build, and sell it to > > users? Any need to discuss a four-way switch? Should there be a single > > switch that flip-flops all three at the same time (T1, Ethernet, pstn)?) > > Simple idea: Have a process on each machine pulse a lead-state (something > a s simple as DTR out a serial port or a single data line on a parallel > port) out to an external box. This box is strictly discrete hardware and > built with timeout that is retriggered by the pulse. When the pulse fails > to arrive, the box switches the T1 over to the backup system. And upon partial restoration of the failed system, should it automatically fall back to the primary? Or, might there be some element of human control that would suggest not falling back until told to do so? > > Since protecting calls in progress (under all circumstances and > > configurations) is likely the most expensive and most difficult to achieve, > > we can probably all agree that handling this should be left to some > > future long-range plan. Is that acceptable to everyone? > > Its going to be almost impossible to preserve calls in progress. If you > switch a T1 from one machine to the other, there's going to either going > to be a lack of sync (ISDN D-channels need to come up, RBS channels need > to wink) that's going to result in the loss of the call. What about calls in progress between two sip phones (and cdr records)? > > 2. In a hot-spare arrangement (single primary, single running secondary), > > what static and/or dynamic information needs to be shared across the > > two systems to maintain the best chance of switching to the secondary > > system in the shortest period of time, and while minimizing the loss of > > business data? (Should this same data be shared across all systems in > > a cluster if the cluster consists of two or more machines?) > > > > 3. If a clustered environment, is clustering based on IP address or MAC > > address? > >a. If based on an IP address, is a layer-3 box required between * and > > sip phones? (If so, how many?) > > Yes. You'll need something like Linux Virtual Server or an F5 load > balancing box to make this happen. You can play silly games with round > robin DNS, but it doesn't handle failure well. Agreed, but then one would need two F5 boxes as "it" would become the new single point of failure. > >b. If based on MAC address, what process moves an active * MAC address > > to a another * machine (to maintain connectivity to sip phones)? > > Something like Ultra Monkey (http://www.ultramonkey.org) > > >c. Should sessions that rely on a failed machine in a cluster simply > > be dropped? > >d. Are there any realistic ways to recover RTP sessions in a clustered > > environment when a single machine within the cluster fails, and RTP > > sessions were flowing through it (canreinvite=no)? > >e. Should a sip phone's arp cache timeout be configurable? > > Shouldn't need to worry about that unless the phone is on the same > physical network segment. Which in most cases where asterisk is deployed (obviously not all) is probably the case. > >f. Which system(s) control the physical switch in #1 above? > > A voting system...all systems control it. It is up to the switch to > decide who isn't working right. With probably some manual over-ride since we know that systems can appear to be ready for production, but the sys admin says its not ready due to any number of valid technical reasons. > >g. Is sharing static/dynamic operational data across some sort of > > high-availability hsrp channel acceptable, or, should two or more > > database servers be deployed? > > DB Server clustering is a fairly solid technology these days. Deploy a DB > cluster if you want. Which gets to be rather expensive, adds complexity, and additional points of failure (decreasing the ability to approach five/four-9's). > > 4. If a firewall/nat box is involved, what are the requirements to detect > >and handle a failed * machine? > >a. Are the requirements different for hot-spare vs clustering? > >b. What if the firewall is an inexpensive device (eg, Linksys) with > > minimal configuration options? > >c. Are the nat requirements within * different for clustering? > > > > 5. Should sip phones be configurable with a primary and secondary proxy? > >a. If the primary proxy fails, what determines when a sip phone fails > > over to the secondary proxy?
Re: [Asterisk-Users] Hardware to build an Enterprise AsteriskUniversal Gateway
> 1. Moving a physical interface (whether a T1, ethernet or 2-wire pstn) is > mostly trevial, however what "signal" is needed to detect a system failure > and move the physical connection to a second machine/interface? (If there > are three systems in a cluster, what signal is needed? If a three-way > switch is reqquired, does someone want to design, build, and sell it to > users? Any need to discuss a four-way switch? Should there be a single > switch that flip-flops all three at the same time (T1, Ethernet, pstn)?) Simple idea: Have a process on each machine pulse a lead-state (something a s simple as DTR out a serial port or a single data line on a parallel port) out to an external box. This box is strictly discrete hardware and built with timeout that is retriggered by the pulse. When the pulse fails to arrive, the box switches the T1 over to the backup system. > > Since protecting calls in progress (under all circumstances and > configurations) is likely the most expensive and most difficult to achive, > we can probably all agree that handling this should be left to some > future long-range plan. Is that acceptable to everyone? Its going to be almost impossible to preserve calls in progress. If you switch a T1 from one machine to the other, there's going to either going to be a lack of sync (ISDN D-channels need to come up, RBS channels need to wink) that's going to result in the loss of the call. > 2. In a hot-spare arrangement (single primary, single running secondary), > what static and/or dynamic information needs to be shared across the > two systems to maintain the best chance of switching to the secondary > system in the shortest period of time, and while minimizing the loss of > business data? (Should this same data be shared across all systems in > a cluster if the cluster consists of two or more machines?) > > 3. If a clustered environment, is clustering based on IP address or MAC > address? >a. If based on an IP address, is a layer-3 box required between * and > sip phones? (If so, how many?) Yes. You'll need something like Linux Virtual Server or an F5 load balancing box to make this happen. You can play silly games with round robin DNS, but it doesn't handle failure well. >b. If based on MAC address, what process moves an active * MAC address > to a another * machine (to maintain connectivity to sip phones)? Something like Ultra Monkey (http://www.ultramonkey.org) >c. Should sessions that rely on a failed machine in a cluster simply > be dropped? >d. Are there any realistic ways to recover RTP sessions in a clustered > environment when a single machine within the cluster fails, and RTP > sessions were flowing through it (canreinvite=no)? >e. Should a sip phone's arp cache timeout be configurable? Shouldn't need to worry about that unless the phone is on the same physical network segment. >f. Which system(s) control the physical switch in #1 above? A voting system...all systems control it. It is up to the switch to decide who isn't working right. >g. Is sharing static/dynamic operational data across some sort of > high-availability hsrp channel acceptable, or, should two or more > database servers be deployed? DB Server clustering is a fairly solid technology these days. Deploy a DB cluster if you want. > 4. If a firewall/nat box is involved, what are the requirements to detect >and handle a failed * machine? >a. Are the requirements different for hot-spare vs clustering? >b. What if the firewall is an inexpensive device (eg, Linksys) with > minimal configuration options? >c. Are the nat requirements within * different for clustering? > > 5. Should sip phones be configurable with a primary and secondary proxy? >a. If the primary proxy fails, what determines when a sip phone fails > over to the secondary proxy? Usually a simple timeout works for this..but if your clustering/hot-spare switch works right...the client should never need to change. >b. After fail over to the secondary, what determines when the sip phone > should switch back to the primary proxy? (Is the primary ready to > handle production calls, or is it back ready for a system admin to > diagnose the original problem in a non-production manner?) Auto switch-back is never a good thing. Once a system is taken out of service by an automated monitoring system, it should be up to human intervention to say that it is ready to go back into service. ___ Asterisk-Users mailing list [EMAIL PROTECTED] http://lists.digium.com/mailman/listinfo/asterisk-users
Re: [Asterisk-Users] Hardware to build an Enterprise AsteriskUniversal Gateway
> >I'd guess part of the five-9's discussion centers around how automated > >must one be to be able to actually get close? If one assumes the loss > >of a SIMM the answer/effort certainly is different then assuming the > >loss of a single interface card (when multiples exist), etc. > > > >I would doubt that anyone reading this list actually have a justifiable > >business requirement for five-9's given the expontential cost/effort > >involved to get there. But, setting some sort of reasonable goal > >that would focus towards failover within xx number of seconds (and > >maybe some other conditions) seems very practical. > > > > > > > A failover system does not solve the scalability issue.. which means > that you have a full server sitting there doing nothing most of the time > when if the load were being balanced across the servers in a "cluster" > senario you would also have the scalability.. > > Also a failover system would typically only be 2 servers, if there were > a cluster system there could be 10 servers in which case five 9's should > be easy.. Everyone's response to Olle's proposition are of value including yours. For those that have been involved with analyzing the requirments to achive five-9's (for anything), there are tons of approaches, and each approach comes with some sort of cost/benefit trade off. Once the approaches have been documented and costs associated with them, it's common for the original requirements to be redefined in terms of something that is more realistic in business terms. Whether that is clustering, hot standby, or another approach is largely irrelavent at the beginning of the process. If you're a sponsor of clustering and your forced to use canreinvite=no, lots of people would be unhappy when their RTP "system" died. I'm not suggesting clustering is a bad choice, only suggesting there are lots of cost/benefit trade-offs that are made on an individual basis and there might be more then one answer to reliability/uptime question. In an earlier post, you mentioned a single IP address issue. That's really not an issue in some cases as a virtual IP (within a cluster) may be perfectly fine (canreinvite=yes), etc. Pure guess is that use of a virtual IP forces some other design choices like the need for a layer-3 box (since virtual IP's won't fix layer-2 problems), and probably revisiting RTP standards. (And, if we only have one layer-3 box, guess we need to get another for uptime, etc, etc.) Since hardware has become increasingly more reliable, infrastructure items less expensive, uptimes moving towards larger numbers, software more reliable (in very general terms over years), using a hot spare approach could be just as effective as a two-box cluster. In both cases, part of the problem boils down to assumptions about external interfaces and how to move those interfaces between two or "more" boxes; and, what design requirements one states regardling calls in progress. (Olle, are you watching?) 1. Moving a physical interface (whether a T1, ethernet or 2-wire pstn) is mostly trevial, however what "signal" is needed to detect a system failure and move the physical connection to a second machine/interface? (If there are three systems in a cluster, what signal is needed? If a three-way switch is reqquired, does someone want to design, build, and sell it to users? Any need to discuss a four-way switch? Should there be a single switch that flip-flops all three at the same time (T1, Ethernet, pstn)?) Since protecting calls in progress (under all circumstances and configurations) is likely the most expensive and most difficult to achive, we can probably all agree that handling this should be left to some future long-range plan. Is that acceptable to everyone? 2. In a hot-spare arrangement (single primary, single running secondary), what static and/or dynamic information needs to be shared across the two systems to maintain the best chance of switching to the secondary system in the shortest period of time, and while minimizing the loss of business data? (Should this same data be shared across all systems in a cluster if the cluster consists of two or more machines?) 3. If a clustered environment, is clustering based on IP address or MAC address? a. If based on an IP address, is a layer-3 box required between * and sip phones? (If so, how many?) b. If based on MAC address, what process moves an active * MAC address to a another * machine (to maintain connectivity to sip phones)? c. Should sessions that rely on a failed machine in a cluster simply be dropped? d. Are there any realistic ways to recover RTP sessions in a clustered environment when a single machine within the cluster fails, and RTP sessions were flowing through it (canreinvite=no)? e. Should a sip phone's arp cache timeout be configurable? f. Which system(s) control the physical switch in #1 above? g. Is sharing static/dynamic operational data acro
Re: [Asterisk-Users] Hardware to build an Enterprise AsteriskUniversal Gateway
>>> Andrew Kohlsmith wrote: >I would set the "Enterprise Class" bar at five 9's reliability >(about 5.25 minutes per year of down time) the same >as a Class 4/5 phone switch. This would require redundant >design considerations in both hardware and software. >>> >>> To turn around, let's discuss what we need to focus on to get >>> Asterisk there: >>> >>> Here's a few bullet points, there's certainly a lot more >>> * Linux platform stability - how? >> >> Even more than Linux itself is the x86 platform... I've thought about >> this a bit when considering * boxes for big customers. When one >> actually comes along, I'll have to actually make a decision :-). >>>From where I stand, the best thing to do for smaller customers is give >> them a box with RAID and redundant power supplies, if they can afford >> it. > > You can overcome most of those problems by buying good quality > hardware. If you buy your * server from your local Taiwanese clone > shop, you're asking for trouble. A big, beefy machine from Dell would > be better. Yeah, but nothing like a nice, big Sun machine. A cluster of Dell machines is reliable, but a midrange Sun box puts them to shame. >> But if I were to have a big customer with deep pockets, I'd really >> like * on a big Sun beast with redundant-everything (i.e. you can hot >> swap any component and there's usually n+1 of everything). The >> problem is that I don't think there's any Solaris support for Digium >> cards, since it's kind of a chicken-and-egg problem. > > Nope. No Solaris support, but you might be able to get away with > Linux/Solaris...but then you lose a lot of the hot-swapability. In my > experience, though, the only things I've ever been able to hotswap were > power supplies and hard drives...and thats not software/os dependant. With the big boxes like the 4800, you can hot swap CPUs and memory and such as well. You're right that all that stuff is pretty Solaris-dependent, which is why I wanted to see if I couldn't get Asterisk to run on a little Solaris machine (and then sell it to people who own the big ones). >> One of these days, I may convince myself to buy a modern Sun box >> (maybe the ~$1000 Blade 100s) and see what can be done. The only >> problem I could conceive would be endian-ness, but I read about Digium >> cards in a PowerPC box, so that won't be a problem, right? >> Nick > > Endian-ness is really only a driver issue. Its when programmers who > believe that the world revolves around Linux/i386 that you have > problems. But it can also be a problem if you have on-card firmware, I've heard. > Personally, I'd stick my Digium cards into an Alpha of some sort. A > DS-10L for 1U mounting with 1 card or a DS-20 for multiple cards where > you need lots of processor zoobs. I like the Alphas too, but they're being discontinued last I heard, and being replaced with the Itanium. Even VMS is being ported (now _there's_ an OS for * :-) Nick ___ Asterisk-Users mailing list [EMAIL PROTECTED] http://lists.digium.com/mailman/listinfo/asterisk-users
Re: [Asterisk-Users] Hardware to build an Enterprise AsteriskUniversal Gateway
On Sun, 2004-01-04 at 13:28, WipeOut wrote: > Steven Critchfield wrote: > > >On Sun, 2004-01-04 at 10:14, Doug Shubert wrote: > > > > > >>I would set the "Enterprise Class" bar at five 9's reliability > >>(about 5.25 minutes per year of down time) the same > >>as a Class 4/5 phone switch. This would require redundant > >>design considerations in both hardware and software. > >> > >>In our network, Linux is approaching > >>"Enterprise Class" and I don't see why * > >>could not achieve this in the near future. > >> > >> > > > >I may be wrong, but I think the 5 9's relates to full system not to > >individual pieces especially when talking about a class4/5 switch. On a > >small scale deployment, that will be a problem as you won't implement > >full redundancy. Redundancy adds quite a bit to the cost of your > >deployment. > > > >As far as linux goes, it is at that level if you put forth the effort to > >make it's environment decent. I have multiple machines approaching 2 > >years of uptime, and many over a year of uptime. I have not had a > >machine in my colo space go down since we removed the one machine with a > >buggy NIC. > > > >So next step, is asterisk. Outside of a couple of deadlocks from kernel > >problems when I was compiling new modules, I haven't had asterisk knock > >over while doing normal calls. > > > >The downtime could have been dealt with by having some redundancy in the > >physical lines. I would have lost the calls on the line, but the calls > >could be reconnected immediately. > > > >I can say up front that I have asterisk installs running multiple months > >without problems. > > > > > Steven, > > You often mention your servers uptime, I am assuming you don't count > reboots since you must have had to patch your kernel at least a few > times in the last year and the reboot would have reset your uptime.. Why do you assume I would have to patch a kernel? Not all machines must run the most current kernels, and some kernels can be such that they are sufficiently minimal enough to present low risk. Plus all the recent problems require a local user to exploit. I subscribe to the theory to only give access to critical machines to people I can quickly level a shotgun to their head. With that knowledge, and my users acknowledgment or witness to my accuracy, they don't wish to screw with the systems. BTW, my accuracy goes up with the number of concurrent targets by about 4 percent. > If that is the case then I have a server that is also around the 2 year > uptime mark.. The longest single runtime between reboots for updated > kernels is only 127 days.. :) I have 2 machines at this moment that are halfway to looping the uptime counter again at 497 days. Webserver is at 497 + 197 days Old almost decommissioned file server is at 497 + 194 days A VPN machine is at 414 days DB server is at 245 days A almost decommissioned distro server is at 497 + 165 days due to some upgrades, I now have fewer machines holding high uptimes. My mail server was updated just over 2 months ago and it was swapped to the distro server. So the distro server that is about to be decommissioned is really just waiting for me to go take it out of the rack. Those are real uptimes with no reboots. What makes those 4 machines with more than a year uptime interesting is that 1 is a dell, one is a supermicro, the other 2 are homebuilt systems. So I can attest to x86 being able to be stable. Maybe not always, and I would like some more swappable parts. -- Steven Critchfield <[EMAIL PROTECTED]> ___ Asterisk-Users mailing list [EMAIL PROTECTED] http://lists.digium.com/mailman/listinfo/asterisk-users
Re: [Asterisk-Users] Hardware to build an Enterprise AsteriskUniversal Gateway
>> Andrew Kohlsmith wrote: I would set the "Enterprise Class" bar at five 9's reliability (about 5.25 minutes per year of down time) the same as a Class 4/5 phone switch. This would require redundant design considerations in both hardware and software. >>> >> >> To turn around, let's discuss what we need to focus on to get >> Asterisk there: >> >> Here's a few bullet points, there's certainly a lot more >> * Linux platform stability - how? > > Even more than Linux itself is the x86 platform... I've thought about this > a bit when considering * boxes for big customers. When one actually comes > along, I'll have to actually make a decision :-). >>From where I stand, the best thing to do for smaller customers is give > them a box with RAID and redundant power supplies, if they can afford it. You can overcome most of those problems by buying good quality hardware. If you buy your * server from your local Taiwanese clone shop, you're asking for trouble. A big, beefy machine from Dell would be better. > But if I were to have a big customer with deep pockets, I'd really like * > on a big Sun beast with redundant-everything (i.e. you can hot swap any > component and there's usually n+1 of everything). The problem is that I > don't think there's any Solaris support for Digium cards, since it's kind > of a chicken-and-egg problem. Nope. No Solaris support, but you might be able to get away with Linux/Solaris...but then you lose a lot of the hot-swapability. In my experience, though, the only things I've ever been able to hotswap were power supplies and hard drives...and thats not software/os dependant. > One of these days, I may convince myself to buy a modern Sun box (maybe > the ~$1000 Blade 100s) and see what can be done. The only problem I could > conceive would be endian-ness, but I read about Digium cards in a PowerPC > box, so that won't be a problem, right? > Nick Endian-ness is really only a driver issue. Its when programmers who believe that the world revolves around Linux/i386 that you have problems. Personally, I'd stick my Digium cards into an Alpha of some sort. A DS-10L for 1U mounting with 1 card or a DS-20 for multiple cards where you need lots of processor zoobs. ___ Asterisk-Users mailing list [EMAIL PROTECTED] http://lists.digium.com/mailman/listinfo/asterisk-users
Re: [Asterisk-Users] Hardware to build an Enterprise AsteriskUniversal Gateway
Rich Adamson wrote: Andrew Kohlsmith wrote: I would set the "Enterprise Class" bar at five 9's reliability (about 5.25 minutes per year of down time) the same as a Class 4/5 phone switch. This would require redundant design considerations in both hardware and software. To turn around, let's discuss what we need to focus on to get Asterisk there: Here's a few bullet points, there's certainly a lot more * Linux platform stability - how? ** Special demands when using Zaptel cards * Redundancy architecture * Development/stable release scheme Then we have some channel demands, like * Better support for SRV records in the SIP channel More? Better sip phone support for primary/secondary proxy (and failover) (note: some phones don't support a second proxy at all; some say they do, but fail at it.) Maybe some sort of HSRP (hot spare standby protocol, or whatever) Some form of dynamic config sharing between pri/sec systems Won't mention external pstn line failover as that's sort of a separate topic, or loss of calls in flight, etc. I'd guess part of the five-9's discussion centers around how automated must one be to be able to actually get close? If one assumes the loss of a SIMM the answer/effort certainly is different then assuming the loss of a single interface card (when multiples exist), etc. I would doubt that anyone reading this list actually have a justifiable business requirement for five-9's given the expontential cost/effort involved to get there. But, setting some sort of reasonable goal that would focus towards failover within xx number of seconds (and maybe some other conditions) seems very practical. A failover system does not solve the scalability issue.. which means that you have a full server sitting there doing nothing most of the time when if the load were being balanced across the servers in a "cluster" senario you would also have the scalability.. Also a failover system would typically only be 2 servers, if there were a cluster system there could be 10 servers in which case five 9's should be easy.. Later.. ___ Asterisk-Users mailing list [EMAIL PROTECTED] http://lists.digium.com/mailman/listinfo/asterisk-users
Re: [Asterisk-Users] Hardware to build an Enterprise AsteriskUniversal Gateway
Steven Critchfield wrote: On Sun, 2004-01-04 at 10:14, Doug Shubert wrote: I would set the "Enterprise Class" bar at five 9's reliability (about 5.25 minutes per year of down time) the same as a Class 4/5 phone switch. This would require redundant design considerations in both hardware and software. In our network, Linux is approaching "Enterprise Class" and I don't see why * could not achieve this in the near future. I may be wrong, but I think the 5 9's relates to full system not to individual pieces especially when talking about a class4/5 switch. On a small scale deployment, that will be a problem as you won't implement full redundancy. Redundancy adds quite a bit to the cost of your deployment. As far as linux goes, it is at that level if you put forth the effort to make it's environment decent. I have multiple machines approaching 2 years of uptime, and many over a year of uptime. I have not had a machine in my colo space go down since we removed the one machine with a buggy NIC. So next step, is asterisk. Outside of a couple of deadlocks from kernel problems when I was compiling new modules, I haven't had asterisk knock over while doing normal calls. The downtime could have been dealt with by having some redundancy in the physical lines. I would have lost the calls on the line, but the calls could be reconnected immediately. I can say up front that I have asterisk installs running multiple months without problems. Steven, You often mention your servers uptime, I am assuming you don't count reboots since you must have had to patch your kernel at least a few times in the last year and the reboot would have reset your uptime.. If that is the case then I have a server that is also around the 2 year uptime mark.. The longest single runtime between reboots for updated kernels is only 127 days.. :) Later.. ___ Asterisk-Users mailing list [EMAIL PROTECTED] http://lists.digium.com/mailman/listinfo/asterisk-users
Re: [Asterisk-Users] Hardware to build an Enterprise AsteriskUniversal Gateway
> Andrew Kohlsmith wrote: > >>I would set the "Enterprise Class" bar at five 9's reliability > >>(about 5.25 minutes per year of down time) the same > >>as a Class 4/5 phone switch. This would require redundant > >>design considerations in both hardware and software. > > > > To turn around, let's discuss what we need to focus on to get > Asterisk there: > > Here's a few bullet points, there's certainly a lot more > * Linux platform stability - how? > ** Special demands when using Zaptel cards > * Redundancy architecture > * Development/stable release scheme > > Then we have some channel demands, like > * Better support for SRV records in the SIP channel > > More? Better sip phone support for primary/secondary proxy (and failover) (note: some phones don't support a second proxy at all; some say they do, but fail at it.) Maybe some sort of HSRP (hot spare standby protocol, or whatever) Some form of dynamic config sharing between pri/sec systems Won't mention external pstn line failover as that's sort of a separate topic, or loss of calls in flight, etc. I'd guess part of the five-9's discussion centers around how automated must one be to be able to actually get close? If one assumes the loss of a SIMM the answer/effort certainly is different then assuming the loss of a single interface card (when multiples exist), etc. I would doubt that anyone reading this list actually have a justifiable business requirement for five-9's given the expontential cost/effort involved to get there. But, setting some sort of reasonable goal that would focus towards failover within xx number of seconds (and maybe some other conditions) seems very practical. ___ Asterisk-Users mailing list [EMAIL PROTECTED] http://lists.digium.com/mailman/listinfo/asterisk-users
Re: [Asterisk-Users] Hardware to build an Enterprise AsteriskUniversal Gateway
> Andrew Kohlsmith wrote: >>>I would set the "Enterprise Class" bar at five 9's reliability >>>(about 5.25 minutes per year of down time) the same >>>as a Class 4/5 phone switch. This would require redundant >>>design considerations in both hardware and software. >> > > To turn around, let's discuss what we need to focus on to get > Asterisk there: > > Here's a few bullet points, there's certainly a lot more > * Linux platform stability - how? Even more than Linux itself is the x86 platform... I've thought about this a bit when considering * boxes for big customers. When one actually comes along, I'll have to actually make a decision :-). >From where I stand, the best thing to do for smaller customers is give them a box with RAID and redundant power supplies, if they can afford it. But if I were to have a big customer with deep pockets, I'd really like * on a big Sun beast with redundant-everything (i.e. you can hot swap any component and there's usually n+1 of everything). The problem is that I don't think there's any Solaris support for Digium cards, since it's kind of a chicken-and-egg problem. One of these days, I may convince myself to buy a modern Sun box (maybe the ~$1000 Blade 100s) and see what can be done. The only problem I could conceive would be endian-ness, but I read about Digium cards in a PowerPC box, so that won't be a problem, right? Nick ___ Asterisk-Users mailing list [EMAIL PROTECTED] http://lists.digium.com/mailman/listinfo/asterisk-users
Re: [Asterisk-Users] Hardware to build an Enterprise AsteriskUniversal Gateway
On Sun, 2004-01-04 at 10:14, Doug Shubert wrote: > I would set the "Enterprise Class" bar at five 9's reliability > (about 5.25 minutes per year of down time) the same > as a Class 4/5 phone switch. This would require redundant > design considerations in both hardware and software. > > In our network, Linux is approaching > "Enterprise Class" and I don't see why * > could not achieve this in the near future. I may be wrong, but I think the 5 9's relates to full system not to individual pieces especially when talking about a class4/5 switch. On a small scale deployment, that will be a problem as you won't implement full redundancy. Redundancy adds quite a bit to the cost of your deployment. As far as linux goes, it is at that level if you put forth the effort to make it's environment decent. I have multiple machines approaching 2 years of uptime, and many over a year of uptime. I have not had a machine in my colo space go down since we removed the one machine with a buggy NIC. So next step, is asterisk. Outside of a couple of deadlocks from kernel problems when I was compiling new modules, I haven't had asterisk knock over while doing normal calls. The downtime could have been dealt with by having some redundancy in the physical lines. I would have lost the calls on the line, but the calls could be reconnected immediately. I can say up front that I have asterisk installs running multiple months without problems. -- Steven Critchfield <[EMAIL PROTECTED]> ___ Asterisk-Users mailing list [EMAIL PROTECTED] http://lists.digium.com/mailman/listinfo/asterisk-users
Re: [Asterisk-Users] Hardware to build an Enterprise AsteriskUniversal Gateway
> I would set the "Enterprise Class" bar at five 9's reliability > (about 5.25 minutes per year of down time) the same > as a Class 4/5 phone switch. This would require redundant > design considerations in both hardware and software. > > In our network, Linux is approaching > "Enterprise Class" and I don't see why * > could not achieve this in the near future. Linux might approach that, but * as an application won't in its present design for lots of reasons that have already been discussed. I'd be reasonable certain (you're right) it will head that direction, it just happens to not be there today. On the surface, I've not heard of anyone that is actually addressing it either. ___ Asterisk-Users mailing list [EMAIL PROTECTED] http://lists.digium.com/mailman/listinfo/asterisk-users
Re: [Asterisk-Users] Hardware to build an Enterprise AsteriskUniversal Gateway
Andrew Kohlsmith wrote: I would set the "Enterprise Class" bar at five 9's reliability (about 5.25 minutes per year of down time) the same as a Class 4/5 phone switch. This would require redundant design considerations in both hardware and software. To turn around, let's discuss what we need to focus on to get Asterisk there: Here's a few bullet points, there's certainly a lot more * Linux platform stability - how? ** Special demands when using Zaptel cards * Redundancy architecture * Development/stable release scheme Then we have some channel demands, like * Better support for SRV records in the SIP channel More? /O ___ Asterisk-Users mailing list [EMAIL PROTECTED] http://lists.digium.com/mailman/listinfo/asterisk-users
Re: [Asterisk-Users] Hardware to build an Enterprise AsteriskUniversal Gateway
Doug Shubert wrote: I would set the "Enterprise Class" bar at five 9's reliability (about 5.25 minutes per year of down time) the same as a Class 4/5 phone switch. This would require redundant design considerations in both hardware and software. In our network, Linux is approaching "Enterprise Class" and I don't see why * could not achieve this in the near future. Asterisk would need some kind of clustering/load balancing ability (Single IP system image for the IP phones across multiple servers) to be truely "Enterprise Class" in terms of both reliability and scaleability.. Obviously that would not be as relevent for the analog hard wired phones unless the channelbanks and T1/E1 lines could be automatically switched to another server.. Later.. ___ Asterisk-Users mailing list [EMAIL PROTECTED] http://lists.digium.com/mailman/listinfo/asterisk-users
Re: [Asterisk-Users] Hardware to build an Enterprise AsteriskUniversal Gateway
> I would set the "Enterprise Class" bar at five 9's reliability > (about 5.25 minutes per year of down time) the same > as a Class 4/5 phone switch. This would require redundant > design considerations in both hardware and software. My Norstar Meridian system has nowhere near this. We get about 5 minutes downtime every month (usually trunk card issues). Not arguing against anything you've said, just making a datapoint. Regards, Andrew ___ Asterisk-Users mailing list [EMAIL PROTECTED] http://lists.digium.com/mailman/listinfo/asterisk-users
Re: [Asterisk-Users] Hardware to build an Enterprise AsteriskUniversal Gateway
I would set the "Enterprise Class" bar at five 9's reliability (about 5.25 minutes per year of down time) the same as a Class 4/5 phone switch. This would require redundant design considerations in both hardware and software. In our network, Linux is approaching "Enterprise Class" and I don't see why * could not achieve this in the near future. Steven Critchfield wrote: > On Sun, 2004-01-04 at 04:35, EDWARD WILSON wrote: > > Does anyone know what the hardware requirements would be to build an > > Enterprise Asterisk Universal Gateway ? I am thinking of something > > comprable to the Cisco AS5xxx Series of gateways. > > Just to prepare you, if you ask the above question, you are not ready to > ask the above question. > > Basically it falls down to the problem of what is needed to be done, and > more so what is considered enterprise level hardware to be run upon. > -- > Steven Critchfield <[EMAIL PROTECTED]> > > ___ > Asterisk-Users mailing list > [EMAIL PROTECTED] > http://lists.digium.com/mailman/listinfo/asterisk-users -- FREE Unlimited Worldwide Voip calling set-up an account and start saving today! http://www.voippages.com ext. 7000 http://www.pulver.com/fwd/ ext. 83740 free IP phone software @ http://www.xten.com/ http://iaxclient.sourceforge.net/iaxcomm/ ___ Asterisk-Users mailing list [EMAIL PROTECTED] http://lists.digium.com/mailman/listinfo/asterisk-users