Re: [Asterisk-Users] Hardware to build an Enterprise AsteriskUniversal Gateway

2004-01-04 Thread Doug Shubert
I would set the "Enterprise Class" bar at five 9's reliability
(about 5.25 minutes per year of down time) the same
as a Class 4/5 phone switch. This would require redundant
design considerations in both hardware and software.

In our network, Linux is approaching
"Enterprise Class" and I don't see why *
could not achieve this in the near future.


Steven Critchfield wrote:

> On Sun, 2004-01-04 at 04:35, EDWARD WILSON wrote:
> > Does anyone know what the hardware requirements would be to build an
> > Enterprise Asterisk Universal Gateway ?  I am thinking of something
> > comprable to the Cisco AS5xxx Series of gateways.
>
> Just to prepare you, if you ask the above question, you are not ready to
> ask the above question.
>
> Basically it falls down to the problem of what is needed to be done, and
> more so what is considered enterprise level hardware to be run upon.
> --
> Steven Critchfield <[EMAIL PROTECTED]>
>
> ___
> Asterisk-Users mailing list
> [EMAIL PROTECTED]
> http://lists.digium.com/mailman/listinfo/asterisk-users

--
FREE Unlimited Worldwide Voip calling
set-up an account and start saving today!
http://www.voippages.com ext. 7000
http://www.pulver.com/fwd/ ext. 83740
free IP phone software @
http://www.xten.com/
http://iaxclient.sourceforge.net/iaxcomm/


___
Asterisk-Users mailing list
[EMAIL PROTECTED]
http://lists.digium.com/mailman/listinfo/asterisk-users


Re: [Asterisk-Users] Hardware to build an Enterprise AsteriskUniversal Gateway

2004-01-04 Thread Andrew Kohlsmith
> I would set the "Enterprise Class" bar at five 9's reliability
> (about 5.25 minutes per year of down time) the same
> as a Class 4/5 phone switch. This would require redundant
> design considerations in both hardware and software.

My Norstar Meridian system has nowhere near this.  We get about 5 minutes 
downtime every month (usually trunk card issues).

Not arguing against anything you've said, just making a datapoint.

Regards,
Andrew
___
Asterisk-Users mailing list
[EMAIL PROTECTED]
http://lists.digium.com/mailman/listinfo/asterisk-users


Re: [Asterisk-Users] Hardware to build an Enterprise AsteriskUniversal Gateway

2004-01-04 Thread WipeOut
Doug Shubert wrote:

I would set the "Enterprise Class" bar at five 9's reliability
(about 5.25 minutes per year of down time) the same
as a Class 4/5 phone switch. This would require redundant
design considerations in both hardware and software.
In our network, Linux is approaching
"Enterprise Class" and I don't see why *
could not achieve this in the near future.
 

Asterisk would need some kind of clustering/load balancing ability 
(Single IP system image for the IP phones across multiple servers) to be 
truely "Enterprise Class" in terms of both reliability and 
scaleability..  Obviously that would not be as relevent for the analog 
hard wired phones unless the channelbanks and T1/E1 lines could be 
automatically switched to another server..

Later..

___
Asterisk-Users mailing list
[EMAIL PROTECTED]
http://lists.digium.com/mailman/listinfo/asterisk-users


Re: [Asterisk-Users] Hardware to build an Enterprise AsteriskUniversal Gateway

2004-01-04 Thread Olle E. Johansson
Andrew Kohlsmith wrote:
I would set the "Enterprise Class" bar at five 9's reliability
(about 5.25 minutes per year of down time) the same
as a Class 4/5 phone switch. This would require redundant
design considerations in both hardware and software.

To turn around, let's discuss what we need to focus on to get
Asterisk there:
Here's a few bullet points, there's certainly a lot more
* Linux platform stability - how?
** Special demands when using Zaptel cards
* Redundancy architecture
* Development/stable release scheme
Then we have some channel demands, like
* Better support for SRV records in the SIP channel
More?

/O

___
Asterisk-Users mailing list
[EMAIL PROTECTED]
http://lists.digium.com/mailman/listinfo/asterisk-users


Re: [Asterisk-Users] Hardware to build an Enterprise AsteriskUniversal Gateway

2004-01-04 Thread Rich Adamson
> I would set the "Enterprise Class" bar at five 9's reliability
> (about 5.25 minutes per year of down time) the same
> as a Class 4/5 phone switch. This would require redundant
> design considerations in both hardware and software.
> 
> In our network, Linux is approaching
> "Enterprise Class" and I don't see why *
> could not achieve this in the near future.

Linux might approach that, but * as an application won't in its present
design for lots of reasons that have already been discussed. I'd be
reasonable certain (you're right) it will head that direction, it just 
happens to not be there today. On the surface, I've not heard of
anyone that is actually addressing it either.


___
Asterisk-Users mailing list
[EMAIL PROTECTED]
http://lists.digium.com/mailman/listinfo/asterisk-users


Re: [Asterisk-Users] Hardware to build an Enterprise AsteriskUniversal Gateway

2004-01-04 Thread Steven Critchfield
On Sun, 2004-01-04 at 10:14, Doug Shubert wrote:
> I would set the "Enterprise Class" bar at five 9's reliability
> (about 5.25 minutes per year of down time) the same
> as a Class 4/5 phone switch. This would require redundant
> design considerations in both hardware and software.
> 
> In our network, Linux is approaching
> "Enterprise Class" and I don't see why *
> could not achieve this in the near future.

I may be wrong, but I think the 5 9's relates to full system not to
individual pieces especially when talking about a class4/5 switch. On a
small scale deployment, that will be a problem as you won't implement
full redundancy. Redundancy adds quite a bit to the cost of your
deployment. 

As far as linux goes, it is at that level if you put forth the effort to
make it's environment decent. I have multiple machines approaching 2
years of uptime, and many over a year of uptime. I have not had a
machine in my colo space go down since we removed the one machine with a
buggy NIC.

So next step, is asterisk. Outside of a couple of deadlocks from kernel
problems when I was compiling new modules, I haven't had asterisk knock
over while doing normal calls.

The downtime could have been dealt with by having some redundancy in the
physical lines. I would have lost the calls on the line, but the calls
could be reconnected immediately. 

I can say up front that I have asterisk installs running multiple months
without problems. 
-- 
Steven Critchfield <[EMAIL PROTECTED]>

___
Asterisk-Users mailing list
[EMAIL PROTECTED]
http://lists.digium.com/mailman/listinfo/asterisk-users


Re: [Asterisk-Users] Hardware to build an Enterprise AsteriskUniversal Gateway

2004-01-04 Thread Nick Bachmann
> Andrew Kohlsmith wrote:
>>>I would set the "Enterprise Class" bar at five 9's reliability
>>>(about 5.25 minutes per year of down time) the same
>>>as a Class 4/5 phone switch. This would require redundant
>>>design considerations in both hardware and software.
>>
>
> To turn around, let's discuss what we need to focus on to get
> Asterisk there:
>
> Here's a few bullet points, there's certainly a lot more
> * Linux platform stability - how?

Even more than Linux itself is the x86 platform... I've thought about this
a bit when considering * boxes for big customers.  When one actually comes
along, I'll have to actually make a decision :-).
>From where I stand, the best thing to do for smaller customers is give
them a box with RAID and redundant power supplies, if they can afford it.
But if I were to have a big customer with deep pockets, I'd really like *
on a big Sun beast with redundant-everything (i.e. you can hot swap any
component and there's usually n+1 of everything).  The problem is that I
don't think there's any Solaris support for Digium cards, since it's kind
of  a chicken-and-egg problem.
One of these days, I may convince myself to buy a modern Sun box (maybe
the ~$1000 Blade 100s) and see what can be done.  The only problem I could
conceive would be endian-ness, but I read about Digium cards in a PowerPC
box, so that won't be a problem, right?
Nick



___
Asterisk-Users mailing list
[EMAIL PROTECTED]
http://lists.digium.com/mailman/listinfo/asterisk-users


Re: [Asterisk-Users] Hardware to build an Enterprise AsteriskUniversal Gateway

2004-01-04 Thread Rich Adamson
> Andrew Kohlsmith wrote:
> >>I would set the "Enterprise Class" bar at five 9's reliability
> >>(about 5.25 minutes per year of down time) the same
> >>as a Class 4/5 phone switch. This would require redundant
> >>design considerations in both hardware and software.
> > 
> 
> To turn around, let's discuss what we need to focus on to get
> Asterisk there:
> 
> Here's a few bullet points, there's certainly a lot more
> * Linux platform stability - how?
> ** Special demands when using Zaptel cards
> * Redundancy architecture
> * Development/stable release scheme
> 
> Then we have some channel demands, like
> * Better support for SRV records in the SIP channel
> 
> More?

Better sip phone support for primary/secondary proxy (and failover)
 (note: some phones don't support a second proxy at all; some say they
  do, but fail at it.)

Maybe some sort of HSRP (hot spare standby protocol, or whatever)

Some form of dynamic config sharing between pri/sec systems

Won't mention external pstn line failover as that's sort of a separate
  topic, or loss of calls in flight, etc.

I'd guess part of the five-9's discussion centers around how automated
must one be to be able to actually get close?  If one assumes the loss
of a SIMM the answer/effort certainly is different then assuming the 
loss of a single interface card (when multiples exist), etc.

I would doubt that anyone reading this list actually have a justifiable
business requirement for five-9's given the expontential cost/effort
involved to get there. But, setting some sort of reasonable goal
that would focus towards failover within xx number of seconds (and
maybe some other conditions) seems very practical. 



___
Asterisk-Users mailing list
[EMAIL PROTECTED]
http://lists.digium.com/mailman/listinfo/asterisk-users


Re: [Asterisk-Users] Hardware to build an Enterprise AsteriskUniversal Gateway

2004-01-04 Thread WipeOut
Steven Critchfield wrote:

On Sun, 2004-01-04 at 10:14, Doug Shubert wrote:
 

I would set the "Enterprise Class" bar at five 9's reliability
(about 5.25 minutes per year of down time) the same
as a Class 4/5 phone switch. This would require redundant
design considerations in both hardware and software.
In our network, Linux is approaching
"Enterprise Class" and I don't see why *
could not achieve this in the near future.
   

I may be wrong, but I think the 5 9's relates to full system not to
individual pieces especially when talking about a class4/5 switch. On a
small scale deployment, that will be a problem as you won't implement
full redundancy. Redundancy adds quite a bit to the cost of your
deployment. 

As far as linux goes, it is at that level if you put forth the effort to
make it's environment decent. I have multiple machines approaching 2
years of uptime, and many over a year of uptime. I have not had a
machine in my colo space go down since we removed the one machine with a
buggy NIC.
So next step, is asterisk. Outside of a couple of deadlocks from kernel
problems when I was compiling new modules, I haven't had asterisk knock
over while doing normal calls.
The downtime could have been dealt with by having some redundancy in the
physical lines. I would have lost the calls on the line, but the calls
could be reconnected immediately. 

I can say up front that I have asterisk installs running multiple months
without problems. 
 

Steven,

You often mention your servers uptime, I am assuming you don't count 
reboots since you must have had to patch your kernel at least a few 
times in the last year and the reboot would have reset your uptime..

If that is the case then I have a server that is also around the 2 year 
uptime mark.. The longest single runtime between reboots for updated 
kernels is only 127 days.. :)

Later..

___
Asterisk-Users mailing list
[EMAIL PROTECTED]
http://lists.digium.com/mailman/listinfo/asterisk-users


Re: [Asterisk-Users] Hardware to build an Enterprise AsteriskUniversal Gateway

2004-01-04 Thread WipeOut
Rich Adamson wrote:

Andrew Kohlsmith wrote:
   

I would set the "Enterprise Class" bar at five 9's reliability
(about 5.25 minutes per year of down time) the same
as a Class 4/5 phone switch. This would require redundant
design considerations in both hardware and software.
   

To turn around, let's discuss what we need to focus on to get
Asterisk there:
Here's a few bullet points, there's certainly a lot more
* Linux platform stability - how?
** Special demands when using Zaptel cards
* Redundancy architecture
* Development/stable release scheme
Then we have some channel demands, like
* Better support for SRV records in the SIP channel
More?
   

Better sip phone support for primary/secondary proxy (and failover)
(note: some phones don't support a second proxy at all; some say they
 do, but fail at it.)
Maybe some sort of HSRP (hot spare standby protocol, or whatever)

Some form of dynamic config sharing between pri/sec systems

Won't mention external pstn line failover as that's sort of a separate
 topic, or loss of calls in flight, etc.
I'd guess part of the five-9's discussion centers around how automated
must one be to be able to actually get close?  If one assumes the loss
of a SIMM the answer/effort certainly is different then assuming the 
loss of a single interface card (when multiples exist), etc.

I would doubt that anyone reading this list actually have a justifiable
business requirement for five-9's given the expontential cost/effort
involved to get there. But, setting some sort of reasonable goal
that would focus towards failover within xx number of seconds (and
maybe some other conditions) seems very practical. 

 

A failover system does not solve the scalability issue.. which means 
that you have a full server sitting there doing nothing most of the time 
when if the load were being balanced across the servers in a "cluster" 
senario you would also have the scalability..

Also a failover system would typically only be 2 servers, if there were 
a cluster system there could be 10 servers in which case five 9's should 
be easy..

Later..

___
Asterisk-Users mailing list
[EMAIL PROTECTED]
http://lists.digium.com/mailman/listinfo/asterisk-users


Re: [Asterisk-Users] Hardware to build an Enterprise AsteriskUniversal Gateway

2004-01-04 Thread James Sharp
>> Andrew Kohlsmith wrote:
I would set the "Enterprise Class" bar at five 9's reliability
(about 5.25 minutes per year of down time) the same
as a Class 4/5 phone switch. This would require redundant
design considerations in both hardware and software.
>>>
>>
>> To turn around, let's discuss what we need to focus on to get
>> Asterisk there:
>>
>> Here's a few bullet points, there's certainly a lot more
>> * Linux platform stability - how?
>
> Even more than Linux itself is the x86 platform... I've thought about this
> a bit when considering * boxes for big customers.  When one actually comes
> along, I'll have to actually make a decision :-).
>>From where I stand, the best thing to do for smaller customers is give
> them a box with RAID and redundant power supplies, if they can afford it.

You can overcome most of those problems by buying good quality hardware. 
If you buy your * server from your local Taiwanese clone shop, you're
asking for trouble.  A big, beefy machine from Dell would be better.

> But if I were to have a big customer with deep pockets, I'd really like *
> on a big Sun beast with redundant-everything (i.e. you can hot swap any
> component and there's usually n+1 of everything).  The problem is that I
> don't think there's any Solaris support for Digium cards, since it's kind
> of  a chicken-and-egg problem.

Nope.  No Solaris support, but you might be able to get away with
Linux/Solaris...but then you lose a lot of the hot-swapability.  In my
experience, though, the only things I've ever been able to hotswap were
power supplies and hard drives...and thats not software/os dependant.

> One of these days, I may convince myself to buy a modern Sun box (maybe
> the ~$1000 Blade 100s) and see what can be done.  The only problem I could
> conceive would be endian-ness, but I read about Digium cards in a PowerPC
> box, so that won't be a problem, right?
> Nick

Endian-ness is really only a driver issue.  Its when programmers who
believe that the world revolves around Linux/i386 that you have problems.

Personally, I'd stick my Digium cards into an Alpha of some sort.  A
DS-10L for 1U mounting with 1 card or a DS-20 for multiple cards where you
need lots of processor zoobs.
___
Asterisk-Users mailing list
[EMAIL PROTECTED]
http://lists.digium.com/mailman/listinfo/asterisk-users


Re: [Asterisk-Users] Hardware to build an Enterprise AsteriskUniversal Gateway

2004-01-04 Thread Steven Critchfield
On Sun, 2004-01-04 at 13:28, WipeOut wrote:
> Steven Critchfield wrote:
> 
> >On Sun, 2004-01-04 at 10:14, Doug Shubert wrote:
> >  
> >
> >>I would set the "Enterprise Class" bar at five 9's reliability
> >>(about 5.25 minutes per year of down time) the same
> >>as a Class 4/5 phone switch. This would require redundant
> >>design considerations in both hardware and software.
> >>
> >>In our network, Linux is approaching
> >>"Enterprise Class" and I don't see why *
> >>could not achieve this in the near future.
> >>
> >>
> >
> >I may be wrong, but I think the 5 9's relates to full system not to
> >individual pieces especially when talking about a class4/5 switch. On a
> >small scale deployment, that will be a problem as you won't implement
> >full redundancy. Redundancy adds quite a bit to the cost of your
> >deployment. 
> >
> >As far as linux goes, it is at that level if you put forth the effort to
> >make it's environment decent. I have multiple machines approaching 2
> >years of uptime, and many over a year of uptime. I have not had a
> >machine in my colo space go down since we removed the one machine with a
> >buggy NIC.
> >
> >So next step, is asterisk. Outside of a couple of deadlocks from kernel
> >problems when I was compiling new modules, I haven't had asterisk knock
> >over while doing normal calls.
> >
> >The downtime could have been dealt with by having some redundancy in the
> >physical lines. I would have lost the calls on the line, but the calls
> >could be reconnected immediately. 
> >
> >I can say up front that I have asterisk installs running multiple months
> >without problems. 
> >  
> >
> Steven,
> 
> You often mention your servers uptime, I am assuming you don't count 
> reboots since you must have had to patch your kernel at least a few 
> times in the last year and the reboot would have reset your uptime..

Why do you assume I would have to patch a kernel? Not all machines must
run the most current kernels, and some kernels can be such that they are
sufficiently minimal enough to present low risk. Plus all the recent
problems require a local user to exploit. I subscribe to the theory to
only give access to critical machines to people I can quickly level a
shotgun to their head. With that knowledge, and my users acknowledgment
or witness to my accuracy, they don't wish to screw with the systems. 

BTW, my accuracy goes up with the number of concurrent targets by about
4 percent. 

> If that is the case then I have a server that is also around the 2 year 
> uptime mark.. The longest single runtime between reboots for updated 
> kernels is only 127 days.. :)

I have 2 machines at this moment that are halfway to looping the uptime
counter again at 497 days.

Webserver is at 497 + 197 days
Old almost decommissioned file server is at 497 + 194 days
A VPN machine is at 414 days
DB server is at 245 days
A almost decommissioned distro server is at 497 + 165 days


due to some upgrades, I now have fewer machines holding high uptimes. My
mail server was updated just over 2 months ago and it was swapped to the
distro server. So the distro server that is about to be decommissioned
is really just waiting for me to go take it out of the rack. 

Those are real uptimes with no reboots. What makes those 4 machines with
more than a year uptime interesting is that 1 is a dell, one is a
supermicro, the other 2 are homebuilt systems. So I can attest to x86
being able to be stable. Maybe not always, and I would like some more
swappable parts.
-- 
Steven Critchfield <[EMAIL PROTECTED]>

___
Asterisk-Users mailing list
[EMAIL PROTECTED]
http://lists.digium.com/mailman/listinfo/asterisk-users


Re: [Asterisk-Users] Hardware to build an Enterprise AsteriskUniversal Gateway

2004-01-04 Thread Nick Bachmann
>>> Andrew Kohlsmith wrote:
>I would set the "Enterprise Class" bar at five 9's reliability
>(about 5.25 minutes per year of down time) the same
>as a Class 4/5 phone switch. This would require redundant
>design considerations in both hardware and software.

>>>
>>> To turn around, let's discuss what we need to focus on to get
>>> Asterisk there:
>>>
>>> Here's a few bullet points, there's certainly a lot more
>>> * Linux platform stability - how?
>>
>> Even more than Linux itself is the x86 platform... I've thought about
>> this a bit when considering * boxes for big customers.  When one
>> actually comes along, I'll have to actually make a decision :-).
>>>From where I stand, the best thing to do for smaller customers is give
>> them a box with RAID and redundant power supplies, if they can afford
>> it.
>
> You can overcome most of those problems by buying good quality
> hardware.  If you buy your * server from your local Taiwanese clone
> shop, you're asking for trouble.  A big, beefy machine from Dell would
> be better.

Yeah, but nothing like a nice, big Sun machine.  A cluster of Dell
machines is reliable, but a midrange Sun box puts them to shame.
>> But if I were to have a big customer with deep pockets, I'd really
>> like * on a big Sun beast with redundant-everything (i.e. you can hot
>> swap any component and there's usually n+1 of everything).  The
>> problem is that I don't think there's any Solaris support for Digium
>> cards, since it's kind of  a chicken-and-egg problem.
>
> Nope.  No Solaris support, but you might be able to get away with
> Linux/Solaris...but then you lose a lot of the hot-swapability.  In my
> experience, though, the only things I've ever been able to hotswap were
> power supplies and hard drives...and thats not software/os dependant.

With the big boxes like the 4800, you can hot swap CPUs and memory and
such as well.  You're right that all that stuff is pretty
Solaris-dependent, which is why I wanted to see if I couldn't get Asterisk
to run on a little Solaris machine (and then sell it to people who own the
big ones).
>> One of these days, I may convince myself to buy a modern Sun box
>> (maybe the ~$1000 Blade 100s) and see what can be done.  The only
>> problem I could conceive would be endian-ness, but I read about Digium
>> cards in a PowerPC box, so that won't be a problem, right?
>> Nick
>
> Endian-ness is really only a driver issue.  Its when programmers who
> believe that the world revolves around Linux/i386 that you have
> problems.

But it can also be a problem if you have on-card firmware, I've heard.

> Personally, I'd stick my Digium cards into an Alpha of some sort.  A
> DS-10L for 1U mounting with 1 card or a DS-20 for multiple cards where
> you need lots of processor zoobs.

I like the Alphas too, but they're being discontinued last I heard, and
being replaced with the Itanium.  Even VMS is being ported (now _there's_
an OS for * :-)
Nick


___
Asterisk-Users mailing list
[EMAIL PROTECTED]
http://lists.digium.com/mailman/listinfo/asterisk-users


Re: [Asterisk-Users] Hardware to build an Enterprise AsteriskUniversal Gateway

2004-01-04 Thread Rich Adamson

> >I'd guess part of the five-9's discussion centers around how automated
> >must one be to be able to actually get close?  If one assumes the loss
> >of a SIMM the answer/effort certainly is different then assuming the 
> >loss of a single interface card (when multiples exist), etc.
> >
> >I would doubt that anyone reading this list actually have a justifiable
> >business requirement for five-9's given the expontential cost/effort
> >involved to get there. But, setting some sort of reasonable goal
> >that would focus towards failover within xx number of seconds (and
> >maybe some other conditions) seems very practical. 
> >
> >  
> >
> A failover system does not solve the scalability issue.. which means 
> that you have a full server sitting there doing nothing most of the time 
> when if the load were being balanced across the servers in a "cluster" 
> senario you would also have the scalability..
> 
> Also a failover system would typically only be 2 servers, if there were 
> a cluster system there could be 10 servers in which case five 9's should 
> be easy..

Everyone's response to Olle's proposition are of value including yours.

For those that have been involved with analyzing the requirments to
achive five-9's (for anything), there are tons of approaches, and each 
approach comes with some sort of cost/benefit trade off. Once the approaches
have been documented and costs associated with them, it's common for
the original requirements to be redefined in terms of something that is
more realistic in business terms. Whether that is clustering, hot standby,
or another approach is largely irrelavent at the beginning of the process.

If you're a sponsor of clustering and your forced to use canreinvite=no, 
lots of people would be unhappy when their RTP "system" died. I'm not
suggesting clustering is a bad choice, only suggesting there are lots
of cost/benefit trade-offs that are made on an individual basis and there
might be more then one answer to reliability/uptime question.

In an earlier post, you mentioned a single IP address issue. That's really
not an issue in some cases as a virtual IP (within a cluster) may be
perfectly fine (canreinvite=yes), etc. Pure guess is that use of a virtual
IP forces some other design choices like the need for a layer-3 box
(since virtual IP's won't fix layer-2 problems), and probably revisiting
RTP standards. (And, if we only have one layer-3 box, guess we need to get
another for uptime, etc, etc.)

Since hardware has become increasingly more reliable, infrastructure items
less expensive, uptimes moving towards larger numbers, software more
reliable (in very general terms over years), using a hot spare approach
could be just as effective as a two-box cluster. In both cases, part of
the problem boils down to assumptions about external interfaces and how
to move those interfaces between two or "more" boxes; and, what design
requirements one states regardling calls in progress.

(Olle, are you watching?)

1. Moving a physical interface (whether a T1, ethernet or 2-wire pstn) is 
mostly trevial, however what "signal" is needed to detect a system failure 
and move the physical connection to a second machine/interface? (If there 
are three systems in a cluster, what signal is needed? If a three-way 
switch is reqquired, does someone want to design, build, and sell it to 
users? Any need to discuss a four-way switch? Should there be a single
switch that flip-flops all three at the same time (T1, Ethernet, pstn)?)

Since protecting calls in progress (under all circumstances and 
configurations) is likely the most expensive and most difficult to achive,
we can probably all agree that handling this should be left to some
future long-range plan. Is that acceptable to everyone?

2. In a hot-spare arrangement (single primary, single running secondary),
what static and/or dynamic information needs to be shared across the
two systems to maintain the best chance of switching to the secondary
system in the shortest period of time, and while minimizing the loss of
business data? (Should this same data be shared across all systems in
a cluster if the cluster consists of two or more machines?)

3. If a clustered environment, is clustering based on IP address or MAC
address?
   a. If based on an IP address, is a layer-3 box required between * and
  sip phones? (If so, how many?)
   b. If based on MAC address, what process moves an active * MAC address
  to a another * machine (to maintain connectivity to sip phones)?
   c. Should sessions that rely on a failed machine in a cluster simply
  be dropped?
   d. Are there any realistic ways to recover RTP sessions in a clustered
  environment when a single machine within the cluster fails, and RTP
  sessions were flowing through it (canreinvite=no)?
   e. Should a sip phone's arp cache timeout be configurable?
   f. Which system(s) control the physical switch in #1 above?
   g. Is sharing static/dynamic operational data acro

Re: [Asterisk-Users] Hardware to build an Enterprise AsteriskUniversal Gateway

2004-01-04 Thread James Sharp
> 1. Moving a physical interface (whether a T1, ethernet or 2-wire pstn) is
> mostly trevial, however what "signal" is needed to detect a system failure
> and move the physical connection to a second machine/interface? (If there
> are three systems in a cluster, what signal is needed? If a three-way
> switch is reqquired, does someone want to design, build, and sell it to
> users? Any need to discuss a four-way switch? Should there be a single
> switch that flip-flops all three at the same time (T1, Ethernet, pstn)?)

Simple idea:  Have a process on each machine pulse a lead-state (something
a s simple as DTR out a serial port or a single data line on a parallel
port) out to an external box.  This box is strictly discrete hardware and
built with timeout that is retriggered by the pulse.  When the pulse fails
to arrive, the box switches the T1 over to the backup system.

>
> Since protecting calls in progress (under all circumstances and
> configurations) is likely the most expensive and most difficult to achive,
> we can probably all agree that handling this should be left to some
> future long-range plan. Is that acceptable to everyone?

Its going to be almost impossible to preserve calls in progress.  If you
switch a T1 from one machine to the other, there's going to either going
to be a lack of sync (ISDN D-channels need to come up, RBS channels need
to wink) that's going to result in the loss of the call.

> 2. In a hot-spare arrangement (single primary, single running secondary),
> what static and/or dynamic information needs to be shared across the
> two systems to maintain the best chance of switching to the secondary
> system in the shortest period of time, and while minimizing the loss of
> business data? (Should this same data be shared across all systems in
> a cluster if the cluster consists of two or more machines?)
>
> 3. If a clustered environment, is clustering based on IP address or MAC
> address?
>a. If based on an IP address, is a layer-3 box required between * and
>   sip phones? (If so, how many?)

Yes.  You'll need something like Linux Virtual Server or an F5 load
balancing box to make this happen.  You can play silly games with round
robin DNS, but it doesn't handle failure well.

>b. If based on MAC address, what process moves an active * MAC address
>   to a another * machine (to maintain connectivity to sip phones)?

Something like Ultra Monkey (http://www.ultramonkey.org)

>c. Should sessions that rely on a failed machine in a cluster simply
>   be dropped?
>d. Are there any realistic ways to recover RTP sessions in a clustered
>   environment when a single machine within the cluster fails, and RTP
>   sessions were flowing through it (canreinvite=no)?
>e. Should a sip phone's arp cache timeout be configurable?

Shouldn't need to worry about that unless the phone is on the same
physical network segment.

>f. Which system(s) control the physical switch in #1 above?

A voting system...all systems control it.  It is up to the switch to
decide who isn't working right.

>g. Is sharing static/dynamic operational data across some sort of
>   high-availability hsrp channel acceptable, or, should two or more
>   database servers be deployed?

DB Server clustering is a fairly solid technology these days.  Deploy a DB
cluster if you want.

> 4. If a firewall/nat box is involved, what are the requirements to detect
>and handle a failed * machine?
>a. Are the requirements different for hot-spare vs clustering?
>b. What if the firewall is an inexpensive device (eg, Linksys) with
>   minimal configuration options?
>c. Are the nat requirements within * different for clustering?
>
> 5. Should sip phones be configurable with a primary and secondary proxy?
>a. If the primary proxy fails, what determines when a sip phone fails
>   over to the secondary proxy?

Usually a simple timeout works for this..but if your clustering/hot-spare
switch works right...the client should never need to change.


>b. After fail over to the secondary, what determines when the sip phone
>   should switch back to the primary proxy? (Is the primary ready to
>   handle production calls, or is it back ready for a system admin to
>   diagnose the original problem in a non-production manner?)

Auto switch-back is never a good thing.  Once a system is taken out of
service by an automated monitoring system, it should be up to human
intervention to say that it is ready to go back into service.


___
Asterisk-Users mailing list
[EMAIL PROTECTED]
http://lists.digium.com/mailman/listinfo/asterisk-users


Re: [Asterisk-Users] Hardware to build an Enterprise AsteriskUniversal Gateway

2004-01-04 Thread Rich Adamson
The comments below are certainly not intended as any form of negativism,
but rather to pursue thought processes for redundant systems.

> > 1. Moving a physical interface (whether a T1, ethernet or 2-wire pstn) is
> > mostly trivial, however what "signal" is needed to detect a system failure
> > and move the physical connection to a second machine/interface? (If there
> > are three systems in a cluster, what signal is needed? If a three-way
> > switch is required, does someone want to design, build, and sell it to
> > users? Any need to discuss a four-way switch? Should there be a single
> > switch that flip-flops all three at the same time (T1, Ethernet, pstn)?)
> 
> Simple idea:  Have a process on each machine pulse a lead-state (something
> a s simple as DTR out a serial port or a single data line on a parallel
> port) out to an external box.  This box is strictly discrete hardware and
> built with timeout that is retriggered by the pulse.  When the pulse fails
> to arrive, the box switches the T1 over to the backup system.

And upon partial restoration of the failed system, should it automatically
fall back to the primary? Or, might there be some element of human 
control that would suggest not falling back until told to do so?

> > Since protecting calls in progress (under all circumstances and
> > configurations) is likely the most expensive and most difficult to achieve,
> > we can probably all agree that handling this should be left to some
> > future long-range plan. Is that acceptable to everyone?
> 
> Its going to be almost impossible to preserve calls in progress.  If you
> switch a T1 from one machine to the other, there's going to either going
> to be a lack of sync (ISDN D-channels need to come up, RBS channels need
> to wink) that's going to result in the loss of the call.

What about calls in progress between two sip phones (and cdr records)?
 
> > 2. In a hot-spare arrangement (single primary, single running secondary),
> > what static and/or dynamic information needs to be shared across the
> > two systems to maintain the best chance of switching to the secondary
> > system in the shortest period of time, and while minimizing the loss of
> > business data? (Should this same data be shared across all systems in
> > a cluster if the cluster consists of two or more machines?)
> >
> > 3. If a clustered environment, is clustering based on IP address or MAC
> > address?
> >a. If based on an IP address, is a layer-3 box required between * and
> >   sip phones? (If so, how many?)
> 
> Yes.  You'll need something like Linux Virtual Server or an F5 load
> balancing box to make this happen.  You can play silly games with round
> robin DNS, but it doesn't handle failure well.

Agreed, but then one would need two F5 boxes as "it" would become the new
single point of failure.
 
> >b. If based on MAC address, what process moves an active * MAC address
> >   to a another * machine (to maintain connectivity to sip phones)?
> 
> Something like Ultra Monkey (http://www.ultramonkey.org)
> 
> >c. Should sessions that rely on a failed machine in a cluster simply
> >   be dropped?
> >d. Are there any realistic ways to recover RTP sessions in a clustered
> >   environment when a single machine within the cluster fails, and RTP
> >   sessions were flowing through it (canreinvite=no)?
> >e. Should a sip phone's arp cache timeout be configurable?
> 
> Shouldn't need to worry about that unless the phone is on the same
> physical network segment.

Which in most cases where asterisk is deployed (obviously not all) is 
probably the case.
 
> >f. Which system(s) control the physical switch in #1 above?
> 
> A voting system...all systems control it.  It is up to the switch to
> decide who isn't working right.

With probably some manual over-ride since we know that systems can 
appear to be ready for production, but the sys admin says its not ready
due to any number of valid technical reasons.
 
> >g. Is sharing static/dynamic operational data across some sort of
> >   high-availability hsrp channel acceptable, or, should two or more
> >   database servers be deployed?
> 
> DB Server clustering is a fairly solid technology these days.  Deploy a DB
> cluster if you want.

Which gets to be rather expensive, adds complexity, and additional
points of failure (decreasing the ability to approach five/four-9's).
 
> > 4. If a firewall/nat box is involved, what are the requirements to detect
> >and handle a failed * machine?
> >a. Are the requirements different for hot-spare vs clustering?
> >b. What if the firewall is an inexpensive device (eg, Linksys) with
> >   minimal configuration options?
> >c. Are the nat requirements within * different for clustering?
> >
> > 5. Should sip phones be configurable with a primary and secondary proxy?
> >a. If the primary proxy fails, what determines when a sip phone fails
> >   over to the secondary proxy?

Re: [Asterisk-Users] Hardware to build an Enterprise AsteriskUniversal Gateway

2004-01-04 Thread Steven Critchfield
On Sun, 2004-01-04 at 21:23, Rich Adamson wrote:
> Part of the point of many of the questions is that there really are a
> lot of dependencies on devices other then asterisk, and simply going down
> a path that says clustering (or whichever approach) can handle something
> is probably ignoring several of those dependencies which does not actually
> improve the end-to-end availability of asterisk. (Technically, asterisk
> is up, you just can't reach it because your phone (or whatever) doesn't
> know how to get to it.)
> 
> Using another load-balancing box (F5 or whatever) only moves the problem
> to that box. Duplicating it, moves the problem to another box, until
> the costs exponentially grow beyond the initial intended value of the
> solution. The weak points become lots of other boxes and infrastructure, 
> suggesting that asterisk really isn't "the" weakest point (regardless of 
> what its built on).

Rich is hitting the main point in designing anything for high
reliability. So lets enumerate failures and then what if anything can be
done to eliminate them.

1. Line failures.
I'll lump them together as they can occur anywhere from the CO to your
premises. I've experienced them in just about every section in my short
time in this part of the industry. I have had lines broken inside the
CO. I have had water get to the lines along the street during
construction, and it could have just as easily been the construction
people cutting the line if they had been any more careless. Inside the
building problems that luckily aren't as likely to crop up after
install. BTW, this is the same even if your incoming phones are VoIP
lines. 

2. Hardware failure. 
This can be drives, memory, cpu, NIC, or any other part that basically
renders the hardware unavailable or unstable.

3. Software failure.
This could be any number of bugs not yet found or that will be
introduced later.

4. Phones.
This can be split to a VoIP and an analog section as the problems and
solutions are different.
a. VoIP
b. analog

5. Power.
This also falls into two parts split on VoIP and analog as it doesn't
help to have power on the switch if all your phones go dark. Think about
in cases where there is a storm or other adverse conditions and you need
to call authorities.

So now you go to solutions. 
1. Your solution to this is based on budget because the only solutions
cost a monthly fee. Also for truely good solution, the install fee will
go up too. Basically the solution here comes via redundancy. Not just in
multiples, but in getting the lines from different locations and making
sure they don't follow the same paths. Most locations are not wired from
different paths unless your location attracted a fiber loop. So if you
have to have it, it might cost quite a bit or not be available.

2. Raid and hot swap drives combined with hot swap redundant power
supplies. This is about the limit of what is currently available on a
budget in the x86 world. Also with Raid, make sure you have actual
redundancy. Raid doesn't always mean you are in a condition all the time
to recover from a failure. If it is really important, you will also have
hot spares in the machine. As you can see, this adds cost each time you
add a drive to make a system more resilient to failure.

During a recent presentation at our LUG, it was explained that even Raid
can fail. The presenter had several drives die all together due to an AC
failure. They had hot spares, but as drives failed, extra stress was
applied to weakened drives till they failed. Soon they exceeded their
fault tolerance and had to rely on what they could scrape together from
backups to recover.

So if possible, look into Raid equipment that has some form of interface
to see what is going on especially if you aren't in a monitored
environment. If your Raid is able to generate messages at the driver
layer and you can watch these messages, you can fix a problem before it
escalates. 

While multiple machines is another way of solving a total system
failure, you are probably more likely to experience a line failure more
often than a hardware failure if you treat your hardware well. Some
forms of this solution also require software modification.  

3. This one basically only is combated by due diligence. Mark and the
other CVS comiters due their best to review everything before it goes
in. Those who write patches try not to write buggy code. The
implementers should still spend some time testing all the components to
verify the functions work as needed.   

4. Phones luckily have few failures. And when they do fail, it doesn't
usually take down any other phones. Analog phones can be just swapped
out as there are few differences between them. Only ADSI would
complicate this, but not if you had spares of the same ADSI phones. VoIP
is pretty much the same.

5. Power is important as good clean power makes your hardware last
longer. Add to this that it is needed to survive any adverse weather
conditions. Analog phones makes yo

Re: [Asterisk-Users] Hardware to build an Enterprise AsteriskUniversal Gateway

2004-01-05 Thread Nicolas Bougues
On Sun, Jan 04, 2004 at 07:38:16PM +, WipeOut wrote:
>
> Also a failover system would typically only be 2 servers, if there were 
> a cluster system there could be 10 servers in which case five 9's should 
> be easy..
> 

Err, no. five 9s is *never* easy.

Does your telco provide you with SLAs that make five 9s reasonable at
all ?

Do you really need five 9s ? There is no such thing I'm aware of in
enterprise grade telephony. You have to go to "carrier grade"
equipment, which asterisk, and PCs in general, are definetly not aimed
at.

-- 
Nicolas Bougues
Axialys Interactive
___
Asterisk-Users mailing list
[EMAIL PROTECTED]
http://lists.digium.com/mailman/listinfo/asterisk-users


Re: [Asterisk-Users] Hardware to build an Enterprise AsteriskUniversal Gateway

2004-01-05 Thread WipeOut
Nicolas Bougues wrote:

On Sun, Jan 04, 2004 at 07:38:16PM +, WipeOut wrote:
 

Also a failover system would typically only be 2 servers, if there were 
a cluster system there could be 10 servers in which case five 9's should 
be easy..

   

Err, no. five 9s is *never* easy.

Does your telco provide you with SLAs that make five 9s reasonable at
all ?
Do you really need five 9s ? There is no such thing I'm aware of in
enterprise grade telephony. You have to go to "carrier grade"
equipment, which asterisk, and PCs in general, are definetly not aimed
at.
 

Granted five 9's is never easy but in a cluster of 10+ servers the 
system should survive just about anything short of an act of God..

Maybe, as mentioned eariler, a more realistic goal for Asterisk is three 
or four 9's.. Three 9's could probably be achived already on a single 
server with RAID and hot swap power so four 9's is probably a good 
target to go for..

Later..

___
Asterisk-Users mailing list
[EMAIL PROTECTED]
http://lists.digium.com/mailman/listinfo/asterisk-users


Re: [Asterisk-Users] Hardware to build an Enterprise AsteriskUniversal Gateway

2004-01-05 Thread Doug Shubert
>
> Does your telco provide you with SLAs that make five 9s reasonable at
> all ?
>

LOL... Our telco services could be down for several hours at a time.

We found than most US Broadband carriers (DSL and Cable) offer a
"best effort" zero SLA service. If you are using broadband as a primary
transport, expect the failure points to be "up stream" more than "in house".

> Do you really need five 9s ? There is no such thing I'm aware of in
> enterprise grade telephony.

Cisco has a white paper "IP Telephony: The Five Nines Story"
http://www.cisco.com/warp/public/cc/so/neso/vvda/iptl/5nine_wp.htm

My take on the "nine's" is that Telcordia SR-323 / Bellcore MIL-HDBK-217
attempted to predict reliability of individual electronic components, and
marketing
departments have used the predictions as sales tools to best an opponents
product.


> You have to go to "carrier grade"
> equipment, which asterisk, and PCs in general, are definetly not aimed
> at.
>

Most Carrier and even Enterprise phone equipment use a "blade" design.
PC's can be configured in a hot swap blade design.

Doug



--
FREE Unlimited Worldwide Voip calling
set-up an account and start saving today!
http://www.voippages.com ext. 7000
http://www.pulver.com/fwd/ ext. 83740
free IP phone software @
http://www.xten.com/
http://iaxclient.sourceforge.net/iaxcomm/


___
Asterisk-Users mailing list
[EMAIL PROTECTED]
http://lists.digium.com/mailman/listinfo/asterisk-users


Re: [Asterisk-Users] Hardware to build an Enterprise AsteriskUniversal Gateway

2004-01-05 Thread Rich Adamson
> > Using another load-balancing box (F5 or whatever) only moves the problem
> > to that box. Duplicating it, moves the problem to another box, until
> > the costs exponentially grow beyond the initial intended value of the
> > solution. The weak points become lots of other boxes and infrastructure, 
> > suggesting that asterisk really isn't "the" weakest point (regardless of 
> > what its built on).
> 
> Rich is hitting the main point in designing anything for high
> reliability. So lets enumerate failures and then what if anything can be
> done to eliminate them.
> 
> 1. Line failures.

> 2. Hardware failure. 

> 3. Software failure.
> This could be any number of bugs not yet found or that will be
> introduced later.

> 4. Phones.

The primary points the questions were attempting to uncover are more
related to basic layer-2 and layer-3 issues (of all necessary components
in an end-to-end telephony implementation), and not just basic hardware
configurations.

Having spent a fair number of years working with corporations that have
attempted to build high-availability solutions, the typical engineering
approach is almost always oriented towards throwing more hardware at the
problem and not thinking about the basic layer-2/3/4 issues. (I don't have
an answer that I'm sponsoring either, just looking for comments from
those that intimately know the "end-to-end" impact of doing things like
hot-sparing or clustering.) I'm sure its fairly clear to most that
adding redundant supplies, ups, raid, etc, will improve the uptime of the
* box. However, once past throwing hardware at "the" server, where are
the pitfalls associated with hot-sparing or clustering * servers?

Several well-known companies have attempted products that swap MAC
addresses between machines (layer-2), hide servers behind a virtual
IP (layer-3), hide a cluster behind some form of load balancing hardware
(generally layer-2 & 3), etc. Most of those solutions end up creating yet 
another problem that was not considered in the original thought process. 
I.e., not well thought out. (Even Cisco with a building full of engineers
didn't initially consider the impact of flip-flopping between boxes
when hsrp was first implemented. And there still are issues with that
approach that many companies have witnessed first hand.)

Load balancers have some added value, but those that have had to deal
with a problem where a single system within the cluster is up but not
processing data would probably argue their actual value.

So, if one were to attempt either hot-sparing or clustering, are there
issues associated with sip, rtp, iax, nat and/or other asterisk protocols 
that would impact the high-availability design?

One issue that would _seem_ to be a problem are those installations that 
have to use canreinvite=no (meaning, even in a clustered environment 
those rtp sessions are going to be dropped with a server failure. Maybe
its okay to simply note the exceptions in a proposed high-availability
design.)

If any proposed design actually involved a different MAC address,
obviously all local sip phones would die since the arp cache timeout 
within the phones would preclude a failover. (Not cool.)

IBM (with their stack of AIX machines) and Tandem (with their non-stop
architecture) didn't throw clustered database servers at the problem.
Both had them, but not as a means of increasing the availability of the 
base systems.

Technology now supports 100 meg layer-2 pipes throughout a city at a
reasonable cost. If a cluster were split across mutiple buildings within 
a city, it certainly would be of interest to those that are responsible 
for business continuity planning. Are there limitations?

Someone mentioned the only data needed to be shared between clustered
systems was phone Registration info (and then quickly jumped to engineering
a solution for that). Is that the only data needed or might someone
need a ton of other stuff? (Is cdr, iax, dialplans, agi, vm, and/or
other dynamic data an issue that needs to be considered in a reasonable
high-availability design?)

Whether the objective is 2, 3, 4, or 5 nines is somewhat irrelavent. If
one had to stand in front of the President or Board and represent/sell
availability, they are going to assume end-to-end and not just "the"
server. Later, they are not going to talk kindly about the phone
system when your single F5 box died; or, (not all that unusual) you
say asterisk was up the entire time, its your stupid phones that couldn't 
find it!! (Or, you lost five hours of cdr data because of why???)

I'd have to guess there are probably hundreds on this list that can 
engineer raid drives, ups's for ethernet closet switches, protected
cat 5 cabling, and switch boxes that can move physical interfaces between
servers. But, I'd also guess there are far fewer that can identify many 
of the sip, rtp, iax, nat, cdr, etc, etc, issues. What are some of those
issues? (Maybe there aren't any?)

Rich


__

Re: [Asterisk-Users] Hardware to build an Enterprise AsteriskUniversal Gateway

2004-01-05 Thread Martin Bene
Hi Richard,

>Load balancers have some added value, but those that have had to deal
>with a problem where a single system within the cluster is up but not
>processing data would probably argue their actual value.

I've done quite a lot of work with clustered/ha linux configurations. I
usualy try to keep additional boxes/hardware to an absolute minimum,
otherwise the newly introduced points of (hardware) failure tend to make the
whole exersize pointless. A solution I found to work quite well:

Software load balancer (using LVS) run as a HA service (ldirectord) on two of
the servers. This allows use of quite specific probes for the real servers
being balanced, so a server not correctly processing requests can be removed
from the list of active quite reliably. Since the director script is perl,
adding probes for protocols not supported in the default install is fairly
streightforward.

>If any proposed design actually involved a different MAC address,
>obviously all local sip phones would die since the arp cache timeout 
>within the phones would preclude a failover. (Not cool.)

Arp cache timeouts usualy don't come into this: when moving a cluster IP
address to a different NIC (probaly on a different machine) you can broadcast
gratuitous arp packets on the affected ethernet segment; this updates the arp
caches of all connected devices and allows failovers far faster than arp
chache timeout. Notable exception: some firewalls can be quite paranoid wrt.
to arp updates and will NOT accept gratuitous arp packets. I've run into this
with a cluster installation with one of my customers.

>Technology now supports 100 meg layer-2 pipes throughout a city at a
>reasonable cost. If a cluster were split across mutiple 
>buildings within a city, it certainly would be of interest to those 
>that are responsible for business continuity planning. Are there
limitations?

I'm wary of split cluster configurations because often the need for multiple,
independent communication paths between cluster nodes gets overlooked or
ignored in these configurations, greatly increasing risk of "split-brain"
configurations, i.e. several nodes in the cluster thinking they're the only
online server and trying to take over services. This easily/usually leads to
a real mess (data corruption) that can be costly to clean up. When keeping
your nodes in physical proximity it's much easier to have, say, 2 network
links + one serial link between cluster nodes thus providing a very resilient
fabric for inter-cluster communications.

>Someone mentioned the only data needed to be shared between clustered
>systems was phone Registration info (and then quickly jumped 
>to engineering a solution for that). Is that the only data needed or 
>might someone need a ton of other stuff? (Is cdr, iax, dialplans, agi, 
>vm, and/or other dynamic data an issue that needs to be considered in 
>a reasonable high-availability design?)

Depends on what you want/need to fail over in case your asterisk box goes
down. in stages that'd be
1 (cluster) IP address for sip/h323 etc. services
2 voice mail, recordings, activity logs
3 registrations for connected VoIP clients
4 active calls (VoIP + PSTN)

For the moment, item 4 definitely isn't feasible; even if we get some
hardware to switch over E1/T1/PRI whatever interfaves, card or interface
initialisation will kill active calls. 

Item 2 would be plain file on-disk data; for an active/standby cluster
replicating these should be pretty straigthforward using either shared
storage or an apropriate filesystem/blockdevice replication system. I've
personaly had good experience with drbd (block device replication over the
network; only supports 2 nodes in active/standby configuration but works
quite well for that.)

Item 3 should also feasible; this information is already persistent over
asterisk restarts and seems to be just a berkley db file for a default
install. Sme method as for item 2 should work.

>I'd have to guess there are probably hundreds on this list that can 
>engineer raid drives, ups's for ethernet closet switches, protected
>cat 5 cabling, and switch boxes that can move physical 
>interfaces between servers. But, I'd also guess there are far fewer 
>that can identify many of the sip, rtp, iax, nat, cdr, etc, etc, 
>issues. What are some of those issues? (Maybe there aren't any?)

Since I'm still very much an asterisk beginner I'll have to pass on  this
one; However, I'm definitely going to do some experiments on my test cluster
systems with asterisk to just see what breaks when failing over asterisk
services.

Also, things get MUCH more interesting when yo start to move from plain
active/standby to active/active configurations: here, for failover, you'll
end up with the registration and file data from the failed server and need to
integrate that into an already running server merging the seperate sets of
information - preferably without trashing the running server :-)

Bye, Martin
___

Re: [Asterisk-Users] Hardware to build an Enterprise AsteriskUniversal Gateway

2004-01-09 Thread Steve Underwood
WipeOut wrote:

Granted five 9's is never easy but in a cluster of 10+ servers the 
system should survive just about anything short of an act of God..
You do realise that is a real dumb statement, don't you? :-)

A cluster of 10 machines, each on a different site. Guarantees from the 
power company - checked personally to see that aren't cheating - that 
you have genuinely independant feeds to these sites. Large UPSs, with 
diesel generator backups. Multiple diverse telecoms links between the 
sites, personally checked multiple times to see there is genuine 
diversity (Its a waste of time asking a telco for guarantees of this 
kind, as they lie by habit). This *might* start to approach 5 9's. Just 
having 10 servers means *very* little.

Regards,
Steve
___
Asterisk-Users mailing list
[EMAIL PROTECTED]
http://lists.digium.com/mailman/listinfo/asterisk-users
To UNSUBSCRIBE or update options visit:
  http://lists.digium.com/mailman/listinfo/asterisk-users


Re: [Asterisk-Users] Hardware to build an Enterprise AsteriskUniversal Gateway

2004-01-09 Thread Steven Critchfield
On Fri, 2004-01-09 at 21:36, Steve Underwood wrote:
> WipeOut wrote:
> 
> > Granted five 9's is never easy but in a cluster of 10+ servers the 
> > system should survive just about anything short of an act of God..
> 
> You do realise that is a real dumb statement, don't you? :-)
> 
> A cluster of 10 machines, each on a different site. Guarantees from the 
> power company - checked personally to see that aren't cheating - that 
> you have genuinely independant feeds to these sites. Large UPSs, with 
> diesel generator backups. Multiple diverse telecoms links between the 
> sites, personally checked multiple times to see there is genuine 
> diversity (Its a waste of time asking a telco for guarantees of this 
> kind, as they lie by habit). This *might* start to approach 5 9's. Just 
> having 10 servers means *very* little.

Maybe the fact that the main clusters I have knowledge or in university
settings meant to increase compute power, but cluster tends to have the
connotation of being in one location. In the case of a single location,
the extra machines do mean higher odds of loosing parts due to average
time between failure. A friend of mine made a comment about one of the
top 500 super computer clusters maintenance having to have a box of
memory, and drives. It was mentioned that they lost a certain number of
memory modules a day. That freaked me out as the only times I had
experienced memory failure was due to miss handling not normal course of
computer operation. 

The setup you mention above isn't what I would normally associate with
clustering. It also is unlikely to make a difference for a single office
location keeping their system available.
-- 
Steven Critchfield <[EMAIL PROTECTED]>

___
Asterisk-Users mailing list
[EMAIL PROTECTED]
http://lists.digium.com/mailman/listinfo/asterisk-users
To UNSUBSCRIBE or update options visit:
   http://lists.digium.com/mailman/listinfo/asterisk-users


Re: [Asterisk-Users] Hardware to build an Enterprise AsteriskUniversal Gateway

2004-01-10 Thread Chris Albertson

--- Steve Underwood <[EMAIL PROTECTED]> wrote:
> WipeOut wrote:
> 
> > Granted five 9's is never easy but in a cluster of 10+ servers the 
> > system should survive just about anything short of an act of God..
> 
> You do realise that is a real dumb statement, don't you? :-)
> 
> A cluster of 10 machines, each on a different site. Guarantees from
> the 
> power company - checked personally to see that aren't cheating - that
> 
> you have genuinely independant feeds to these sites. Large UPSs, with
> 
> diesel generator backups. Multiple diverse telecoms links between the

If he says "cluster" he likely means 10 servers in one rack.  But still
you are right.  It is all the other stuff that could break.  You
will need paralleld Ethernet switches (Yes they make these, no, they
are NOT cheap.) you will need some kind of fail over.  The switches
can do that for you. (do a google on "level 3 switch")

It's the level three switches that make .9 possible but half or
more of your hardware will be just "hot spares" so it really will
take a rack full of boxes

Each box should have mirrored drives and dual power supplies and each
AC power cord needs to go to it's own UPS

Has anyone tried to build Asterisk on SPARC/Solaris?  One SPARC
server is almost five nines all by itself as it can do thinks
like "boot around" failed CPU, RAM or disks.  I've actually
pulled a disk drive out of a running Sun SPARC and applications
continoued to run. 



=
Chris Albertson
  Home:   310-376-1029  [EMAIL PROTECTED]
  Cell:   310-990-7550
  Office: 310-336-5189  [EMAIL PROTECTED]
  KG6OMK

__
Do you Yahoo!?
Yahoo! Hotjobs: Enter the "Signing Bonus" Sweepstakes
http://hotjobs.sweepstakes.yahoo.com/signingbonus
___
Asterisk-Users mailing list
[EMAIL PROTECTED]
http://lists.digium.com/mailman/listinfo/asterisk-users
To UNSUBSCRIBE or update options visit:
   http://lists.digium.com/mailman/listinfo/asterisk-users


Re: [Asterisk-Users] Hardware to build an Enterprise AsteriskUniversal Gateway

2004-01-10 Thread Steve Underwood
Hi,

I don't want to drag this into a long thread, but note the original says 
"the system should survive just about anything short of an act of God", 
and suddenly you are talking about a reliable server and a few switches. 
These are quite different things. I have yet to see a 5 x 9's server 
room. Fire, mechanical damage and other factors will normally keep the 
location itself well below 5 x 9's. Think "system" instead of "server 
equipment", and the picture looks very different. Even for a single PC 
type server, downtime due to telecoms lines, power problems, fire, 
flood, typhoon damage, theft and a mass of other stuff mught well exceed 
the server unavailablility itself. I've seen many servers not fail in 5 
years. I have yet to see the best location go that long without causing 
at least one substantial period of downtime. 5 x 9's allows about 6 
minutes downtime a year. That means 100% of all failures must have 
automated failover, as manuals repair could never be achieved so fast. 
Physical diversity if essential for that.

Regards,
Steve
Chris Albertson wrote:

--- Steve Underwood <[EMAIL PROTECTED]> wrote:
 

WipeOut wrote:

   

Granted five 9's is never easy but in a cluster of 10+ servers the 
system should survive just about anything short of an act of God..
 

You do realise that is a real dumb statement, don't you? :-)

A cluster of 10 machines, each on a different site. Guarantees from
the 
power company - checked personally to see that aren't cheating - that

you have genuinely independant feeds to these sites. Large UPSs, with

diesel generator backups. Multiple diverse telecoms links between the
   

If he says "cluster" he likely means 10 servers in one rack.  But still
you are right.  It is all the other stuff that could break.  You
will need paralleld Ethernet switches (Yes they make these, no, they
are NOT cheap.) you will need some kind of fail over.  The switches
can do that for you. (do a google on "level 3 switch")
It's the level three switches that make .9 possible but half or
more of your hardware will be just "hot spares" so it really will
take a rack full of boxes
Each box should have mirrored drives and dual power supplies and each
AC power cord needs to go to it's own UPS
Has anyone tried to build Asterisk on SPARC/Solaris?  One SPARC
server is almost five nines all by itself as it can do thinks
like "boot around" failed CPU, RAM or disks.  I've actually
pulled a disk drive out of a running Sun SPARC and applications
continoued to run. 
 



___
Asterisk-Users mailing list
[EMAIL PROTECTED]
http://lists.digium.com/mailman/listinfo/asterisk-users
To UNSUBSCRIBE or update options visit:
  http://lists.digium.com/mailman/listinfo/asterisk-users


Re: [Asterisk-Users] Hardware to build an Enterprise AsteriskUniversal Gateway

2004-01-10 Thread Steve Totaro
Automated failover is a nice thought in this instance but in the Telco world
it may not be necessary.  Most industries will allow for weekend work as
well as planned downtime (Yes, even in a three shift manufacturing facility)
In my experience, fires and acts of God are far and few between but someone
tripping over a power cord or shutting something down or pulling the wrong
patch cord is a regular occurance.  Not sure if I am agreeing with Steve or
not, the more I read his post the less I am sure what he is saying.

- Original Message - 
From: "Steve Underwood" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Saturday, January 10, 2004 2:43 AM
Subject: Re: [Asterisk-Users] Hardware to build an Enterprise
AsteriskUniversal Gateway


> Hi,
>
> I don't want to drag this into a long thread, but note the original says
> "the system should survive just about anything short of an act of God",
> and suddenly you are talking about a reliable server and a few switches.
> These are quite different things. I have yet to see a 5 x 9's server
> room. Fire, mechanical damage and other factors will normally keep the
> location itself well below 5 x 9's. Think "system" instead of "server
> equipment", and the picture looks very different. Even for a single PC
> type server, downtime due to telecoms lines, power problems, fire,
> flood, typhoon damage, theft and a mass of other stuff mught well exceed
> the server unavailablility itself. I've seen many servers not fail in 5
> years. I have yet to see the best location go that long without causing
> at least one substantial period of downtime. 5 x 9's allows about 6
> minutes downtime a year. That means 100% of all failures must have
> automated failover, as manuals repair could never be achieved so fast.
> Physical diversity if essential for that.
>
> Regards,
> Steve
>
>
> Chris Albertson wrote:
>
> >--- Steve Underwood <[EMAIL PROTECTED]> wrote:
> >
> >
> >>WipeOut wrote:
> >>
> >>
> >>
> >>>Granted five 9's is never easy but in a cluster of 10+ servers the
> >>>system should survive just about anything short of an act of God..
> >>>
> >>>
> >>You do realise that is a real dumb statement, don't you? :-)
> >>
> >>A cluster of 10 machines, each on a different site. Guarantees from
> >>the
> >>power company - checked personally to see that aren't cheating - that
> >>
> >>you have genuinely independant feeds to these sites. Large UPSs, with
> >>
> >>diesel generator backups. Multiple diverse telecoms links between the
> >>
> >>
> >
> >If he says "cluster" he likely means 10 servers in one rack.  But still
> >you are right.  It is all the other stuff that could break.  You
> >will need paralleld Ethernet switches (Yes they make these, no, they
> >are NOT cheap.) you will need some kind of fail over.  The switches
> >can do that for you. (do a google on "level 3 switch")
> >
> >It's the level three switches that make .9 possible but half or
> >more of your hardware will be just "hot spares" so it really will
> >take a rack full of boxes
> >
> >Each box should have mirrored drives and dual power supplies and each
> >AC power cord needs to go to it's own UPS
> >
> >Has anyone tried to build Asterisk on SPARC/Solaris?  One SPARC
> >server is almost five nines all by itself as it can do thinks
> >like "boot around" failed CPU, RAM or disks.  I've actually
> >pulled a disk drive out of a running Sun SPARC and applications
> >continoued to run.
> >
> >
>
>
> ___
> Asterisk-Users mailing list
> [EMAIL PROTECTED]
> http://lists.digium.com/mailman/listinfo/asterisk-users
> To UNSUBSCRIBE or update options visit:
>http://lists.digium.com/mailman/listinfo/asterisk-users
>

___
Asterisk-Users mailing list
[EMAIL PROTECTED]
http://lists.digium.com/mailman/listinfo/asterisk-users
To UNSUBSCRIBE or update options visit:
   http://lists.digium.com/mailman/listinfo/asterisk-users


Re: [Asterisk-Users] Hardware to build an Enterprise AsteriskUniversal Gateway

2004-01-10 Thread Rich Adamson
> I don't want to drag this into a long thread, but note the original says 
> "the system should survive just about anything short of an act of God", 
> and suddenly you are talking about a reliable server and a few switches. 
> These are quite different things. I have yet to see a 5 x 9's server 
> room. Fire, mechanical damage and other factors will normally keep the 
> location itself well below 5 x 9's. Think "system" instead of "server 
> equipment", and the picture looks very different. Even for a single PC 
> type server, downtime due to telecoms lines, power problems, fire, 
> flood, typhoon damage, theft and a mass of other stuff mught well exceed 
> the server unavailablility itself. I've seen many servers not fail in 5 
> years. I have yet to see the best location go that long without causing 
> at least one substantial period of downtime. 5 x 9's allows about 6 
> minutes downtime a year. That means 100% of all failures must have 
> automated failover, as manuals repair could never be achieved so fast. 
> Physical diversity if essential for that.

The five-9's thread has been discussed under several different subjects
in the last few months, and its not difficult to detect from the postings
the subject has lots of very different levels of technical understandings. 
It's also obvious that many have not worked in a business or institution 
where disaster recovery or business continuity plans mean something much
different then redundant power supplies, raid, motherboard on the shelf, 
a Sun multiprocessor system, a database server, redundent layer-2 switches,
or lots of toys in one's basement.

Whether one refers to application/system availability as five-9's, maximum 
uptime, or some other set of words is mostly irrelevant; the objective is
still to provide the highest level of functionality possible given a set 
of business parameters that might include cost, time to repair, commercial 
power stability, regional susceptibility to tornados or floods, etc, etc. 
Low-end ISP's tend to believe a UPS would address their needs, small companies 
tend towards hot/cold spares, while larger organizations frequent towards 
other approaches that minimize the need for human involvement to recovery 
from any form of failure.

Gus may have a strong conviction that clustering addresses his needs (given
his set of business drivers), while Joe's needs are to recover from "any"
event (including loss of building"s") within X hours that may be driven by 
outside requirements such as government regulations, etc. It is a given
the recovery plan and supported investments will be dramatically different 
for many business cases. Neither one should be ragged on since none of us
on the list are exposed to their business drivers. Regardless of how one 
chooses to address application availability (for the purposes of this list 
anyway), sharing configuration and operational data between multiple 
asterisk boxes on a more real-time basis is/will be important to those 
involved with systems in the small business category and above.

Therefore, the list would benefit from discussions and implementations that
help support the task of dynamically sharing asterisk data across multiple
systems to improve uptime (whatever that happens to mean to each reader).

Excluding the low-end ISP approach and from a 5,000-foot level, it would 
appear that an underlying/common design data-point might be "what are the
asterisk design changes that need to occur to support two (or more)
asterisk systems in seperate physical locations?"  (Note that if someone's
business drivers suggest the systems remain within the same building/room, 
that's fine. If they are separated by 10 feet or 100 miles, that's fine.
If someone wants to include UPS, power supplies, raid, dual-this-or-that, 
layer two/three boxes, load balancers, Sun systems, database servers, etc, 
that's fine. If an external T1 switch is required, that's fine. If clustering
add's some value for someone's deployment, that's fine. If a hot spare
meets the business needs, that's fine. If lots of people have issues with
a particular sip phone vendor's method of fail over, I'll bet some vendors
would be more then willing to improve code "if" they understood it gives
them a competitive advantage over another vendor, etc, etc.)

Thoughts?

Rich


___
Asterisk-Users mailing list
[EMAIL PROTECTED]
http://lists.digium.com/mailman/listinfo/asterisk-users
To UNSUBSCRIBE or update options visit:
   http://lists.digium.com/mailman/listinfo/asterisk-users