Wired for sound: how SIP won the VoIP protocol wars

Ars takes an in-depth look at VoIP, and specifically at the past, 
present, and future of the Session Initiation Protocol (SIP) that forms 
the foundation of many VoIP deployments.

By Gilad Shaham
Ars Technica

Last updated December 8, 2009 9:40 AM

http://arstechnica.com/business/news/2009/12/wired-for-sound-how-sip-won-the-voip-protocol-wars.ars


As an industry grows, it is quite common to find multiple solutions that 
all attempt to address similar requirements. This evolution dictates 
that these proposed standards go through a stage of selection—over time, 
we see some become more dominant than others. Today, the Session 
Initiation Protocol (SIP) is clearly one of the dominant VoIP protocols, 
but that obviously didn't happen overnight. In this article, the first 
of a series of in-depth articles exploring SIP and VoIP, we'll look at 
the main factors that led to this outcome.


A brief history of VoIP

Let's go back to 1995 in the days prior to Google, IM, and even 
broadband. Cell phones were large and bulky, Microsoft had developed a 
new Windows interface with a "Start" button, and Netscape had the most 
popular Web browser. The growth of the Internet and data networks 
prompted many to realize that it's possible to use the new networks to 
serve our voice communication needs while substantially lowering the 
associated cost. The first commercial solution of Internet VoIP came 
from a company called VocalTec; their software allowed two people to 
talk with each other over the Internet. One would make a local call to 
an ISP via a 28.8K or 36.6K modem and be able to talk with friends even 
if they lived far away. I remember trying out this software, and the 
sound was definitely below acceptable quality. (It frequently sounded 
like you were attempting to speak while submerged in a swimming pool.) 
However, the software successfully connected two people and introduced 
real-time voice conversation for a bandwidth-constrained network.

It was immediately apparent to the first VoIP implementors that there 
are several differences between the telephone network and the data 
network. One of them is the message exchange design. The phone system 
works in circuit-switch, where a circuit is the complete path between 
two endpoints. Thus, it is possible to guarantee a single path for all 
messages in a single communication. The data network works with packets, 
where various hops along the way help to route the packets to their 
final destination, and this path may change from one packet to the 
other. Because of this structure, the data network cannot guarantee that 
the packets of a single session will traverse through the same path. 
VoIP therefore required some new innovations before it could really get 
off the ground.

To start a call, you need a VoIP signaling protocol. The term 
"signaling" comes from the circuit-switch telephone communication world. 
In this system, we have signals sent from one end to the other in order 
to communicate and allow us to talk over vast distances. The role of a 
signaling protocol is to define the way these messages are structured 
and the rules that let us start, configure, and end conversation. It is 
worth it to point out that signaling messages do not include the voice 
one hears (the media of the call). The signaling protocol may include 
the media streams information and their attributes, but the speech 
itself in a voice call is not a signaling message. If you're looking for 
a very high-level explanation, just think of signaling as the messages a 
device sends when you dial or hang up the phone.

So the race was on to create a new signaling protocol. Some of these 
protocol specifications were open for everyone to implement, and others 
were vendor-proprietary solutions. And that race still isn't quite over, 
as we're constantly seeing new proposals that attempt to convince 
everyone that there's a better way to do things. A VoIP signaling 
protocol must show how it integrates with the data network; this 
includes aspects such as defining a method of locating the communication 
devices, specifying server behavior, introducing new services, and 
security design.


SIP protocol design

SIP is an Internet Engineering Task Force (IETF) protocol and as such, 
it was designed to be an open Internet protocol. Its first release was 
in 1999, defined by RFC 2543, but its early drafts date back to 1996. It 
had some of its definitions revised later in 2002 by RFC 3261.

Let's look at a simple SIP request:

INVITE sip:[email protected] SIP/2.0
Via: SIP/2.0/UDP home.mynetwork.org;branch=z9hG4bK8uf35f
To: Jon Stokes <sip:[email protected]>
From: Gilad <sip:[email protected]>;tag=n23ycs
Call-ID: [email protected]
CSeq: 59164 INVITE
Contact: sip:[email protected]
Max-Forwards: 70

SIP is text-based. Notice the addresses are very similar to email 
addresses. Although SIP can support telephone numbers, the basic idea is 
that the addresses do not have to be phone numbers, just as you would 
not expect your email address to look like your home or work address. A 
SIP message might resemble the following (partial) example:

GET /reviews/ HTTP/1.1
Host: arstechnica.com
User-Agent: Gecko/Firefox/3.5.5

Thus, SIP is quite similar to HTTP. The first line is the request line, 
which contains information regarding the type of request (GET in HTTP 
and INVITE in SIP for these examples) and the intended address, while 
subsequent lines are headers with additional information. Naturally, 
responses in SIP also look very similar to HTTP responses. The idea is 
to use the structure of one of the most popular Internet protocols and 
make it easier for software developers and network managers to work with 
SIP.

These attempts to make SIP as easy as HTTP worked out to some extent, 
but the requirements of SIP addresses are more complex than HTTP, so the 
protocol is more complex. For example, it is a basic requirement in SIP 
to be able to have 2-way symmetric communication, whereas a typical HTTP 
scenario would be a client making requests to a server and the server 
sending a response. Even without prior HTTP knowledge, learning this 
message structure is a very easy task.

For those who are wondering, the SIP example above is the first packet 
one might send when calling from a SIP phone to Ars Technica's Deputy 
Editor, Jon Stokes. I will refrain from going into the technical details 
of the message contents at this time, as this is a subject for a 
separate article.


Reuse, and keeping it simple

Another important factor in SIP's design was the decision to reuse other 
existing Internet standards as much as possible. Address location uses 
DNS, user authentication uses HTTP digest authentication, setting the 
call media streams uses the Session Description Protocol (SDP), 
encryption uses TLS and, when applicable, users send each other XML 
information. This integration further helped establish SIP as part of 
the Internet protocol world, and vendors could reuse existing 
implementations in their SIP applications. On the other hand, in some 
cases the IETF had to make additional definitions in other protocols in 
order to serve SIP needs.

Keeping the complexity of the servers, especially the proxies, along the 
call path as minimal as possible is also an emphasis in SIP's design. 
SIP Proxies route the messages between the calling parties. The proxies 
defined in the standard are not aware of the call state, but rather 
operate on the transaction level and may also be stateless. This helps 
with scalability, because fewer devices can serve more calls. To do 
that, the protocol itself was separated to several distinct layers, a 
common practice programmers use to break down a complex system. This 
design helps to further simplify SIP and make it easier to implement. At 
times, keeping this minimal state forced some limitations (and later, 
some changes in the protocol), but these byproducts were kept to a minimum.

Finally, and perhaps most importantly, SIP was not built solely as a 
replacement for the telephone system. It allows extensions, and it 
relies on them to provide additional services beyond just simple calls. 
For example, you can use SIP to maintain user status information in an 
IM client as well as to set up IM sessions. Another extension enables 
transferring a call to a third party, something that was simply not 
defined by the basic SIP specification. This is possible thanks to the 
fact that SIP provides the necessary basic constructs while limiting 
those constructs only when necessary. SIP defines the concept of 
"dialog" which is a 2-way communication, but does not limit dialogs to 
calls. Two-way communication also includes setting your IM status and 
receiving your IM friends’ updates. Extensions can also easily define 
new request or response types and new headers when needed.

The outcome of these design decisions is the ability to answer the 
changing communication world's needs with an existing protocol and thus 
provide a faster offering of these services. Vendors who want to support 
multiple services and enable new services would need to follow up 
closely on the growing number of SIP-related RFC specifications (and 
sometimes the earlier drafts). This affected SIP interoperability. The 
fact that so many extensions exist may make it more difficult to deploy 
a SIP network with multiple vendor devices. SIP attempts to mitigate 
this problem by defining keywords in an extension. Thus, you can 
indicate the supported and required extensions by adding the 
corresponding keywords. Market forces also drive vendors to implement 
the most commonly required extensions.


A different approach

H.323 is an ITU protocol—or actually a suite of ITU protocols. It is 
common to find comparisons between SIP and H.323. Some of these 
comparisons aim to show the benefits of one protocol rather than the 
other, but our purpose here is to present a different design approach 
and outcome. We will take a quick look at H.323, but this in no way a 
complete review.

H.323 is also an open protocol and its first release dates prior to SIP, 
back in 1996. To the naked eye, H.323 looks like a binary protocol. It 
does have some binary elements, but for the most part, it's an encoding 
of ASN.1. ASN.1 encodes a structure or an object, so that one could 
easily take ASN.1 and retrieve a tree-like structure with all the data 
elements. H.323 uses PER encoding to reduce the packet length, which 
results in very efficient and small packets. Information sent in 
PER-encoded ASN.1 therefore requires less bandwidth than text, but many 
people find it easier simply to read a text rather than traverse a tree 
structure, especially when there are many elements. Furthermore, PER 
encoding/decoding is not that simple and often takes more programming 
effort than a text parser.

H.323 defines the details of its protocol more precisely than SIP. 
Vendors implementing H.323 can expect quicker interoperability with 
other vendors offering the same standard. When the ITU releases new 
versions of the protocol, they are always fully compatible with previous 
ones, so you don't need to worry whether or not the previous 
implementation will become deprecated. On the flip side, all 
implementations of the new versions must support scenarios defined in 
older versions, even if it has been well-established that those 
procedures have significant disadvantages. Until now, 7 versions of 
H.323 were defined and some H.323 procedures changed during this time, 
such as new flows to start a new call. So everyone must support previous 
methods even when they are less efficient and more difficult to code. As 
for SIP, only 2 versions were created. In fact, they are both marked as 
SIP/2.0, so they are not really considered different versions. In any 
case, in places where the definition changed, backwards compatibility 
was maintained. Today, some issues that are considered bugs in the 
protocol are being repaired by short-length RFCs and in some cases SIP 
RFCs deprecate previous requirements.

In some cases, H.323 had features earlier than SIP, such as a resource 
reservation or digit notification event that's generated when someone 
presses the keys on the telephone during a call. This forced SIP vendors 
to create their own solution to missing extensions, or to use an early 
draft of an RFC. H.323 that does support extensions by allowing 
vendor-specific information in some of its fields. So it's possible to 
extend H.323, but it's much harder. H.323 did formally introduce 
extension support in its 4th version, but not everyone was ready to move 
to this version quickly and SIP had this capability from the beginning. 
Most of H.323's extensions focused on call supplementary services, while 
SIP extensions offered much more than extensions to calls. H.323 reuses 
many ITU protocols, many of which came from ISDN, but it does not 
present the same layer separation and modularity as seen in SIP.

Both designs have merits as well as disadvantages. Eventually, H.323 was 
released first, something that significantly contributed to its adoption 
at the time. Today, however, new deployments usually go with SIP rather 
than H.323. So what made SIP more acceptable over time?


The keys to SIP's success

SIP's ability to add new services and extensions proved to be the 
primary factor of its success. When the focus turned to multimedia 
sessions with abilities other than mere calls, SIP's adaptability proved 
to be crucial. Simply put, it is very natural in SIP to work with 
extensions. SIP is also relatively programmer-friendly. It does not take 
that long to learn, and the time needed to create a basic application is 
relatively short. The IETF usually makes an effort to make sure its 
specifications are readable and that implementers can understand its 
contents. Open-source projects can therefore create SIP-based 
applications much faster, and commercial vendors usually find the 
ability to release a product with fewer resources very appealing—this 
means more products for customers to choose.

Finally, as time progressed, interoperability became less of an issue 
because vendors were able to test their SIP implementations. By the time 
most network operators had to choose a protocol, SIP interoperability 
levels were already very high. SIP does not just compete with H.323, but 
also with several proprietary protocols. Network operators deploying 
VoIP truly like the concept of having the ability to select from 
different products. Interoperable solutions also imply the ability to 
deploy with multiple vendors on the same network without any specific 
implementation dependency. This last factor gave a strong push in favor 
of an open protocol rather than one owned by a company.

One major example of SIP's success is the IP Multimedia Subsystem (IMS). 
IMS offers Internet services to cellular networks as well as merging the 
fixed and mobile worlds. SIP is a crucial building block in IMS' ability 
to do just that. One of the goals in IMS design was to have the ability 
to introduce multiple services, a great fit between this requirement and 
SIP design.

As with any protocol, SIP is not perfect. Over time, it has required 
some alterations to cope with new real-world network scenarios. Thus 
far, it has proven flexible and adaptive enough to handle these changes 
without breaking the protocol. That's a very good sign for the future of 
SIP.

Nonetheless, some do feel that using a newer protocol would be better. 
In fact, ITU started working on H.325 (One would expect its name would 
be H.324, but H.324 already exists and defines transmission of voice 
video and data on analog lines. It was later adapted to H.324M to enable 
video-over-cellular). H.325 is also often referred to as Advanced 
Multimedia System (AMS), and it strives to offer a new protocol designed 
to address the latest requirements from day one. However, H.325 is in 
the very early stages of development, and it has yet to prove itself 
among a variety of competing options. At present, it's safe to say that 
SIP will continue to dominate for now and into the foreseeable future. 
In fact, wireless operators are starting to shift to a next-generation, 
all-IP network known as Long Term Evolution, or LTE, and SIP already 
plays an important role in this upcoming network architecture.

-- 
================================
George Antunes, Political Science Dept
University of Houston; Houston, TX 77204 
Voice: 713-743-3923  Fax: 713-743-3927
Mail: antunes at uh dot edu

***********************************
* POST TO [email protected] *
***********************************

Medianews mailing list
[email protected]
http://lists.etskywarn.net/mailman/listinfo/medianews

Reply via email to