--On Tuesday, April 17, 2007 10:19 PM -0700 Qin Zhang
<[EMAIL PROTECTED]> wrote:
> Probably my question is too late since you have already decide
use REST, but I want to
> know the rationale behind it.
>
> Since you are still returning data in xml format, what makes you
decide not to publish
> a collection of WSDL and go along with more industrial standard
web service calls?
Excellent question! No, it's not too late at all. This is exactly
the right time to be
discussing this kind of thing.
It turns out that when I started the Version 8 design process, I
was still thinking in
terms of a monolithic server and was heading down the SOAP/WSDL
route. I was, for
example, investigating Glassfish as an alternative to Tomcat due to
its purportedly
better support for web services.
Then the Version 8 design process took an unexpected turn, and the
monolithic server
fragmented into a set of communicating services: SensorBase
services for raw sensor data,
Analysis services that would request data from SensorBases and
provide higher level
abstractions, and UI services that would request data from
SensorBases and Analyses and
display it with a user interface.
What worried me about this design initially was that every Analysis
service would have to
be able to both produce and consume data (kind of like being a web
server and a web
browser at the same time), and that Glassfish might be overkill for
this situation. So, I
started looking for a lightweight Java-based framework for
producing/consuming web
services, and came upon the Restlet Framework
(http://www.restlet.org/), which then got
me thinking more deeply about REST.
It's hard to quickly sum up the differences between REST and WSDL,
but here's a few
thoughts to get you started. WSDL is basically based upon the
remote procedure call
architectural style, with HTTP used as a "tunnel". As a result,
you generally have a
single "endpoint", or URL, such as
<host>/soap/servlet/messagerouter, that is used for
all communication. Every single communication with the service,
whether it is to "get"
data from the service, "put" data to the service, or modify
existing data is always
implemented (from an HTTP perspective) in exactly the same way: an
HTTP POST to a single
URL. From the perspective of HTTP, the "meaning" of the request is
completely opaque.
In REST, in contrast, you design your system so that your URLs
actually "mean" something:
they name a "resource". Furthermore, the type of HTTP method also
"means" something: GET
means "get" a representation of the resource named by the URL,
"POST" means create a new
resource which will have a unique URL as its name, DELETE means
"delete" the resource
named by the URL, and so forth.
For example, in Hackystat Version 7, to send sensor data to the
server, we use Axis,
SOAP, and WSDL to send an HTTP POST to
http://hackystat.ics.hawaii.edu/hackystat/soap/rpcrouter, and the
content of the message
indicates that we want to create some sensor data. All sensor data,
of all types, for all
users, is sent to the same URL in the same way. If we wanted to
enable programmatic
access to sensor data in Version 7, we would tell clients to
continue to use HTTP POST to
http://hackystat.ics.hawaii.edu/hackystat/soap/rpcrouter, but tell
them that the content
of the POST could now invoke a method in the server to obtain data.
A RESTful interface does it differently: to request data, you use
GET with an URL that
identifies the data you want. To put data, you use POST with an
URL that identifies the
resource you are creating on the server. For example:
GET
http://hackystat.ics.hawaii.edu/hackystat/sensordata/x3fhU784vcEW/Commit
/1176759070170
might return the Commit sensor data with timestamp 1176759070170
for user x3fhU784vcEW.
Similarly,
POST
http://hackystat.ics.hawaii.edu/hackystat/sensordata/x3fhU784vcEW/Commit
/1176759070170
would contain a payload with the actual Commit data contents that
should be created on
the server. And
DELETE
http://hackystat.ics.hawaii.edu/hackystat/sensordata/x3fhU784vcEW/Commit
/1176759070170
would delete that resource. (There are authentication issues, of
course.)
In fact, REST asserts a direct correspondance between the CRUD
(create/read/update/delete) DB operations and the POST, GET, PUT,
and DELETE methods for
resources named by URLs.
Now, why do we care? What's so good about REST anyway? In the case
of Hackystat, I think
there are two really significant advantages of a RESTfully designed
system over an
RPC/SOAP/WSDL designed system:
(1) Caching can be done by the Internet. If you obey a few more
principles when designing
your system, then you can use HTTP techniques as a way to cache
data rather than build in
your own caching system. It's exactly the same way that your
browser avoids going back
to Amazon to get the logo files and so forth when you move between
pages. In the case of
Hackystat, when someone invokes a GET on the SensorBase with a
specific URL, the results
can be transparently cached to speed up future GETs of the same
URL, since that
represents the same resource. (There are cache expiration issues,
which I'm pretty sure
we can deal with.)
In Hackystat Version 7, there is a huge amount of code that is
devoted to caching, and
this code is also a huge source of bugs and concurrency issues.
With a REST
architecture, it is possible that most, perhaps all, of this code
can be completely
eliminated without a performance hit. Indeed, performance might
actually be significantly
better in Version 8.
(2) A REST API is substantially more "accessible" than a WSDL API.
One thing I want from
Hackystat Version 8 is a substantially simpler, more accessible
interface, that enables
outsiders to quickly learn how to extend Hackystat for their own
purposes with new
services and/or extract low-level or high-level data from Hackystat
for their own
analyses. To do this with a RESTful API, it's straightforward:
here are some URLs,
here's how they translate into resources, invoke GET and you are on
your way. Pretty
much every programming language has library support for invoking an
HTTP GET with an URL.
One could expect a first semester programming student to be able to
write a program to do
that. Shoots, you can do it in a browser. The "barrier to entry"
for this kind of API
is really, really low.
Now consider a WSDL API. All of a sudden, you need to learn about
SOAP, and you need to
find out how to do Web Services in your chosen programming
language, and you have to
study the remote procedure calls that are available, and so forth.
The "barrier to
entry" is suddenly much higher: there are incompatible versions of
SOAP, there's way more
to learn, and I bet more than a few people will quickly decide to
just bail and request
direct access to the database, which cuts them out of 90% of the
cool stuff in Hackystat.
So, from my reckoning, if we decided to use Axis/SOAP/WSDL in
Version 8, we'd (1)
continue to need to do all our own caching with all of the
headaches that entails, and
(2) we'd be stuck with a relatively complex interface to the data.
I want to emphasize that a RESTful architecture is more subtle than
simply using GET,
POST, PUT, and DELETE. For example, the following is probably not
restful:
GET http://foo/bar/baz&action=delete
For more details,
<http://en.wikipedia.org/wiki/Representational_State_Transfer> has
a
good intro with pointers to other readings.
Your email made another interesting assertion:
> what makes you decide not to publish
> a collection of WSDL and go along with more industrial standard
web service calls?
Although I agree that WSDL is an "industry standard", this doesn't
mean that REST isn't
one as well. Indeed, my sense after a few weeks of research on the
topic is that most
significant industrial players have already moved to REST or offer
REST as an alternative
to WSDL: eBay, Google, Yahoo, Flickr, and Amazon all have REST-
based services. I recall
reading that the REST API gets far more traffic than the
correponding WSDL API for at
least some of these services.
Finally, no architecture is a silver bullet, and REST is no
exception. For example, if
you can't effectively model your domain as a set of resources, or
if the CRUD operations
aren't a good fit with the kinds of manipulations you want to do,
then REST isn't right.
Another REST requirement is statelessness, which can be a problem
for some applications.
So far in my design process, however, I haven't run into any
showstoppers for the case of
Hackystat.
Version 8 is still in the early stages, and the advantages of REST
are still
hypothetical, so I'm really happy to have this conversation. There
are no hard
commitments to anything yet, and if there turns out to be a
showstopping problem with
REST, then we can of course make a change. The more we talk about
it, the greater the
odds we'll figure out the right thing.
Cheers,
Philip