How do you mean "included"? Actual code that is run? One of the things
where AOLserver completely blows PHP out of the water is that PHP
has to
re-interpret everything on every page. So if you include a library and
only use one function from it... AOLserver's library Tcl code is just
there at no extra cost, other than at thread creation.
Exactly : "package require" hugely helps performance, but more
generally, web pages slowly sprout features on each page, such as
"people who also bought..." which add overhead to every page. In my
environment, I have a debug hook that does "package forget" on every
page, to make debugging easy, and turn that off when in production.
platform, at about 1/2 the speed of serving 1k GIFs, but about 10x
faster than PHP, 3x faster than lighthttpd-fastcgi. So, AOLserver is
How could that be, PHP is better than anything, its popular for a
reason! ;-)
That's good info, though. I still plan to do my own benchmarks,
with some
simulated application so the database and it's drivers get benched
too.
Not sure if I will ever get around to it, though.
I did that too, and got around 40pgs/second on my macmini against a
single-sql query page using mysql, and 600 pgs/second with a 4 query
bdb page. The "real" (ie, in production) full text search page,
against a 100,000 item corpus, runs around 120pgs/second on my
macmini (which is an incredibly underpowered machine)
a good platform as far as scripting scalability is concerned, as long
as the developer takes care not to load too much dependent code per
As I said becfore I am not clear about what you mean by this, can you
elaborate?
Really, all I mean is per-page feature creep, esp features that
require a db query for each page load. Most web sites slowly add these.
end of the table, which is a common scenario. In my experience, many
applications that use SQL actually only need key-lookup capability
I guess the answer here is "it depends"; rendering news.bbc.co.uk
could be
done from BDB. But people end up wanting to do queries like "how much
money does the average customer spend per month" that only an RDBMS
can
provide easily. And the "R" in RDBMS is quite important too. I
think the
best use for BerkeleyDB is as a cache; save to the RDBMS, then
export to
Berkeley. Like saving all the fields in a blog entry, then
rendering the
page and stuffing it in a BerkeleyDB for viewing.
For the occasional fancy queries, in my case (www.bookmooch.com)
things like "what are the most popular book topics" I simply do a
table scan and cache the result (I need to look at AOLServer's time-
expiring cache mechanism, I thought I saw something that did that).
I'm not really a mysql table scan + tcl function for average is any
slower than SQL, because that's exactly what the SQL engine is doing
when you ask "average of field X for whole table" and compiled Tcl is
awfully fast, and all a SQL query optimizer really is, is a p-code
compiler (just like tcl). On my PHP-backed site www.magnatune.com,
the "average stats" pages are run nightly and cached to HTML, because
against mysql they take about 30s to run.
Now, granted, there is more code to write to do an average on a table
scan with Tcl+bdb, than doing it with SQL, but the code is drop-dead
simple to write and not in the critical path, that I don't worry
about it.
that it's tricky to get right. Google uses Berkeley DB for their
universal login, for just this reason.
It's extremely usefull for that. Though many sites don't quite need to
scale to Google-esque proportions.
No, but on the other hand, most interactive web sites have response
times over 3 seconds. Amazon, for instance, now is frequently >10
seconds for a book page load. Yahoo, with it's page times
consistently under 1 second, is a pleasure to use.
Not to put too fine a point on it, but if you run a web-based
business and may want to sell your company, one of the first
questions you will hear is "how do you scale". Engineering for
scalability way beyond your current needs really will help sell your
business (I've sold two web-based businesses, and have been through
discussion this quite a few times)
Thirdly, you'll notice many sites have very poor full text searching
performance. Lucene, a recently popularized full text search engine,
appears to finally solve this problem. However, in my case I wanted
But Lucene is probably hard to integrate unless you use Java, isn't
it? I
had a play with http://xapian.org/ two years ago. Seems very good,
it'd be
nice to have an AOLserver module for it. Some day, when I have time...
There's a C client-side module for Lucene, which I assume could be
fairly easily integrated into a Tcl module. However, I believe the
server-side is Java, which many are biased against.
I know that going with Berkeley DB is controversial, but in my
opinion it's extremely difficult to scale up a SQL backed application
Like I said, I would use DBD as a cache and store the "real data" in a
nice relational schema. But maybe that's just my apprehension with a
technology I haven't used beyond some experiments...
db platform and do extensive up-front design work to that effect,
which few people do.
And those are the magic words. Most people don't, they just care about
functionality; "it works, I am done." I am just finsihing up on a
service
(http://www.sativo.co.uk) where search is very important. I know
that even
with high concurrency it will do 20 searches/sec on the current
hardware
with 100K subscribers. I also designed in seperate pools for
reading and
writing, so if I do need to scale to multiple DBs, I can use a simple
single master, multiple slaves setup with every web server reading
from
their own DB and writing to the master. Due to the low number of
writes on
this service, that will scale very well.
Yes, generally if you plan on a single-writer architecture, that can
work well. I've frequently seen user-session-state saved in a backend
SQL database, with a key passed on the URL, so that every page
requires a SQL query, and every state changes requires a SQL write.
That sort of architecture is dreadfully hard to scale up, since you
have a many-writers/many-readers scenario. Typically, people use
fancy load-balancing machines that send the same user to the same
machine every time, which is more complicated that I like.
Just my opinion... all I can say is that AOLServer+berkeley-db, if
you can live with a key-lookup database only, is incredibly fast, at
No surprise there really, AOLserver is the fastest server and BDB is
pretty much the fastest thing out there for key/value lookups!
DJ Bernstein (of qmail fame) wrote a read-only database library
that's insanely fast, about 3x faster than bdb in my own tests, which
could be useful in certain scenarios.
-john
--
AOLserver - http://www.aolserver.com/
To Remove yourself from this list, simply send an email to <[EMAIL PROTECTED]>
with the
body of "SIGNOFF AOLSERVER" in the email message. You can leave the Subject:
field of your email blank.