Hi,
There are 4 responses to our results. We will
answer them to the best of our ability.
MATT >This is a very very old version of
postgresql. Try it again with 7.1 for MATT
>more respectable results.
Accepted. We knew this when we conducted the
benchmarks.
We've had terrible experience with postgres.
Firstly on performance and more importantly on availablity.
Some of you should try pounding postgres with
upwards of 25 queries a second and see the results. The postgres server will
spew out error messages and shutdown. Last year we had a several nightouts
writing code to protect postgres from an overload of queries.
I've written several mails to postgres mailing
lists and even to mod_perl in desperation. Problem wasn't solved.
We'll try out 7.1. Maybe it is a major improvement
over 6.5.3. I find it difficult to believe that it will improve performance by
36 times !!!!
Here I have to add. We met one of Oracle support
people in India to know whether Oracle will be a good alternative. He was a nice
guy and told us that postgres is a thinner DB and should perform better
under most circumstances. People go in for Oracle more for features and perhaps
corporate support not for performance !!!!!!
BRUCE > It's more likely you are seeing hardware
bottlenecks with this configuration........
.......followed by a long list of options to
try.
2 replies
a) We've monitored the CPU and memory usage. Hardly
anything to write home about. If the CPU/Memory where anywhere near maxing out
then I agree..... of course except of course when we use postgres. Postgres is
not swapping, only hogging CPU. when postgres benchmarks are running we have
more than 70% of our RAM free.
Postgres almost always maxes out......... read next
point for further details
b) We have repeated these benchmarks on dual-cpu
pentium 3-700 with 1 GB RAM with almost identical relative results. Postgres
still performs poorly relative to others. In these benchmarks too postgres takes
100% of CPU while running the query !!!
CLAYTON > i
wanted a good benchmark for postgres and mysql
We haven't tried this one. We are doing a project
on mysql. Our preliminary assessment is, it's a shocker. They justify not having
commit and rollback!! Makes us think whether they are even lower end than
MS-Access.
PERRIN >You might get better performance by
using a combined key, hashing it, and PERRIN
>splitting into directories after the first 2 characters in the
key. This PERRIN >would mean 2
directories to traverse for each lookup, rather than 4. I PERRIN >believe the File::Cache module works this
way, so you could steal code from PERRIN
>there.
PERRIN >However, dbm
is a good choice for this. You may find SDBM_File faster than PERRIN >DB_File if your records are small enough for
it (I think the limit is 2K per PERRIN
>record).
These are good options to try. We will try them
(hope we have time) and postback results.
Regards,
S Muthu Ganesh & V Murali
----- Original Message -----
Sent: Tuesday, April 17, 2001 4:41
PM
Subject: Fast DB access
Hi,
A few months back we asked modperl mailing list
on alternate methods of DB access to postgres (with the same subject). We got
some decent alternatives. We are putting back some of the work we have done on
this issue.
We had a project to program an ad server. This is
not really an OLTP application, i.e., we had a few screens where some
data is captured. Based on this data we had to pick up an advertisement to
serve it.
Essence of the application is to have a highly
scaleable program to deliver ads... which means we wanted a method to be able
to pick ads given a criteria and choose one among them.
We had written a benchmark program, after which
we decided to go for MLDBM for our purposes.
Though this is not directly related to modperl,
we are taking the liberty of posting this message. We hope you find it
useful.
Specification and results of the benchmark is as
follows
Objective : To choose one of the alternate access
methods for an read-only DB program
Program logic :
Choose row from a table which has a composite key
containing 4 attributes.
The 4 attributes which we used are
publishers, size, type and ip number
Given values of these 4 attributes, we get a
list of advertisements for these attributes.
In the live application we will choose one these
ads based on a weighted random number.
For the purpose of benchmark we want to create a
hash or hash reference of the ads given these 4 criteria
Benchmark Data :
Our benchmark data consists of 100
publishers, 3 sizes, 4 types and 20 ip numbers which makes it a data
structure containing 24,000 combination of attributes. Each combination in
turn contains 10 advertisements
Benchmark alternatives :
We have populated this data into
a) A pure in memory multi-level hash : Before
starting the actual benchmark the program populates a multi-level hash... each
of which finally points to the advertisements. Objective is to pick the last
level hash of advertisements
b) Flat file : Create a Linux directory structure
with the same hierarchy as the attributes....i.e., directory structure has
<publishers>/<sizes>/<types>/<ip numbers>. ip numbers
is the file name which contains a list of ads. Objective is to pick the right
file, open this file and create a hash with the contents of the
file.
c) Postgres : Create a table with composite
primary key <publisher>,<sizes>,<types>,<ip
numbers>. Objective is to pick an hash reference of all rows given an
attribute combination
d) Storable : Store the multi-level hash into
disk file using Storable.pm. Objective : Read the storable file
into memory and pick last level hash of ads.
e) MLDBM : Populate an MLDBM data structure using
MLDBM. Identical to Storable except we are using MLDBM.pm
H/W : Celeron 433 with 64 MB RAM, IDE HDD using
RH 6.1, perl 5.005, Postgres 6.5.3
Benchmark run :
A benchmark run consists of accessing each of the
Benchmark alternatives 40 times. i.e., we generate a random combination of the
4 selection attributes 40 times, access a particular data structure and pick
up the ads.
We repeat this process twice to ensure that we
are getting consistent results
Benchmark results :
hash: 0 wallclock secs ( 0.00 usr +
0.00 sys = 0.00 CPU) flatfile: 5 wallclock secs ( 0.08 usr
+ 0.11 sys = 0.19 CPU) postgres: 36 wallclock secs ( 0.04 usr
+ 0.01 sys = 0.05 CPU) storable: 17 wallclock secs (16.24 usr
+ 0.61 sys = 16.85 CPU) mldbm: 0 wallclock secs ( 0.08 usr
+ 0.08 sys = 0.16 CPU)
Benchmark interpretation :
We did not want to write this section... but we
have interpreted these results and chosen mldbm.
the first option is not viable for
us... we want to carry forward the values between 2 runs of the program and
can't recreate the hash everytime in the live app.
We had experimented with postgres earlier with
disastrous results. In fact, postgres motivated us to seek
alternatives.
We had a choice between flatfile and MLDBM (from
the results).
It was a close one... but we found MLDBM far more
compact. Users were more comfortable maintaining a system with single file
rather than multitude of files on the file system.
We also suspect the if we add more rows, flatfile
won't scale.... though we've not tested it fully.
At the end .... we have chosen MLDBM. We have
developed the software called opticlik (www.opticlik.com). It's doing just great.
Serving around 1.2 million ads a day. In short bursts it serves upto 100 ads a
second.
We've attached our benchmark program and sql
along with this mail. If any of you have time to run through the program and
give us feedback it would be great. We also welcome and clarifications that
may be required.
Regards,
S Muthu Ganesh & V
Murali
Differentiated Software Solutions Pvt.
Ltd., 90, 3rd Cross,2nd Main, Ganga Nagar, Bangalore - 560
032 Phone : 91 80 3631445, 3431470 Visit us at www.diffsoft.com
|