Re: Fast DB access

Differentiated Software Solutions Pvt. Ltd., Tue, 17 Apr 2001 23:00:50 -0700

Hi,

There are 4 responses to our results. We will answer them to the best of our ability.

MATT >This is a very very old version of postgresql. Try it again with 7.1 for
MATT >more respectable results.

Accepted. We knew this when we conducted the benchmarks.

We've had terrible experience with postgres. Firstly on performance and more importantly on availablity.

Some of you should try pounding postgres with upwards of 25 queries a second and see the results. The postgres server will spew out error messages and shutdown. Last year we had a several nightouts writing code to protect postgres from an overload of queries.

I've written several mails to postgres mailing lists and even to mod_perl in desperation. Problem wasn't solved.

We'll try out 7.1. Maybe it is a major improvement over 6.5.3. I find it difficult to believe that it will improve performance by 36 times !!!!

Here I have to add. We met one of Oracle support people in India to know whether Oracle will be a good alternative. He was a nice guy and told us that postgres is a thinner DB and should perform better under most circumstances. People go in for Oracle more for features and perhaps corporate support not for performance !!!!!!

BRUCE > It's more likely you are seeing hardware bottlenecks with this configuration........

.......followed by a long list of options to try.

2 replies

a) We've monitored the CPU and memory usage. Hardly anything to write home about. If the CPU/Memory where anywhere near maxing out then I agree..... of course except of course when we use postgres. Postgres is not swapping, only hogging CPU. when postgres benchmarks are running we have more than 70% of our RAM free.

Postgres almost always maxes out......... read next point for further details

b) We have repeated these benchmarks on dual-cpu pentium 3-700 with 1 GB RAM with almost identical relative results. Postgres still performs poorly relative to others. In these benchmarks too postgres takes 100% of CPU while running the query !!!

CLAYTON > i wanted a good benchmark for postgres and mysql

We haven't tried this one. We are doing a project on mysql. Our preliminary assessment is, it's a shocker. They justify not having commit and rollback!! Makes us think whether they are even lower end than MS-Access.

PERRIN >You might get better performance by using a combined key, hashing it, and
PERRIN >splitting into directories after the first 2 characters in the key. This
PERRIN >would mean 2 directories to traverse for each lookup, rather than 4. I
PERRIN >believe the File::Cache module works this way, so you could steal code from
PERRIN >there.

PERRIN >However, dbm is a good choice for this. You may find SDBM_File faster than
PERRIN >DB_File if your records are small enough for it (I think the limit is 2K per
PERRIN >record).

These are good options to try. We will try them (hope we have time) and postback results.

Regards,

S Muthu Ganesh & V Murali

----- Original Message -----

From: Differentiated Software Solutions Pvt. Ltd.,

To: [EMAIL PROTECTED]

Sent: Tuesday, April 17, 2001 4:41 PM

Subject: Fast DB access

Hi,

A few months back we asked modperl mailing list on alternate methods of DB access to postgres (with the same subject). We got some decent alternatives. We are putting back some of the work we have done on this issue.

We had a project to program an ad server. This is not really an OLTP application, i.e., we had a few screens where some data is captured. Based on this data we had to pick up an advertisement to serve it.

Essence of the application is to have a highly scaleable program to deliver ads... which means we wanted a method to be able to pick ads given a criteria and choose one among them.

We had written a benchmark program, after which we decided to go for MLDBM for our purposes.

Though this is not directly related to modperl, we are taking the liberty of posting this message. We hope you find it useful.

Specification and results of the benchmark is as follows

Objective : To choose one of the alternate access methods for an read-only DB program

Program logic :

Choose row from a table which has a composite key containing 4 attributes.

The 4 attributes which we used are publishers, size, type and ip number

Given values of these 4 attributes, we get a list of advertisements for these attributes.

In the live application we will choose one these ads based on a weighted random number.

For the purpose of benchmark we want to create a hash or hash reference of the ads given these 4 criteria

Benchmark Data :

Our benchmark data consists of 100 publishers, 3 sizes, 4 types and 20 ip numbers which makes it a data structure containing 24,000 combination of attributes. Each combination in turn contains 10 advertisements

Benchmark alternatives :

We have populated this data into

a) A pure in memory multi-level hash : Before starting the actual benchmark the program populates a multi-level hash... each of which finally points to the advertisements. Objective is to pick the last level hash of advertisements

b) Flat file : Create a Linux directory structure with the same hierarchy as the attributes....i.e., directory structure has <publishers>/<sizes>/<types>/<ip numbers>. ip numbers is the file name which contains a list of ads. Objective is to pick the right file, open this file and create a hash with the contents of the file.

c) Postgres : Create a table with composite primary key <publisher>,<sizes>,<types>,<ip numbers>. Objective is to pick an hash reference of all rows given an attribute combination

d) Storable : Store the multi-level hash into disk file using Storable.pm. Objective : Read the storable file into memory and pick last level hash of ads.

e) MLDBM : Populate an MLDBM data structure using MLDBM. Identical to Storable except we are using MLDBM.pm

H/W : Celeron 433 with 64 MB RAM, IDE HDD using RH 6.1, perl 5.005, Postgres 6.5.3

Benchmark run :

A benchmark run consists of accessing each of the Benchmark alternatives 40 times. i.e., we generate a random combination of the 4 selection attributes 40 times, access a particular data structure and pick up the ads.

We repeat this process twice to ensure that we are getting consistent results

Benchmark results :

hash: 0 wallclock secs ( 0.00 usr + 0.00 sys = 0.00 CPU)
flatfile: 5 wallclock secs ( 0.08 usr + 0.11 sys = 0.19 CPU)
postgres: 36 wallclock secs ( 0.04 usr + 0.01 sys = 0.05 CPU)
storable: 17 wallclock secs (16.24 usr + 0.61 sys = 16.85 CPU)
mldbm: 0 wallclock secs ( 0.08 usr + 0.08 sys = 0.16 CPU)

Benchmark interpretation :

We did not want to write this section... but we have interpreted these results and chosen mldbm.

the first option is not viable for us... we want to carry forward the values between 2 runs of the program and can't recreate the hash everytime in the live app.

We had experimented with postgres earlier with disastrous results. In fact, postgres motivated us to seek alternatives.

We had a choice between flatfile and MLDBM (from the results).

It was a close one... but we found MLDBM far more compact. Users were more comfortable maintaining a system with single file rather than multitude of files on the file system.

We also suspect the if we add more rows, flatfile won't scale.... though we've not tested it fully.

At the end .... we have chosen MLDBM. We have developed the software called opticlik (www.opticlik.com). It's doing just great. Serving around 1.2 million ads a day. In short bursts it serves upto 100 ads a second.

We've attached our benchmark program and sql along with this mail. If any of you have time to run through the program and give us feedback it would be great. We also welcome and clarifications that may be required.

Regards,

S Muthu Ganesh & V Murali

Differentiated Software Solutions Pvt. Ltd.,
90, 3rd Cross,2nd Main,
Ganga Nagar,
Bangalore - 560 032
Phone : 91 80 3631445, 3431470
Visit us at www.diffsoft.com

Re: Fast DB access

Reply via email to