Hi,

Cees has found a bug in our benchmark. We were using rtrim in our select
statement while doing the benchmark and this was forcing postgres to perform
a table scan.

We've corrected the code for this bug. We are reposting results without
rtrim (creating the tables with varchar).
In fact with postgres even ignores the trailing blanks with 'char' datatype.

When we ran these benchmarks at 40 seconds, we could not make out any
difference in the results.
Instead we increased the number of selects to 2000. Here are the new results
between only postgres(6.5.3) and mldbm
postgres
18 wallclock secs ( 1.88 usr +  0.18 sys =  2.06 CPU)
mldbm
 3 wallclock secs ( 1.77 usr +  0.21 sys =  1.98 CPU)

Results still compare favourably towards MLDBM.

Summary of our learning from the benchmarks and these discussions

a) We have to use PG 7.1. It is a major improvement over 6.5.3
b) When we need a completely read-only high-performance data structure,
MLDBM is a good option against postgres. This is provided we are able to
cast our database in MLDBM style datastructures.
c) Generic benchmarks may not be useful for most applications.... we need to
device our own benchmark which represents critical processing requirements
and try out various options.

Thanks to all of you who have contributed to this thread.

Regards,

V Murali & S Muthu Ganesh
Differentiated Software Solutions Pvt. Ltd.,
90, 3rd Cross,2nd Main,
Ganga Nagar,
Bangalore - 560 032
Phone : 91 80 3631445, 3431470
Visit us at www.diffsoft.com

----- Original Message -----
From: Cees Hek <[EMAIL PROTECTED]>
To: Murali V <[EMAIL PROTECTED]>
Cc: <[EMAIL PROTECTED]>
Sent: Friday, April 20, 2001 1:45 AM
Subject: [OT] Re: Fast DB access


>
> On Thu, 19 Apr 2001, Murali V wrote:
>
> > Hi,
> >
> > If you read the code more deeply, you'll find that the timeit is only
> > wrapped around select and not around insert.
> > We've written the insert code so that in the first round you can
populate
> > the database.
> > You comment out the insert code after the first round and run the
benchmark
> > several times. This would only do select and time select.
> >
>
> Hi Murali,
>
> OK, to start off, I was not specifically aiming my rant at you, I was
> replying to someone who had modified your code and was now comparing MySQL
> and PostgreSQL, and he was implying that the timings were for inserts and
> selects.  I took this at face value, and didn't check the code close
> enough which I really should have done in the first place.
>
> > Connecting this error to an axiom that "Benchmarks are useless" is bad
> > indeed. Shouldn't we be ironing out errors and runing benchmarks which
are
> > good.
>
> Perhaps I should have said published benchmarks.  In your case, you are
> using benchmarks for exactly what they are intended for...  Creating a
> system that closely resembles your application and putting it through it's
> paces.  What I find dangerous about publishing benchmarks, is that they
> are almost always heavily swayed to a specific application, and most of
> the time they show what the user wants them to show.
>
> In your original message, you clain to have a bias against Postgres, and
> your benchmark shows that bias.  I however am a happy user of postgres,
> and am therefore biased towards it.  I modified your benchmark script
> slightly, and I got the following results (I have include a diff of my
> changes at the bottom):
>
> postgres
>  0 wallclock secs ( 0.02 usr + 0.01 sys = 0.03 CPU)
> postgres
>  0 wallclock secs ( 0.02 usr +  0.00 sys =  0.02 CPU)
>
> Whereas if I run it with your version I get the following:
>
> postgres
> 27 wallclock secs ( 0.00 usr + 0.00 sys = 0.00 CPU)
> postgres
> 27 wallclock secs ( 0.02 usr +  0.00 sys =  0.02 CPU)
>
> So what does that tell you about the benchmark?  that the postgres part of
> this benchmark is useless...  It may have given you the answer that you
> wanted, but it is misleading to anyone else out there.
>
> This is why there are always flame wars about benchmarking databases (by
> the way I think this whole thread has been very civilized and i hope is
> stays that way).  Invariably the benchmark has missed some critical idea
> or optimization which drastically skew the results.
>
> > Your recommendation is to pick a DB best suited to your app. But How ??
> > a) Either by hiring a guru who has seen all kinds of apps with different
DBs
> > who can give you the answer with which we can run
> > b) Run a benchmark on critical programs which represent you app across
> > databases and find what performs best.
> > I've read too much literature on DB features. All DBs have all features
> > (except MySQL which does not have commit !!!!)
> > You can't make a thing out of DB literature.
>
> What I would recommend is exactly what you have done in this case.  Get
> access to any and all the systems that you feel may do the job for you ,
> and try them out.  Browse the web for other users experiences, but don't
> use other peoples benchmarks, because the odds are good that they are
> wrong...  Create your own, or modify an existing one, and scrutinize
> exactly what it is doing.  And if you want to share your results with
> anyone else, tell them what you choose in the end, and why.  Tell them you
> choose database x because it did this and this for you.  Don't say
> database y is a piece of crap, so we went with database x.
>
> But whatever you do, don't choose your database based on other peoples
> benchmarks........ (that is all I'm trying to say, and I guess I didn't
> say it clearly enough)
>
> When I first read your message, I tucked it away somewhere, so I could
> reference it again in the future, because I was interested in the MLDBM
> work that you had done, and I thank you for that.  But it also made me
> think that maybe I shouldn't be using Postgres, because your results were
> so poor (only for a second or too though :).  But I'll bet that a lot of
> people who have never used postgres before are now less likely to download
> it and try it out for themself, because a benchmark swayed them away from
> it.  That sounds like a good closer, so I'll stop it there :-)
>
>
> Cees
>
>
> Here is the diff of my changes and a quick comment on why your way kills
> the performance of postgres
>
> ***************
> *** 124,131 ****
>                 $i_ip = int(rand(20));
>
>                 @row_ary = $dbh->selectrow_array("select crr from
benchmark where
> !                               rtrim(pub) = 'pub$i_pub' and rtrim(size) =
'size$i_size' and
> !                               rtrim(type) = 'type$i_type' and rtrim(ip)
= 'ip$i_ip'");
>         }
>   };
>
> --- 124,131 ----
>                 $i_ip = int(rand(20));
>
>                 @row_ary = $dbh->selectrow_array("select crr from
benchmark where
> !                               pub = 'pub$i_pub' and size = 'size$i_size'
and
> !                               type = 'type$i_type' and ip = 'ip$i_ip'");
>         }
>   };
>
> All I did was remove the 'rtrim' called from the query.  I'm assuming you
> did this because Postgres pads all char fields with spaces (varchar
> doesn't do this).  What your query ends up doing, is running rtrim on
> every field in the column and then comparing it against your provided
> value (indeces are not used since you are changing the value of the
> primary key).  So since your table has 24000 rows and you call rtrim on 4
> columns in the select, it has a worst case of 24,000 x 4 calls to rtrim
> (of course postgres will short circuit the conditions if the first call
> fails), but in the best case it still has to do 24,000 calls to rtrim.
> I'm amazed that postgres can actually do this this fast :).
>
> In Postgres 7.0 you don't have to worry about the space padding issue, so
> I was able to remove the rtrims from the query.  However since you are
> using 6.5.x, what you should have done, is either make your columns
> varchar instead of char, or space pad your values in the query, so that
> you are checking pub = 'pub1 ' instead of rtrim(pub) = 'pub1'.  i
>
> However, since you were looking at solutions for a new application, you
> have no excuse for using a database that is several years old, when
> several new stable revisions are available (for free)...
>
> Give it a try and see what happens...
>
>
>
>

Reply via email to