[ADMIN] Re: HSA (Highly Scalable Architecture) Distribution and replication

Chris Albertson Fri, 29 Oct 1999 12:42:32 -0700
You are doing both a "join" and a "union".  The join is the part
where you go "a.oid = b.id".

It sounds to me like the "throw hardware at the problem" solution
is not even close to exhausted yet.  You are still using petty low
cost equipment.  Run the "top" program in an xterm on the database
server when the system is loaded.  What do you see?  Is the CPU at
100%?  How much RAM is in use?  How much RAM is used for cache and
buffer?  Is any swap space in use?  How many Postgres backed
servers are typically running at once?

I don't think adding many $1k boxes is what your customers want.
Why do they care how many requests per hour you can handle if
each single request still take 45 seconds?  No, what your customers
want is for _thier_ request to have zero delay.  For that you want
one big fast computer. 

First off RAM helps a LOT.   192MB RAM is not much.  Look at "top"
and keep adding it until the cache and buffer sizes are quite large.
I would think 512MB or 1GB of RAM is what you need.  The idea is
to let the OS (Linux) cache a good partion of your database in 
RAM.  Postgres has a buffer pool which you should set to some
huge size but OS level caching really helps speed things up.
My rule is to keep adding RAM until some of it is not used.

Multiple CPUs IMO _will_ help.  Check again with "top".  If you
have enough RAM and fast enough disks then CPU will be your
bottleneck.  You can see that the CPU is in fact the  bottleneck
if 'top" shows a CPU utilization up above 85% to 100%.  Adding 
a second (or 3rd, 4th or 10th) CPU will help if utilization is
close to 100%.  Just keep adding them until you have idle CPU time.
Same as above.  You want free RAM and idle CPU time

One reason I think a multi-CPU machine is better is that you
only need to buy that 512 or 1024MB of RAM once.  The RAM is for
caching so All four CPUs can share that big RAM cache.  Put each
CPU in it's own box and they each get only a small cache or you
pay 4x for four big RAMs.  It gets worse as you have to add
overhead to keep all those databases in sync.

You may have to "bite the bullet". and realize you have out grown
PC hardware.  Yes you could go with a four-way SMP Xeon and RAID
but for that kind of money (or less) you could also get a used
Sun E450.

I run Postgres on both a Pentium with Linux and on a Sun SPARC
with Solaris 7.  I am pretty impressed with Sun hardware.  One
thing about Sun.  You will never outgrow it.  the 450 could likely
run both your web and database servers.   The cost is not all
that much in the grand scheme of things.  the E450 is about the
same price as just one car.  Sun will run Linux, NetBSD and Solaris.
As much as I am a Linux fan I think for this application (database
server) Solaris is the way to go. 

So do your upgrade this weekend but then in two months, if you are
lucky, you will need to upgrade again. Maybe then think about a
non-PC that will have a three to five year life.   A good rule of
thumb is that when you install a new server it should run a 50%
capacity.  If it is at 100% on day one you have just wasted your
time and money as you are just going to have to upgrade again.   

PostgreSQL Server wrote:
> 
> I'm glad someone asked this, because I'd like to try and optimize my
> database or the way I'm using it, and I've reached my personal limit on
> optimization at this point.
> 
> 1) I'm using two machines so far for my e-comerce solution. The website is
> usedcars.com, in case you want to go and check the performance, which
> probably sucks right now. :-) One machine is running the web server and java
> servlets, the other postgres lastest from CVS. Server for psql is RH Linux
> 6, on PII 333, 192MB RAM, 104MB Swap. The HD psql is running on is 4GB
> UW SCSI. No RAID, just daily FTP and Tape backups. I'm getting ready to
> upgrade the MB to the latest PIII whatever this weekend. I'm thinking a
> dual-CPU box would be a good idea, however, I'm not sure how much good a
> dual-CPU will really do. I feel that in another 2 months, I'll be in the
> same boat again. Whereas if I come up with a distributed solution, I can
> just keep adding those $1000 Linux boxes. :-)
> 
> 2) One table gets 99% of the queries, it has about 300K rows, by watching
> the log, it seems to get about 120 queries per minute on average, peaking
> at 480 queries per minute during peak hours. Not sure what a join is, do
> you mean union? If so, then yes, almost every query to this table is of
> the form
> select a.*, b.1, b.2 where a.oid = b.id and a.int4 < x and a.str =
> 'String' union select a.*, c.1, c.2 where a.oid = c.id and a.int4 < x and
> a.str = 'String' limit 100;
> I defined indexes on all the fields which appear in a WHERE clause, using
> either hash or btree, as recommended by the docs. The only operator I
> couldn't find was for a money type.
> I don't use order by, although I'd like to, because the docs said that
> indexes are not used on order by queries. I use limit also.
> 
> I think writing a Distribution/Replication engine using Java servlets
> should be a medium size task. Making it fault-tolerant and dealing with
> crash and error recovery issues could make it larger. If I end up writing
> it from scratch, I'll post the source for others to use.
> 
> Thanks,
> Rich
> 
> On Fri, 29 Oct 1999, Chris Albertson wrote:
> 
> > >
> > > Date: Thu, 28 Oct 1999 01:11:32 -0700
> > > From: Rich Ryan <[EMAIL PROTECTED]>
> > > Subject: HSA (Highly Scalable Architecture) Distribution and replication
> > >
> > > Has anyone written a HSA e-commerce solution using postgres? In english,
> > > this means I want more than one server running postgres with the same
> > > database(s). I get lots and lots of queries, and it's just too much for one
> > > poor machine to handle.
> >
> >
> > Is this really true?  "too much for one poor machine to handle"?
> >
> > Could you be more specific?
> >   1) What kind of hardware are you using.  What CPU(s), how fast?
> >      How much RAM?, What kind of disk(s) (RAID?) and so on.
> >
> >   2) How big is your task?  How many rows and tables.  How many
> >      queries per minute?  Do you do many big joins?
> >
> > I think Postgres scales well if you throw hardware at at.  I wonder
> > if you've tried that yet.
> >
> > All that said, I'd be very much interrested in Distribution and
> > replication for another reason.  I am working with a large database.
> > (on order of a hundred million rows)  Our application is distributed
> > over the (much to slow) Internet.  The ability to periodically
> > synchronize servers would work for us.
> >
> >
> > PS Please cc any reply to me directly as a get the digest.
> > --
> >   Chris Albertson
> >
> >   [EMAIL PROTECTED]                  Voice: 626-351-0089  X127
> >   Logicon, Pasadena California            Fax:   626-351-0699
> >

-- 
  Chris Albertson

  [EMAIL PROTECTED]                  Voice: 626-351-0089  X127
  Logicon, Pasadena California            Fax:   626-351-0699

************
[ADMIN] Re: HSA (Highly Scalable Architecture) Distribution and replication

Reply via email to