Re: [GENERAL] how good is PostgreSQL

2000-10-31 Thread Bryan White

>  Whenever a query is executed (not found in cache, etc.), the caching
> system would simply store the query, the results, and a list of tables
> queried.   When a new query came in, it would do a quick lookup in the
query
> hash to see if it already had the results.  If so, whammo. Whenever an
> insert/delete/update was sensed, it would look at the tables being
affected,
> and the caching mechanism would clear out the entries depending on those
> tables.

It seems to me that tracking the list of cached queries and watching for
queries that might invalidate them adds a lot of complexity to the back end
and the front end still has to establish the connection and wait transfer
the data over the socket.

On a more practical level, a backend solution would require someone with
fairly detailed knowlege of the internals of the backend.  A front end
solution can more likely to be implemented by someone not as knowlegable.

One of the big advantages of your technique is there is no code change at
the application level.  This means less database lock-in.  Maybe that is a
disadvantage too. ;-)





Re: [GENERAL] how good is PostgreSQL

2000-10-31 Thread KuroiNeko

>  As programmers, we naturally want to throw things into databases for
> three reasons. First, it's easy to get  data in. Second, it's easy to get
> relevant data  out. And third,  it's "cool". We  don't want to  work with
> flat files, now do we? ; )

 Kiddin', eh? :)  Actually, the third reason seems to  dominate the younger
developers' minds. People often tend to  keep everything in poor DBMS until
it begins  to kick back.  And this has impact  on the customers.  Does your
system use a database?  No, why should it? You mean  you'll keep our dearly
beloved banner  ads as flat files?  Yes, this is where  they belong. Sorry,
we'll seek for someone more advanced. Good luck.
 Of course, hardware  vendors jump up of  joy :) Maybe I don't  get it, but
IMHO there's no reason  to put into DB something that  can't be indexed and
used in where clause.

> It would *not* allow the
> database to  do joins, it  would grab entire tables,  then try to  do the
> joins
> itself, in Perl.

 Umh Yeah  Well To  keep compatibility  with other  Open Source
Databases and ESR/RMS, you know :)

>(Incidentally,  we've toyed  around with  developping a  query-caching
> system that would sit betwen PostgreSQL and our DB libraries.

 Sounds  amazing, but  requires some  research, I  guess. However,  in many
cases one  would be  more than  happy with  cahced connections.  Of course,
cahced query results  can be naturally added to that,  but just connections
are OK to start with. Security


--

 contaminated fish and microchips
  huge supertankers on Arabian trips
 oily propaganda from the leaders' lips
  all about the future
 there's people over here, people over there
  everybody's looking for a little more air
 crossing all the borders just to take their share
  planning for the future

 Rainbow, Difficult to Cure



Re: [GENERAL] how good is PostgreSQL

2000-10-31 Thread Lamar Owen

Steve Wolfe wrote:
> 
> > Even after that, you have a long way to go before you will hit 1000
> > transactions per second from any SQL database.

>I guess they could always buy a few Sun E1's on the backend, and a
> large room of rack-mountable PC's for web/CGI serving.  Nothing like
> plopping down ten or twenty million dollars on hardware. : )

Or they could buy a single IBM S/390, run Linux/390 and PostgreSQL on
that. Probably would cost less, and be more reliable.  And they can
always load another Linux/390 VM -- an S/390 can run something like
41,000 virtual machines each running Linux/390 and Apache.

However, if you want to see the architecture of a _large_
database-backed website, see the story behind Digital City at
www.aolserver.com.  While they're using Sybase instead of PostgreSQL,
the architecture is the same.
--
Lamar Owen
WGCR Internet Radio
1 Peter 4:11



Re: [GENERAL] how good is PostgreSQL

2000-10-31 Thread Steve Wolfe


> Even after that, you have a long way to go before you will hit 1000
> transactions per second from any SQL database.

   I guess they could always buy a few Sun E1's on the backend, and a
large room of rack-mountable PC's for web/CGI serving.  Nothing like
plopping down ten or twenty million dollars on hardware. : )

steve





Re: [GENERAL] how good is PostgreSQL

2000-10-31 Thread Steve Wolfe


> Even after that, you have a long way to go before you will hit 1000
> transactions per second from any SQL database.

 Since my last post probably wasn't too useful, here's some information
that might be a little more help.  It's a little long, I know, but hopefully
it will be of use to someone.

 As programmers, we naturally want to throw things into databases for
three reasons.  First, it's easy to get data in.  Second, it's easy to get
relevant data out.  And third, it's "cool".  We don't want to work with flat
files, now do we?  ; )

 However, in some cases, using the database to get data out ends up
costing us a lot of time and money.  Sometimes we do the same nasty query so
often, that we end up purchasing bigger hardware to make the system work
reasonably.  Why?  Because it was easier for us to write a program that did:

GetDataFromDatabase();
PrepareData();
PrintData();

 Each time, the database server does the work.  But it doesn't
necessarily have to be that way.  In our company, we've found two trends
that have enabled us to save a LOT of processing power on our machines.
(read:  Increase the capacity of our servers by 30% or more, with fairly
minor changes)

 The first case is that of rarely-changing data.  Some of our datasets
probably have around 50,000 to 1,000,000 views (selects) for each update
(insert/delete).  Having the database repeat the query every time is a
waste.  So, we began writing our programs such that they will grab the data
from the database once, and generate the HTML for every page, and the
indexes.  Then, when an update is made to the database (via the
administrative tools), it simply rewrites *the relevant HTML files*, and
changes the indeces pointing to them.  (There are also some other very large
advantages to this sort of thing, but I'm not allowed to say them. ; )  )

   The second case is that of often-repeated queries.  One of the
offerings on our site is an online directory, which gets a pretty fair
amount of traffic.  Unfortunately, it uses a proprietary program that was
purchased by management before they spoke with us.  Grr   It was the
most utterly inefficient program I've ever seen.  It would *not* allow the
database to do joins, it would grab entire tables, then try to do the joins
itself, in Perl.

   We rewrote the program to let PostgreSQL do the joins, and that sped
it up.   Then we realized that a very small number of queries (those for the
first one or two levels of pages) accounted for a huge portion of the
useage.  So, we replaced the front page with a static HTML page (the front
page doesn't change...), and saw another terrific drop in our system loads.


   Overall, by only modifying a couple of our more heavily-uesd programs,
our server loads dropped by about 30%-40%.  If we went to the trouble to
modify some others, it would drop even more.  But we're going to rewrite
them completely for other reasons. : )


   In any event, there are ways like this to save a LOT of CPU and disk I/O.
Most web servers can server out several hundred static pages with the
resources that would otherwise deliver one dynamically-created,
database-driven page.  It also allows you to cluster the web servers with
cheap commodity hardware, instead of using big-iron on the database.  And if
you have a big-iron machine running the back-end, this can severely lighten
the load on it, keeping you from dropping a few hundred grand on the next
step up. ; )


   (Incidentally, we've toyed around with developping a query-caching system
that would sit betwen PostgreSQL and our DB libraries.  However, it seems
like it could be done *much* more efficiently in PostgreSQL itself, as it
would be much easier to keep track of which tables have changed, etc..
Anybody know if this sort of functionality is planned?  It would be terrific
to simply give the machine another 256 megs of RAM, and tell it to use it as
a DB cache...)

steve






Re: [GENERAL] how good is PostgreSQL

2000-10-31 Thread Steve Wolfe

> Or they could buy a single IBM S/390, run Linux/390 and PostgreSQL on
> that. Probably would cost less, and be more reliable.  And they can
> always load another Linux/390 VM -- an S/390 can run something like
> 41,000 virtual machines each running Linux/390 and Apache.

  Yeah  I'm very optomistic about IBM's new chips that are coming out
next year.  Each "processor module" will have 4 processors, but each
processor will have 2 cores - so in effect, each "processor module" has 8
processors on it.  All processors will have copper interconnects, and
depending on the source, will debut at anywhere from 1.3 to 2 gigahertz.  I
think that will certainly help them get a larger share of the high-end
market!

steve





Re: [GENERAL] how good is PostgreSQL

2000-10-31 Thread markw

Arnold Gamboa wrote:

> Hi,
>
> For users of large PostgreSQL and PostgreSQL builders, this is for you.
>
> I'm having a terrible time deciding now. :(
>
> We're about to build a "huge" website now.  I got tied up in signing the
> contract without really getting enough information about PgSQL since this
> what we plan to implement with PHP (normally we use mySQL but i guess it
> does not fit for huge databases like that).
>
> Here's my problem.. We're about to build a site like hitbox.com where there
> is a large amount of database required.. If say there is 100,000 users with
> 1000 page hits per day for each, and everything will be logged, you could
> imagine how huge this will be.  I'm just so "nervous" (really, that's the
> term) if we implement this and later on experience a slow down or worse than
> that, crash in the server.

That is a LOT of work for any system. That is over 1100 page views a second, or
under 900us each..  A standard Pentium III system, serving static pages would
have problems with that.

If you look at search engines, to get that performance with readonly data, they
usually cluster multiple systems and load balance across them. You may need to
segment your data and have multiple SQL servers perform different functions.

Also, that 1100 page view per second is assuming an even distribution of
traffic, which does not happen in a web server. If you average that much,
chances are there will be periods of twice that.

Look into a "local director," "Alteon," or even LVS.

>
>
> My questions are:
> 1. What is the limit for number of records in a table WITHOUT SUFFERING SLOW
> DOWN.

> 2. limit in number of tables per database
> 3. ... limit in number of database.

There are a couple factors involved, more complex than a simple response.

Use multiple databases and put each on a separate disk, with its own
controller. Better yet, have multiple load balanced web boxes do a lot of
processing in PHP and offload much of the CPU bound SQL work to the "cheap" web
boxes, and have multiple SQL databases in the back handling various independent
tasks.

In a web site I worked on, we had multiple front end web servers, load balanced
with an Alteon. Each web server had its own SQL database which provided SQL
access to "static" data which was updated each week.  We had an additional
single SQL database backend which all the Web servers accessed for synchronized
dynamic data.

If you are serious about the load you expect to put on this system you must be
careful:
Do not create any indexes you do not need.
Do not use the "foreign key" constraint as it forces a trigger for each insert.

Make sure you index the keys by which you will access data.
Avoid searching by strings, try to use keys.

Even after that, you have a long way to go before you will hit 1000
transactions per second from any SQL database.

If you are betting your business on this implementation, you have a lot of
homework to do.

>
>
> Thanks for you comments.  I would really appreciate every comment that I'll
> receive regarding this.
>
> Arnold