Re: [PERFORM] Restricting Postgres

2004-11-04 Thread Martin Foster
Kevin Barnard wrote:
I am generally interested in a good solution for this.  So far our
solution has been to increase the hardware to the point of allowing
800 connections to the DB.
I don't have the mod loaded for Apache, but we haven't had too many
problems there.  The site is split pretty good between dynamic and
non-dynamic, it's largely Flash with several plugins to the DB. 
However we still can and have been slammed and up to point of the 800
connections.

What I don't get is why not use pgpool?  This should eliminate the
rapid fire forking of postgres instanaces in the DB server.  I'm
assuming you app can safely handle a failure to connect to the DB
(i.e. exceed number of DB connections).  If not it should be fairly
simple to send a 503 header when it's unable to get the connection.
Note, that I am not necessarily looking for a PostgreSQL solution to the 
matter.  Just a way to prevent the database from killing off the server 
it sits on, but looking at the load averages.

I have attempted to make use of pgpool and have had some very poor 
performance.   There were constant error messages being sounded, load 
averages on that machine seemed to skyrocket and it just seemed to not 
be suited for my needs.

Apache::DBI overall works better to what I require, even if it is not a 
pool per sey.   Now if pgpool supported variable rate pooling like 
Apache does with it's children, it might help to even things out.  That 
and you'd still get the spike if you have to start the webserver and 
database server at or around the same time.

Martin Foster
Creator/Designer Ethereal Realms
[EMAIL PROTECTED]
---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] Restricting Postgres

2004-11-04 Thread Martin Foster
Matt Clark wrote:
Case in point: A first time visitor hits your home page.  A 
dynamic page is generated (in about 1 second) and served 
(taking 2 more seconds) which contains links to 20 additional 

The gain from an accelerator is actually even more that that, as it takes
essentially zero seconds for Apache to return the generated content (which
in the case of a message board could be quite large) to Squid, which can
then feed it slowly to the user, leaving Apache free again to generate
another page.  When serving dialup users large dynamic pages this can be a
_huge_ gain.
I think Martin's pages (dimly recalling another thread) take a pretty long
time to generate though, so he may not see quite such a significant gain.

Correct the 75% of all hits are on a script that can take anywhere from 
a few seconds to a half an hour to complete.The script essentially 
auto-flushes to the browser so they get new information as it arrives 
creating the illusion of on demand generation.

A squid proxy would probably cause severe problems when dealing with a 
script that does not complete output for a variable rate of time.

As for images, CSS, javascript and such the site makes use of it, but in 
the grand scheme of things the amount of traffic they tie up is 
literally inconsequential.   Though I will probably move all of that 
onto another server just to allow the main server the capabilities of 
dealing with almost exclusively dynamic content.

Martin Foster
Creator/Designer Ethereal Realms
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Restricting Postgres

2004-11-04 Thread Martin Foster
Matt Clark wrote:
Apache::DBI overall works better to what I require, even if 
it is not a 
pool per sey.   Now if pgpool supported variable rate pooling like 
Apache does with it's children, it might help to even things 
out.  That 
and you'd still get the spike if you have to start the webserver and 
database server at or around the same time.

I still don't quite get it though - you shouldn't be getting more than one
child per second being launched by Apache, so that's only one PG postmaster
per second, which is really a trivial load.  That is unless you have
'StartServers' set high, in which case the 'obvious' answer is to lower it.
Are you launching multiple DB connections per Apache process as well?
I have start servers set to a fairly high limit.   However this would 
make little different overall if I restarted the webservers to load in 
new modules during a high load time.When I am averaging 145 
concurrent connections before a restart, I can expect that many request 
to hit the server once Apache begins to respond.

As a result, it will literally cause a spike on both machines as new 
connections are initiated at a high rate.   In my case I don't always 
have the luxury of waiting till 0300 just to test a change.

Again, not necessarily looking for a PostgreSQL solution.  I am looking 
for a method that would allow the database or the OS itself to protect 
the system it's hosted on.If both the database and the apache server 
were on the same machine this type of scenario would be unstable to say 
the least.

Martin Foster
Creator/Designer Ethereal Realms
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly


Re: [PERFORM] Restricting Postgres

2004-11-04 Thread Martin Foster
Matt Clark wrote:
Correct the 75% of all hits are on a script that can take 
anywhere from 
a few seconds to a half an hour to complete.The script 
essentially 
auto-flushes to the browser so they get new information as it arrives 
creating the illusion of on demand generation.

This is more like a streaming data server, which is a very different beast
from a webserver, and probably better suited to the job.  Usually either
multithreaded or single-process using select() (just like Squid).  You could
probably build one pretty easily.  Using a 30MB Apache process to serve one
client for half an hour seems like a hell of a waste of RAM.
These are CGI scripts at the lowest level, nothing more and nothing 
less.  While I could probably embed a small webserver directly into the 
perl scripts and run that as a daemon, it would take away the 
portability that the scripts currently offer.

This should be my last question on the matter, does squid report the 
proper IP address of the client themselves?That's a critical 
requirement for the scripts.

Martin Foster
Creator/Designer Ethereal Realms
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Restricting Postgres

2004-11-04 Thread Martin Foster
Matt Clark wrote:

Pierre-Frédéric Caillaud wrote:
check this marvelus piece of 5 minutes of work :
http://boutiquenumerique.com/test/iframe_feed.html
cela m'a fait le sourire :-)
(apologies for bad french)
M

Javascript is not an option for the scripts, one of the mandates of the 
project is to support as many different client setups as possible and we 
have encountered everything from WebTV to the latest Firefox release. 
 It's a chat/roleplay community and not everyone will invest in new 
equipment.

Now, it would seem to me that there is a trade off between a JS push 
system and a constant ever-present process.With the traditional 
method as I use it, a client will incur the initial penalty of going 
through authentication, pulling the look and feel of the realms, sites 
and simply poll one table from that point on.

Now on the other hand, you have one user making a call for new posts 
every x amount of seconds.   This means every X seconds the penalty for 
authentication and design would kick in, increasing overall the load.

The current scripts can also by dynamically adapted to slow things down 
based on heavy load or quiet realms that bring little posts in.   It's 
much harder to expect Javascript solutions to work perfectly every time 
and not be modified by some proxy.

Unfortunately, we are getting way off track.   I'm looking for a way to 
protect the PostgreSQL server, either from PostgreSQL or some sort of 
external script which pools load average once in a while to make that 
determination.

Now is there an administrative command in PostgreSQL that will cause it 
to move into some sort of maintenance mode?   For me that could be 
exceedingly useful as it would still allow for an admin connection to be 
made and run a VACUUM FULL and such.

Martin Foster
Creator/Designer Ethereal Realms
[EMAIL PROTECTED]
---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Restricting Postgres

2004-11-03 Thread Martin Foster
Simon Riggs wrote:
On Tue, 2004-11-02 at 23:52, Martin Foster wrote:
Is there a way to restrict how much load a PostgreSQL server can take 
before dropping queries in order to safeguard the server?I was 
looking at the login.conf (5) man page and while it allows me to limit 
by processor time this seems to not fit my specific needs.

Essentially, I am looking for a sort of functionality similar to what 
Sendmail and Apache have.   Once the load of the system reaches a 
certain defined limit the daemon drops tasks until such a time that it 
can resume normal operation.

Sounds great... could you give more shape to the idea, so people can
comment on it?
What limit? Measured how? Normal operation is what?
Drop what? How to tell?

Let's use the example in Apache, there is the Apache::LoadAvgLimit 
mod_perl module which allows one to limit based on the system load 
averages.   Here is an example of the configuration one would find:

  Location /perl
PerlInitHandler Apache::LoadAvgLimit
PerlSetVar LoadAvgLimit_1 3.00
PerlSetVar LoadAvgLimit_5 2.00
PerlSetVar LoadAvgLimit_15 1.50
PerlSetVar LoadAvgRetryAfter 120
  /Location
The end state is simple, once the load average moves above 3.00 for the 
1 minute average the web server will not process the CGI scripts or 
mod_perl applications under that directory.  Instead it will return a 
503 error and save the system from being crushed by ever increasing load 
averages.

Only once the load average is below the defined limits will the server 
process requests as normal.   This is not necessarily the nicest or 
cleanest way or doing things, but it does allow the Apache web server to 
prevent a collapse.

There are ways of restricting the size of files, number of concurrent 
processes and even memory being used by a daemon.  This can be done 
through ulimit or the login.conf file if your system supports it. 
However, there is no way to restrict based on load averages, only 
processor time which is ineffective for a perpetually running daemon 
like PostgreSQL has.

While not necessarily common on my servers I have witnessed some fairly 
high load averages which may have led to the machine dropping outright. 
  Any help on this matter would be appreciated.

You can limit the number of connections overall?
Limiting concurrent connections is not always the solution to the 
problem.   Problems can occur when there is a major spike in activity 
that would be considered abnormal, due to outside conditions.

For example using Apache::DBI or pgpool the DBMS may be required to 
spawn a great deal of child processed in a short order of time.   This 
in turn can cause a major spike in processor load and if unchecked by 
running as high demand queries the system can literally increase in load 
until the server buckles.

I've seen this behavior before when restarting the web server during 
heavy loads.Apache goes from zero connections to a solid 120, 
causing PostgreSQL to spawn that many children in a short order of time 
just to keep up with the demand.

PostgreSQL undertakes a penalty when spawning a new client and accepting 
a connection, this slows takes resources at every level to accomplish. 
 However clients on the web server are hitting the server at an 
accelerated rate because of the slowed response, leading to even more 
demand being placed on both machines.

In most cases the processor will be taxed and the load average high 
enough to cause even a noticeable delay when using a console, however it 
will generally recover... slowly or in rare cases crash outright.   In 
such a circumstance, having the database server refuse queries when the 
sanity of the system is concerned might come in handy for such a 
circumstance.

Of course, I am not blaming PostgreSQL, there are probably some 
instabilities in the AMD64 port of FreeBSD 5.2.1 for dual processor 
systems that lead to an increased chance of failure instead of recovery. 
  However, if there was a way to prevent the process from reaching 
those limits, it may avoid the problem altogether.

Martin Foster
Creator/Designer Ethereal Realms
[EMAIL PROTECTED]
---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] Restricting Postgres

2004-11-03 Thread Martin Foster
John A Meinel wrote:
Martin Foster wrote:
Simon Riggs wrote:
On Tue, 2004-11-02 at 23:52, Martin Foster wrote:
[...]
I've seen this behavior before when restarting the web server during 
heavy loads.Apache goes from zero connections to a solid 120, 
causing PostgreSQL to spawn that many children in a short order of 
time just to keep up with the demand.

But wouldn't limiting the number of concurrent connections do this at 
the source. If you tell it that You can at most have 20 connections 
you would never have postgres spawn 120 children.
I'm not sure what apache does if it can't get a DB connection, but it 
seems exactly like what you want.

Now, if you expected to have 50 clients that all like to just sit on 
open connections, you could leave the number of concurrent connections 
high.

But if your only connect is from the webserver, where all of them are 
designed to be short connections, then leave the max low.

The other possibility is having the webserver use connection pooling, so 
it uses a few long lived connections. But even then, you could limit it 
to something like 10-20, not 120.

John
=:-
I have a dual processor system that can support over 150 concurrent 
connections handling normal traffic and load.   Now suppose I setup 
Apache to spawn all of it's children instantly, what will happen is that 
as this happens the PostgreSQL server will also receive 150 attempts at 
connection.

This will spawn 150 children in a short order of time and as this takes 
place clients can connect and start requesting information not allowing 
the machine to settle down to a normal traffic.That spike when 
initiated can cripple the machine or even the webserver if a deadlocked 
transaction is introduced.

Because on the webserver side a slowdown in the database means that it 
will just get that many more connection attempts pooled from the 
clients.  As they keep clicking and hitting reload over and over to get 
a page load, that server starts to buckle hitting unbelievably high load 
averages.

When the above happened once, I lost the ability to type on a console 
because of a 60+ (OpenBSD) load average on a single processor system. 
The reason why Apache now drops a 503 Service Unavailable when loads get 
too high.

It's that spike I worry about and it can happen for whatever reason.  It 
could just as easily be triggered by a massive concurrent request for 
processing of an expensive query done in DDOS fashion.   This may not 
affect the webserver at all, at least immediately, but the same problem 
can effect can come into effect.

Limiting connections help, but it's not the silver bullet and limits 
your ability to support more connections because of that initial spike. 
   The penalty for forking a new child is hardly unexecpected, even 
Apache will show the same effect when restarted in a high traffic time.

Martin Foster
Creator/Designer Ethereal Realms
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


[PERFORM] Restricting Postgres

2004-11-02 Thread Martin Foster
Is there a way to restrict how much load a PostgreSQL server can take 
before dropping queries in order to safeguard the server?I was 
looking at the login.conf (5) man page and while it allows me to limit 
by processor time this seems to not fit my specific needs.

Essentially, I am looking for a sort of functionality similar to what 
Sendmail and Apache have.   Once the load of the system reaches a 
certain defined limit the daemon drops tasks until such a time that it 
can resume normal operation.

While not necessarily common on my servers I have witnessed some fairly 
high load averages which may have led to the machine dropping outright. 
  Any help on this matter would be appreciated.
--
	Martin Foster
	Creator/Designer Ethereal Realms
	[EMAIL PROTECTED]

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faqs/FAQ.html


[PERFORM] Cleaning up indexes

2004-09-23 Thread Martin Foster
My database was converted from MySQL a while back and has maintained all 
of the indexes which were previously used.   Tt the time however, there 
were limitations on the way PostgreSQL handled the indexes compared to 
MySQL.

Meaning that under MySQL, it would make use of a multi-column index even 
if the rows within did not match.When the conversion was made more 
indexes were created overall to correct this and proceed with the 
conversion.

Now the time has come to clean up the used indexes.   Essentially, I 
want to know if there is a way in which to determine which indexes are 
being used and which are not.   This will allow me to drop off the 
unneeded ones and reduce database load as a result.

And have things changed as to allow for mismatched multi-column indexes 
in version 7.4.x or even the upcoming 8.0.x?

Martin Foster
[EMAIL PROTECTED]
---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


[PERFORM] Tanking a server with shared memory

2004-09-05 Thread Martin Foster
I have been experimenting with the  'IPC::Shareable' module under the 
native implementation of Perl 5 for OpenBSD 3.5.  While it is not loaded 
by default it is a pure pure implementation.

I have tested this module under two machines, one which used to run 
PostgreSQL and has a higher then normal amount of SYSV semaphores.  The 
other has a normal amount, when testing under the former database server 
things load up fine, clients can connect and all information is as it 
should.

When I test under the normal setup the machine tanks.No core dumps, 
no errors produced, just a near instant lock-up of the server itself and 
that is with a non-privileged user.

While I know this is a Perl issue, but figured I might be able to gain 
some insight on how a server could drop without at least generating a 
panic.  Any ideas?

Martin Foster
Creator/Designer Ethereal Realms
[EMAIL PROTECTED]
---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] Faster with a sub-query then without

2004-08-15 Thread Martin Foster
Tom Lane wrote:
Martin Foster [EMAIL PROTECTED] writes:
The one not using sub-queries under EXPLAIN ANALYZE proves itself to be 
less efficient and have a far higher cost then those with the penalty of 
a sub-query.   Since this seems to be counter to what I have been told 
in the past, I thought I would bring this forward and get some 
enlightenment.

The ones with the subqueries are not having to form the full join of W
and G; they just pick a few rows out of G and look up the matching W
rows.
The subquery penalty is nonexistent in this case because the
subqueries are not dependent on any variables from the outer query, and
so they need be evaluated only once, rather than once per outer-query
row which is what I suppose you were expecting.  This is reflected in
the EXPLAIN output: notice they are shown as InitPlans not SubPlans.
The outputs of the InitPlans are essentially treated as constants (shown
as $0 in the EXPLAIN output) and the outer plan is approximately what
it would be if you'd written WHERE g.field = 'constant' instead of
WHERE g.field = (select ...)
			regards, tom lane
That would explain it overall.  Still, it does seem unusual when one 
puts in additional code, which most literature warns you about and you 
actually gain a speed boost.

Thanks!
Martin Foster
Creator/Designer Ethereal Realms
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


[PERFORM] Faster with a sub-query then without

2004-08-14 Thread Martin Foster
I thought this could generate some interesting discussion.  Essentially, 
there are three queries below, two using sub-queries to change the way 
the randomized information (works first by author and then by work) and 
the original which simply randomizes out of all works available.

The one not using sub-queries under EXPLAIN ANALYZE proves itself to be 
less efficient and have a far higher cost then those with the penalty of 
a sub-query.   Since this seems to be counter to what I have been told 
in the past, I thought I would bring this forward and get some 
enlightenment.

Martin Foster
Creator/Designer Ethereal Realms
[EMAIL PROTECTED]
---
SELECT
 g.GalleryID,
 w.WorkID,
 w.WorkName,
 w.WorkImageThumbnail,
 g.GalleryRating,
 g.GalleryPenName
FROM ethereal.Work w, ethereal.Gallery g
WHERE w.GalleryID = g.GalleryID
 AND g.GalleryPrivacy = 'no'
 AND w.WorkImageThumbnail IS NOT NULL
 AND g.PuppeteerLogin = (SELECT PuppeteerLogin
  FROM ethereal.Gallery
  WHERE GalleryType='image'
  GROUP BY PuppeteerLogin
  ORDER BY RANDOM() LIMIT 1)
ORDER BY RANDOM() LIMIT 1
 Limit  (cost=60.70..60.70 rows=1 width=100) (actual time=1.013..1.013 
rows=0 loops=1)
   InitPlan
 -  Limit  (cost=6.36..6.37 rows=1 width=11) (actual 
time=0.711..0.713 rows=1 loops=1)
   -  Sort  (cost=6.36..6.45 rows=33 width=11) (actual 
time=0.708..0.708 rows=1 loops=1)
 Sort Key: random()
 -  HashAggregate  (cost=5.45..5.53 rows=33 width=11) 
(actual time=0.420..0.553 rows=46 loops=1)
   -  Seq Scan on gallery  (cost=0.00..5.30 
rows=60 width=11) (actual time=0.007..0.227 rows=59 loops=1)
 Filter: ((gallerytype)::text = 'image'::text)
   -  Sort  (cost=54.33..54.37 rows=16 width=100) (actual 
time=1.009..1.009 rows=0 loops=1)
 Sort Key: random()
 -  Nested Loop  (cost=0.00..54.01 rows=16 width=100) (actual 
time=0.981..0.981 rows=0 loops=1)
   -  Seq Scan on gallery g  (cost=0.00..5.56 rows=2 
width=24) (actual time=0.855..0.888 rows=1 loops=1)
 Filter: (((galleryprivacy)::text = 'no'::text) AND 
((puppeteerlogin)::text = ($0)::text))
   -  Index Scan using pkwork on work w 
(cost=0.00..24.10 rows=8 width=80) (actual time=0.080..0.080 rows=0 loops=1)
 Index Cond: (w.galleryid = outer.galleryid)
 Filter: (workimagethumbnail IS NOT NULL)
 Total runtime: 1.211 ms

---
SELECT
 g.GalleryID,
 w.WorkID,
 w.WorkName,
 w.WorkImageThumbnail,
 g.GalleryRating,
 g.GalleryPenName
FROM ethereal.Work w, ethereal.Gallery g
WHERE w.GalleryID = g.GalleryID
 AND g.GalleryPrivacy = 'no'
 AND w.WorkImageThumbnail IS NOT NULL
 AND g.GalleryPenName = (SELECT GalleryPenName
  FROM ethereal.Gallery
  WHERE GalleryType='image'
  GROUP BY GalleryPenName
  ORDER BY RANDOM() LIMIT 1)
ORDER BY RANDOM() LIMIT 1
 Limit  (cost=59.92..59.92 rows=1 width=100) (actual time=0.904..0.906 
rows=1 loops=1)
   InitPlan
 -  Limit  (cost=6.69..6.69 rows=1 width=14) (actual 
time=0.731..0.733 rows=1 loops=1)
   -  Sort  (cost=6.69..6.79 rows=42 width=14) (actual 
time=0.729..0.729 rows=1 loops=1)
 Sort Key: random()
 -  HashAggregate  (cost=5.45..5.56 rows=42 width=14) 
(actual time=0.431..0.568 rows=48 loops=1)
   -  Seq Scan on gallery  (cost=0.00..5.30 
rows=60 width=14) (actual time=0.011..0.233 rows=59 loops=1)
 Filter: ((gallerytype)::text = 'image'::text)
   -  Sort  (cost=53.23..53.27 rows=16 width=100) (actual 
time=0.899..0.899 rows=1 loops=1)
 Sort Key: random()
 -  Nested Loop  (cost=0.00..52.91 rows=16 width=100) (actual 
time=0.808..0.862 rows=6 loops=1)
   -  Index Scan using idxgallery_pen on gallery g 
(cost=0.00..4.45 rows=2 width=24) (actual time=0.767..0.769 rows=1 loops=1)
 Index Cond: ((gallerypenname)::text = ($0)::text)
 Filter: ((galleryprivacy)::text = 'no'::text)
   -  Index Scan using pkwork on work w 
(cost=0.00..24.10 rows=8 width=80) (actual time=0.020..0.042 rows=6 loops=1)
 Index Cond: (w.galleryid = outer.galleryid)
 Filter: (workimagethumbnail IS NOT NULL)
 Total runtime: 1.117 ms

---
SELECT
 g.GalleryID,
 w.WorkID,
 w.WorkName,
 w.WorkImageThumbnail,
 g.GalleryRating,
 g.GalleryPenName
FROM ethereal.Work w, ethereal.Gallery g
WHERE w.GalleryID = g.GalleryID
 AND g.GalleryType = 'image'
 AND g.GalleryPrivacy = 'no'
 AND w.WorkImageThumbnail IS NOT NULL
ORDER BY RANDOM() LIMIT 1

 Limit  (cost=111.73..111.73 rows=1 width=100) (actual 
time=13.021..13.023 rows=1 loops=1)
   -  Sort  (cost=111.73..113.70 rows=786 width=100

Re: [PERFORM] Performance Bottleneck

2004-08-08 Thread Martin Foster
Jeff wrote:
On Aug 8, 2004, at 1:29 AM, Martin Foster wrote:
I am currently making use of Apache::DBI which overrides the 
DBI::disconnect call and keeps a pool of active connections for use 
when need be.   Since it offloads the pooling to the webserver, it 
seems more advantageous then pgpool which while being able to run on a 
external system is not adding another layer of complexity.

Apache::DBI is not the same sort of a pool as pgpool.  DB connections 
are not shared among all your apache children (A common misconception). 
 So if you have 300 apache kids you can have have 300 db connections.  
With pgpool connections are  shared among all of them so even though you 
have 300 kids you only have say 32 db connections.

Seems that you are right, never noticed that from the documentation 
before.   I always assumed it had something to do with the long 
lasting/persistent scripts that would remain in transactions for 
extended periods of time.

Here is an odd question.   While the server run 7.4.x, the client 
connects with 7.3.x.  Would this in itself make a difference in 
performance as the protocols are different?   At least based from 
pgpool's documentation.

Martin Foster
Creator/Designer Ethereal Realms
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] Performance Bottleneck

2004-08-08 Thread Martin Foster
Arjen van der Meijden wrote:
On 8-8-2004 16:29, Matt Clark wrote:
There are two well-worn and very mature techniques for dealing with the
issue of web apps using one DB connection per apache process, both of 
which
work extremely well and attack the issue at its source.

1)Use a front-end caching proxy like Squid as an accelerator.  Static
content will be served by the accelerator 99% of the time.  Additionally,
large pages can be served immediately to the accelerator by Apache, which
can then go on to serve another request without waiting for the end 
user's
dial-up connection to pull the data down.  Massive speedup, fewer apache
processes needed.

Another version of this 1) is to run with a content accelerator; our 
favourite is to run Tux in front of Apache. It takes over the 
connection-handling stuff, has a very low memoryprofile (compared to 
Apache) and very little overhead. What it does, is to serve up all 
simple content (although you can have cgi/php/perl and other languages 
being processed by it, entirely disabling the need for apache in some 
cases) and forwards/proxies everything it doesn't understand to an 
Apache/other webserver running at the same machine (which runs on 
another port).

I think there are a few advantages over Squid; since it is partially 
done in kernel-space it can be slightly faster in serving up content, 
apart from its simplicity which will probably matter even more. You'll 
have no caching issues for pages that should not be cached or static 
files that change periodically (like every few seconds). Afaik Tux can 
handle more than 10 times as much ab-generated requests per second than 
a default-compiled Apache on the same machine.
And besides the speed-up, you can do any request you where able to do 
before, since Tux will simply forward it to Apache if it didn't 
understand it.

Anyway, apart from all that. Reducing the amount of apache-connections 
is nice, but not really the same as reducing the amount of 
pooled-connections using a db-pool... You may even be able to run with 
1000 http-connections, 40 apache-processes and 10 db-connections. In 
case of the non-pooled setup, you'd still have 40 db-connections.

In a simple test I did, I did feel pgpool had quite some overhead 
though. So it should be well tested, to find out where the 
turnover-point is where it will be a gain instead of a loss...

Best regards,
Arjen van der Meijden
Other then images, there are very few static pages being loaded up by 
the user.Since they make up a very small portion of the traffic, it 
tends to be an optimization we can forgo for now.

I attempted to make use of pgpool.   At the default 32 connections 
pre-forked the webserver almost immediately tapped out the pgpool base 
and content stopped being served because no new processes were being 
forked to make up for it.

So I raised it to a higher value (256) and it immediately segfaulted and 
dropped the core.So not sure exactly how to proceed, since I rather 
need the thing to fork additional servers as load hits and not the other 
way around.

Unless I had it configured oddly, but it seems work differently then an 
Apache server would to handle content.

Martin Foster
Creator/Designer Ethereal Realms
[EMAIL PROTECTED]
---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org


Re: [PERFORM] Performance Bottleneck

2004-08-07 Thread Martin Foster
Jan Wieck wrote:
On 8/3/2004 2:05 PM, Martin Foster wrote:
I run a Perl/CGI driven website that makes extensive use of PostgreSQL 
(7.4.3) for everything from user information to formatting and display 
of specific sections of the site.   The server itself, is a dual 
processor AMD Opteron 1.4Ghz w/ 2GB Ram and 2 x 120GB hard drives 
mirrored for redundancy running under FreeBSD 5.2.1 (AMD64).

Recently loads on the site have increased during peak hours to the 
point of showing considerable loss in performance.This can be 
observed when connections move from the 120 concurrent connections to 
PostgreSQL to roughly 175 or more. Essentially, the machine seems 
to struggle to keep up with continual requests and slows down 
respectively as resources are tied down.

Have you taken a look at pgpool? I know, it sounds silly to *reduce* the 
number of DB connections through a connection pool, but it can help.

Jan
I am currently making use of Apache::DBI which overrides the 
DBI::disconnect call and keeps a pool of active connections for use when 
need be.   Since it offloads the pooling to the webserver, it seems more 
advantageous then pgpool which while being able to run on a external 
system is not adding another layer of complexity.

Anyone had any experience with both Apache::DBI and pgpool?   For my 
needs they seem to do essentially the same thing, simply that one is 
invisible to the code while the other requires adding the complexity of 
a proxy.

Martin Foster
Creator/Designer Ethereal Realms
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Performance Bottleneck

2004-08-07 Thread Martin Foster
Christopher Kings-Lynne wrote:
This value of wal_buffers is simply ridiculous.

Instead I think is ridiculous a wal_buffers = 8 ( 64KB ) by default.

There is no point making WAL buffers higher than 8.  I have done much 
testing of this and it makes not the slightest difference to performance 
that I could measure.

Chris
No point?  I had it at 64 if memory serves and logs were warning me that 
raising this value would be desired because of excessive IO brought upon 
from the logs being filled far too often.

It would seem to me that 8 is a bit low in at least a few circumstances.
Martin Foster
Creator/Designer Ethereal Realms
[EMAIL PROTECTED]
---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


Re: [PERFORM] Performance Bottleneck

2004-08-06 Thread Martin Foster
Gaetano Mendola wrote:

Let start from your postgres configuration:
shared_buffers = 8192 This is really too small for your 
configuration
sort_mem = 2048

wal_buffers = 128 This is really too small for your configuration
effective_cache_size = 16000
change this values in:
shared_buffers = 5
sort_mem = 16084
wal_buffers = 1500
effective_cache_size = 32000
to bump up the shm usage you have to configure your OS in order to be
allowed to use that ammount of SHM.
This are the numbers that I feel good for your HW, the second step now is
analyze your queries
These changes have yielded some visible improvements, with load averages 
rarely going over the anything noticeable.   However, I do have a 
question on the matter, why do these values seem to be far higher then 
what a frequently pointed to document would indicate as necessary?

http://www.varlena.com/GeneralBits/Tidbits/perf.html
I am simply curious, as this clearly shows that my understanding of 
PostgreSQL is clearly lacking when it comes to tweaking for the hardware.

Martin Foster
Creator/Designer Ethereal Realms
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Performance Bottleneck

2004-08-06 Thread Martin Foster
Mike Benoit wrote:
On Wed, 2004-08-04 at 17:25 +0200, Gaetano Mendola wrote:

The queries themselves are simple, normally drawing information from one 
table with few conditions or in the most complex cases using joins on 
two table or sub queries.   These behave very well and always have, the 
problem is that these queries take place in rather large amounts due to 
the dumb nature of the scripts themselves.
Show us the explain analyze on that queries, how many rows the tables are
containing, the table schema could be also usefull.

If the queries themselves are optimized as much as they can be, and as
you say, its just the sheer amount of similar queries hitting the
database, you could try using prepared queries for ones that are most
often executed to eliminate some of the overhead. 

I've had relatively good success with this in the past, and it doesn't
take very much code modification.
One of the biggest problems is most probably related to the indexes. 
Since the performance penalty of logging the information needed to see 
which queries are used and which are not is a slight problem, then I 
cannot really make use of it for now.

However, I am curious how one would go about preparing query?   Is this 
similar to the DBI::Prepare statement with placeholders and simply 
changing the values passed on execute?  Or is this something database 
level such as a view et cetera?

SELECT
 Post.PostIDNumber,
 Post.$format,
 Post.PuppeteerLogin,
 Post.PuppetName,
 Post.PostCmd,
 Post.PostClass
FROM Post
WHERE Post.PostIDNumber  ?::INT
  AND (Post.PostTo='all' OR Post.PostTo=?)
  AND (NOT EXISTS (SELECT PuppetIgnore.PuppetLogin
FROM PuppetIgnore
WHERE PuppetIgnore.PuppetIgnore='global'
  AND PuppetIgnore.PuppeteerLogin=?
  AND PuppetIgnore.PuppetLogin=Post.PuppeteerLogin)
   OR Post.PuppeteerLogin IS NULL)
  AND (NOT EXISTS (SELECT PuppetIgnore.PuppetName
FROM PuppetIgnore
WHERE PuppetIgnore.PuppetIgnore='single'
  AND PuppetIgnore.PuppeteerLogin=?
  AND PuppetIgnore.PuppetName=Post.PuppetName)
   OR Post.PuppetName IS NULL)
ORDER BY Post.PostIDNumber LIMIT 100
The range is determined from the previous run or through a query listed 
below.   It was determined that using INT was far faster then limiting 
by timestamp.

SELECT MIN(PostIDNumber)
FROM Post
WHERE RealmName=?
  AND PostClass IN ('general','play')
  AND PostTo='all'
The above simply provides a starting point, nothing more.   Once posts 
are pulled the script will throw in the last pulled number as to start 
from a fresh point.

Under MySQL time was an stored as an INT which may have helped it handle 
timestamps more efficiently.It also made use of three or more 
queries, where two were done to generate an IN statement for the query 
actually running at the time.

Martin Foster
Creator/Designer Ethereal Realms
[EMAIL PROTECTED]
---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Performance Bottleneck

2004-08-06 Thread Martin Foster
Scott Marlowe wrote:
On Fri, 2004-08-06 at 17:24, Gaetano Mendola wrote:
Martin Foster wrote:

Gaetano Mendola wrote:

Let start from your postgres configuration:
shared_buffers = 8192 This is really too small for your 
configuration
sort_mem = 2048

wal_buffers = 128 This is really too small for your 
configuration

effective_cache_size = 16000
change this values in:
shared_buffers = 5
sort_mem = 16084
wal_buffers = 1500
effective_cache_size = 32000
to bump up the shm usage you have to configure your OS in order to be
allowed to use that ammount of SHM.
This are the numbers that I feel good for your HW, the second step now is
analyze your queries
These changes have yielded some visible improvements, with load averages 
rarely going over the anything noticeable.   However, I do have a 
question on the matter, why do these values seem to be far higher then 
what a frequently pointed to document would indicate as necessary?

http://www.varlena.com/GeneralBits/Tidbits/perf.html
I am simply curious, as this clearly shows that my understanding of 
PostgreSQL is clearly lacking when it comes to tweaking for the hardware.
Unfortunately there is no a wizard tuning for postgres so each one of
us have a own school. The data I gave you are oversized to be sure
to achieve improvements. Now you can start to decrease these values
( starting from the wal_buffers ) in order to find the good compromise
with your HW.

FYI, my school of tuning is to change one thing at a time some
reasonable percentage (shared_buffers from 1000 to 2000) and measure the
change under simulated load.  Make another change, test it, chart the
shape of the change line.  It should look something like this for most
folks:
shared_buffers | q/s (more is better)
100 | 20
200 | 45
400 | 80
1000 | 100
... levels out here...
8000 | 110
1 | 108
2 | 40
3 | 20
Note it going back down as we exceed our memory and start swapping
shared_buffers.  Where that happens on your machine is determined by
many things like your machine's memory, memory bandwidth, type of load,
etc... but it will happen on most machines and when it does, it often
happens at the worst times, under heavy parallel load.
Unless testing shows it's faster, 1 or 25% of mem (whichever is
less) is usually a pretty good setting for shared_buffers.  Large data
sets may require more than 1, but going over 25% on machines with
large memory is usually a mistake, especially servers that do anything
other than just PostgreSQL.
You're absolutely right about one thing, there's no automatic wizard for
tuning this stuff.
Which rather points out the crux of the problem.  This is a live system, 
meaning changes made need to be as informed as possible, and that 
changing values for the sake of testing can lead to potential problems 
in service.

Martin Foster
Creator/Designer Ethereal Realms
[EMAIL PROTECTED]
---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] Performance Bottleneck

2004-08-06 Thread Martin Foster
Scott Marlowe wrote:
On Fri, 2004-08-06 at 22:02, Martin Foster wrote:
Scott Marlowe wrote:

On Fri, 2004-08-06 at 17:24, Gaetano Mendola wrote:

Martin Foster wrote:

Gaetano Mendola wrote:

Let start from your postgres configuration:
shared_buffers = 8192 This is really too small for your 
configuration
sort_mem = 2048

wal_buffers = 128 This is really too small for your 
configuration

effective_cache_size = 16000
change this values in:
shared_buffers = 5
sort_mem = 16084
wal_buffers = 1500
effective_cache_size = 32000
to bump up the shm usage you have to configure your OS in order to be
allowed to use that ammount of SHM.
This are the numbers that I feel good for your HW, the second step now is
analyze your queries
These changes have yielded some visible improvements, with load averages 
rarely going over the anything noticeable.   However, I do have a 
question on the matter, why do these values seem to be far higher then 
what a frequently pointed to document would indicate as necessary?

http://www.varlena.com/GeneralBits/Tidbits/perf.html
I am simply curious, as this clearly shows that my understanding of 
PostgreSQL is clearly lacking when it comes to tweaking for the hardware.
Unfortunately there is no a wizard tuning for postgres so each one of
us have a own school. The data I gave you are oversized to be sure
to achieve improvements. Now you can start to decrease these values
( starting from the wal_buffers ) in order to find the good compromise
with your HW.

FYI, my school of tuning is to change one thing at a time some
reasonable percentage (shared_buffers from 1000 to 2000) and measure the
change under simulated load.  Make another change, test it, chart the
shape of the change line.  It should look something like this for most
folks:
shared_buffers | q/s (more is better)
100 | 20
200 | 45
400 | 80
1000 | 100
... levels out here...
8000 | 110
1 | 108
2 | 40
3 | 20
Note it going back down as we exceed our memory and start swapping
shared_buffers.  Where that happens on your machine is determined by
many things like your machine's memory, memory bandwidth, type of load,
etc... but it will happen on most machines and when it does, it often
happens at the worst times, under heavy parallel load.
Unless testing shows it's faster, 1 or 25% of mem (whichever is
less) is usually a pretty good setting for shared_buffers.  Large data
sets may require more than 1, but going over 25% on machines with
large memory is usually a mistake, especially servers that do anything
other than just PostgreSQL.
You're absolutely right about one thing, there's no automatic wizard for
tuning this stuff.
Which rather points out the crux of the problem.  This is a live system, 
meaning changes made need to be as informed as possible, and that 
changing values for the sake of testing can lead to potential problems 
in service.

But if you make those changes slowly, as I was showing, you should see
the small deleterious effects like I was showing long before they become
catastrophic.  To just jump shared_buffers to 5 is not a good idea,
especially if the sweet spot is likely lower than that.  

While I agree, there are also issues with the fact that getting 
consistent results from this site are very much difficult to do, since 
it is based on the whims of users visiting one of three sites hosted on 
the same hardware.

Now that being said, having wal_buffers at 8 certainly would not be a 
good idea, since the database logs themselves were warning of excessive 
writes in that region.I am not hoping for a perfect intermix ratio, 
that will solve all my problems.

But a good idea on a base that will allow me to gain a fair load would 
certainly be a good option.   Right now, the load being handled is not 
much more then a single processor system did with half the memory. 
Certainly this architecture should be able to take more of a beating 
then this?

Martin Foster
Creator/Designer Ethereal Realms
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Performance Bottleneck

2004-08-04 Thread Martin Foster
Gaetano Mendola wrote:
Martin Foster wrote:
Gaetano Mendola wrote:
Martin Foster wrote:
I run a Perl/CGI driven website that makes extensive use of 
PostgreSQL (7.4.3) for everything from user information to 
formatting and display of specific sections of the site.   The 
server itself, is a dual processor AMD Opteron 1.4Ghz w/ 2GB Ram and 
2 x 120GB hard drives mirrored for redundancy running under FreeBSD 
5.2.1 (AMD64).

Recently loads on the site have increased during peak hours to the 
point of showing considerable loss in performance.This can be 
observed when connections move from the 120 concurrent connections 
to PostgreSQL to roughly 175 or more. Essentially, the machine 
seems to struggle to keep up with continual requests and slows down 
respectively as resources are tied down.

Code changes have been made to the scripts to essentially back off 
in high load working environments which have worked to an extent. 
However, as loads continue to increase the database itself is not 
taking well to the increased traffic taking place.

Having taken a look at 'Tuning PostgreSQL for Performance' 
(http://www.varlena.com/GeneralBits/Tidbits/perf.html) using it as 
best I could in order to set my settings.However, even with 
statistics disabled and ever setting tweaked things still consider 
to deteriorate.

Is there anything anyone can recommend in order to give the system a 
necessary speed boost?   It would seem to me that a modest dataset 
of roughly a Gig combined with that type of hardware should be able 
to handle substantially more load then what it is.  Can anyone 
provide me with clues as where to pursue?Would disabling 'fsync' 
provide more performance if I choose that information may be lost in 
case of a crash?

If anyone needs access to logs, settings et cetera.   Please ask, I 
simply wish to test the waters first on what is needed. Thanks!


Tell us about your tipical queries, show us your configuration file.
The access are only in read only mode or do you have concurrent writers
and readers ? During peak hours your processors are tied to 100% ?
What say the vmstat and the iostat ?
May be you are not using indexes some where, or may be yes but the
planner is not using it... In two words we needs other informations
in order to help you.

Regards
Gaetano Mendola

I included all the files in attachments, which will hopefully cut down 
on any replied to Emails.As for things like connection pooling, 
the web server makes use of Apache::DBI to pool the connections for 
the Perl scripts being driven on that server.For the sake of being 
thorough, a quick 'apachectl status' was thrown in when the database 
was under a good load.

Let start from your postgres configuration:
shared_buffers = 8192 This is really too small for your 
configuration
sort_mem = 2048

wal_buffers = 128 This is really too small for your configuration
effective_cache_size = 16000
change this values in:
shared_buffers = 5
sort_mem = 16084
wal_buffers = 1500
effective_cache_size = 32000
to bump up the shm usage you have to configure your OS in order to be
allowed to use that ammount of SHM.
This are the numbers that I feel good for your HW, the second step now is
analyze your queries
The queries themselves are simple, normally drawing information from 
one table with few conditions or in the most complex cases using joins 
on two table or sub queries.   These behave very well and always have, 
the problem is that these queries take place in rather large amounts 
due to the dumb nature of the scripts themselves.

Show us the explain analyze on that queries, how many rows the tables are
containing, the table schema could be also usefull.

regards
Gaetano Mendola
I will look into moving up those values and seeing how they interact 
with the system once I get back from work.Since it was requested, I 
have a visual representation of an older schema, one that was used under 
MySQL.  Note that all of the timestamps are now properly set to 
LOCALTIME on PostgreSQL.

http://prdownloads.sourceforge.net/ethereal-realms/ethereal-3_0_0.png?download
The amount of rows for tables of note are as follows:
 Puppeteer 1606
 Puppet33176
 Realm 83
 Post  36156
 Audit 61961
The post table is continually cleared of old information since the 
nature of the information is time very critical and archiving would only 
hinder performance.As a result, this will vary wildly based on time 
of day since users (Puppeteers) tend to post more during peak hours.

NOTE:   The scripts make use of different schema's with the same
information in order to virtualize the script in order
to support more then one site on the same hardware.
On a side note, this would be a normal post-authentication session once 
in realm for getting new posts:
 * Script is executed and schema is determined through stored procedure;
 * Formatting information is fetched from Tag and RealmDesign as needed

[PERFORM] Performance Bottleneck

2004-08-03 Thread Martin Foster
I run a Perl/CGI driven website that makes extensive use of PostgreSQL 
(7.4.3) for everything from user information to formatting and display 
of specific sections of the site.   The server itself, is a dual 
processor AMD Opteron 1.4Ghz w/ 2GB Ram and 2 x 120GB hard drives 
mirrored for redundancy running under FreeBSD 5.2.1 (AMD64).

Recently loads on the site have increased during peak hours to the point 
of showing considerable loss in performance.This can be observed 
when connections move from the 120 concurrent connections to PostgreSQL 
to roughly 175 or more. Essentially, the machine seems to struggle 
to keep up with continual requests and slows down respectively as 
resources are tied down.

Code changes have been made to the scripts to essentially back off in 
high load working environments which have worked to an extent. 
However, as loads continue to increase the database itself is not taking 
well to the increased traffic taking place.

Having taken a look at 'Tuning PostgreSQL for Performance' 
(http://www.varlena.com/GeneralBits/Tidbits/perf.html) using it as best 
I could in order to set my settings.However, even with statistics 
disabled and ever setting tweaked things still consider to deteriorate.

Is there anything anyone can recommend in order to give the system a 
necessary speed boost?   It would seem to me that a modest dataset of 
roughly a Gig combined with that type of hardware should be able to 
handle substantially more load then what it is.  Can anyone provide me 
with clues as where to pursue?Would disabling 'fsync' provide more 
performance if I choose that information may be lost in case of a crash?

If anyone needs access to logs, settings et cetera.   Please ask, I 
simply wish to test the waters first on what is needed. Thanks!

Martin Foster
[EMAIL PROTECTED]
---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Performance Bottleneck

2004-08-03 Thread Martin Foster
Gaetano Mendola wrote:
Martin Foster wrote:
I run a Perl/CGI driven website that makes extensive use of PostgreSQL 
(7.4.3) for everything from user information to formatting and display 
of specific sections of the site.   The server itself, is a dual 
processor AMD Opteron 1.4Ghz w/ 2GB Ram and 2 x 120GB hard drives 
mirrored for redundancy running under FreeBSD 5.2.1 (AMD64).

Recently loads on the site have increased during peak hours to the 
point of showing considerable loss in performance.This can be 
observed when connections move from the 120 concurrent connections to 
PostgreSQL to roughly 175 or more. Essentially, the machine seems 
to struggle to keep up with continual requests and slows down 
respectively as resources are tied down.

Code changes have been made to the scripts to essentially back off in 
high load working environments which have worked to an extent. 
However, as loads continue to increase the database itself is not 
taking well to the increased traffic taking place.

Having taken a look at 'Tuning PostgreSQL for Performance' 
(http://www.varlena.com/GeneralBits/Tidbits/perf.html) using it as 
best I could in order to set my settings.However, even with 
statistics disabled and ever setting tweaked things still consider to 
deteriorate.

Is there anything anyone can recommend in order to give the system a 
necessary speed boost?   It would seem to me that a modest dataset of 
roughly a Gig combined with that type of hardware should be able to 
handle substantially more load then what it is.  Can anyone provide me 
with clues as where to pursue?Would disabling 'fsync' provide more 
performance if I choose that information may be lost in case of a crash?

If anyone needs access to logs, settings et cetera.   Please ask, I 
simply wish to test the waters first on what is needed. Thanks!

Tell us about your tipical queries, show us your configuration file.
The access are only in read only mode or do you have concurrent writers
and readers ? During peak hours your processors are tied to 100% ?
What say the vmstat and the iostat ?
May be you are not using indexes some where, or may be yes but the
planner is not using it... In two words we needs other informations
in order to help you.

Regards
Gaetano Mendola

I included all the files in attachments, which will hopefully cut down 
on any replied to Emails.As for things like connection pooling, the 
web server makes use of Apache::DBI to pool the connections for the Perl 
scripts being driven on that server.For the sake of being thorough, 
a quick 'apachectl status' was thrown in when the database was under a 
good load.

Since it would rather slow things down to wait for the servers to really 
get bogged down with load averages of 20.00 and more, I opted to choose 
a period of time where we are a bit busier then normal.   You will be 
able to see how the system behaves under a light load and subsequently 
reaching 125 or so concurrent connections.

The queries themselves are simple, normally drawing information from one 
table with few conditions or in the most complex cases using joins on 
two table or sub queries.   These behave very well and always have, the 
problem is that these queries take place in rather large amounts due to 
the dumb nature of the scripts themselves.

Over a year ago when I was still using MySQL for the project, the 
statistics generated would report well over 65 queries per second under 
loads ranging from 130 to 160 at peak but averaged over the weeks of 
operation.   Looking at the Apache status, one can see that it averages 
only roughly 2.5 requests per second giving you a slight indication as 
to what is taking place.

A quick run of 'systat -ifstat' shows the following graph:
   /0   /1   /2   /3   /4   /5   /6   /7   /8   /9   /10
Load Average   
Interface   Traffic   PeakTotal
  lo0  in  0.000 KB/s  0.000 KB/s   37.690 GB
   out 0.000 KB/s  0.000 KB/s   37.690 GB
  em0  in 34.638 KB/s 41.986 KB/s   28.998 GB
   out70.777 KB/s 70.777 KB/s   39.553 GB
Em0 is a full duplexed 100Mbs connection to an internal switch that 
supports the servers directly.   Load on the loopback was cut down 
considerably once I stopped using pg_autovaccum since its performance 
benefits under low load were buried under the hindrance it caused when 
traffic was high.

I am sure that there are some places that could benefit from some 
optimization.  Especially in the case of indexes, however as a whole the 
problem seems to be related more to the massive onslaught of queries 
then it does anything else.

Also note that some of these scripts run for longer durations even if 
they are web based.Some run as long as 30 minutes, making queries to 
the database from periods of wait from five seconds to twenty-five 
seconds. Under high duress

Re: [NOVICE] [PERFORM] Extreme high load averages

2003-07-11 Thread Martin Foster
Shridhar Daithankar wrote:

On 10 Jul 2003 at 0:43, Martin Foster wrote:

As for creating a new table, that in itself is a nice idea.   But it 
would cause issues for people currently in the realm.   Their posts 
would essentially dissapear from site and cause more confusion then its 
worth.


No they won't. Say you have a base table and your current post table is child 
of that. You can query on base table and get rows from child table. That way 
all the data would always be there.

While inserting posts, you would insert in child table. While qeurying you 
would query on base table. That way things will be optimal.


Inheritance would work, but the database would essentially just grow and 
grow and grow right?


Right. But there are two advantages.

1. It will always contain valid posts. No dead tuples.
2. You can work in chuncks of data. Each child table can be dealt with 
separately without affecting other child tables, whereas in case of a single 
large table, entire site is affected..

Deleting 100K posts from 101K rows table is vastly different than deleting 10K 
posts from 2M rows table. Later one would unnecessary starve the table with 
dead tuples and IO whereas in former case you can do create table as select 
from and drop the original..

HTH

Bye
 Shridhar
--
[In 'Doctor' mode], I spent a good ten minutes telling Emacs what Ithought of 
it.  (The response was, 'Perhaps you could try to be lessabusive.')(By Matt 
Welsh)

When I ran EXPLAIN on the views and queries making use of the inherited 
tables, I noticed that everything worked based on sequence scans and it 
avoided all indexes.While making use of ONLY kicked in full indexes.

This is even after having created a child table with the same indexes as 
the parent.   Is this a known issue, or just some sort of oddity on my 
setup?

Tables still cannot be removed easily, but I found a way to work around 
it for a day-to-day basis.  Essentailly I just clean out the tables 
containing old rows and delete them later.  However based on the above, 
I doubt performance would get any better.

Thanks for the advice however!

Martin Foster
Creator/Designer Ethereal Realms
[EMAIL PROTECTED]


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Moving postgresql.conf tunables into 2003...

2003-07-09 Thread Martin Foster
Scott Marlowe  wrote:
It would be nice to have a program that could run on any OS postgresql 
runs on and could report on the current limits of the kernel, and make 
recommendations for changes the admin might want to make.

One could probably make a good stab at effective cache size during 
install.  Anything reasonably close would probably help.

Report what % of said resources could be consumed by postgresql under 
various circumstances...

One of the issues that automating the process would encounter are limits 
in the kernel that are too low for PostgreSQL to handle. The BSD's come 
to mind where they need values manually increased in the kernel before 
you can reach a reasonable maximum connection count.

Another example is how OpenBSD will outright crash when trying to test 
the database during install time.   It seems that most of the tests fail 
because the maximum amount of processes allowed is too low for the test 
to succeed.   While FreeBSD will work just fine on those same tests.

If PostgreSQL automates the configuration, that would be a plus.   But 
also detect the platform and inform the person that these changes should 
be done to the kernel, sysctl or whatever in order to have that 
configuration run.

Perl may be useful in this for a few reasons.   It's portable enough to 
run on multiple Unix variants and the tools would be fairly standard, so 
the code would require less considerations for more exotic implementations.

Martin Foster
Creator/Designer Ethereal Realms
[EMAIL PROTECTED]


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Moving postgresql.conf tunables into 2003...

2003-07-09 Thread Martin Foster
Sean Chittenden wrote:
I looked through the src/doc/runtime.sgml for a good place to stick
this and couldn't find a place that this seemed appropriate, but on
FreeBSD, this can be determined with a great deal of precision in a
programmatic manner:
echo effective_cache_size = $((`sysctl -n vfs.hibufspace` / 8192))

The same OID is available via C too.  It'd be slick if PostgreSQL
could tune itself (on FreeBSD) at initdb time with the above code.  If
Linux exports this info via /proc and can whip out the appropriate
magic, even better.  An uncommented out good guess that shows up in
postgresql.conf would be stellar and quite possible with the use of
sed.
Maybe an initdb switch could be added to have initdb tune the config
it generates?  If a -n is added, have it generate a config and toss it
to stdout?
case `uname` in
FreeBSD)
echo effective_cache_size = $((`sysctl -n vfs.hibufspace` / 8192))
;;
*)
echo Unable to automatically determine the effective cache size  
/dev/stderr
;;
esac
-sc

Simplest way may be to create a 'auto-tune' directory with scripts for 
configured platforms.   When postgres installs the databases, it checks 
for 'tune.xxx' and if found uses that to generate the script itself?

This would allow for defaults on platforms that do not have them and 
optimization for those that do.

Martin Foster
Creator/Designer Ethereal Realms
[EMAIL PROTECTED]


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


Re: [NOVICE] [PERFORM] Extreme high load averages

2003-07-09 Thread Martin Foster
Dennis Björklund wrote:

On Sun, 6 Jul 2003, Martin Foster wrote:


The processor seems to be purposely sitting there twiddling it's thumbs. 
 Which leads me to believe that perhaps the nice levels have to be 
changed on the server itself?


It could also be all the usual things that affect performance. Are your 
queries using indexes where it should? Do you vacuum analyze after you 
have updated/inserted a lot of data?

It could be that some of your queries is not as efficient as it should, 
like doing a sequenctial scan over a table instead of an index scan. That 
translates into more IO needed and slower response times. Especially when 
you have more connections figthing for the available IO.

I actually got a bit more respect for PostgreSQL tonight.  It seems that 
one of my scripts was not committing changes after maintenance was 
conducted.  Meaning that rows that would normally be removed after 
offline archiving was completed were in fact still around.

Normally at any given point in time this table would grow 50K rows 
during a day, be archived that night and then loose rows that were no 
longer needed.This process, is what allowed MySQL to maintain any 
stability as the size of this table can balloon significantly.

PostgreSQL with tweaking was handling a table with nearly 300K rows. 
That size alone would of dragged the MySQL system down to a near grind, 
and since most of those rows are not needed.   One can imagine that 
queries are needlessly processing rows that should be outright ignored.

This probably explains why row numbering based searches greatly 
accelerated the overall process.

By fixing the script and doing the appropriate full vacuum and re-index, 
the system is behaving much more like it should.  Even if the process 
may seem a bit odd to some.

The reason for removing rows on a daily basis is due to the perishable 
nature of the information.  Since this is a chat site, posts over a day 
old are rarely needed for any reason.   Which is why they are archived 
into dumps in case we really need to retrieve the information itself and 
this gives us the added bonus of smaller backup sizes and smaller 
database sizes.

Martin Foster
Creator/Designer Ethereal Realms
[EMAIL PROTECTED]


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly


Re: [PERFORM] Moving postgresql.conf tunables into 2003...

2003-07-06 Thread Martin Foster
Michael Pohl wrote:
On Sun, 6 Jul 2003, Matthew Nuzum wrote:


At the very least, if there is good documentation for these parameters,
maybe the conf file should provide a link to this info. 


I believe that is what Josh is proposing:

http://archives.postgresql.org/pgsql-performance/2003-07/msg00102.php


[Apache httpd] uses a three phase (if not more) documentation level.  
The .conf file contains detailed instructions in an easy to read and
not-to-jargon-ish structure.  The docs provide detailed tutorials and
papers that expand on configuration params in an easy to read format.  
Both of these refer to the thorough reference manual that breaks each
possible option down into it's nitty gritty details so that a user can
get more information if they so desire.


I agree that Apache's approach is primo.  Often the .conf comments are
enough to jog my memory about a directive I haven't used for a while.  Or
the comments are enough to let me know I don't need a directive, or that I
need to go to the manual and read more.  I appreciate that.
michael



One thing that may also help, is to include more sample .conf files. 
For example, you could include settings that would be commonly seen for 
decicated databases with generic specs and another with less resources 
and not dedicated for use with Postgres.

This would allow users to see how certain setting changes will work. 
The default .conf is great if you want to setup a small test bed, but 
for a real life example chances are it won't exactly be what your 
looking for.

Martin Foster
Creator/Designer Ethereal Realms
[EMAIL PROTECTED]
---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly