Re: [PERFORM] Suspicious top output

2008-04-22 Thread Greg Smith

On Tue, 22 Apr 2008, Rafael Barrera Oro wrote:


Hello, i have a postgresql server running and from time to time it gets
painfully slow.


The usual information you should always include when posting messages here 
is PostgreSQL and operating system versions.


When this happens i usually connect to the server and run a "top" 
command


The other thing you should fire up in another window is "vmstat 1" to 
figure out just what's going on in general.  The great thing about those 
is you can save them when you're done and easily analyze the results later 
easily, which is trickier to do with top.



71872 pgsql1   40 48552K 42836K sbwait   1:41  4.79%
postgres


Some searching found this interesting suggestion from Darcy about things 
stuck in sbwait:


http://unix.derkeiler.com/Mailing-Lists/FreeBSD/performance/2004-03/0015.html

--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Background writer underemphasized ...

2008-04-22 Thread Greg Smith

On Sun, 20 Apr 2008, James Mansion wrote:

Are you suggesting that the disk subsystem has already decided on its 
strategy for a set of seeks and writes and will not insert new 
instructions into an existing elevator plan until it is completed and it 
looks at the new requests?


No, just that each component only gets to sort across what it sees, and 
because of that the sorting horizon may not be optimized the same way 
depending on how writes are sent.


Let me try to construct a credible example of this elusive phenomenon:

-We have a server with lots of RAM
-The disk controller cache has 256MB of cache
-We have 512MB of data to write that's spread randomly across the database 
disk.


Case 1:  Write early

Let's say the background writer writes a sample of 1/2 the data right now 
in anticipation of needing those buffers for something else soon.  It's 
now in the controller's cache and sorted already.  The controller is 
working on it.  Presume it starts at the beginning of the disk and works 
its way toward the end, seeking past gaps in between as needed.


The checkpoint hits just after that happens.  The remaining 256MB gets 
dumped into the OS buffer cache.  This gets elevator sorted by the OS, 
which will now write it out to the card in sorted order, beginning to end. 
But writes to the controller will block because most of the cache is 
filled, so they trickle in as data writes are completed and the cache gets 
space.  Let's presume they're all ignored, because the drive is working 
toward the end and these are closer to the beginning than the ones it's 
working on.


Now the disk is near the end of its logical space, and there's a cache 
full of new dirty checkpoint data.  But the OS has finished spooling all 
its dirty stuff into the cache so the checkpoint is over.  During that 
checkpoint the disk has to seek enough to cover the full logical "length" 
of the volume.  The controller will continue merrily writing now until its 
cache clears again, moving from the end of the disk back to the beginning 
again.


Case 2:  Delayed writes, no background writer use

The checkpoint hits.  512MB of data gets dumped into the OS cache.  It 
sorts and feeds that in sorted order into the cache.  Drive starts at the 
beginning and works it way through everything.  By the time it's finished 
seeking its way across half the disk, the OS is now unblocked becuase the 
remaining data is in the cache.


Can you see how in this second case, it may very well be that the 
checkpoint finishes *faster* because we waited longer to start writing? 
Because the OS has a much larger elevator sorting capacity than the disk 
controller, leaving data in RAM and waiting until there's more of it 
queued up there has approximately halved the number/size of seeks involved 
before the controller can say it's absorbed all the writes.


This sounds a bit tenuous at best - almost to the point of being a 
bug. Do you believe this is universal?


Of course not, or the background writer would be turned off by default. 
There are occasional reports where it just gets in the way, typically in 
ones where the controller has its own cache and there's a bad interaction 
there.


This is not unique to this situation, so in that sense this class of 
problems is universal.  There's all kinds of operating sytems 
configurations that are tuned to delay writing in hopes of making those 
writes more efficient, because the OS usually has a much larger capacity 
for buffering pages to optimize what's going to happen than the downstream 
controller/disk caches do.  Once you've filled a downstream cache, you may 
not be able to influence what that device executing those requests does 
anymore until that cache clears.


Note that the worst-case situation here actually gets worse in some 
respects the larger the downstream cache is, because there's that much 
more data you have to wait to clear before you can necessarily influence 
what the disks are doing if you've made a bad choice in what you asked it 
to write early.  If the disk head is too far away from where you want to 
write or read to now, you can be in for quite a wait before it gets back 
your way if the filled cache is large.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] [PERFORMANCE] Error loading 37G CSV file "invalid string enlargement request size 65536"

2008-04-22 Thread Shane Ambler

Adonias Malosso wrote:

Hi all,




split --lines=1000

And running the copy i receive the error on the 5th file:

psql:/srv/www/htdocs/import/script_q2.sql:122: ERROR:  invalid string
enlargement request size 65536
CONTEXT:  COPY temp_q2, line 3509639: ""9367276";"4";"DANIEL DO
CARMO BARROS";"31-Jan-1986";"M";"1";"10";"3162906";"GILSON TEIXEIRA..."

Any clues?


quote problems from earlier than that?
one missing?
\ at end of field negating the closing quote

I'd keep splitting to help isolate - what control do you have over the 
generation of the data?


Is this one off import or ongoing?


My postgresql version is 8.2.4 the server is running suse linux with 1.5GB
Sensitive changes in postgresql.conf are:

shared_buffers = 512MB
temp_buffers = 256MB
checkpoint_segments = 60

I´d also like to know if there´s any way to optimize huge data load in
operations like these.


Sounds like you are already using copy. Where from? Is the data file on 
the server or a seperate client? (as in reading from the same disk that 
you are writing the data to?)


See if http://pgfoundry.org/projects/pgbulkload/ can help

It depends a lot on what you are doing and what table you are importing 
into. Indexes will most likely be the biggest slow down, it is faster to 
create them after the table is filled. Also fk restraints can slow down 
as well.


Is this a live server that will still be working as you load data?

If the db is not in use try dropping all indexes (on the relevant table 
anyway), loading then create indexes.


You can copy into a temp table without indexes then select into the 
target table.


What fk restraints does this table have? Can they be safely deferred 
during the import?



--

Shane Ambler
pgSQL (at) Sheeky (dot) Biz

Get Sheeky @ http://Sheeky.Biz

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] connections slowing everything down?

2008-04-22 Thread Merlin Moncure
On Mon, Apr 21, 2008 at 5:50 AM, Adrian Moisey
<[EMAIL PROTECTED]> wrote:
> Hi
>
>  # ps -ef | grep idle | wc -l
>  87
>  # ps -ef | grep SELECT | wc -l
>  5
>
>
>  I have 2 web servers which connect to PGPool which connects to our postgres
> db.  I have noticed that idle connections seem to take up CPU and RAM
> (according to top).  Could this in any way cause things to slow down?

Something is not quite with your assumptions. On an unloaded server,
open a bunch of connections (like 500) from psql doing nothing, and
cpu load will stay at zero. IOW, an 'idle' connection does not consume
any measurable CPU resources once connected.  It does consume some ram
but that would presumably at least partly swap out eventually.  What's
probably going on here is your connections are not really idle.  Top
by default aggregates usage every three seconds and ps is more of a
snapshot.  During the top a single connection might accept and dispose
0, 1, 50, 100, or 1000 queries depending on various factors.  Your
sampling methods are simply not accurate enough.

With statement level logging on (with pid on the log line),  you can
break out and measure query activity by connection.

merlin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] CPU bound at 99%

2008-04-22 Thread Ivan Voras

Bryan Buecking wrote:

On Tue, Apr 22, 2008 at 10:55:19AM -0500, Erik Jones wrote:
Are you referring to PHP's persistent connections?  Do not use those.   
Here's a thread that details the issues with why not:  
http://archives.postgresql.org/pgsql-general/2007-08/msg00660.php .  


Thanks for that article, very informative and persuasive enough that
I've turned off persistent connections.


Note that it's not always true - current recommended practice for PHP is 
to run it in FastCGI, in which case even though there are hundreds of 
Apache processes, there are only few PHP processes with their persistent 
database connections (and unused PHP FastCGI servers get killed off 
routinely) so you get almost "proper" pooling without the overhead.



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Suspicious top output

2008-04-22 Thread Ivan Voras

Rafael Barrera Oro wrote:

Hello, i have a postgresql server running and from time to time it gets
painfully slow. When this happens i usually connect to the server and
run a "top" command, the output i get is filled with lines like the
following

71872 pgsql1   40 48552K 42836K sbwait   1:41  4.79%
postgres

Are those connections that were not closed or something like that?


This looks like FreeBSD; "sbwait" state is socket buffer wait, and 
guessing from the CPU usage the process seems to be talking to another 
process.



should i worry?


Don't know. Are you sure all client processes disconnect properly from 
the database?



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] [PERFORMANCE] Error loading 37G CSV file "invalid string enlargement request size 65536"

2008-04-22 Thread Tom Lane
"Adonias Malosso" <[EMAIL PROTECTED]> writes:
> I´m running a copy for a 37G CSV and receiving the following error:
> "invalid string enlargement request size 65536"

AFAICS this would only happen if you've got an individual line of COPY
data exceeding 1GB.  (PG versions later than 8.2 give a slightly more
helpful "out of memory" error in such a case.)

Most likely, that was not your intention, and the real problem is
incorrect quoting/escaping in the CSV file, causing COPY to think
that a large number of physical lines should be read as one logical line.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] [PERFORMANCE] Error loading 37G CSV file "invalid string enlargement request size 65536"

2008-04-22 Thread Adonias Malosso
Hi all,

I´m running a copy for a 37G CSV and receiving the following error:

"invalid string enlargement request size 65536"

The file has about 70 million lines with 101 columns, all them varchar.

When I run the command with the whole file i receive the error after loading
about 29million lines. So i´ve spllited the file in 10 million lines with
split:

split --lines=1000

And running the copy i receive the error on the 5th file:

psql:/srv/www/htdocs/import/script_q2.sql:122: ERROR:  invalid string
enlargement request size 65536
CONTEXT:  COPY temp_q2, line 3509639: ""9367276";"4";"DANIEL DO
CARMO BARROS";"31-Jan-1986";"M";"1";"10";"3162906";"GILSON TEIXEIRA..."

Any clues?

My postgresql version is 8.2.4 the server is running suse linux with 1.5GB
Sensitive changes in postgresql.conf are:

shared_buffers = 512MB
temp_buffers = 256MB
checkpoint_segments = 60

I´d also like to know if there´s any way to optimize huge data load in
operations like these.

Regards

Adonias Malosso


[PERFORM] Suspicious top output

2008-04-22 Thread Rafael Barrera Oro
Hello, i have a postgresql server running and from time to time it gets
painfully slow. When this happens i usually connect to the server and
run a "top" command, the output i get is filled with lines like the
following

71872 pgsql1   40 48552K 42836K sbwait   1:41  4.79%
postgres

Are those connections that were not closed or something like that?

should i worry?

Thanks in advance, as always

yours trully

Rafael


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] CPU bound at 99%

2008-04-22 Thread PFC



about 2300 connections in idle
(ps auxwww | grep postgres | idle)


[...]


The server that connects to the db is an apache server using persistent
connections. MaxClients is 2048 thus the high number of connections
needed. Application was written in PHP using the Pear DB class.


This is pretty classical.
	When your number of threads gets out of control, everything gets slower,  
so more requests pile up, spawning more threads, this is positive  
feedback, and in seconds all hell breaks loose. That's why I call it  
imploding, like if it collapses under its own weight. There is a threshold  
effect and it gets from working good to a crawl rather quickly once you  
pass the threshold, as you experienced.


	Note that the same applies to Apache, PHP as well as Postgres : there is  
a "sweet spot" in the number of threads, for optimum efficiency, depending  
on how many cores you have. Too few threads, and it will be waiting for IO  
or waiting for the database. Too many threads, and CPU cache utilization  
becomes suboptimal and context switches eat your performance.


	This sweet spot is certainly not at 500 connections per core, either for  
Postgres or for PHP. It is much lower, about 5-20 depending on your load.


	I will copypaste here an email I wrote to another person with the exact  
same problem, and the exact same solution.

Please read this carefully :

*

Basically there are three classes of websites in my book.
1- Low traffic (ie a few hits/s on dynamic pages), when performance  
doesn't matter
2- High traffic (ie 10-100 hits/s on dynamic pages), when you must read  
the rest of this email
3- Monster traffic (lots more than that) when you need to give some of  
your cash to Akamai, get some load balancers, replicate your databases,  
use lots of caching, etc. This is yahoo, flickr, meetic, etc.


Usually people whose web sites are imploding under load think they are in  
class 3 but really most of them are in class 2 but using inadequate  
technical solutions like MySQL, etc. I had a website with 200K members  
that ran on a Celeron 1200 with 512 MB RAM, perfectly fine, and lighttpd  
wasn't even visible in the top.


Good news for you is that the solution to your problem is pretty easy. You  
should be able to solve that in about 4 hours.


Suppose you have some web servers for static content ; obviously you are  
using lighttpd on that since it can service an "unlimited" (up to the OS  
limit, something like 64K sockets) number of concurrent connections. You  
could also use nginx or Zeus. I think Akamai uses Zeus. But Lighttpd is  
perfectly fine (and free). For your static content servers you will want  
to use lots of RAM for caching, if you serve images, put the small files  
like thumbnails, css, javascript, html pages on a separate server so that  
they are all served from RAM, use a cheap CPU since a Pentium-M  with  
lighttpd will happily push 10K http hits/s if you don't wait for IO. Large  
files should be on the second static server to avoid cache trashing on the  
server which has all the frequently accessed small files.


Then you have some web servers for generating your dynamic content. Let's  
suppose you have N CPU cores total.
With your N cores, the ideal number of threads would be N. However those  
will also wait for IO and database operations, so you want to fill those  
wait times with useful work, so maybe you will use something like 2...10  
threads per core. This can only be determined by experimentation, it  
depends on the type and length of your SQL queries so there is no "one  
size fits all" answer.


Example. You have pages that take 20 ms to generate, and you have 100  
requests for those coming up. Let's suppose you have one CPU core.


(Note : if your pages take longer than 10 ms, you have a problem. On the  
previously mentioned website, now running on the cheapest Core 2 we could  
find since the torrent tracker eats lots of CPU, pages take about 2-5 ms  
to generate, even the forum pages with 30 posts on them. We use PHP with  
compiled code caching and SQL is properly optimized). And, yes, it uses  
MySQL. Once I wrote (as an experiment) an extremely simple forum which did  
1400 pages/second (which is huge) with a desktop Core2 as the Postgres 8.2  
server.


- You could use Apache in the old fasion way, have 100 threads, so all  
your pages will take 20 ms x 100 = 2 seconds,
But the CPU cache utilisation will suck because of all those context  
switches, you'll have 100 processes eating your RAM (count 8MB for a PHP  
process), 100 database connections, 100 postgres processes, the locks will  
stay on longer, transactions will last longer, you'll get more dead rows  
to vacuum, etc.
And actually, since Apache will not buffer the output of your scripts, the  
PHP or Perl interpreter will stay in memory (and hog a database  
connection) until the client at the o

Re: [PERFORM] CPU bound at 99%

2008-04-22 Thread Craig Ringer

Erik Jones wrote:


max_connections = 2400


That is WAY too high.  Get a real pooler, such as pgpool, and drop that 
down to 1000 and test from there.  I see you mentioned 500 concurrent 
connections.  Are each of those connections actually doing something?  
My guess that once you cut down on the number actual connections you'll 
find that each connection can get it's work done faster and you'll see 
that number drop significantly.


It's not an issue for me - I'm expecting *never* to top 100 concurrent 
connections, and many of those will be idle, with the usual load being 
closer to 30 connections. Big stuff ;-)


However, I'm curious about what an idle backend really costs.

On my system each backend has an RSS of about 3.8MB, and a psql process 
tends to be about 3.0MB. However, much of that will be shared library 
bindings etc. The real cost per psql instance and associated backend 
appears to be 1.5MB (measured with 10 connections using system free RAM 
change) . If I use a little Python program to generate 50 connections 
free system RAM drops by ~45MB and rises by the same amount when the 
Python process exists and the backends die, so the backends presumably 
use less than 1MB each of real unshared RAM.


Presumably the backends will grow if they perform some significant 
queries and are then left idle. I haven't checked that.


At 1MB of RAM per backend that's not a trivial cost, but it's far from 
earth shattering, especially allowing for the OS swapping out backends 
that're idle for extended periods.


So ... what else does an idle backend cost? Is it reducing the amount of 
shared memory available for use on complex queries? Are there some lists 
PostgreSQL must scan for queries that get more expensive to examine as 
the number of backends rise? Are there locking costs?


--
Craig Ringer

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] CPU bound at 99%

2008-04-22 Thread Tom Lane
Bryan Buecking <[EMAIL PROTECTED]> writes:
> On Tue, Apr 22, 2008 at 10:55:19AM -0500, Erik Jones wrote:
>> That is WAY too high.  Get a real pooler, such as pgpool, and drop  
>> that down to 1000 and test from there.

> I agree, but the number of idle connections dont' seem to affect
> performace only memory usage.

I doubt that's true (and your CPU load suggests the contrary as well).
There are common operations that have to scan the whole PGPROC array,
which has one entry per open connection.  What's worse, some of them
require exclusive lock on the array.

8.3 has some improvements in this area that will probably let it scale
to more connections than previous releases, but in any case connection
pooling is a good thing.

> I'm trying to lessen the load of
> connection setup. But sounds like this tax is minimal?

Not really.  You're better off reusing a connection over a large number
of queries.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] CPU bound at 99%

2008-04-22 Thread Scott Marlowe
On Tue, Apr 22, 2008 at 10:10 AM, Bryan Buecking <[EMAIL PROTECTED]> wrote:
>
>  I agree, but the number of idle connections dont' seem to affect
>  performace only memory usage. I'm trying to lessen the load of
>  connection setup. But sounds like this tax is minimal?

Not entirely true.  There are certain things that happen that require
one backend to notify ALL OTHER backends.  when this happens a lot,
then the system will slow to a crawl.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] CPU bound at 99%

2008-04-22 Thread Bryan Buecking
On Tue, Apr 22, 2008 at 01:21:03PM -0300, Rodrigo Gonzalez wrote:
> Are tables vacuumed often?

How often is often.  Right now db is vaccumed once a day.
-- 
Bryan Buecking  http://www.starling-software.com

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] CPU bound at 99%

2008-04-22 Thread Rodrigo Gonzalez

Are tables vacuumed often?

Bryan Buecking escribió:

On Tue, Apr 22, 2008 at 10:55:19AM -0500, Erik Jones wrote:
  

On Apr 22, 2008, at 10:31 AM, Bryan Buecking wrote:



max_connections = 2400
  
That is WAY too high.  Get a real pooler, such as pgpool, and drop  
that down to 1000 and test from there.



I agree, but the number of idle connections dont' seem to affect
performace only memory usage. I'm trying to lessen the load of
connection setup. But sounds like this tax is minimal?

When these issues started happening, max_connections was set to 1000 and
I was not using persistent connections.

  

I see you mentioned 500 concurrent connections. Are each of those
connections actually doing something?



Yes out of the 2400 odd connections, 500 are either in SELECT or RESET.

  

My guess that once you cut down on the number actual connections
you'll find that each connection can get it's work done faster
and you'll see that number drop significantly.



I agree, but not in this case.  I will look at using pooling. 
  





smime.p7s
Description: S/MIME Cryptographic Signature


Re: [PERFORM] CPU bound at 99%

2008-04-22 Thread Bryan Buecking
On Tue, Apr 22, 2008 at 10:55:19AM -0500, Erik Jones wrote:
> 
> Are you referring to PHP's persistent connections?  Do not use those.   
> Here's a thread that details the issues with why not:  
> http://archives.postgresql.org/pgsql-general/2007-08/msg00660.php .  

Thanks for that article, very informative and persuasive enough that
I've turned off persistent connections.

-- 
Bryan Buecking  http://www.starling-software.com

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] CPU bound at 99%

2008-04-22 Thread Harald Armin Massa
Bryan,

 > > about 2300 connections in idle
>  > > (ps auxwww | grep postgres | idle)

that is about 2300 processes being task scheduled by your kernel, each
of them using > 1 MB of RAM and some other ressources, are you sure
that this is what you want?

Usual recommended design for a web application:

start request, rent a connection from connection pool, do query, put
connection back, finish request, wait for next request

so to get 500 connections in parallel, you would have the outside
situaion of 500 browsers submitting requests within the time needed to
fullfill one request.

Harald
-- 
GHUM Harald Massa
persuadere et programmare
Harald Armin Massa
Spielberger Straße 49
70435 Stuttgart
0173/9409607
fx 01212-5-13695179
-
EuroPython 2008 will take place in Vilnius, Lithuania - Stay tuned!

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] CPU bound at 99%

2008-04-22 Thread Bryan Buecking
On Tue, Apr 22, 2008 at 10:55:19AM -0500, Erik Jones wrote:
> On Apr 22, 2008, at 10:31 AM, Bryan Buecking wrote:
> 
> >max_connections = 2400
> 
> That is WAY too high.  Get a real pooler, such as pgpool, and drop  
> that down to 1000 and test from there.

I agree, but the number of idle connections dont' seem to affect
performace only memory usage. I'm trying to lessen the load of
connection setup. But sounds like this tax is minimal?

When these issues started happening, max_connections was set to 1000 and
I was not using persistent connections.

> I see you mentioned 500 concurrent connections. Are each of those
> connections actually doing something?

Yes out of the 2400 odd connections, 500 are either in SELECT or RESET.

> My guess that once you cut down on the number actual connections
> you'll find that each connection can get it's work done faster
> and you'll see that number drop significantly.

I agree, but not in this case.  I will look at using pooling. 
-- 
Bryan Buecking  http://www.starling-software.com

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] CPU bound at 99%

2008-04-22 Thread Bryan Buecking
On Tue, Apr 22, 2008 at 08:41:09AM -0700, Joshua D. Drake wrote:
> On Wed, 23 Apr 2008 00:31:01 +0900
> Bryan Buecking <[EMAIL PROTECTED]> wrote:
> 
> > at any given time there is about 5-6 postgres in startup 
> > (ps auxwww | grep postgres | grep startup | wc -l)
> > 
> > about 2300 connections in idle 
> > (ps auxwww | grep postgres | idle)
> > 
> > and loads of "FATAL: sorry, too many clients already" being logged.
> > 
> > The server that connects to the db is an apache server using
> > persistent connections. MaxClients is 2048 thus the high number of
> > connections needed. Application was written in PHP using the Pear DB
> > class.
> 
> Sounds like your pooler isn't reusing connections properly.

The persistent connections are working properly. The idle connections
are expected given that the Apache child process are not closing them
(A la non-persistent).  The connections do go away after 1000 requests
(MaxChildRequest).

I decided to move towards persistent connections since prior to
persistent connections the idle vs startup were reversed.

-- 
Bryan Buecking  http://www.starling-software.com

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] CPU bound at 99%

2008-04-22 Thread Erik Jones


On Apr 22, 2008, at 10:31 AM, Bryan Buecking wrote:


Hi,

I'm running into an performance problem where a Postgres db is running
at 99% CPU (4 cores) with about 500 concurrent connection doing  
various

queries from a web application. This problem started about a week ago,
and has been steadily going downhill. I have been tweaking the  
config a

bit, mainly shared_memory but have seen no noticeable improvements.

at any given time there is about 5-6 postgres in startup
(ps auxwww | grep postgres | grep startup | wc -l)

about 2300 connections in idle
(ps auxwww | grep postgres | idle)

and loads of "FATAL: sorry, too many clients already" being logged.

The server that connects to the db is an apache server using  
persistent

connections. MaxClients is 2048 thus the high number of connections
needed. Application was written in PHP using the Pear DB class.


Are you referring to PHP's persistent connections?  Do not use those.   
Here's a thread that details the issues with why not:  http://archives.postgresql.org/pgsql-general/2007-08/msg00660.php 
.  Basically, PHP's persistent connections are NOT pooling solution.   
Us pgpool or somesuch.






max_connections = 2400


That is WAY too high.  Get a real pooler, such as pgpool, and drop  
that down to 1000 and test from there.  I see you mentioned 500  
concurrent connections.  Are each of those connections actually doing  
something?  My guess that once you cut down on the number actual  
connections you'll find that each connection can get it's work done  
faster and you'll see that number drop significantly.  For example,  
our application does anywhere from 200 - 600 transactions per second,  
dependent on the time of day/week, and we never need more that 150 to  
200 connections (although we do have the max_connections set to 500).




Erik Jones

DBA | Emma®
[EMAIL PROTECTED]
800.595.4401 or 615.292.5888
615.292.0777 (fax)

Emma helps organizations everywhere communicate & market in style.
Visit us online at http://www.myemma.com




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] CPU bound at 99%

2008-04-22 Thread Joshua D. Drake
On Wed, 23 Apr 2008 00:31:01 +0900
Bryan Buecking <[EMAIL PROTECTED]> wrote:

> at any given time there is about 5-6 postgres in startup 
> (ps auxwww | grep postgres | grep startup | wc -l)
> 
> about 2300 connections in idle 
> (ps auxwww | grep postgres | idle)
> 
> and loads of "FATAL: sorry, too many clients already" being logged.
> 
> The server that connects to the db is an apache server using
> persistent connections. MaxClients is 2048 thus the high number of
> connections needed. Application was written in PHP using the Pear DB
> class.

Sounds like your pooler isn't reusing connections properly.

Sincerely,

Joshua D. Drake


-- 
The PostgreSQL Company since 1997: http://www.commandprompt.com/ 
PostgreSQL Community Conference: http://www.postgresqlconference.org/
United States PostgreSQL Association: http://www.postgresql.us/
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] CPU bound at 99%

2008-04-22 Thread Bryan Buecking
Hi,

I'm running into an performance problem where a Postgres db is running
at 99% CPU (4 cores) with about 500 concurrent connection doing various
queries from a web application. This problem started about a week ago,
and has been steadily going downhill. I have been tweaking the config a
bit, mainly shared_memory but have seen no noticeable improvements.

at any given time there is about 5-6 postgres in startup 
(ps auxwww | grep postgres | grep startup | wc -l)

about 2300 connections in idle 
(ps auxwww | grep postgres | idle)

and loads of "FATAL: sorry, too many clients already" being logged.

The server that connects to the db is an apache server using persistent
connections. MaxClients is 2048 thus the high number of connections
needed. Application was written in PHP using the Pear DB class.

Here are some typical queries taking place

(table media has about 40,000 records and category about 40):

LOG: duration: 66141.530 ms  statement:
SELECT COUNT(*) AS CNT
FROM media m JOIN category ca USING(category_id)
WHERE CATEGORY_ROOT(m.category_id) = '-1'
AND m.deleted_on IS NULL

LOG:  duration: 57828.983 ms  statement:
SELECT COUNT(*) AS CNT
FROM media m JOIN category ca USING(category_id)
WHERE CATEGORY_ROOT(m.category_id) = '-1'
AND m.deleted_on IS NULL AND m.POSTED_ON + interval '7 day'

System
==
cpu Xeon(R) CPU 5160 @ 3.00GHz stepping 06 x 4
L1, L2 = 32K, 4096K
mem 8GB
dbmspostgresql-server 8.2.4
disks   scsi0 : LSI Logic SAS based MegaRAID driver
SCSI device sda: 142082048 512-byte hdwr sectors (72746 MB)
SCSI device sda: 142082048 512-byte hdwr sectors (72746 MB)

Stats
==

top - 00:28:40 up 12:43,  1 user,  load average: 46.88, 36.55, 37.65
Tasks: 2184 total,  63 running, 2119 sleeping,   1 stopped,   1 zombie
Cpu0: 99.3% us,  0.5% sy,  0.0% ni,  0.0% id,  0.0% wa,  0.0% hi, 0.2% si
Cpu1: 98.3% us,  1.4% sy,  0.0% ni,  0.2% id,  0.0% wa,  0.0% hi, 0.0% si
Cpu2: 99.5% us,  0.5% sy,  0.0% ni,  0.0% id,  0.0% wa,  0.0% hi, 0.0% si
Cpu3: 99.5% us,  0.5% sy,  0.0% ni,  0.0% id,  0.0% wa,  0.0% hi, 0.0% si
Mem:   8166004k total,  6400368k used,  1765636k free,   112080k buffers
Swap:  1020088k total,0k used,  1020088k free,  3558764k cached


$ vmstat 3
procs -memory-- ---swap-- -io --system-- cpu
 r  b swpd   free   buff  cache   si   sobibo   incs us sy id wa
 4  00 559428 109440 3558684001127   31   117 96  2  2  0
 5  00 558996 109452 355867200 041 1171   835 93  1  7  0
 4  00 558996 109452 355874000 038 1172   497 98  1  1  0
11  00 554516 109452 355874000 019 1236   610 97  1  2  0
25  00 549860 109452 355874000 032 1228   332 99  1  0  0
12  00 555412 109452 355874000 0 4 1148   284 99  1  0  0
15  00 555476 109452 355874000 023 1202   290 99  1  0  0
15  00 555476 109452 355874000 0 1 1125   260 99  1  0  0
16  00 555460 109452 355874000 012 1214   278 99  1  0  0


# -
# PostgreSQL configuration file
# -

#data_directory = 'ConfigDir'   # use data in another directory
# (change requires restart)
#hba_file = 'ConfigDir/pg_hba.conf' # host-based authentication file
# (change requires restart)
#ident_file = 'ConfigDir/pg_ident.conf' # ident configuration file
# (change requires restart)

# If external_pid_file is not explicitly set, no extra PID file is written.
#external_pid_file = '(none)'   # write an extra PID file
# (change requires restart)


#---
# CONNECTIONS AND AUTHENTICATION
#---

# - Connection Settings -

listen_addresses = 'localhost'# what IP address(es) to listen on; 
# comma-separated list of addresses;
# defaults to 'localhost', '*' = all
# (change requires restart)
port = 5432 # (change requires restart)
max_connections = 2400  # (change requires restart)
# Note: increasing max_connections costs ~400 bytes of shared memory per 
# connection slot, plus lock space (see max_locks_per_transaction).  You
# might also need to raise shared_buffers to support more connections.
superuser_reserved_connections = 3  # (change requires restart)
#unix_socket_directory = '' # (change requires restart)
#unix_socket_group = '' # (change requires restart)
#uni

Re: [PERFORM] Oddly slow queries

2008-04-22 Thread Scott Marlowe
On Tue, Apr 22, 2008 at 7:42 AM, Thomas Spreng <[EMAIL PROTECTED]> wrote:
>
>  I think I'll upgrade PostgreSQL to the latest 8.3 version in the next
>  few days anyway, along with a memory upgrade (from 1.5GB to 4GB) and a
>  new 2x RAID-1 (instead of RAID-5) disk configuration. I hope that this
>  has already a noticeable impact on the performance.

Note that if you have a good RAID controller with battery backed cache
and write back enabled, then you're probably better or / at least as
well off using four disks in a RAID-10 than two separate RAID-1 sets
(one for xlog and one for data).

Test to see.  I've had better performance in general with the RAID-10 setup.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Oddly slow queries

2008-04-22 Thread Thomas Spreng

On 19.04.2008, at 19:11, Christopher Browne wrote:
Martha Stewart called it a Good Thing when [EMAIL PROTECTED] (Thomas  
Spreng) wrote:

On 16.04.2008, at 17:42, Chris Browne wrote:
What I meant is if there are no INSERT's or UPDATE's going on it
shouldn't affect SELECT queries, or am I wrong?


Yes, that's right.  (Caveat: VACUUM would be a form of update, in this
context...)


thanks for pointing that out, at the moment we don't run autovacuum but
VACUUM ANALYZE VERBOSE twice a day.


2.  On the other hand, if you're on 8.1 or so, you may be able to
configure the Background Writer to incrementally flush checkpoint  
data

earlier, and avoid the condition of 1.

Mind you, you'd have to set BgWr to be pretty aggressive, based on  
the

"10s periodicity" that you describe; that may not be a nice
configuration to have all the time :-(.


I've just seen that the daily vacuum tasks didn't run,
apparently. The DB has almost doubled it's size since some days
ago. I guess I'll have to VACUUM FULL (dump/restore might be faster,
though) and check if that helps anything.


If you're locking out users, then it's probably a better idea to use
CLUSTER to reorganize the tables, as that simultaneously eliminates
empty space on tables *and indices.*

In contrast, after running VACUUM FULL, you may discover you need to
reindex tables, because the reorganization of the *table* leads to
bloating of the indexes.


I don't VACUUM FULL but thanks for the hint.


Pre-8.3 (I *think*), there's a transactional issue with CLUSTER where
it doesn't fully follow MVCC, so that "dead, but still accessible, to
certain transactions" tuples go away.  That can cause surprises
(e.g. - queries missing data) if applications are accessing the
database concurrently with the CLUSTER.  It's safe as long as the DBA
can take over the database and block out applications.  And at some
point, the MVCC bug got fixed.


I think I'll upgrade PostgreSQL to the latest 8.3 version in the next
few days anyway, along with a memory upgrade (from 1.5GB to 4GB) and a
new 2x RAID-1 (instead of RAID-5) disk configuration. I hope that this
has already a noticeable impact on the performance.


Note that you should check the output of a VACUUM VERBOSE run, and/or
use the contrib function pgsstattuples() to check how sparse the
storage usage is.  There may only be a few tables that are behaving
badly, and cleaning up a few tables will be a lot less intrusive than
cleaning up the whole database.


That surely is the case because about 90% of all data is stored in one
big table and most of the rows are deleted and newly INSERT'ed every
night.

cheers,

tom

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Group by more efficient than distinct?

2008-04-22 Thread Mark Mielke

Matthew Wakeling wrote:

On Tue, 22 Apr 2008, Mark Mielke wrote:
The poster I responded to said that the memory required for a hash 
join was relative to the number of distinct values, not the number of 
rows. They gave an example of millions of rows, but only a few 
distinct values. Above, you agree with me that it it would include 
the rows (or at least references to the rows) as well. If it stores 
rows, or references to rows, then memory *is* relative to the number 
of rows, and millions of records would require millions of rows (or 
row references).


Yeah, I think we're talking at cross-purposes, due to hash tables 
being used in two completely different places in Postgres. Firstly, 
you have hash joins, where Postgres loads the references to the actual 
rows, and puts those in the hash table. For that situation, you want a 
small number of rows. Secondly, you have hash aggregates, where 
Postgres stores an entry for each "group" in the hash table, and does 
not store the actual rows. For that situation, you can have a 
bazillion individual rows, but only a small number of distinct groups.


That makes sense with my reality. :-)

Thanks,
mark

--
Mark Mielke <[EMAIL PROTECTED]>


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Group by more efficient than distinct?

2008-04-22 Thread Matthew Wakeling

On Tue, 22 Apr 2008, Mark Mielke wrote:
The poster I responded to said that the memory required for a hash join was 
relative to the number of distinct values, not the number of rows. They gave 
an example of millions of rows, but only a few distinct values. Above, you 
agree with me that it it would include the rows (or at least references to 
the rows) as well. If it stores rows, or references to rows, then memory *is* 
relative to the number of rows, and millions of records would require 
millions of rows (or row references).


Yeah, I think we're talking at cross-purposes, due to hash tables being 
used in two completely different places in Postgres. Firstly, you have 
hash joins, where Postgres loads the references to the actual rows, and 
puts those in the hash table. For that situation, you want a small number 
of rows. Secondly, you have hash aggregates, where Postgres stores an 
entry for each "group" in the hash table, and does not store the actual 
rows. For that situation, you can have a bazillion individual rows, but 
only a small number of distinct groups.


Matthew

--
First law of computing:  Anything can go wro
sig: Segmentation fault.  core dumped.

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Group by more efficient than distinct?

2008-04-22 Thread Mark Mielke

Matthew Wakeling wrote:

On Mon, 21 Apr 2008, Mark Mielke wrote:
This surprises me - hash values are lossy, so it must still need to 
confirm against the real list of values, which at a minimum should 
require references to the rows to check against?


Is PostgreSQL doing something beyond my imagination? :-)


Not too far beyond your imagination, I hope.

It's simply your assumption that the hash table is lossy. Sure, hash 
values are lossy, but a hash table isn't. Postgres stores in memory 
not only the hash values, but the rows they refer to as well, having 
checked them all on disc beforehand. That way, it doesn't need to look 
up anything on disc for that branch of the join again, and it has a 
rapid in-memory lookup for each row.


I said hash *values* are lossy. I did not say hash table is lossy.

The poster I responded to said that the memory required for a hash join 
was relative to the number of distinct values, not the number of rows. 
They gave an example of millions of rows, but only a few distinct 
values. Above, you agree with me that it it would include the rows (or 
at least references to the rows) as well. If it stores rows, or 
references to rows, then memory *is* relative to the number of rows, and 
millions of records would require millions of rows (or row references).


Cheers,
mark

--
Mark Mielke <[EMAIL PROTECTED]>


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Group by more efficient than distinct?

2008-04-22 Thread Matthew Wakeling

On Mon, 21 Apr 2008, Mark Mielke wrote:
This surprises me - hash values are lossy, so it must still need to confirm 
against the real list of values, which at a minimum should require references 
to the rows to check against?


Is PostgreSQL doing something beyond my imagination? :-)


Not too far beyond your imagination, I hope.

It's simply your assumption that the hash table is lossy. Sure, hash 
values are lossy, but a hash table isn't. Postgres stores in memory not 
only the hash values, but the rows they refer to as well, having checked 
them all on disc beforehand. That way, it doesn't need to look up anything 
on disc for that branch of the join again, and it has a rapid in-memory 
lookup for each row.


Matthew

--
X's book explains this very well, but, poor bloke, he did the Cambridge Maths 
Tripos...   -- Computer Science Lecturer


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Oddly slow queries

2008-04-22 Thread PFC



that's correct, there are nightly (at least at the moment) processes that
insert around 2-3 mio rows and delete about the same amount. I can see  
that

those 'checkpoints are occurring too frequently' messages are only logged
during that timeframe.


	Perhaps you should increase the quantity of xlog PG is allowed to write  
between each checkpoint (this is checkpoint_segments). Checkpointing every  
10 seconds is going to slow down your inserts also, because of the need to  
fsync()'ing all those pages, not to mention nuking your IO-bound SELECTs.  
Increase it till it checkpoints every 5 minutes or something.



I assume that it's normal that so many INSERT's and DELETE's cause the


	Well, also, do you use batch-processing or plpgsql or issue a huge mass  
of individual INSERTs via some script ?
	If you use a script, make sure that each INSERT doesn't have its own  
transaction (I think you know that since with a few millions of rows it  
would take forever... unless you can do 1 commits/s, in which case  
either you use 8.3 and have activated the "one fsync every N seconds"  
feature, or your battery backed up cache works, or your disk is lying)...

If you use a script and the server is under heavy load you can :
BEGIN
	Process N rows (use multi-values INSERT and DELETE WHERE .. IN (...)), or  
execute a prepared statement multiple times, or copy to temp table and  
process with SQL (usually much faster)

COMMIT
Sleep
Wash, rinse, repeat

background writer to choke a little bit. I guess I really need to adjust  
the
processes to INSERT and DELETE rows in a slower pace if I want to do  
other

queries during the same time.

cheers,

tom





--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Oddly slow queries

2008-04-22 Thread Thomas Spreng


On 19.04.2008, at 19:04, Scott Marlowe wrote:

No, that will certainly NOT just affect write performance; if the
postmaster is busy writing out checkpoints, that will block SELECT
queries that are accessing whatever is being checkpointed.



What I meant is if there are no INSERT's or UPDATE's going on it  
shouldn't

affect SELECT queries, or am I wrong?


But checkpoints only occur every 10 seconds because of a high insert /
update rate.  So, there ARE inserts and updates going on, and a lot of
them, and they are blocking your selects when checkpoint hits.

While adjusting your background writer might be called for, and might
provide you with some relief, you REALLY need to find out what's
pushing so much data into your db at once that it's causing a
checkpoint storm.


that's correct, there are nightly (at least at the moment) processes  
that
insert around 2-3 mio rows and delete about the same amount. I can see  
that
those 'checkpoints are occurring too frequently' messages are only  
logged

during that timeframe.

I assume that it's normal that so many INSERT's and DELETE's cause the
background writer to choke a little bit. I guess I really need to  
adjust the
processes to INSERT and DELETE rows in a slower pace if I want to do  
other

queries during the same time.

cheers,

tom

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance