Re: [PERFORM] performance config help

2010-01-12 Thread Matthew Wakeling

On Mon, 11 Jan 2010, Bob Dusek wrote:

How do I learn more about the actual lock contention in my db?   Lock 
contention makes
some sense.  Each of the 256 requests are relatively similar.  So, I don't 
doubt that
lock contention could be an issue.  I just don't know how to observe it or 
correct it. 
It seems like if we have processes that are contending for locks, there's not 
much we can
do about it. 


To me:

1. This doesn't look like an IO bandwidth issue, as the database is small.
2. This doesn't look like a CPU speed issue, as usage is low.
3. This doesn't look like a memory bandwidth issue, as that would count as
   CPU active in top.
4. This doesn't look like a memory size problem either.

So, what's left? It could be a network bandwidth problem, if your client 
is on a separate server. You haven't really given much detail about the 
nature of the queries, so it is difficult for us to tell if the size of 
the results means that you are maxing out your network. However, it 
doesn't sound like it is likely to me that this is the problem.


It could be a client bottleneck problem - maybe your server is performing 
really well, but your client can't keep up. You may be able to determine 
this by switching on logging of long-running queries in Postgres, and 
comparing that with what your client says. Also, look at the resource 
usage on the client machine.


It could be a lock contention problem. To me, this feels like the most 
likely. You say that the queries are similar. If you are reading and 
writing from a small set of the same objects in each of the transactions, 
then you will suffer badly from lock contention, as only one backend can 
be in a read-modify-write cycle on a given object at a time. We don't know 
enough about the data and queries to say whether this is the case. 
However, if you have a common object that every request touches (like a 
status line or something), then try to engineer that out of the system.


Hope this helps. Synchronising forty processes around accessing a single 
object for high performance is really hard, and Postgres does it 
incredibly well, but it is still by far the best option to avoid 
contention completely if possible.



  -Kevin


It'd really help us reading your emails if you could make sure that it is 
easy to distinguish your words from words you are quoting. It can be very 
confusing reading some of your emails, trying to remember which bits I 
have seen before written by someone else. This is one of the few lines 
that I know you didn't write - you're a Bob, not a Kevin. A few  
characters at the beginning of lines, which most mail readers will add 
automatically, make all the difference.


Matthew

--
Me... a skeptic?  I trust you have proof?
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] performance config help

2010-01-12 Thread Kevin Grittner
Matthew Wakeling matt...@flymine.org wrote:
 
   -Kevin
 
 It'd really help us reading your emails if you could make sure
 that it is easy to distinguish your words from words you are
 quoting. It can be very confusing reading some of your emails,
 trying to remember which bits I have seen before written by
 someone else. This is one of the few lines that I know you didn't
 write - you're a Bob, not a Kevin. A few  characters at the
 beginning of lines, which most mail readers will add
 automatically, make all the difference.
 
That took me by surprise, because outside of that one line, where
Bob apparently lost the leading character, I've been seeing his
messages properly quoted.  I went back and looked at Bob's old
messages and found that he's sending them in multiple mime formats,
text/plain with the '' characters and the following:
 
--0016e6d77e63233088047ce8a128
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
 
I hadn't noticed, because I have my email reader set up to default
to text format if available.  Your reader must be looking at the
html format and not handling the this stuff:
 
blockquote class=3Dgmail_quote style=3Dborder-left: 1px solid=
 rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left:
1ex;divd=
iv class=3Dh5
 
You might want to adjust your reader.
 
Bob, you might want to just send plain text, to avoid such problems.
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] performance config help

2010-01-12 Thread Bob Dusek
On Tue, Jan 12, 2010 at 12:12 PM, Matthew Wakeling matt...@flymine.orgwrote:

 On Mon, 11 Jan 2010, Bob Dusek wrote:

 How do I learn more about the actual lock contention in my db?   Lock
 contention makes
 some sense.  Each of the 256 requests are relatively similar.  So, I don't
 doubt that
 lock contention could be an issue.  I just don't know how to observe it or
 correct it.
 It seems like if we have processes that are contending for locks, there's
 not much we can
 do about it.


 To me:

 1. This doesn't look like an IO bandwidth issue, as the database is small.
 2. This doesn't look like a CPU speed issue, as usage is low.
 3. This doesn't look like a memory bandwidth issue, as that would count as
   CPU active in top.
 4. This doesn't look like a memory size problem either.

 So, what's left? It could be a network bandwidth problem, if your client is
 on a separate server. You haven't really given much detail about the nature
 of the queries, so it is difficult for us to tell if the size of the results
 means that you are maxing out your network. However, it doesn't sound like
 it is likely to me that this is the problem.


The connections to postgres are happening on the localhost.  Our application
server accepts connections from the network, and the application queries
Postgres using a standard pg_pconnect on the localhost.


 It could be a client bottleneck problem - maybe your server is performing
 really well, but your client can't keep up. You may be able to determine
 this by switching on logging of long-running queries in Postgres, and
 comparing that with what your client says. Also, look at the resource usage
 on the client machine.


We've been logging long-running queries (200 ms).  That's how we know
Postgres is degrading.  We don't see any queries showing up when we have 40
clients running.  But, we start seeing quite a bit show up after that.



 It could be a lock contention problem. To me, this feels like the most
 likely. You say that the queries are similar. If you are reading and writing
 from a small set of the same objects in each of the transactions, then you
 will suffer badly from lock contention, as only one backend can be in a
 read-modify-write cycle on a given object at a time. We don't know enough
 about the data and queries to say whether this is the case. However, if you
 have a common object that every request touches (like a status line or
 something), then try to engineer that out of the system.

 Hope this helps. Synchronising forty processes around accessing a single
 object for high performance is really hard, and Postgres does it incredibly
 well, but it is still by far the best option to avoid contention completely
 if possible.


Each of the concurrent clients does a series of selects, inserts, updates,
and deletes.  The requests would generally never update or delete the same
rows in a table.  However, the requests do generally write to the same
tables.  And, they are all reading from the same tables that they're writing
to.  For the inserts, I imagine they are blocking on access to the sequence
that controls the primary keys for the insert tables.

But, I'm not sure about locking beyond that.  When we delete from the
tables, we generally delete where clientid=X, which deletes all of the
rows that a particular client inserted (each client cleans up its own rows
after it finishes what its doing).  Would that be blocking inserts on that
table for other clients?



   -Kevin


 It'd really help us reading your emails if you could make sure that it is
 easy to distinguish your words from words you are quoting. It can be very
 confusing reading some of your emails, trying to remember which bits I have
 seen before written by someone else. This is one of the few lines that I
 know you didn't write - you're a Bob, not a Kevin. A few  characters at
 the beginning of lines, which most mail readers will add automatically, make
 all the difference.


I'm really sorry.  I'm using gmail's interface.   I just saw the  Plain
Text formatter at the top of this compose message.  But, if I convert it to
Plain Text now, I may lose my portion of the message.  I'll use the Plain
Text when posting future messages.

Sorry for the hassel.

Matthew

 --
 Me... a skeptic?  I trust you have proof?


Re: [PERFORM] performance config help

2010-01-12 Thread Bob Dusek
 Bob, you might want to just send plain text, to avoid such problems.

Will do.  Looks like gmail's interface does it nicely.


 -Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] performance config help

2010-01-12 Thread Matthew Wakeling

On Tue, 12 Jan 2010, Bob Dusek wrote:

Each of the concurrent clients does a series of selects, inserts, updates,
and deletes.  The requests would generally never update or delete the same
rows in a table.  However, the requests do generally write to the same
tables.  And, they are all reading from the same tables that they're writing
to.  For the inserts, I imagine they are blocking on access to the sequence
that controls the primary keys for the insert tables.


I'm going to have to bow out at this stage, and let someone else who knows 
more about what gets locked in a transaction help instead. The sequence 
may be significant, but I would have thought it would have to be something 
a bit bigger that is gumming up the works.



I'm really sorry.  I'm using gmail's interface.


Actually, you weren't doing anything particularly wrong as it turns out. 
It is partly a case of alpine being too clever for its own good, just as 
Kevin pointed out. My mail reader is taking the most preferred mime 
alternative, which is the HTML version, and interpreting it to its best 
ability, which isn't very well. It is the email that says which 
alternative is preferred, by the way. I have just forced alpine to view 
the plain text version instead, and it is much better.


I just saw the  Plain Text formatter at the top of this compose 
message.  But, if I convert it to Plain Text now, I may lose my portion 
of the message.  I'll use the Plain Text when posting future messages.


To be honest, that's always a good idea, although you didn't actually do 
wrong. I do know people whose spam filters immediately discard emails that 
contain a HTML alternative - that's taking it to the extreme!


Matthew

--
Beware of bugs in the above code; I have only proved it correct, not
tried it.   --Donald Knuth

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] a heavy duty operation on an unused table kills my server

2010-01-12 Thread Eduardo Piombino
Hi list, I'm having a problem when dealing with operations that asks too
much CPU from the server.
The scenario is this:

I have a multithreaded server, each thread with its own connection to the
database. Everything is working fine, actually great, actually
outstandingly, in normal operation.

I've a table named a with 1.8 million records, and growing, but I'm ok
with it, at least for the moment. Maybe in the near future we will cut it
down, backup old data, and free it up. But this is not the issue, as I said,
everything is working great. I have a cpl of indexes to help some queries,
and that's it.

Now my problem started when I tried to do some model refactoring on this
production table.

First I tried a dumb approach.
I connected from pgadmin, opened a new session.
I tried an ALTER TABLE on this table just to turn a char(255) field into
char(250), and it locked up my system.

No surprise, since I had many threads waiting for this alter table to
finish. What I did not foresee was that this alter table would take up so
much time. Ok, my fault, for not having calculated the time that it would
take the ALTER TABLE to complete.

Now, with this experience, I tried a simple workaround.
Created an empty version of a named a_empty, identical in every sense.
renamed a to a_full, and a_empty to a. This procedure costed me like
0 seconds of downtime, and everything kept working smoothly. Maybe a cpl of
operations could have failed if they tried to write in the very second that
there was actually no table named a, but since the operation was
transactional, the worst scenario was that if the operation should have
failed, the client application would just inform of the error and ask the
user for a retry. No big deal.

Now, this table, that is totally unattached to the system in every way (no
one references this table, its like a dumpster for old records), is not
begin accessed by no other thread in the system, so an ALTER table on it, to
turn a char(255) to char(250), should have no effect on the system.

So, with this in mind, I tried the ALTER TABLE this time on the a_full
(totally unrelated) table.
The system went non-responsive again, and this time it had nothing to do
with threads waiting for the alter table to complete. The pgAdmin GUI went
non-responsive, as well as the application's server GUI, whose threads kept
working on the background, but starting to take more and more time for every
clients request (up to 25 seconds, which are just ridiculous and completely
unacceptable in normal conditions).

This resulted in my client applications to start disconnecting after their
operations failed due to timeout, and the system basically went down again,
from a users point of view.

This time, since I saw no relation between my operation on a totally
unrelated table, and the server BIG slowdown, I blamed the servers memory.

After some tests, I came up to the conclusion that any heavy duty operation
on any thread (ALTER TABLE on 1.8 million records tables, updates on this
table, or an infinite loop, just to make my point), would affect the whole
server.

Bottom line is, I can't seem to do any heavy processing on the database (or
any operation that would require the server to enter into high CPU usage),
and still expect the server to behave normally. Whatever heavy duty
operation, DDL, DML, on whatever table (related, or unrelated), on whatever
thread, would tear down my servers integrity.

My question then is: is there a way to limit the CPU assigned to a specific
connection?
I mean, I don't care if my ALTER TABLE takes 4 days instead of 4 hours.

Something like:
pg_set_max_cpu_usage(2/100);

and rest assured that no matter what that thread is asking the database to
do, it just wont affect the other running threads. Obviosly, assuring that
the process itself does not involve any locking of the other threads.

Is something like that possible?

Thanks in advance,
Eduardo.


Re: [PERFORM] performance config help

2010-01-12 Thread Craig Ringer

On 13/01/2010 2:01 AM, Bob Dusek wrote:


The connections to postgres are happening on the localhost.  Our
application server accepts connections from the network, and the
application queries Postgres using a standard pg_pconnect on the localhost.


Well, that's a good reason to have all those CPUs - if your app server 
runs on the same host as Pg.



We've been logging long-running queries (200 ms).  That's how we know
Postgres is degrading.  We don't see any queries showing up when we have
40 clients running.  But, we start seeing quite a bit show up after that.


It might be informative to see how fast query times are increasing with 
client count. You can probably figure this out by progressively lowering 
your query time logging theshold.



Each of the concurrent clients does a series of selects, inserts,
updates, and deletes.  The requests would generally never update or
delete the same rows in a table.  However, the requests do generally
write to the same tables.  And, they are all reading from the same
tables that they're writing to.


AFAIK None of that should interfere with each other, so long as they're 
not working with the same sets of tuples.



For the inserts, I imagine they are
blocking on access to the sequence that controls the primary keys for
the insert tables.


I doubt it. Sequences are outside transactional rules for that reason. 
It takes an incredibly short time for nextval(...) to obtain the next 
value for the sequence, and after that the sequence is unlocked and 
ready for the next use.



But, I'm not sure about locking beyond that.  When we delete from the
tables, we generally delete where clientid=X, which deletes all of the
rows that a particular client inserted (each client cleans up its own
rows after it finishes what its doing).  Would that be blocking inserts
on that table for other clients?


Now would be a good time to start watching pg_locks - rather than 
guessing, try to actually observe if there's lock contention.


I'd also consider looking into a connection pool so that as the number 
of clients to your appserver increases you can keep the number of active 
Pg connections at the sweet spot for your server to maximise overall 
throughput.


--
Craig Ringer

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] a heavy duty operation on an unused table kills my server

2010-01-12 Thread Craig James

Eduardo Piombino wrote:
Hi list, I'm having a problem when dealing with operations that asks too 
much CPU from the server.

The scenario is this:


A nice description below, but ... you give no information about your system: 
number of CPUs, disk types and configuration, how much memory, what have you 
changed in your Postgres configuration?  And what operating system, what 
version of Postgres, etc., etc.  The more information you give, the better the 
answer.

If you're operating on a single disk with a tiny amount of memory, and old, 
misconfigured Postgres on a laptop computer, that's a whole different problem 
than if you're on a big sytem with 16 CPUs and a huge RAID 1+0 with 
battery-backed cache.

Craig



I have a multithreaded server, each thread with its own connection to 
the database. Everything is working fine, actually great, actually 
outstandingly, in normal operation.


I've a table named a with 1.8 million records, and growing, but I'm ok 
with it, at least for the moment. Maybe in the near future we will cut 
it down, backup old data, and free it up. But this is not the issue, as 
I said, everything is working great. I have a cpl of indexes to help 
some queries, and that's it.


Now my problem started when I tried to do some model refactoring on this 
production table.


First I tried a dumb approach.
I connected from pgadmin, opened a new session.
I tried an ALTER TABLE on this table just to turn a char(255) field into 
char(250), and it locked up my system.


No surprise, since I had many threads waiting for this alter table to 
finish. What I did not foresee was that this alter table would take up 
so much time. Ok, my fault, for not having calculated the time that it 
would take the ALTER TABLE to complete.


Now, with this experience, I tried a simple workaround.
Created an empty version of a named a_empty, identical in every sense.
renamed a to a_full, and a_empty to a. This procedure costed me 
like 0 seconds of downtime, and everything kept working smoothly. Maybe 
a cpl of operations could have failed if they tried to write in the very 
second that there was actually no table named a, but since the 
operation was transactional, the worst scenario was that if the 
operation should have failed, the client application would just inform 
of the error and ask the user for a retry. No big deal.


Now, this table, that is totally unattached to the system in every way 
(no one references this table, its like a dumpster for old records), is 
not begin accessed by no other thread in the system, so an ALTER table 
on it, to turn a char(255) to char(250), should have no effect on the 
system.


So, with this in mind, I tried the ALTER TABLE this time on the a_full 
(totally unrelated) table.
The system went non-responsive again, and this time it had nothing to do 
with threads waiting for the alter table to complete. The pgAdmin GUI 
went non-responsive, as well as the application's server GUI, whose 
threads kept working on the background, but starting to take more and 
more time for every clients request (up to 25 seconds, which are just 
ridiculous and completely unacceptable in normal conditions).


This resulted in my client applications to start disconnecting after 
their operations failed due to timeout, and the system basically went 
down again, from a users point of view.


This time, since I saw no relation between my operation on a totally 
unrelated table, and the server BIG slowdown, I blamed the servers memory.


After some tests, I came up to the conclusion that any heavy duty 
operation on any thread (ALTER TABLE on 1.8 million records tables, 
updates on this table, or an infinite loop, just to make my point), 
would affect the whole server.


Bottom line is, I can't seem to do any heavy processing on the database 
(or any operation that would require the server to enter into high CPU 
usage), and still expect the server to behave normally. Whatever heavy 
duty operation, DDL, DML, on whatever table (related, or unrelated), on 
whatever thread, would tear down my servers integrity.


My question then is: is there a way to limit the CPU assigned to a 
specific connection?

I mean, I don't care if my ALTER TABLE takes 4 days instead of 4 hours.

Something like:
pg_set_max_cpu_usage(2/100);

and rest assured that no matter what that thread is asking the database 
to do, it just wont affect the other running threads. Obviosly, assuring 
that the process itself does not involve any locking of the other threads.


Is something like that possible?

Thanks in advance,
Eduardo.




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] a heavy duty operation on an unused table kills my server

2010-01-12 Thread Craig Ringer

On 13/01/2010 12:59 PM, Eduardo Piombino wrote:


My question then is: is there a way to limit the CPU assigned to a
specific connection?
I mean, I don't care if my ALTER TABLE takes 4 days instead of 4 hours.

Something like:
pg_set_max_cpu_usage(2/100);


You're assuming the issue is CPU. I think that unlikely. In general, a 
single thread/process that wants as much CPU as it can get won't bring 
any machine with a half-decent OS to its knees. Any UNIX system should 
barely notice - everything else will slow down somewhat, depending on 
its scheduler, but in any sane setup shouldn't slow down by more than 
1/2. Modern Windows tends to be fairly well behaved here too.


What's much more likely is that you're working with a crappy disk setup 
- such as a RAID 5 array without battery-backed cache, or a single slow 
disk. You probably also have quite deep write queuing in the RAID 
controller / disk / OS. This means that your disk-intensive ALTER TABLE 
makes your disk subsystem so busy that it takes ages before any other 
process gets a look-in. It's not unlikely that I/O requests are being 
queued so deeply that it (often) takes several seconds for the 
controller to get around to executing a newly submitted read or write 
request. If your other queries need to do more than a few steps where 
they read some data, think about it, and read other data depending on 
the first read, then they're going to take forever, because they're 
going to have to ensure a long delay before disk access each time.


Of course, that's just a guess, since you've provided no information on 
your hardware. Try collecting up some of the information shown here:


http://wiki.postgresql.org/wiki/Guide_to_reporting_problems#Things_you_need_to_mention

In any case, if it *is* I/O related, what to do about it depends on 
exactly what sort of I/O issue it is. Extremely deep queuing? Looks good 
for throughput benchmarks, but is stupid if you care about latency and 
have some I/O that's higher priority than others, so reduce your queue 
depth. Very slow writes hammering reads? Don't use RAID 5. Etc.


--
Craig Ringer

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] a heavy duty operation on an unused table kills my server

2010-01-12 Thread Eduardo Piombino
I'm sorry.

The server is a production server HP Proliant, I don't remember the exact
model, but the key features were:
4 cores, over 2GHz each (I'm sorry I don't remember the actual specs), I
think it had 16G of RAM (if that is possible?)
It has two 320G disks in RAID (mirrored).

I don't even have the emails with the specs here, but I can give you the
exact configuration by tomorrow.

Operating system: Windows 2003 server, with latest patches.
Postgres version: 8.2.4, with all defaults, except DateStyle and TimeZone.

At any given time, the server is on 0% CPU load, with peaks of 1%, 2%, max.
In normal operation.

I've been digging a little in the archives, and one thing that it helped me
come up with, is that I don't really remember seeing high CPU usage (fact
that surprised me, but i do remember seeing high IO activity). I'm sorry,
its pretty late here.
I know this single statement is enough to almost change everything I've just
asked.
Please try interpreting again my original mail, considering that when I said
high CPU usage It might very well be high IO usage.

The final effect was that the server went non-responsive, for all matters,
not even the TaskManager would come up when i hit CTRL-ALT-DEL, and of
course, every client would suffer horrific (+20 secs) for the simplest
operations like SELECT NOW();

I've just made a little modification to my original questions, to extend to
the possibility of a IO usage issue, instead of just CPU.

*
*

 *Bottom line is, I can't seem to do any heavy processing on the database
 (or any operation that would require the server to enter into high CPU usage
 **or IO USAGE), and still expect the server to behave normally. Whatever
 heavy duty operation, DDL, DML, on whatever table (related, or unrelated),
 on whatever thread, would tear down my servers integrity.*

 * My question then is: is there a way to limit the CPU or IO USAGEassigned to 
 a specific connection?
 *
 * I mean, I don't care if my ALTER TABLE takes 4 days instead of 4 hours.*

 * Something like:*
 * pg_set_max_cpu _or_io_usage(2/100);*



On Wed, Jan 13, 2010 at 2:14 AM, Craig James craig_ja...@emolecules.comwrote:

 Eduardo Piombino wrote:

 Hi list, I'm having a problem when dealing with operations that asks too
 much CPU from the server.
 The scenario is this:


 A nice description below, but ... you give no information about your
 system: number of CPUs, disk types and configuration, how much memory, what
 have you changed in your Postgres configuration?  And what operating system,
 what version of Postgres, etc., etc.  The more information you give, the
 better the answer.

 If you're operating on a single disk with a tiny amount of memory, and old,
 misconfigured Postgres on a laptop computer, that's a whole different
 problem than if you're on a big sytem with 16 CPUs and a huge RAID 1+0 with
 battery-backed cache.


Craig



 I have a multithreaded server, each thread with its own connection to the
 database. Everything is working fine, actually great, actually
 outstandingly, in normal operation.

 I've a table named a with 1.8 million records, and growing, but I'm ok
 with it, at least for the moment. Maybe in the near future we will cut it
 down, backup old data, and free it up. But this is not the issue, as I said,
 everything is working great. I have a cpl of indexes to help some queries,
 and that's it.

 Now my problem started when I tried to do some model refactoring on this
 production table.

 First I tried a dumb approach.
 I connected from pgadmin, opened a new session.
 I tried an ALTER TABLE on this table just to turn a char(255) field into
 char(250), and it locked up my system.

 No surprise, since I had many threads waiting for this alter table to
 finish. What I did not foresee was that this alter table would take up so
 much time. Ok, my fault, for not having calculated the time that it would
 take the ALTER TABLE to complete.

 Now, with this experience, I tried a simple workaround.
 Created an empty version of a named a_empty, identical in every sense.
 renamed a to a_full, and a_empty to a. This procedure costed me
 like 0 seconds of downtime, and everything kept working smoothly. Maybe a
 cpl of operations could have failed if they tried to write in the very
 second that there was actually no table named a, but since the operation
 was transactional, the worst scenario was that if the operation should have
 failed, the client application would just inform of the error and ask the
 user for a retry. No big deal.

 Now, this table, that is totally unattached to the system in every way (no
 one references this table, its like a dumpster for old records), is not
 begin accessed by no other thread in the system, so an ALTER table on it, to
 turn a char(255) to char(250), should have no effect on the system.

 So, with this in mind, I tried the ALTER TABLE this time on the a_full
 (totally unrelated) table.
 The system went non-responsive again, and this time it had nothing 

Re: [PERFORM] a heavy duty operation on an unused table kills my server

2010-01-12 Thread Craig Ringer

On 13/01/2010 1:47 PM, Eduardo Piombino wrote:

I'm sorry.

The server is a production server HP Proliant, I don't remember the
exact model, but the key features were:
4 cores, over 2GHz each (I'm sorry I don't remember the actual specs), I
think it had 16G of RAM (if that is possible?)
It has two 320G disks in RAID (mirrored).


Plain 'ol SATA disks in RAID-1?

Hardware RAID (and if so, controller model)? With battery backup? Write 
cache on or off?


Or software RAID? If so, Windows build-in sw raid, or some vendor's 
fakeraid (Highpoint, Promise, Adaptec, etc) ?


Anyway, with two disks in RAID-1 I'm not surprised you're seeing some 
performance issues with heavy writes, especially since it seems unlikely 
that you have a BBU hardware RAID controller. In RAID-1 a write must hit 
both disks, so a 1Mb write effectively costs twice as much as a 1Mb 
read. Since many controllers try for high throughput (because it looks 
good in benchmarks) at the expense of latency they also tend to try to 
batch writes into long blocks, which keeps the disks busy in extended 
bursts. That slaughters read latencies.


I had this sort of issue with a 3Ware 8500-8, and landed up modifying 
and recompiling the driver to reduce its built-in queue depth. I also 
increased readahead. It was still pretty awful as I was working with 
RAID 5 on SATA disks, but it made a big difference and more importantly 
meant that my Linux server was able to honour `ionice' priorities and 
feed more important requests to the controller first.


On windows, I really don't know what to do about it beyond getting a 
better I/O subsystem. Google may help - look into I/O priorities, queue 
depths, reducing read latencies, etc.



I don't even have the emails with the specs here, but I can give you the
exact configuration by tomorrow.

Operating system: Windows 2003 server, with latest patches.
Postgres version: 8.2.4, with all defaults, except DateStyle and TimeZone.


Urk. 8.2 ?

Pg on Windows improves a lot with each release, and that's an old buggy 
version of 8.2 at that. Looking into an upgrade would be a really, 
REALLY good idea.



Please try interpreting again my original mail, considering that when I
said high CPU usage It might very well be high IO usage.

The final effect was that the server went non-responsive, for all
matters, not even the TaskManager would come up when i hit CTRL-ALT-DEL,
and of course, every client would suffer horrific (+20 secs) for the
simplest operations like SELECT NOW();


That sounds a LOT like horrible read latencies caused by total I/O 
overload. It could also be running out of memory and swapping heavily, 
so do keep an eye out for that, but I wouldn't expect to see that with 
an ALTER TABLE - especially on a 16GB server.



/ My question then is: is there a way to limit the CPU* or **IO
USAGE* assigned to a specific connection?/


In win32 you can set CPU priorities manually in Task Manager, but only 
once you already know the process ID of the Pg backend that's going to 
be hammering the machine. Not helpful.


I don't know of any way to do per-process I/O priorities in Win32, but I 
only use win32 reluctantly and don't use it for anything I care about 
(like a production Pg server) so I'm far from a definitive source.


--
Craig Ringer

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] [PERFORMANCE] work_mem vs temp files issue

2010-01-12 Thread Jaime Casanova
On Mon, Jan 11, 2010 at 3:18 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 Hmm.  Not clear where the temp files are coming from, but it's *not* the
 sort --- the internal sort ended line shows that that sort never went
 to disk.  What kind of plan is feeding the sort node?


some time ago, you said:

It might be useful to turn on trace_sort to see if the small files
are coming from sorts.  If they're from hashes I'm afraid there's
no handy instrumentation ...


and is clearly what was bother me... because most of all temp files
are coming from hash...

why we don't show some of that info in explain? for example: we can
show memory used, no? or if the hash goes to disk... if i remove
#ifdef HJDEBUG seems like we even know how many batchs the hash
used...

the reason i say most of the temp files is that when i removed
#ifdef HJDEBUG it says that in total i was using 10 batchs but there
were 14 temp files created (i guess we use 1 file per batch, no?)


nbatch = 1, nbuckets = 1024
nbatch = 1, nbuckets = 1024
nbatch = 8, nbuckets = 2048


-- 
Atentamente,
Jaime Casanova
Soporte y capacitación de PostgreSQL
Asesoría y desarrollo de sistemas
Guayaquil - Ecuador
Cel. +59387171157

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] a heavy duty operation on an unused table kills my server

2010-01-12 Thread Eduardo Piombino
Excellent, lots of useful information in your message.
I will follow your advices, and keep you posted on any progress. I have yet
to confirm you with some technical details of my setup, but I'm pretty sure
you hit the nail in any case.

One last question, this IO issue I'm facing, do you think it is just a
matter of RAID configuration speed, or a matter of queue gluttony (and not
leaving time for other processes to get into the IO queue in a reasonable
time)?

Because if it was just a matter of speed, ok, with my actual RAID
configuration lets say it takes 10 minutes to process the ALTER TABLE
(leaving no space to other IOs until the ALTER TABLE is done), lets say then
i put the fastest possible RAID setup, or even remove RAID for the sake of
speed, and it completes in lets say again, 10 seconds (an unreal
assumption). But if my table now grows 60 times, I would be facing the very
same problem again, even with the best RAID configuration.

The problem would seem to be in the way the OS (or hardware, or someone
else, or all of them) is/are inserting the IO requests into the queue.
What can I do to control the order in which these IO requests are finally
entered into the queue?
I mean .. what i would like to obtain is:

Considering the ALTER TABLE as a sequence of 100.000 READ/WRITE OPERATIONS
Considering the SELECT * FROM xxx as a sequence of 100 READ OPERATIONS
(totally unrelated in disk)

First i run the ALTER TABLE on a thread...
Lets say by the time it generates 1.000 READ/WRITE OPERATIONS, the other
thread starts with the SELECT * FROM xxx ...
I would expect the IO system to give chance to the those 100 READ OPERATIONS
to execute immediately (with no need to wait for the remaining 990.000
READ/WRITE OPERATIONS finish), that is, to enter the queue at *almost* the
very same moment the IO request were issued.

If I can not guarantee that, I'm kinda doomed, because the largest the
amount of IO operations requested by a heavy duty operation, the longest
it will take any other thread to start doing anything.

What cards do I have to manipulate the order the IO requests are entered
into the queue?
Can I disable this queue?
Should I turn disk's IO operation caches off?
Not use some specific disk/RAID  vendor, for instance?

I think I have some serious reading to do on this matter, google will help
of course, but as always, every advice for small it may seem, will be very
much appreciated.

Nonetheless, thanks a lot for all the light you already brought me on this
matter.
I really appreciate it.

Eduardo.



On Wed, Jan 13, 2010 at 3:02 AM, Craig Ringer
cr...@postnewspapers.com.auwrote:

 On 13/01/2010 1:47 PM, Eduardo Piombino wrote:

 I'm sorry.

 The server is a production server HP Proliant, I don't remember the
 exact model, but the key features were:
 4 cores, over 2GHz each (I'm sorry I don't remember the actual specs), I
 think it had 16G of RAM (if that is possible?)
 It has two 320G disks in RAID (mirrored).


 Plain 'ol SATA disks in RAID-1?

 Hardware RAID (and if so, controller model)? With battery backup? Write
 cache on or off?

 Or software RAID? If so, Windows build-in sw raid, or some vendor's
 fakeraid (Highpoint, Promise, Adaptec, etc) ?

 Anyway, with two disks in RAID-1 I'm not surprised you're seeing some
 performance issues with heavy writes, especially since it seems unlikely
 that you have a BBU hardware RAID controller. In RAID-1 a write must hit
 both disks, so a 1Mb write effectively costs twice as much as a 1Mb read.
 Since many controllers try for high throughput (because it looks good in
 benchmarks) at the expense of latency they also tend to try to batch writes
 into long blocks, which keeps the disks busy in extended bursts. That
 slaughters read latencies.

 I had this sort of issue with a 3Ware 8500-8, and landed up modifying and
 recompiling the driver to reduce its built-in queue depth. I also increased
 readahead. It was still pretty awful as I was working with RAID 5 on SATA
 disks, but it made a big difference and more importantly meant that my Linux
 server was able to honour `ionice' priorities and feed more important
 requests to the controller first.

 On windows, I really don't know what to do about it beyond getting a better
 I/O subsystem. Google may help - look into I/O priorities, queue depths,
 reducing read latencies, etc.


  I don't even have the emails with the specs here, but I can give you the
 exact configuration by tomorrow.

 Operating system: Windows 2003 server, with latest patches.
 Postgres version: 8.2.4, with all defaults, except DateStyle and TimeZone.


 Urk. 8.2 ?

 Pg on Windows improves a lot with each release, and that's an old buggy
 version of 8.2 at that. Looking into an upgrade would be a really, REALLY
 good idea.


  Please try interpreting again my original mail, considering that when I
 said high CPU usage It might very well be high IO usage.

 The final effect was that the server went non-responsive, for all
 matters, not even 

Re: [PERFORM] a heavy duty operation on an unused table kills my server

2010-01-12 Thread Greg Smith

Eduardo Piombino wrote:

Postgres version: 8.2.4, with all defaults, except DateStyle and TimeZone.


Ugh...there are several features in PostgreSQL 8.3 and later 
specifically to address the sort of issue you're running into.  If you 
want to get good write performance out of this system, you may need to 
upgrade to at least that version.  It's impossible to resolve several of 
the common problems in write operations being too intense using any 8.2 
version. 

The final effect was that the server went non-responsive, for all 
matters, not even the TaskManager would come up when i hit 
CTRL-ALT-DEL, and of course, every client would suffer horrific (+20 
secs) for the simplest operations like SELECT NOW();


The thing that you have to realize is that altering a table is basically 
making a new copy of that table, which is a really heavy amount of 
writing.  It's quite easy for an I/O heavy operation like that to fill 
up a lot of RAM with data to be written out, and when the database 
periodically needs to force all that data out to disk the whole system 
grinds to a halt when it happens.  There's no way I'm aware of to 
throttle that writing down to a reasonable amount under Windows either, 
to achieve your goal of just making the ALTER run using less resources.


Some reading:

http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server goes over 
basic tuning of the database server.  If you haven't already increased 
the checkpoint_segments parameters of your system, that's the first 
thing to try--increase it *a lot* (32 or more, default is 3) because it 
can really help with this problem.  A moderate increase to 
shared_buffers is in order too; since you're on Windows, increasing it 
to 256MB is a reasonable change.  The rest of the changes in there 
aren't likely to help out with this specific problem.


http://www.westnet.com/~gsmith/content/postgresql/chkp-bgw-83.htm : 
covers the most likely cause of the issue you're running into.  
Unfortunately, most of the solutions you'll see there are things changed 
in 8.3.


--
Greg Smith2ndQuadrant   Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com  www.2ndQuadrant.com


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance