Re: [PERFORM] performance config help
On Mon, 11 Jan 2010, Bob Dusek wrote: How do I learn more about the actual lock contention in my db? Lock contention makes some sense. Each of the 256 requests are relatively similar. So, I don't doubt that lock contention could be an issue. I just don't know how to observe it or correct it. It seems like if we have processes that are contending for locks, there's not much we can do about it. To me: 1. This doesn't look like an IO bandwidth issue, as the database is small. 2. This doesn't look like a CPU speed issue, as usage is low. 3. This doesn't look like a memory bandwidth issue, as that would count as CPU active in top. 4. This doesn't look like a memory size problem either. So, what's left? It could be a network bandwidth problem, if your client is on a separate server. You haven't really given much detail about the nature of the queries, so it is difficult for us to tell if the size of the results means that you are maxing out your network. However, it doesn't sound like it is likely to me that this is the problem. It could be a client bottleneck problem - maybe your server is performing really well, but your client can't keep up. You may be able to determine this by switching on logging of long-running queries in Postgres, and comparing that with what your client says. Also, look at the resource usage on the client machine. It could be a lock contention problem. To me, this feels like the most likely. You say that the queries are similar. If you are reading and writing from a small set of the same objects in each of the transactions, then you will suffer badly from lock contention, as only one backend can be in a read-modify-write cycle on a given object at a time. We don't know enough about the data and queries to say whether this is the case. However, if you have a common object that every request touches (like a status line or something), then try to engineer that out of the system. Hope this helps. Synchronising forty processes around accessing a single object for high performance is really hard, and Postgres does it incredibly well, but it is still by far the best option to avoid contention completely if possible. -Kevin It'd really help us reading your emails if you could make sure that it is easy to distinguish your words from words you are quoting. It can be very confusing reading some of your emails, trying to remember which bits I have seen before written by someone else. This is one of the few lines that I know you didn't write - you're a Bob, not a Kevin. A few characters at the beginning of lines, which most mail readers will add automatically, make all the difference. Matthew -- Me... a skeptic? I trust you have proof? -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] performance config help
Matthew Wakeling matt...@flymine.org wrote: -Kevin It'd really help us reading your emails if you could make sure that it is easy to distinguish your words from words you are quoting. It can be very confusing reading some of your emails, trying to remember which bits I have seen before written by someone else. This is one of the few lines that I know you didn't write - you're a Bob, not a Kevin. A few characters at the beginning of lines, which most mail readers will add automatically, make all the difference. That took me by surprise, because outside of that one line, where Bob apparently lost the leading character, I've been seeing his messages properly quoted. I went back and looked at Bob's old messages and found that he's sending them in multiple mime formats, text/plain with the '' characters and the following: --0016e6d77e63233088047ce8a128 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable I hadn't noticed, because I have my email reader set up to default to text format if available. Your reader must be looking at the html format and not handling the this stuff: blockquote class=3Dgmail_quote style=3Dborder-left: 1px solid= rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;divd= iv class=3Dh5 You might want to adjust your reader. Bob, you might want to just send plain text, to avoid such problems. -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] performance config help
On Tue, Jan 12, 2010 at 12:12 PM, Matthew Wakeling matt...@flymine.orgwrote: On Mon, 11 Jan 2010, Bob Dusek wrote: How do I learn more about the actual lock contention in my db? Lock contention makes some sense. Each of the 256 requests are relatively similar. So, I don't doubt that lock contention could be an issue. I just don't know how to observe it or correct it. It seems like if we have processes that are contending for locks, there's not much we can do about it. To me: 1. This doesn't look like an IO bandwidth issue, as the database is small. 2. This doesn't look like a CPU speed issue, as usage is low. 3. This doesn't look like a memory bandwidth issue, as that would count as CPU active in top. 4. This doesn't look like a memory size problem either. So, what's left? It could be a network bandwidth problem, if your client is on a separate server. You haven't really given much detail about the nature of the queries, so it is difficult for us to tell if the size of the results means that you are maxing out your network. However, it doesn't sound like it is likely to me that this is the problem. The connections to postgres are happening on the localhost. Our application server accepts connections from the network, and the application queries Postgres using a standard pg_pconnect on the localhost. It could be a client bottleneck problem - maybe your server is performing really well, but your client can't keep up. You may be able to determine this by switching on logging of long-running queries in Postgres, and comparing that with what your client says. Also, look at the resource usage on the client machine. We've been logging long-running queries (200 ms). That's how we know Postgres is degrading. We don't see any queries showing up when we have 40 clients running. But, we start seeing quite a bit show up after that. It could be a lock contention problem. To me, this feels like the most likely. You say that the queries are similar. If you are reading and writing from a small set of the same objects in each of the transactions, then you will suffer badly from lock contention, as only one backend can be in a read-modify-write cycle on a given object at a time. We don't know enough about the data and queries to say whether this is the case. However, if you have a common object that every request touches (like a status line or something), then try to engineer that out of the system. Hope this helps. Synchronising forty processes around accessing a single object for high performance is really hard, and Postgres does it incredibly well, but it is still by far the best option to avoid contention completely if possible. Each of the concurrent clients does a series of selects, inserts, updates, and deletes. The requests would generally never update or delete the same rows in a table. However, the requests do generally write to the same tables. And, they are all reading from the same tables that they're writing to. For the inserts, I imagine they are blocking on access to the sequence that controls the primary keys for the insert tables. But, I'm not sure about locking beyond that. When we delete from the tables, we generally delete where clientid=X, which deletes all of the rows that a particular client inserted (each client cleans up its own rows after it finishes what its doing). Would that be blocking inserts on that table for other clients? -Kevin It'd really help us reading your emails if you could make sure that it is easy to distinguish your words from words you are quoting. It can be very confusing reading some of your emails, trying to remember which bits I have seen before written by someone else. This is one of the few lines that I know you didn't write - you're a Bob, not a Kevin. A few characters at the beginning of lines, which most mail readers will add automatically, make all the difference. I'm really sorry. I'm using gmail's interface. I just saw the Plain Text formatter at the top of this compose message. But, if I convert it to Plain Text now, I may lose my portion of the message. I'll use the Plain Text when posting future messages. Sorry for the hassel. Matthew -- Me... a skeptic? I trust you have proof?
Re: [PERFORM] performance config help
Bob, you might want to just send plain text, to avoid such problems. Will do. Looks like gmail's interface does it nicely. -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] performance config help
On Tue, 12 Jan 2010, Bob Dusek wrote: Each of the concurrent clients does a series of selects, inserts, updates, and deletes. The requests would generally never update or delete the same rows in a table. However, the requests do generally write to the same tables. And, they are all reading from the same tables that they're writing to. For the inserts, I imagine they are blocking on access to the sequence that controls the primary keys for the insert tables. I'm going to have to bow out at this stage, and let someone else who knows more about what gets locked in a transaction help instead. The sequence may be significant, but I would have thought it would have to be something a bit bigger that is gumming up the works. I'm really sorry. I'm using gmail's interface. Actually, you weren't doing anything particularly wrong as it turns out. It is partly a case of alpine being too clever for its own good, just as Kevin pointed out. My mail reader is taking the most preferred mime alternative, which is the HTML version, and interpreting it to its best ability, which isn't very well. It is the email that says which alternative is preferred, by the way. I have just forced alpine to view the plain text version instead, and it is much better. I just saw the Plain Text formatter at the top of this compose message. But, if I convert it to Plain Text now, I may lose my portion of the message. I'll use the Plain Text when posting future messages. To be honest, that's always a good idea, although you didn't actually do wrong. I do know people whose spam filters immediately discard emails that contain a HTML alternative - that's taking it to the extreme! Matthew -- Beware of bugs in the above code; I have only proved it correct, not tried it. --Donald Knuth -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] a heavy duty operation on an unused table kills my server
Hi list, I'm having a problem when dealing with operations that asks too much CPU from the server. The scenario is this: I have a multithreaded server, each thread with its own connection to the database. Everything is working fine, actually great, actually outstandingly, in normal operation. I've a table named a with 1.8 million records, and growing, but I'm ok with it, at least for the moment. Maybe in the near future we will cut it down, backup old data, and free it up. But this is not the issue, as I said, everything is working great. I have a cpl of indexes to help some queries, and that's it. Now my problem started when I tried to do some model refactoring on this production table. First I tried a dumb approach. I connected from pgadmin, opened a new session. I tried an ALTER TABLE on this table just to turn a char(255) field into char(250), and it locked up my system. No surprise, since I had many threads waiting for this alter table to finish. What I did not foresee was that this alter table would take up so much time. Ok, my fault, for not having calculated the time that it would take the ALTER TABLE to complete. Now, with this experience, I tried a simple workaround. Created an empty version of a named a_empty, identical in every sense. renamed a to a_full, and a_empty to a. This procedure costed me like 0 seconds of downtime, and everything kept working smoothly. Maybe a cpl of operations could have failed if they tried to write in the very second that there was actually no table named a, but since the operation was transactional, the worst scenario was that if the operation should have failed, the client application would just inform of the error and ask the user for a retry. No big deal. Now, this table, that is totally unattached to the system in every way (no one references this table, its like a dumpster for old records), is not begin accessed by no other thread in the system, so an ALTER table on it, to turn a char(255) to char(250), should have no effect on the system. So, with this in mind, I tried the ALTER TABLE this time on the a_full (totally unrelated) table. The system went non-responsive again, and this time it had nothing to do with threads waiting for the alter table to complete. The pgAdmin GUI went non-responsive, as well as the application's server GUI, whose threads kept working on the background, but starting to take more and more time for every clients request (up to 25 seconds, which are just ridiculous and completely unacceptable in normal conditions). This resulted in my client applications to start disconnecting after their operations failed due to timeout, and the system basically went down again, from a users point of view. This time, since I saw no relation between my operation on a totally unrelated table, and the server BIG slowdown, I blamed the servers memory. After some tests, I came up to the conclusion that any heavy duty operation on any thread (ALTER TABLE on 1.8 million records tables, updates on this table, or an infinite loop, just to make my point), would affect the whole server. Bottom line is, I can't seem to do any heavy processing on the database (or any operation that would require the server to enter into high CPU usage), and still expect the server to behave normally. Whatever heavy duty operation, DDL, DML, on whatever table (related, or unrelated), on whatever thread, would tear down my servers integrity. My question then is: is there a way to limit the CPU assigned to a specific connection? I mean, I don't care if my ALTER TABLE takes 4 days instead of 4 hours. Something like: pg_set_max_cpu_usage(2/100); and rest assured that no matter what that thread is asking the database to do, it just wont affect the other running threads. Obviosly, assuring that the process itself does not involve any locking of the other threads. Is something like that possible? Thanks in advance, Eduardo.
Re: [PERFORM] performance config help
On 13/01/2010 2:01 AM, Bob Dusek wrote: The connections to postgres are happening on the localhost. Our application server accepts connections from the network, and the application queries Postgres using a standard pg_pconnect on the localhost. Well, that's a good reason to have all those CPUs - if your app server runs on the same host as Pg. We've been logging long-running queries (200 ms). That's how we know Postgres is degrading. We don't see any queries showing up when we have 40 clients running. But, we start seeing quite a bit show up after that. It might be informative to see how fast query times are increasing with client count. You can probably figure this out by progressively lowering your query time logging theshold. Each of the concurrent clients does a series of selects, inserts, updates, and deletes. The requests would generally never update or delete the same rows in a table. However, the requests do generally write to the same tables. And, they are all reading from the same tables that they're writing to. AFAIK None of that should interfere with each other, so long as they're not working with the same sets of tuples. For the inserts, I imagine they are blocking on access to the sequence that controls the primary keys for the insert tables. I doubt it. Sequences are outside transactional rules for that reason. It takes an incredibly short time for nextval(...) to obtain the next value for the sequence, and after that the sequence is unlocked and ready for the next use. But, I'm not sure about locking beyond that. When we delete from the tables, we generally delete where clientid=X, which deletes all of the rows that a particular client inserted (each client cleans up its own rows after it finishes what its doing). Would that be blocking inserts on that table for other clients? Now would be a good time to start watching pg_locks - rather than guessing, try to actually observe if there's lock contention. I'd also consider looking into a connection pool so that as the number of clients to your appserver increases you can keep the number of active Pg connections at the sweet spot for your server to maximise overall throughput. -- Craig Ringer -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] a heavy duty operation on an unused table kills my server
Eduardo Piombino wrote: Hi list, I'm having a problem when dealing with operations that asks too much CPU from the server. The scenario is this: A nice description below, but ... you give no information about your system: number of CPUs, disk types and configuration, how much memory, what have you changed in your Postgres configuration? And what operating system, what version of Postgres, etc., etc. The more information you give, the better the answer. If you're operating on a single disk with a tiny amount of memory, and old, misconfigured Postgres on a laptop computer, that's a whole different problem than if you're on a big sytem with 16 CPUs and a huge RAID 1+0 with battery-backed cache. Craig I have a multithreaded server, each thread with its own connection to the database. Everything is working fine, actually great, actually outstandingly, in normal operation. I've a table named a with 1.8 million records, and growing, but I'm ok with it, at least for the moment. Maybe in the near future we will cut it down, backup old data, and free it up. But this is not the issue, as I said, everything is working great. I have a cpl of indexes to help some queries, and that's it. Now my problem started when I tried to do some model refactoring on this production table. First I tried a dumb approach. I connected from pgadmin, opened a new session. I tried an ALTER TABLE on this table just to turn a char(255) field into char(250), and it locked up my system. No surprise, since I had many threads waiting for this alter table to finish. What I did not foresee was that this alter table would take up so much time. Ok, my fault, for not having calculated the time that it would take the ALTER TABLE to complete. Now, with this experience, I tried a simple workaround. Created an empty version of a named a_empty, identical in every sense. renamed a to a_full, and a_empty to a. This procedure costed me like 0 seconds of downtime, and everything kept working smoothly. Maybe a cpl of operations could have failed if they tried to write in the very second that there was actually no table named a, but since the operation was transactional, the worst scenario was that if the operation should have failed, the client application would just inform of the error and ask the user for a retry. No big deal. Now, this table, that is totally unattached to the system in every way (no one references this table, its like a dumpster for old records), is not begin accessed by no other thread in the system, so an ALTER table on it, to turn a char(255) to char(250), should have no effect on the system. So, with this in mind, I tried the ALTER TABLE this time on the a_full (totally unrelated) table. The system went non-responsive again, and this time it had nothing to do with threads waiting for the alter table to complete. The pgAdmin GUI went non-responsive, as well as the application's server GUI, whose threads kept working on the background, but starting to take more and more time for every clients request (up to 25 seconds, which are just ridiculous and completely unacceptable in normal conditions). This resulted in my client applications to start disconnecting after their operations failed due to timeout, and the system basically went down again, from a users point of view. This time, since I saw no relation between my operation on a totally unrelated table, and the server BIG slowdown, I blamed the servers memory. After some tests, I came up to the conclusion that any heavy duty operation on any thread (ALTER TABLE on 1.8 million records tables, updates on this table, or an infinite loop, just to make my point), would affect the whole server. Bottom line is, I can't seem to do any heavy processing on the database (or any operation that would require the server to enter into high CPU usage), and still expect the server to behave normally. Whatever heavy duty operation, DDL, DML, on whatever table (related, or unrelated), on whatever thread, would tear down my servers integrity. My question then is: is there a way to limit the CPU assigned to a specific connection? I mean, I don't care if my ALTER TABLE takes 4 days instead of 4 hours. Something like: pg_set_max_cpu_usage(2/100); and rest assured that no matter what that thread is asking the database to do, it just wont affect the other running threads. Obviosly, assuring that the process itself does not involve any locking of the other threads. Is something like that possible? Thanks in advance, Eduardo. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] a heavy duty operation on an unused table kills my server
On 13/01/2010 12:59 PM, Eduardo Piombino wrote: My question then is: is there a way to limit the CPU assigned to a specific connection? I mean, I don't care if my ALTER TABLE takes 4 days instead of 4 hours. Something like: pg_set_max_cpu_usage(2/100); You're assuming the issue is CPU. I think that unlikely. In general, a single thread/process that wants as much CPU as it can get won't bring any machine with a half-decent OS to its knees. Any UNIX system should barely notice - everything else will slow down somewhat, depending on its scheduler, but in any sane setup shouldn't slow down by more than 1/2. Modern Windows tends to be fairly well behaved here too. What's much more likely is that you're working with a crappy disk setup - such as a RAID 5 array without battery-backed cache, or a single slow disk. You probably also have quite deep write queuing in the RAID controller / disk / OS. This means that your disk-intensive ALTER TABLE makes your disk subsystem so busy that it takes ages before any other process gets a look-in. It's not unlikely that I/O requests are being queued so deeply that it (often) takes several seconds for the controller to get around to executing a newly submitted read or write request. If your other queries need to do more than a few steps where they read some data, think about it, and read other data depending on the first read, then they're going to take forever, because they're going to have to ensure a long delay before disk access each time. Of course, that's just a guess, since you've provided no information on your hardware. Try collecting up some of the information shown here: http://wiki.postgresql.org/wiki/Guide_to_reporting_problems#Things_you_need_to_mention In any case, if it *is* I/O related, what to do about it depends on exactly what sort of I/O issue it is. Extremely deep queuing? Looks good for throughput benchmarks, but is stupid if you care about latency and have some I/O that's higher priority than others, so reduce your queue depth. Very slow writes hammering reads? Don't use RAID 5. Etc. -- Craig Ringer -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] a heavy duty operation on an unused table kills my server
I'm sorry. The server is a production server HP Proliant, I don't remember the exact model, but the key features were: 4 cores, over 2GHz each (I'm sorry I don't remember the actual specs), I think it had 16G of RAM (if that is possible?) It has two 320G disks in RAID (mirrored). I don't even have the emails with the specs here, but I can give you the exact configuration by tomorrow. Operating system: Windows 2003 server, with latest patches. Postgres version: 8.2.4, with all defaults, except DateStyle and TimeZone. At any given time, the server is on 0% CPU load, with peaks of 1%, 2%, max. In normal operation. I've been digging a little in the archives, and one thing that it helped me come up with, is that I don't really remember seeing high CPU usage (fact that surprised me, but i do remember seeing high IO activity). I'm sorry, its pretty late here. I know this single statement is enough to almost change everything I've just asked. Please try interpreting again my original mail, considering that when I said high CPU usage It might very well be high IO usage. The final effect was that the server went non-responsive, for all matters, not even the TaskManager would come up when i hit CTRL-ALT-DEL, and of course, every client would suffer horrific (+20 secs) for the simplest operations like SELECT NOW(); I've just made a little modification to my original questions, to extend to the possibility of a IO usage issue, instead of just CPU. * * *Bottom line is, I can't seem to do any heavy processing on the database (or any operation that would require the server to enter into high CPU usage **or IO USAGE), and still expect the server to behave normally. Whatever heavy duty operation, DDL, DML, on whatever table (related, or unrelated), on whatever thread, would tear down my servers integrity.* * My question then is: is there a way to limit the CPU or IO USAGEassigned to a specific connection? * * I mean, I don't care if my ALTER TABLE takes 4 days instead of 4 hours.* * Something like:* * pg_set_max_cpu _or_io_usage(2/100);* On Wed, Jan 13, 2010 at 2:14 AM, Craig James craig_ja...@emolecules.comwrote: Eduardo Piombino wrote: Hi list, I'm having a problem when dealing with operations that asks too much CPU from the server. The scenario is this: A nice description below, but ... you give no information about your system: number of CPUs, disk types and configuration, how much memory, what have you changed in your Postgres configuration? And what operating system, what version of Postgres, etc., etc. The more information you give, the better the answer. If you're operating on a single disk with a tiny amount of memory, and old, misconfigured Postgres on a laptop computer, that's a whole different problem than if you're on a big sytem with 16 CPUs and a huge RAID 1+0 with battery-backed cache. Craig I have a multithreaded server, each thread with its own connection to the database. Everything is working fine, actually great, actually outstandingly, in normal operation. I've a table named a with 1.8 million records, and growing, but I'm ok with it, at least for the moment. Maybe in the near future we will cut it down, backup old data, and free it up. But this is not the issue, as I said, everything is working great. I have a cpl of indexes to help some queries, and that's it. Now my problem started when I tried to do some model refactoring on this production table. First I tried a dumb approach. I connected from pgadmin, opened a new session. I tried an ALTER TABLE on this table just to turn a char(255) field into char(250), and it locked up my system. No surprise, since I had many threads waiting for this alter table to finish. What I did not foresee was that this alter table would take up so much time. Ok, my fault, for not having calculated the time that it would take the ALTER TABLE to complete. Now, with this experience, I tried a simple workaround. Created an empty version of a named a_empty, identical in every sense. renamed a to a_full, and a_empty to a. This procedure costed me like 0 seconds of downtime, and everything kept working smoothly. Maybe a cpl of operations could have failed if they tried to write in the very second that there was actually no table named a, but since the operation was transactional, the worst scenario was that if the operation should have failed, the client application would just inform of the error and ask the user for a retry. No big deal. Now, this table, that is totally unattached to the system in every way (no one references this table, its like a dumpster for old records), is not begin accessed by no other thread in the system, so an ALTER table on it, to turn a char(255) to char(250), should have no effect on the system. So, with this in mind, I tried the ALTER TABLE this time on the a_full (totally unrelated) table. The system went non-responsive again, and this time it had nothing
Re: [PERFORM] a heavy duty operation on an unused table kills my server
On 13/01/2010 1:47 PM, Eduardo Piombino wrote: I'm sorry. The server is a production server HP Proliant, I don't remember the exact model, but the key features were: 4 cores, over 2GHz each (I'm sorry I don't remember the actual specs), I think it had 16G of RAM (if that is possible?) It has two 320G disks in RAID (mirrored). Plain 'ol SATA disks in RAID-1? Hardware RAID (and if so, controller model)? With battery backup? Write cache on or off? Or software RAID? If so, Windows build-in sw raid, or some vendor's fakeraid (Highpoint, Promise, Adaptec, etc) ? Anyway, with two disks in RAID-1 I'm not surprised you're seeing some performance issues with heavy writes, especially since it seems unlikely that you have a BBU hardware RAID controller. In RAID-1 a write must hit both disks, so a 1Mb write effectively costs twice as much as a 1Mb read. Since many controllers try for high throughput (because it looks good in benchmarks) at the expense of latency they also tend to try to batch writes into long blocks, which keeps the disks busy in extended bursts. That slaughters read latencies. I had this sort of issue with a 3Ware 8500-8, and landed up modifying and recompiling the driver to reduce its built-in queue depth. I also increased readahead. It was still pretty awful as I was working with RAID 5 on SATA disks, but it made a big difference and more importantly meant that my Linux server was able to honour `ionice' priorities and feed more important requests to the controller first. On windows, I really don't know what to do about it beyond getting a better I/O subsystem. Google may help - look into I/O priorities, queue depths, reducing read latencies, etc. I don't even have the emails with the specs here, but I can give you the exact configuration by tomorrow. Operating system: Windows 2003 server, with latest patches. Postgres version: 8.2.4, with all defaults, except DateStyle and TimeZone. Urk. 8.2 ? Pg on Windows improves a lot with each release, and that's an old buggy version of 8.2 at that. Looking into an upgrade would be a really, REALLY good idea. Please try interpreting again my original mail, considering that when I said high CPU usage It might very well be high IO usage. The final effect was that the server went non-responsive, for all matters, not even the TaskManager would come up when i hit CTRL-ALT-DEL, and of course, every client would suffer horrific (+20 secs) for the simplest operations like SELECT NOW(); That sounds a LOT like horrible read latencies caused by total I/O overload. It could also be running out of memory and swapping heavily, so do keep an eye out for that, but I wouldn't expect to see that with an ALTER TABLE - especially on a 16GB server. / My question then is: is there a way to limit the CPU* or **IO USAGE* assigned to a specific connection?/ In win32 you can set CPU priorities manually in Task Manager, but only once you already know the process ID of the Pg backend that's going to be hammering the machine. Not helpful. I don't know of any way to do per-process I/O priorities in Win32, but I only use win32 reluctantly and don't use it for anything I care about (like a production Pg server) so I'm far from a definitive source. -- Craig Ringer -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] [PERFORMANCE] work_mem vs temp files issue
On Mon, Jan 11, 2010 at 3:18 PM, Tom Lane t...@sss.pgh.pa.us wrote: Hmm. Not clear where the temp files are coming from, but it's *not* the sort --- the internal sort ended line shows that that sort never went to disk. What kind of plan is feeding the sort node? some time ago, you said: It might be useful to turn on trace_sort to see if the small files are coming from sorts. If they're from hashes I'm afraid there's no handy instrumentation ... and is clearly what was bother me... because most of all temp files are coming from hash... why we don't show some of that info in explain? for example: we can show memory used, no? or if the hash goes to disk... if i remove #ifdef HJDEBUG seems like we even know how many batchs the hash used... the reason i say most of the temp files is that when i removed #ifdef HJDEBUG it says that in total i was using 10 batchs but there were 14 temp files created (i guess we use 1 file per batch, no?) nbatch = 1, nbuckets = 1024 nbatch = 1, nbuckets = 1024 nbatch = 8, nbuckets = 2048 -- Atentamente, Jaime Casanova Soporte y capacitación de PostgreSQL Asesoría y desarrollo de sistemas Guayaquil - Ecuador Cel. +59387171157 -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] a heavy duty operation on an unused table kills my server
Excellent, lots of useful information in your message. I will follow your advices, and keep you posted on any progress. I have yet to confirm you with some technical details of my setup, but I'm pretty sure you hit the nail in any case. One last question, this IO issue I'm facing, do you think it is just a matter of RAID configuration speed, or a matter of queue gluttony (and not leaving time for other processes to get into the IO queue in a reasonable time)? Because if it was just a matter of speed, ok, with my actual RAID configuration lets say it takes 10 minutes to process the ALTER TABLE (leaving no space to other IOs until the ALTER TABLE is done), lets say then i put the fastest possible RAID setup, or even remove RAID for the sake of speed, and it completes in lets say again, 10 seconds (an unreal assumption). But if my table now grows 60 times, I would be facing the very same problem again, even with the best RAID configuration. The problem would seem to be in the way the OS (or hardware, or someone else, or all of them) is/are inserting the IO requests into the queue. What can I do to control the order in which these IO requests are finally entered into the queue? I mean .. what i would like to obtain is: Considering the ALTER TABLE as a sequence of 100.000 READ/WRITE OPERATIONS Considering the SELECT * FROM xxx as a sequence of 100 READ OPERATIONS (totally unrelated in disk) First i run the ALTER TABLE on a thread... Lets say by the time it generates 1.000 READ/WRITE OPERATIONS, the other thread starts with the SELECT * FROM xxx ... I would expect the IO system to give chance to the those 100 READ OPERATIONS to execute immediately (with no need to wait for the remaining 990.000 READ/WRITE OPERATIONS finish), that is, to enter the queue at *almost* the very same moment the IO request were issued. If I can not guarantee that, I'm kinda doomed, because the largest the amount of IO operations requested by a heavy duty operation, the longest it will take any other thread to start doing anything. What cards do I have to manipulate the order the IO requests are entered into the queue? Can I disable this queue? Should I turn disk's IO operation caches off? Not use some specific disk/RAID vendor, for instance? I think I have some serious reading to do on this matter, google will help of course, but as always, every advice for small it may seem, will be very much appreciated. Nonetheless, thanks a lot for all the light you already brought me on this matter. I really appreciate it. Eduardo. On Wed, Jan 13, 2010 at 3:02 AM, Craig Ringer cr...@postnewspapers.com.auwrote: On 13/01/2010 1:47 PM, Eduardo Piombino wrote: I'm sorry. The server is a production server HP Proliant, I don't remember the exact model, but the key features were: 4 cores, over 2GHz each (I'm sorry I don't remember the actual specs), I think it had 16G of RAM (if that is possible?) It has two 320G disks in RAID (mirrored). Plain 'ol SATA disks in RAID-1? Hardware RAID (and if so, controller model)? With battery backup? Write cache on or off? Or software RAID? If so, Windows build-in sw raid, or some vendor's fakeraid (Highpoint, Promise, Adaptec, etc) ? Anyway, with two disks in RAID-1 I'm not surprised you're seeing some performance issues with heavy writes, especially since it seems unlikely that you have a BBU hardware RAID controller. In RAID-1 a write must hit both disks, so a 1Mb write effectively costs twice as much as a 1Mb read. Since many controllers try for high throughput (because it looks good in benchmarks) at the expense of latency they also tend to try to batch writes into long blocks, which keeps the disks busy in extended bursts. That slaughters read latencies. I had this sort of issue with a 3Ware 8500-8, and landed up modifying and recompiling the driver to reduce its built-in queue depth. I also increased readahead. It was still pretty awful as I was working with RAID 5 on SATA disks, but it made a big difference and more importantly meant that my Linux server was able to honour `ionice' priorities and feed more important requests to the controller first. On windows, I really don't know what to do about it beyond getting a better I/O subsystem. Google may help - look into I/O priorities, queue depths, reducing read latencies, etc. I don't even have the emails with the specs here, but I can give you the exact configuration by tomorrow. Operating system: Windows 2003 server, with latest patches. Postgres version: 8.2.4, with all defaults, except DateStyle and TimeZone. Urk. 8.2 ? Pg on Windows improves a lot with each release, and that's an old buggy version of 8.2 at that. Looking into an upgrade would be a really, REALLY good idea. Please try interpreting again my original mail, considering that when I said high CPU usage It might very well be high IO usage. The final effect was that the server went non-responsive, for all matters, not even
Re: [PERFORM] a heavy duty operation on an unused table kills my server
Eduardo Piombino wrote: Postgres version: 8.2.4, with all defaults, except DateStyle and TimeZone. Ugh...there are several features in PostgreSQL 8.3 and later specifically to address the sort of issue you're running into. If you want to get good write performance out of this system, you may need to upgrade to at least that version. It's impossible to resolve several of the common problems in write operations being too intense using any 8.2 version. The final effect was that the server went non-responsive, for all matters, not even the TaskManager would come up when i hit CTRL-ALT-DEL, and of course, every client would suffer horrific (+20 secs) for the simplest operations like SELECT NOW(); The thing that you have to realize is that altering a table is basically making a new copy of that table, which is a really heavy amount of writing. It's quite easy for an I/O heavy operation like that to fill up a lot of RAM with data to be written out, and when the database periodically needs to force all that data out to disk the whole system grinds to a halt when it happens. There's no way I'm aware of to throttle that writing down to a reasonable amount under Windows either, to achieve your goal of just making the ALTER run using less resources. Some reading: http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server goes over basic tuning of the database server. If you haven't already increased the checkpoint_segments parameters of your system, that's the first thing to try--increase it *a lot* (32 or more, default is 3) because it can really help with this problem. A moderate increase to shared_buffers is in order too; since you're on Windows, increasing it to 256MB is a reasonable change. The rest of the changes in there aren't likely to help out with this specific problem. http://www.westnet.com/~gsmith/content/postgresql/chkp-bgw-83.htm : covers the most likely cause of the issue you're running into. Unfortunately, most of the solutions you'll see there are things changed in 8.3. -- Greg Smith2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance