Re: [PERFORM] Help with performance on current status column
Chris Kratz wrote: Hello All, We are struggling with a specific query that is killing us. When doing explain analyze on the entire query, we *seem* to be getting killed by the estimated number of rows on a case statement calculation. I've included a snippet from the explain analyze of the much larger query. The line in question, (cost=0.00..106.52 rows=1 width=16) (actual time=0.048..67.728 rows=4725 loops=1) shows that it returned 4700 rows instead of 1 which when coupled with a later join causes the statement to run over 3 minutes.[1] It seems that it thinks that the scan on role_id is going to return 1 row, but in reality returns 4725 rows. The case statement causing the problem uses todays date to see if a particular row is still active. Here is a test case showing how far off the estimate is from the reality. [2] [2] A much simpler statement triggers the incorrect row counts here. explain analyze select * from roles rol where CASE WHEN rol.role_id IS NULL THEN NULL WHEN rol.begin IS NOT NULL and rol.end IS NOT NULL THEN CASE WHEN TIMESTAMP 'now'=rol.begin and TIMESTAMP 'now'=rol.end THEN 'Active' ELSE 'Inactive' END WHEN rol.begin IS NOT NULL THEN CASE WHEN TIMESTAMP 'now'=rol.begin THEN 'Active' ELSE 'Inactive' END WHEN rol.end IS NOT NULL THEN CASE WHEN TIMESTAMP 'now'=rol.end THEN 'Active' ELSE 'Inactive' END ELSE 'Active' END = 'Active' Aside #1 - I'm not entirely clear how role_id can be null since you seemed to be joining against it in the real query. Aside #2 - You're probably better off with CURRENT_DATE since begin/end seem to be dates, rather than TIMESTAMP 'now' - and in any case you wanted timestamp with time zone OK, I think the root of your problem is your use of null to mean not ended or not started (whatever 'not started' means). PostgreSQL has the handy timestamptz value infinity, but only for timestamps and not for dates. I'd probably cheat a little and use an end date of '-12-31' or similar to simulate infinity. Then your test is simply: WHERE ... AND (rol.begin = CURRENT_DATE AND rol.end = CURRENT_DATE) That should estimate simply enough. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Low performance on Windows problem
On Tue, Sep 13, 2005 at 11:05:00AM -0400, Merlin Moncure wrote: 5. do select array_accum(q::text) from generate_series(1,1) q; I made the tests you suggested and the pattern is clear. The difference between local and remote command execution is caused by moving data over the network. E.g. the command above takes 700 ms locally and 1500 ms remotely. Remote explain analyze takes exactly the 700 ms. I downloaded PCATTCP - http://www.pcausa.com/Utilities/pcattcp.htm and the measured throughput between the two machines is over 1 kB/s. PCATTCP allows setting TCP_NODELAY but it had no effect on the transfer speed. So the difference between local and remote execution should IMHO stay in the 10 ms range. Definitely not 800 ms. The 8.1 has the same problem. Just for the record: the server PC is Dell Precision 330 with 3Com 3C920 integrated network card. OS MS Windows Professional 2002 with service pack 2. There is Symantec Antivirus installed - which I have (hopefully) completely disabled. Thanks for any help Dalibor Sramek -- Dalibor Sramek http://www.insula.cz/dali \ In the eyes of cats / [EMAIL PROTECTED] \ all things / H blog http://www.transhumanismus.cz/blog.php \ belong to cats. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Help with performance on current status column
Hello Richard, Thank you for the response. I did forget to mention that the columns have the following meanings. One, if a begin or end date is null, it means that the role is open ended in that direction. For example, if there is no end date, that means currently the role will go on forever beginning with the start date. Your idea of using minimal and maximum dates is an interesting one and not one that I had considered. I will do some testing later today and see if that makes a difference. The other option I am toying with is simply having a status column which is updated nightly via a cron job. This will probably be the most efficient and can be indexed. I also forgot to say that we have seen this behavior on 2 boxes both on Linux (Red Hat ES Mandrake) and both are running Postgres 8.0 (8.0.1 and 8.0.3). Strangely, after playing with statistics some yesterday (setting from 10 to 100 to 1000 and back to 10 and analyzing), the 8.0.1 machine picks a different plan and runs in a 101.104ms. The larger machine (dual proc Opt, 6 disk raid 10, etc) with 8.0.3 still takes 3-5minutes to run the same query with the same data set even after playing with statistics and repeated analyze on the same table. It just seems odd. It seems it is picking the incorrect plan based off of an overly optimistic estimate of rows returned from the calculation. The other frustration with this is that this sql is machine generated which is why we have some of the awkwardness in the calculation. That calc gets used for a lot of different things including column definitions when people want to see the column on screen. Thanks, -Chris On Wednesday 14 September 2005 05:13 am, Richard Huxton wrote: Chris Kratz wrote: Hello All, We are struggling with a specific query that is killing us. When doing explain analyze on the entire query, we *seem* to be getting killed by the estimated number of rows on a case statement calculation. I've included a snippet from the explain analyze of the much larger query. The line in question, (cost=0.00..106.52 rows=1 width=16) (actual time=0.048..67.728 rows=4725 loops=1) shows that it returned 4700 rows instead of 1 which when coupled with a later join causes the statement to run over 3 minutes.[1] It seems that it thinks that the scan on role_id is going to return 1 row, but in reality returns 4725 rows. The case statement causing the problem uses todays date to see if a particular row is still active. Here is a test case showing how far off the estimate is from the reality. [2] [2] A much simpler statement triggers the incorrect row counts here. explain analyze select * from roles rol where CASE WHEN rol.role_id IS NULL THEN NULL WHEN rol.begin IS NOT NULL and rol.end IS NOT NULL THEN CASE WHEN TIMESTAMP 'now'=rol.begin and TIMESTAMP 'now'=rol.end THEN 'Active' ELSE 'Inactive' END WHEN rol.begin IS NOT NULL THEN CASE WHEN TIMESTAMP 'now'=rol.begin THEN 'Active' ELSE 'Inactive' END WHEN rol.end IS NOT NULL THEN CASE WHEN TIMESTAMP 'now'=rol.end THEN 'Active' ELSE 'Inactive' END ELSE 'Active' END = 'Active' Aside #1 - I'm not entirely clear how role_id can be null since you seemed to be joining against it in the real query. Aside #2 - You're probably better off with CURRENT_DATE since begin/end seem to be dates, rather than TIMESTAMP 'now' - and in any case you wanted timestamp with time zone OK, I think the root of your problem is your use of null to mean not ended or not started (whatever 'not started' means). PostgreSQL has the handy timestamptz value infinity, but only for timestamps and not for dates. I'd probably cheat a little and use an end date of '-12-31' or similar to simulate infinity. Then your test is simply: WHERE ... AND (rol.begin = CURRENT_DATE AND rol.end = CURRENT_DATE) That should estimate simply enough. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq -- Chris Kratz ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Low performance on Windows problem
(1) Latency and throughput don't necessarily correlate well. When blasting quantities of data to test throughput, TCP_NODELAY might not matter much -- a full buffer will be sent without a delay anyway. What do you get on a ping while running the throughput test? (2) Besides the TCP_NODELAY issue, another issue which has caused similar problems is a mismatch between half duplex and full duplex in the configuration of the switch and the server. Sometimes auto-negotiate doesn't work as advertised; you might want to try setting the configuration explicitly, if you aren't already doing so. -Kevin Dalibor Sramek [EMAIL PROTECTED] 09/14/05 8:02 AM On Tue, Sep 13, 2005 at 11:05:00AM -0400, Merlin Moncure wrote: 5. do select array_accum(q::text) from generate_series(1,1) q; I made the tests you suggested and the pattern is clear. The difference between local and remote command execution is caused by moving data over the network. E.g. the command above takes 700 ms locally and 1500 ms remotely. Remote explain analyze takes exactly the 700 ms. I downloaded PCATTCP - http://www.pcausa.com/Utilities/pcattcp.htm and the measured throughput between the two machines is over 1 kB/s. PCATTCP allows setting TCP_NODELAY but it had no effect on the transfer speed. So the difference between local and remote execution should IMHO stay in the 10 ms range. Definitely not 800 ms. The 8.1 has the same problem. Just for the record: the server PC is Dell Precision 330 with 3Com 3C920 integrated network card. OS MS Windows Professional 2002 with service pack 2. There is Symantec Antivirus installed - which I have (hopefully) completely disabled. Thanks for any help Dalibor Sramek ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Low performance on Windows problem
in the 10 ms range. Definitely not 800 ms. The 8.1 has the same problem. Just for the record: the server PC is Dell Precision 330 with 3Com 3C920 integrated network card. OS MS Windows Professional 2002 with service pack 2. There is Symantec Antivirus installed - which I have (hopefully) completely disabled. Try throwing in another network card and see if it helps. Next step is to try twinking tcp settings (http://support.microsoft.com/default.aspx?scid=kb;en-us;314053) and see if that helps. Beyond that, try playing the update driver game. If you are still having problems, try receiving bigger and bigger results to see where problem occurs. 1-2k range suggests mtu problem, 4-8k range suggests tcp receive window problem. Beyond that, I'm stumped, uh, buy Opteron? :) Merlin ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
[PERFORM] Battery Backed Cache for RAID
Folks, I'm getting a new server for our database, and I have a quick question about RAID controllers with a battery backed cache. I understand that the cache will allow the cache to be written out if the power fails to the box, which allows it to report a write as committed safely when it's not actually committed. My question is, if the power goes off, and the drives stop, how does the battery backed cache save things out to the dead drives? Is there another component that is implied that will provide power to the drives that I should be looking into as well? Thanks, Peter Darley ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Battery Backed Cache for RAID
On Wed, 2005-09-14 at 11:25 -0700, Peter Darley wrote: I'm getting a new server for our database, and I have a quick question about RAID controllers with a battery backed cache. I understand that the cache will allow the cache to be written out if the power fails to the box, which allows it to report a write as committed safely when it's not actually committed. Actually the cache will just hold its contents while the power is out. When the power is restored, the RAID controller will complete the writes to disk. If the battery does not last through the outage, the data is lost. My question is, if the power goes off, and the drives stop, how does the battery backed cache save things out to the dead drives? Is there another component that is implied that will provide power to the drives that I should be looking into as well? A UPS would allow you to do an orderly shutdown and write contents to disk during a power failure. However a UPS can be an extra point of failure. -jwb ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Battery Backed Cache for RAID
On Wed, Sep 14, 2005 at 11:28:43AM -0700, Jeffrey W. Baker wrote: On Wed, 2005-09-14 at 11:25 -0700, Peter Darley wrote: I'm getting a new server for our database, and I have a quick question about RAID controllers with a battery backed cache. I understand that the cache will allow the cache to be written out if the power fails to the box, which allows it to report a write as committed safely when it's not actually committed. Actually the cache will just hold its contents while the power is out. When the power is restored, the RAID controller will complete the writes to disk. If the battery does not last through the outage, the data is lost. Just curious: how long are the batteries supposed to last? -- Alvaro Herrera -- Valdivia, Chile Architect, www.EnterpriseDB.com Hi! I'm a .signature virus! cp me into your .signature file to help me spread! ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Battery Backed Cache for RAID
On Wed, Sep 14, 2005 at 11:28:43AM -0700, Jeffrey W. Baker wrote: On Wed, 2005-09-14 at 11:25 -0700, Peter Darley wrote: I'm getting a new server for our database, and I have a quick question about RAID controllers with a battery backed cache. I understand that the cache will allow the cache to be written out if the power fails to the box, which allows it to report a write as committed safely when it's not actually committed. Actually the cache will just hold its contents while the power is out. When the power is restored, the RAID controller will complete the writes to disk. If the battery does not last through the outage, the data is lost. Just curious: how long are the batteries supposed to last? For the length of time it will take for you to get fired for not getting the server running plus one hour :). Merlin ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Battery Backed Cache for RAID
Alvaro Herrera wrote: On Wed, Sep 14, 2005 at 11:28:43AM -0700, Jeffrey W. Baker wrote: On Wed, 2005-09-14 at 11:25 -0700, Peter Darley wrote: I'm getting a new server for our database, and I have a quick question about RAID controllers with a battery backed cache. I understand that the cache will allow the cache to be written out if the power fails to the box, which allows it to report a write as committed safely when it's not actually committed. Actually the cache will just hold its contents while the power is out. When the power is restored, the RAID controller will complete the writes to disk. If the battery does not last through the outage, the data is lost. Just curious: how long are the batteries supposed to last? The recent *cheap* version of a ramdisk had battery backup for 16 hours. (Very expensive ramdisks actually have enough battery power to power a small hard-drive to dump the contents into). I'm guessing for a RAID controller, the time would be in the max 1 day range. John =:- signature.asc Description: OpenPGP digital signature
Re: [PERFORM] Battery Backed Cache for RAID
John A Meinel wrote: The recent *cheap* version of a ramdisk had battery backup for 16 hours. (Very expensive ramdisks actually have enough battery power to power a small hard-drive to dump the contents into). I'm guessing for a RAID controller, the time would be in the max 1 day range. i think some will go a bit longer. i have seen an IBM ServeRaid (rebranded mylex in this particular case) keep its memory after being pulled for a remarkably long period of time. no guarantees, though, so i'm not actually going to say how long so that nobody gets unreasonable expectations. richard ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Battery Backed Cache for RAID
On 14-9-2005 22:03, Alvaro Herrera wrote: On Wed, Sep 14, 2005 at 11:28:43AM -0700, Jeffrey W. Baker wrote: On Wed, 2005-09-14 at 11:25 -0700, Peter Darley wrote: Actually the cache will just hold its contents while the power is out. When the power is restored, the RAID controller will complete the writes to disk. If the battery does not last through the outage, the data is lost. Just curious: how long are the batteries supposed to last? For the LSI-Logic MegaRaid 320-2e its about 72 hours for the standard 128MB version. Their SATA2-solution offers 32 and 72 hour-options. So I assume its in the order of days for most RAID controllers. Best regards, Arjen van der Meijden ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Battery Backed Cache for RAID
Bear in mind you will lose data if the raid controller itself fails (or the cache memory module). Many solutions have mirrored cache for this reason. But that's more $$, depending on the risks you want to take. Quoting Arjen van der Meijden [EMAIL PROTECTED]: On 14-9-2005 22:03, Alvaro Herrera wrote: On Wed, Sep 14, 2005 at 11:28:43AM -0700, Jeffrey W. Baker wrote: On Wed, 2005-09-14 at 11:25 -0700, Peter Darley wrote: Actually the cache will just hold its contents while the power is out. When the power is restored, the RAID controller will complete the writes to disk. If the battery does not last through the outage, the data is lost. Just curious: how long are the batteries supposed to last? For the LSI-Logic MegaRaid 320-2e its about 72 hours for the standard 128MB version. Their SATA2-solution offers 32 and 72 hour-options. So I assume its in the order of days for most RAID controllers. Best regards, Arjen van der Meijden ---(end of broadcast)--- TIP 6: explain analyze is your friend ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] How many tables is too many tables?
[EMAIL PROTECTED] wrote One machine is simply not going to be able to scale with the quantities of links we hope to store information about and we want to move to some kind of cluster. Because of the quantities of data, it seems to make sense to go for a cluster setup such that in a 4 machine cluster, each machine has a quarter of the data (is this Share nothing, or, Share everything?). To that end, we figured a good first step was to partition the data on one machine into multiple tables defining the logic which would find the appropriate table given a piece of data. Then, we assumed, adding the logic to find the appropriate machine and database in our cluster would only be an incremental upgrade. So you set up 4 separate copies of PG in 4 machines? This is neither SN or SE. The partition is good for performance if you distribute IOs and CPUs. In your design, I believe IO is distributed (to 4 machines), but since you sliced data into too small pieces, you will get penality from other places. For example, each table has to maintain separate indices (index becomes an useless burden when table is too small), so there will be so many Btree root ... System tables (pg_class/pg_attribute, etc) has to contains many rows to record your tables ... though we cached system table rows, but the memory space is limited ... In short, too many tables. To design your new partition method, jsut keep in mind that database access data in a page-wise IO. Regards, Qingqing ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster