Re: [PERFORM] UNSUBSCRIBE
On May 10, 2006, at 14:42 , Tom Lane wrote: Chris [EMAIL PROTECTED] writes: Maybe :) The php-general list has To unsubscribe, visit: http://www.php.net/unsub.php at the bottom of every email, and there are still random unsubscribe requests.. That will *always* happen. Just human nature and the numbers of subscribers. However, a one-liner that either points to the webpage for unsubscribing (probably easiest) or a brief description on how to unsubscribe (To unsubscribe, send an email to [EMAIL PROTECTED] with body unsub pgsql-performance (without quotes)) may intercept a few more. Is there a way to configure Majordomo to make even easier to unsubscribe? Just sending to pgsql- [EMAIL PROTECTED] or some such? I've seen other mailing lists that do this. Requiring a specific command (what's the command? in the subject or the body?) is one more place a person can make a mistake. (I've recently switched mail accounts and unsubbed/ subbed from the lists I'm on. This latter style does make it a lot easier.) (And are there mail readers out there that can pick those subscribe/ unsubscribe headers from the list emails? Now *that'd* be sweet.) Michael Glaesemann grzm seespotcode net ---(end of broadcast)--- TIP 6: explain analyze is your friend
[PERFORM] in memory views
is there a possibility for creating views or temp tables in memory to avoid disk io when user makes select operations? regards tom ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] in memory views
Thomas Vatter schrieb: is there a possibility for creating views or temp tables in memory to avoid disk io when user makes select operations? No need. The data will be available in OS and database caches if they are really required often. If not, tune up the caches and do a regular pre select. Regards Tino ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] in memory views
Tino Wildenhain wrote: Thomas Vatter schrieb: is there a possibility for creating views or temp tables in memory to avoid disk io when user makes select operations? No need. The data will be available in OS and database caches if they are really required often. If not, tune up the caches and do a regular pre select. Regards Tino hmm, I am selecting a resultset with 1300 rows joined from 12 tables. with jdbc I am waiting 40 seconds until the first row appears. The following rows appear really fast but the 40 seconds are a problem. regards tom ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] in memory views
Thomas Vatter schrieb: Tino Wildenhain wrote: Thomas Vatter schrieb: is there a possibility for creating views or temp tables in memory to avoid disk io when user makes select operations? No need. The data will be available in OS and database caches if they are really required often. If not, tune up the caches and do a regular pre select. Regards Tino hmm, I am selecting a resultset with 1300 rows joined from 12 tables. with jdbc I am waiting 40 seconds until the first row appears. The following rows appear really fast but the 40 seconds are a problem. Well you will need the equally 40 seconds to fill your hypothetical in memory table. (even a bit more due to the creation of a datastructure). So you can do the aproaches of semi materialized views (that are in fact writing into a shadow table) or just prefetch your data at time - just at the times you would refill your memory tables if they existed. A cronjob with select/fetch should do. Regards Tino ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] in memory views
Tino Wildenhain wrote: Thomas Vatter schrieb: Tino Wildenhain wrote: Thomas Vatter schrieb: is there a possibility for creating views or temp tables in memory to avoid disk io when user makes select operations? No need. The data will be available in OS and database caches if they are really required often. If not, tune up the caches and do a regular pre select. Regards Tino hmm, I am selecting a resultset with 1300 rows joined from 12 tables. with jdbc I am waiting 40 seconds until the first row appears. The following rows appear really fast but the 40 seconds are a problem. Well you will need the equally 40 seconds to fill your hypothetical in memory table. (even a bit more due to the creation of a datastructure). So you can do the aproaches of semi materialized views (that are in fact writing into a shadow table) or just prefetch your data at time - just at the times you would refill your memory tables if they existed. A cronjob with select/fetch should do. Regards Tino If the in memory table is created a bootup time of the dbms it is already present when user selects the data. Of course the challenge is to keep the in memory table up to date if data are changed. What do you mean with semi materialized views, I have tried select * from this_view with the same result. Also, if I repeat the query it does not run faster. regards tom -- Mit freundlichen Grüßen / Regards Vatter Network Inventory Software Sun Microsystems Principal Partner www.network-inventory.de Tel. 030-79782510 E-Mail [EMAIL PROTECTED] ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] in memory views
Hi, there was a similar discussion with a ramdisk: http://archives.postgresql.org/pgsql-hackers/2005-11/msg01058.php You need to populate the data on serverstart, of course. But as Timo mentionend, it's maybe not worth the trouble. Maybe their is a way to speed up the queriy itself. To analyze this, you should post the query- and table-definition and the output of explain analyze of the offending query. Best regards Hakan Kocaman Software-Development digame.de GmbH Richard-Byrd-Str. 4-8 50829 Köln Tel.: +49 (0) 221 59 68 88 31 Fax: +49 (0) 221 59 68 88 98 Email: [EMAIL PROTECTED] -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Thomas Vatter Sent: Wednesday, May 10, 2006 12:43 PM To: Tino Wildenhain Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] in memory views Tino Wildenhain wrote: Thomas Vatter schrieb: Tino Wildenhain wrote: Thomas Vatter schrieb: is there a possibility for creating views or temp tables in memory to avoid disk io when user makes select operations? No need. The data will be available in OS and database caches if they are really required often. If not, tune up the caches and do a regular pre select. Regards Tino hmm, I am selecting a resultset with 1300 rows joined from 12 tables. with jdbc I am waiting 40 seconds until the first row appears. The following rows appear really fast but the 40 seconds are a problem. Well you will need the equally 40 seconds to fill your hypothetical in memory table. (even a bit more due to the creation of a datastructure). So you can do the aproaches of semi materialized views (that are in fact writing into a shadow table) or just prefetch your data at time - just at the times you would refill your memory tables if they existed. A cronjob with select/fetch should do. Regards Tino If the in memory table is created a bootup time of the dbms it is already present when user selects the data. Of course the challenge is to keep the in memory table up to date if data are changed. What do you mean with semi materialized views, I have tried select * from this_view with the same result. Also, if I repeat the query it does not run faster. regards tom -- Mit freundlichen Grüßen / Regards Vatter Network Inventory Software Sun Microsystems Principal Partner www.network-inventory.de Tel. 030-79782510 E-Mail [EMAIL PROTECTED] ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[PERFORM] Question about explain-command...
Hello, I just discovered the explain command and well ... have some (for you of course very stupid) questions. I do a quite large (for my taste) join, the query looks like the following: SELECT DISTINCT customer.email AS cemail, customer.key AS ckey, customer.anrede AS canrede, customer.strasse AS cstrasse, customer.plz AS cplz, customer.ort AS cort, customer.vorname AS cvorname, customer.nachname AS cnachname , custtype.name AS tname, customer.land AS cland, customer.datanotvalid AS cdatanvalid FROM customer LEFT JOIN sells ON customer.key=sells.custid LEFT JOIN goods ON sells.goodsid=goods.key LEFT JOIN custtype ON customer.custgroup=custtype.key LEFT JOIN prodtype ON prodtype.key=goods.prodgroup WHERE customer.nachname LIKE '%name%'; All primary keys are indixed, and this is what explain tells me: Unique (cost=15.67..16.69 rows=34 width=115) - Sort (cost=15.67..15.75 rows=34 width=115) Sort Key: customer.email, customer.key, customer.anrede, customer.str asse, customer.plz, customer.ort, customer.vorname, customer.nachname, custtype. name, customer.land, customer.datanotvalid - Hash Left Join (cost=6.16..14.80 rows=34 width=115) Hash Cond: (outer.prodgroup = inner.key) - Hash Left Join (cost=4.97..13.10 rows=34 width=119) Hash Cond: (outer.custgroup = inner.key) - Hash Left Join (cost=3.88..11.49 rows=34 width=111) Hash Cond: (outer.goodsid = inner.key) - Hash Left Join (cost=1.98..9.08 rows=34 width=111) Hash Cond: (outer.key = inner.custid) - Seq Scan on customer (cost=0.00..6.10 rows=34 width=107) Filter: ((nachname)::text ~~ '%au%'::text) - Hash (cost=1.78..1.78 rows=78 width=8) - Seq Scan on sells (cost=0.00..1.78 rows=78 width=8) - Hash (cost=1.72..1.72 rows=72 width=8) - Seq Scan on goods (cost=0.00..1.72 rows=72 width=8) - Hash (cost=1.08..1.08 rows=8 width=16) - Seq Scan on custtype (cost=0.00..1.08 rows=8 width=16) - Hash (cost=1.15..1.15 rows=15 width=4) - Seq Scan on prodtype (cost=0.00..1.15 rows=15 width=4) What does the hash-lines mean, does that mean my query does not use the indices at all? Why are some table-names and some column-names surrounded by ' '? Are they threated as text-columns? I have to admit that the tables are just filled with test-data so the analyzer may take just a very simple way since almost no data is in... lg Clemens ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] in memory views
Thomas Vatter schrieb: Tino Wildenhain wrote: ... Well you will need the equally 40 seconds to fill your hypothetical in memory table. (even a bit more due to the creation of a datastructure). So you can do the aproaches of semi materialized views (that are in fact writing into a shadow table) or just prefetch your data at time - just at the times you would refill your memory tables if they existed. A cronjob with select/fetch should do. Regards Tino If the in memory table is created a bootup time of the dbms it is already present when user selects the data. Of course the challenge is to keep the in memory table up to date if data are changed. What do you mean with semi materialized views, I have tried select * from this_view with the same result. Also, if I repeat the query it does not run faster. Semi materialized views are just views with aditional rules and some triggers which copy data to another table. There are several receipes if you google accordingly. I do not know what you mean by bootup time - do you really reboot your database server? *hehe* just kidding ;) In your first email you told me your query indeed runs faster the 2nd time (due to the caching) now you are telling me that it is not. Btw, judging from your analyze output you are using very cryptic table and column names - you can use aliasing in the query and dont have to resort to tiny tags when you actually name the objects ;) Maybe others have comments on your query. Btw, better use explain analyze to get realistic results. Regards Tino ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] [GENERAL] Arguments Pro/Contra Software Raid
Greg Stark [EMAIL PROTECTED] writes: Douglas McNaught [EMAIL PROTECTED] writes: Correct me if I'm wrong, but I've never heard of a 15kRPM SATA drive. Well, dollar for dollar you would get the best performance from slower drives anyways since it would give you more spindles. 15kRPM drives are *expensive*. Depends on your power, heat and rack space budget too... If you need max performance out of a given rack space (rather than max density), SCSI is still the way to go. I'll definitely agree that SATA is becoming much more of a player in the server storage market, though. -Doug ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] in memory views
Tino Wildenhain wrote: Thomas Vatter schrieb: Tino Wildenhain wrote: ... Well you will need the equally 40 seconds to fill your hypothetical in memory table. (even a bit more due to the creation of a datastructure). So you can do the aproaches of semi materialized views (that are in fact writing into a shadow table) or just prefetch your data at time - just at the times you would refill your memory tables if they existed. A cronjob with select/fetch should do. Regards Tino If the in memory table is created a bootup time of the dbms it is already present when user selects the data. Of course the challenge is to keep the in memory table up to date if data are changed. What do you mean with semi materialized views, I have tried select * from this_view with the same result. Also, if I repeat the query it does not run faster. Semi materialized views are just views with aditional rules and some triggers which copy data to another table. There are several receipes if you google accordingly. I do not know what you mean by bootup time - do you really reboot your database server? *hehe* just kidding ;) In your first email you told me your query indeed runs faster the 2nd time (due to the caching) now you are telling me that it is not. Btw, judging from your analyze output you are using very cryptic table and column names - you can use aliasing in the query and dont have to resort to tiny tags when you actually name the objects ;) Maybe others have comments on your query. Btw, better use explain analyze to get realistic results. Regards Tino ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match The subsequent rows are shown faster not the subsequent queries - if you really read my first e-mail ;-) . Yes, I have done analyse yesterday, the database has not changed, afaik it is necessary when the database contents are changing. regards tom ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Arguments Pro/Contra Software Raid
* Hannes Dorbath: + Hardware Raids might be a bit easier to manage, if you never spend a few hours to learn Software Raid Tools. I disagree. RAID management is complicated, and once there is a disk failure, all kinds of oddities can occur which can make it quite a challenge to get back a non-degraded array. With some RAID controllers, monitoring is diffcult because they do not use the system's logging mechanism for reporting. In some cases, it is not possible to monitor the health status of individual disks. + Using SATA drives is always a bit of risk, as some drives are lying about whether they are caching or not. You can usually switch off caching. + Using hardware controllers, the array becomes locked to a particular vendor. You can't switch controller vendors as the array meta information is stored proprietary. In case the Raid is broken to a level the controller can't recover automatically this might complicate manual recovery by specialists. It's even more difficult these days. 3ware controllers enable drive passwords, so you can't access the drive from other controllers at all (even if you could interpret the on-disk data). + Even battery backed controllers can't guarantee that data written to the drives is consistent after a power outage, neither that the drive does not corrupt something during the involuntary shutdown / power irregularities. (This is theoretical as any server will be UPS backed) UPS failures are not unheard of. 8-/ Apart from that, you can address a large class of shutdown failures if you replay a log stored in the BBU on the next reboot (partial sector writes come to my mind). It is very difficult to check if the controller does this correctly, though. A few other things to note: You can't achieve significant port density with non-RAID controllers, at least with SATA. You need to buy a RAID controller anyway. You can't quite achieve what a BBU does (even if you've got a small, fast persistent storage device) because there's no host software support for such a configuration. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] [PERFORM] Big IN() clauses etc : feature proposal
Hi, PFC, PFC wrote: The problem is that you need a set-returning function to retrieve the values. SRFs don't have rowcount estimates, so the plans suck. What about adding some way of rowcount estimation to SRFs, in the way of: CREATE FUNCTION foo (para, meters) RETURNS SETOF bar AS $$ ... function code ... $$ LANGUAGE plpgsql ROWCOUNT_ESTIMATOR $$ ... estimation code ... $$ ; Internally, this could create two functions, foo (para, meters) and estimate_foo(para, meters) that are the same language and coupled together (just like a SERIAL column and its sequence). The estimator functions have an implicit return parameter of int8. Parameters may be NULL when they are not known at query planning time. What do you think about this idea? The same scheme could be used to add a CPUCOST_ESTIMATOR to expensive functions. HTH, Markus -- Markus Schaber | Logical TrackingTracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] [GENERAL] Arguments Pro/Contra Software Raid
Hi, Scott all, Scott Lamb wrote: I don't know the answer to this question, but have you seen this tool? http://brad.livejournal.com/2116715.html We had a simpler tool inhouse, which wrote a file byte-for-byte, and called fsync() after every byte. If the number of fsyncs/min is higher than your rotations per minute value of your disks, they must be lying. It does not find as much liers as the script above, but it is less intrusive (can be ran on every low-io machine without crashing it), and it found some liers in-house (some notebook disks, one external USB/FireWire to IDE case, and an older linux cryptoloop implementations, IIRC). If you're interested, I can dig for the C source... HTH, Markus -- Markus Schaber | Logical TrackingTracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] [GENERAL] Arguments Pro/Contra Software Raid
Markus Schaber wrote: Hi, Scott all, Scott Lamb wrote: I don't know the answer to this question, but have you seen this tool? http://brad.livejournal.com/2116715.html We had a simpler tool inhouse, which wrote a file byte-for-byte, and called fsync() after every byte. If the number of fsyncs/min is higher than your rotations per minute value of your disks, they must be lying. It does not find as much liers as the script above, but it is less Why does it find fewer liers? --- intrusive (can be ran on every low-io machine without crashing it), and it found some liers in-house (some notebook disks, one external USB/FireWire to IDE case, and an older linux cryptoloop implementations, IIRC). If you're interested, I can dig for the C source... HTH, Markus -- Markus Schaber | Logical TrackingTracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org ---(end of broadcast)--- TIP 6: explain analyze is your friend -- Bruce Momjian http://candle.pha.pa.us EnterpriseDBhttp://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] [GENERAL] Arguments Pro/Contra Software Raid
On May 10, 2006, at 12:41 AM, Greg Stark wrote: Well, dollar for dollar you would get the best performance from slower drives anyways since it would give you more spindles. 15kRPM drives are *expensive*. Personally, I don't care that much for dollar for dollar I just need performance. If it is within a factor of 2 or 3 in price then I'll go for absolute performance over bang for the buck. smime.p7s Description: S/MIME cryptographic signature
Re: [PERFORM] PostgreSQL VACCUM killing CPU
On Tue, May 09, 2006 at 03:19:08AM -0700, [EMAIL PROTECTED] wrote: I have got such problem. Im running Postgresql 7.3.2 on Linux 2.6.13. What is see when VACCUM is running and killing my CPU is: Cpu(s): 3.2% us, 0.0% sy, 0.0% ni, 0.0% id, 96.8% wa, 0.0% hi, 0.0% si what i am worry about is 96.8% wa why is it like that? It's killing your disk drives instead of CPU(which is mostly _idle_ waiting for I/O completion). Run this command to get an idea of the I/O activities: iostat -x 3 3 [AD]Running a kernel patched with adaptive read-ahead may help it: http://www.vanheusden.com/ara/adaptive-readahead-11.1-2.6.16.5.patch.gz ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] [GENERAL] Arguments Pro/Contra Software Raid
Vivek Khera wrote: On May 10, 2006, at 12:41 AM, Greg Stark wrote: Well, dollar for dollar you would get the best performance from slower drives anyways since it would give you more spindles. 15kRPM drives are *expensive*. Personally, I don't care that much for dollar for dollar I just need performance. If it is within a factor of 2 or 3 in price then I'll go for absolute performance over bang for the buck. That is really the issue. You can buy lots of consumer-grade stuff and work just fine if your performance/reliability tolerance is high enough. However, don't fool yourself that consumer and server-grade hardware is internally the same, or has the same testing. I just had a Toshiba laptop drive replaced last week (new, not refurbished), only to have it fail this week. Obviously there isn't sufficient burn-in done by Toshiba, and I don't fault them because it is a consumer laptop --- it fails, they replace it. For servers, the downtime usually can't be tolerated, while consumers usually can tolerate significant downtime. I have always purchased server-grade hardware for my home server, and I think I have had one day of hardware downtime in the past ten years. Consumer hardware just couldn't do that. As one data point, most consumer-grade IDE drives are designed to be run only 8 hours a day. The engineering doesn't anticipate 24-hour operation, and that trade-off passes all the way through the selection of componients for the drive, which generates sigificant cost savings. -- Bruce Momjian http://candle.pha.pa.us EnterpriseDBhttp://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] [PERFORM] Big IN() clauses etc : feature proposal
The problem is that you need a set-returning function to retrieve the values. SRFs don't have rowcount estimates, so the plans suck. What about adding some way of rowcount estimation to SRFs, in the way of: CREATE FUNCTION foo (para, meters) RETURNS SETOF bar AS $$ ... function code ... $$ LANGUAGE plpgsql ROWCOUNT_ESTIMATOR $$ ... estimation code ... $$ ; Internally, this could create two functions, foo (para, meters) and estimate_foo(para, meters) that are the same language and coupled together (just like a SERIAL column and its sequence). The estimator functions have an implicit return parameter of int8. Parameters may be NULL when they are not known at query planning time. What do you think about this idea? It would be very useful. A few thoughts... You need to do some processing to know how many rows the function would return. Often, this processing will be repeated in the function itself. Sometimes it's very simple (ie. the function will RETURN NEXT each element in an array, you know the array length...) Sometimes, for functions returning few rows, it might be faster to compute the entire result set in the cost estimator. So, it might be a bit hairy to find a good compromise. Ideas on how to do this (clueless hand-waving mode) : 1- Add new attributes to set-returning functions ; basically a list of functions, each returning an estimation parameter (rowcount, cpu tuple cost, etc). This is just like you said. 2- Add an estimator, to a function, which would just be another function, returning one row, a record, containing the estimations in several columns (rowcount, cpu tuple cost, etc). Pros : only one function call to estimate, easier and faster, the estimator just leaves the unknown columns to NULL. The estimator needs not be in the same language as the function itself. It's just another function. 3- The estimator could be a set-returning function itself which would return rows mimicking pg_statistics Pros : planner-friendly, the planner would SELECT from the SRF instead of looking in pg_statistics, and the estimator could tell the planner that, for instance, the function will return unique values. Cons : complex, maybe slow 4- Add simple flags to a function, like : - returns unique values - returns sorted values (no need to sort my results) - please execute me and store my results in a temporary storage, count the rows returned, and plan the outer query accordingly - etc. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Question about explain-command...
-Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Clemens Eisserer Sent: Wednesday, May 10, 2006 6:50 AM To: pgsql-performance@postgresql.org Subject: [PERFORM] Question about explain-command... What does the hash-lines mean, does that mean my query does not use the indices at all? Why are some table-names and some column-names surrounded by ' '? Are they threated as text-columns? I have to admit that the tables are just filled with test-data so the analyzer may take just a very simple way since almost no data is in... For small tables, it is faster to do a sequential scan than an index scan. You probably don't have enough test data to make the planner choose an index scan. I don't think the quotes really mean anything. They are just used as delimiters. The hash lines mean your tables are being joined by hash joins. You should read this page for more info: http://www.postgresql.org/docs/8.1/interactive/performance-tips.html ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] [GENERAL] Arguments Pro/Contra Software Raid
Hi, Bruce, Bruce Momjian wrote: It does not find as much liers as the script above, but it is less Why does it find fewer liers? It won't find liers that have a small lie-queue-length so their internal buffers get full so they have to block. After a small burst at start which usually hides in other latencies, they don't get more throughput than spindle turns. It won't find liers that first acknowledge to the host, and then immediately write the block before accepting other commands. This improves latency (which is measured in some benchmarks), but not syncs/write rate. Both of them can be captured by the other script, but not by my tool. HTH, Markus -- Markus Schaber | Logical TrackingTracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] [GENERAL] Arguments Pro/Contra Software Raid
On Tue, 2006-05-09 at 20:02, Bruce Momjian wrote: Scott Marlowe wrote: Actually, in the case of the Escalades at least, the answer is yes. Last year (maybe a bit more) someone was testing an IDE escalade controller with drives that were known to lie, and it passed the power plug pull test repeatedly. Apparently, the escalades tell the drives to turn off their cache. While most all IDEs and a fair number of SATA drives lie about cache fsyncing, they all seem to turn off the cache when you ask. And, since a hardware RAID controller with bbu cache has its own cache, it's not like it really needs the one on the drives anyway. You do if the controller thinks the data is already on the drives and removes it from its cache. Bruce, re-read what I wrote. The escalades tell the drives to TURN OFF THEIR OWN CACHE. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] [GENERAL] Arguments Pro/Contra Software Raid
Scott Marlowe [EMAIL PROTECTED] writes: On Tue, 2006-05-09 at 20:02, Bruce Momjian wrote: You do if the controller thinks the data is already on the drives and removes it from its cache. Bruce, re-read what I wrote. The escalades tell the drives to TURN OFF THEIR OWN CACHE. Some ATA drives would lie about that too IIRC. Hopefully they've stopped doing it in the SATA era. -Doug ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] [PERFORM] Big IN() clauses etc : feature proposal
On Wed, May 10, 2006 at 04:38:31PM +0200, PFC wrote: You need to do some processing to know how many rows the function would return. Often, this processing will be repeated in the function itself. Sometimes it's very simple (ie. the function will RETURN NEXT each element in an array, you know the array length...) Sometimes, for functions returning few rows, it might be faster to compute the entire result set in the cost estimator. I think the best would probably be to assign a constant. An SRF will generally return between one of 1-10, 10-100, 100-1000, etc. You don't need exact number, you just need to get within an order of magnitude and a constant will work fine for that. How many functions sometimes return one and sometimes a million rows? Have a nice day, -- Martijn van Oosterhout kleptog@svana.org http://svana.org/kleptog/ From each according to his ability. To each according to his ability to litigate. signature.asc Description: Digital signature
Re: [PERFORM] [GENERAL] Arguments Pro/Contra Software Raid
Hi, Bruce, Markus Schaber wrote: It does not find as much liers as the script above, but it is less Why does it find fewer liers? It won't find liers that have a small lie-queue-length so their internal buffers get full so they have to block. After a small burst at start which usually hides in other latencies, they don't get more throughput than spindle turns. I just reread my mail, and must admit that I would not understand what I wrote above, so I'll explain a little more: My test programs writes byte-for-byte. Let's say our FS/OS has 4k page- and blocksize, that means 4096 writes that all write the same disk blocks. Intelligent liers will see that the the 2nd and all further writes obsolete the former writes who still reside in the internal cache, and drop those former writes from cache, effectively going up to 4k writes/spindle turn. Dumb liers will keep the obsolete writes in the write cache / queue, and so won't be caught by my program. (Note that I have no proof that such disks actually exist, but I have enough experience with hardware that I won't be surprised.) HTH, Markus -- Markus Schaber | Logical TrackingTracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] [PERFORM] Big IN() clauses etc : feature proposal
Hi, PFC, PFC wrote: You need to do some processing to know how many rows the function would return. Often, this processing will be repeated in the function itself. Sometimes it's very simple (ie. the function will RETURN NEXT each element in an array, you know the array length...) Sometimes, for functions returning few rows, it might be faster to compute the entire result set in the cost estimator. I know, but we only have to estmiate the number of rows to give a hint to the query planner, so we can use lots of simplifications. E. G. for generate_series we return ($2-$1)/$3, and for some functions even constant estimates will be good enough. - please execute me and store my results in a temporary storage, count the rows returned, and plan the outer query accordingly That's an interesting idea. Markus -- Markus Schaber | Logical TrackingTracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] [GENERAL] Arguments Pro/Contra Software Raid
On Wed, 2006-05-10 at 09:51, Douglas McNaught wrote: Scott Marlowe [EMAIL PROTECTED] writes: On Tue, 2006-05-09 at 20:02, Bruce Momjian wrote: You do if the controller thinks the data is already on the drives and removes it from its cache. Bruce, re-read what I wrote. The escalades tell the drives to TURN OFF THEIR OWN CACHE. Some ATA drives would lie about that too IIRC. Hopefully they've stopped doing it in the SATA era. Ugh. Now that would make for a particularly awful bit of firmware implementation. I'd think that if I found a SATA drive doing that I'd be likely to strike the manufacturer off of the list for possible future purchases... ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] in memory views
On Wed, 2006-05-10 at 04:55, Thomas Vatter wrote: Tino Wildenhain wrote: Thomas Vatter schrieb: is there a possibility for creating views or temp tables in memory to avoid disk io when user makes select operations? No need. The data will be available in OS and database caches if they are really required often. If not, tune up the caches and do a regular pre select. Regards Tino hmm, I am selecting a resultset with 1300 rows joined from 12 tables. with jdbc I am waiting 40 seconds until the first row appears. The following rows appear really fast but the 40 seconds are a problem. Are you selecting the whole set at once? Or are you placing it into a cursor? What happens if you do this by declaring it as a cursor and then fetching the first row? ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Question about explain-command...
I will try answering your questions. Please note that I am a newbie myself. Clemens Eisserer wrote All primary keys are indixed, and this is what explain tells me: Unique (cost=15.67..16.69 rows=34 width=115) - Sort (cost=15.67..15.75 rows=34 width=115) Sort Key: customer.email, customer.key, customer.anrede, customer.str asse, customer.plz, customer.ort, customer.vorname, customer.nachname, custtype. name, customer.land, customer.datanotvalid - Hash Left Join (cost=6.16..14.80 rows=34 width=115) Hash Cond: (outer.prodgroup = inner.key) - Hash Left Join (cost=4.97..13.10 rows=34 width=119) Hash Cond: (outer.custgroup = inner.key) - Hash Left Join (cost=3.88..11.49 rows=34 width=111) Hash Cond: (outer.goodsid = inner.key) - Hash Left Join (cost=1.98..9.08 rows=34 width=111) Hash Cond: (outer.key = inner.custid) - Seq Scan on customer (cost=0.00..6.10 rows=34 width=107) Filter: ((nachname)::text ~~ '%au%'::text) - Hash (cost=1.78..1.78 rows=78 width=8) - Seq Scan on sells (cost=0.00..1.78 rows=78 width=8) - Hash (cost=1.72..1.72 rows=72 width=8) - Seq Scan on goods (cost=0.00..1.72 rows=72 width=8) - Hash (cost=1.08..1.08 rows=8 width=16) - Seq Scan on custtype (cost=0.00..1.08 rows=8 width=16) - Hash (cost=1.15..1.15 rows=15 width=4) - Seq Scan on prodtype (cost=0.00..1.15 rows=15 width=4) What does the hash-lines mean, does that mean my query does not use the indices at all? Yes. Probably each table fits nicely into a single disk read, so reading both the index AND the table is going to be twice as expensive. Why are some table-names and some column-names surrounded by ' '? Are they threated as text-columns? They are either names generated by postgres (outer and inner) or field names which are also reserved words in SQL (key). You can always use double quotes around a field name - you have to in some cases if they are reserved words, and always if they contain special characters (not sure from memory exactly which these are - at least spaces). I recommend not to use either of these, even if a reserved word is the best description of your field. Postgres seems to be a bit better than some other dbms's in allowing unquoted reserved words as field names if there is no ambiguity. Thsis may mean that you get a problem if your application is ever ported to a different dbms. I have to admit that the tables are just filled with test-data so the analyzer may take just a very simple way since almost no data is in... Try loading your tables with a realistic number of customers, and you should see a change in the query plan to use your precious indexes. /Nis ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] in memory views
Scott Marlowe wrote: On Wed, 2006-05-10 at 04:55, Thomas Vatter wrote: Tino Wildenhain wrote: Thomas Vatter schrieb: is there a possibility for creating views or temp tables in memory to avoid disk io when user makes select operations? No need. The data will be available in OS and database caches if they are really required often. If not, tune up the caches and do a regular "pre select". Regards Tino hmm, I am selecting a resultset with 1300 rows joined from 12 tables. with jdbc I am waiting 40 seconds until the first row appears. The following rows appear really fast but the 40 seconds are a problem. Are you selecting the whole set at once? Or are you placing it into a cursor? What happens if you do this by declaring it as a cursor and then fetching the first row? ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org I do executeQuery(), for the resultSet I do next() and return one row, but wait, I have to review the logic in this area, I can tell you tomorrow regards tom
Re: [PERFORM] in memory views
On Wed, 2006-05-10 at 10:41, Thomas Vatter wrote: Scott Marlowe wrote: What happens if you do this by declaring it as a cursor and then fetching the first row? I do executeQuery(), for the resultSet I do next() and return one row, but wait, I have to review the logic in this area, I can tell you tomorrow A good short test is to run explain analyze on the query from the psql command line. If it shows an execution time of significantly less than what you get from you application, then it is likely that the real problem is that your application is receiving the whole result set via libpq and waiting for that. A cursor will solve that problem. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] in memory views
is there a possibility for creating views or temp tables in memory to avoid disk io when user makes select operations? you might also want to look into materialized views: http://jonathangardner.net/PostgreSQL/materialized_views/matviews.html http://www.varlena.com/varlena/GeneralBits/64.php this helped us alot when we had slow queries involving many tables. cheers, thomas ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] [PERFORM] Big IN() clauses etc : feature proposal
Martijn van Oosterhout wrote: On Wed, May 10, 2006 at 04:38:31PM +0200, PFC wrote: You need to do some processing to know how many rows the function would return. Often, this processing will be repeated in the function itself. Sometimes it's very simple (ie. the function will RETURN NEXT each element in an array, you know the array length...) Sometimes, for functions returning few rows, it might be faster to compute the entire result set in the cost estimator. I think the best would probably be to assign a constant. An SRF will generally return between one of 1-10, 10-100, 100-1000, etc. You don't need exact number, you just need to get within an order of magnitude and a constant will work fine for that. How many functions sometimes return one and sometimes a million rows? It will probably be quite common for the number to depend on the number of rows in other tables. Even if this is fairly constant within one db (some assumption), it is likely to be different in others using the same function definition. Perhaps a better solution would be to cache the result of the estimator function. /Nis ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] [PERFORM] Big IN() clauses etc : feature proposal
Hi, Nils, Nis Jorgensen wrote: It will probably be quite common for the number to depend on the number of rows in other tables. Even if this is fairly constant within one db (some assumption), it is likely to be different in others using the same function definition. Perhaps a better solution would be to cache the result of the estimator function. Sophisticated estimator functions are free to use the pg_statistics views for their row count estimation. HTH, Markus -- Markus Schaber | Logical TrackingTracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] UNSUBSCRIBE
On Wed, May 10, 2006 at 01:15:11 -0400, Tom Lane [EMAIL PROTECTED] wrote: Maybe the real problem is at the other end of the process, ie we should require some evidence of a greater-than-room-temp IQ to subscribe in the first place? I suspect it is more lazyiness that smarts. That had to at least figure out how to respond to the confirm message in the first place in order to get subscribed. My theory is that they don't want to take the trouble to figure out how to unsubscribe when they (think that they) can just send a message to the list (not even the admin) asking to be unsubscribed and it will (well actually won't on these lists) happen. Maybe posts with unsubscribe in the subject could be held for moderation and/or get an automated reply with instructions for unsubscribing. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] [PERFORM] Big IN() clauses etc : feature proposal
On Tue, May 09, 2006 at 11:33:42AM +0200, PFC wrote: - Repeating the query might yield different results if records were added or deleted in the meantime. BTW, SET TRANSACTION ISOLATION LEVEL serializeable or BEGIN ISOLATION LEVEL serializeable would cure that. -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] [PERFORM] Big IN() clauses etc : feature proposal
On Tue, May 09, 2006 at 01:29:56PM +0200, PFC wrote: 0.101 ms BEGIN 1.451 ms CREATE TEMPORARY TABLE tmp ( a INTEGER NOT NULL, b INTEGER NOT NULL, c TIMESTAMP NOT NULL, d INTEGER NOT NULL ) ON COMMIT DROP 0.450 ms INSERT INTO tmp SELECT * FROM bookmarks ORDER BY annonce_id DESC LIMIT 20 0.443 ms ANALYZE tmp 0.365 ms SELECT * FROM tmp 0.310 ms DROP TABLE tmp 32.918 ms COMMIT CREATING the table is OK, but what happens on COMMIT ? I hear the disk seeking frantically. With fsync=off, I get this : 0.090 ms BEGIN 1.103 ms CREATE TEMPORARY TABLE tmp ( a INTEGER NOT NULL, b INTEGER NOT NULL, c TIMESTAMP NOT NULL, d INTEGER NOT NULL ) ON COMMIT DROP 0.439 ms INSERT INTO tmp SELECT * FROM bookmarks ORDER BY annonce_id DESC LIMIT 20 0.528 ms ANALYZE tmp 0.364 ms SELECT * FROM tmp 0.313 ms DROP TABLE tmp 0.688 ms COMMIT Getting closer ? I'm betting on system catalogs updates. I get the same timings with ROLLBACK instead of COMMIT. Temp tables have a row in pg_class... Have you tried getting a profile of what exactly PostgreSQL is doing that takes so long when creating a temp table? BTW, I suspect catalogs might be the answer, which is why Oracle has you define a temp table once (which does all the work of putting it in the catalog) and then you just use it accordingly in each individual session. -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] [HACKERS] Big IN() clauses etc : feature proposal
On Tue, May 09, 2006 at 06:29:31PM +0200, PFC wrote: You mean the cursors'storage is in fact the same internal machinery as a temporary table ? Use the source, Luke... See tuplestore_begin_heap in backend/utils/sort/tuplestore.c and heap_create_with_catalog in backend/catalog/heap.c. You'll find that creating a tuplestore is far easier than creating a temp table. Perhaps it would be worth creating a class of temporary tables that used a tuplestore, although that would greatly limit what could be done with that temp table. Something else worth considering is not using the normal catalog methods for storing information about temp tables, but hacking that together would probably be a rather large task. -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] [HACKERS] Big IN() clauses etc : feature proposal
On Tue, May 09, 2006 at 06:29:31PM +0200, PFC wrote: You mean the cursors'storage is in fact the same internal machinery as a temporary table ? Use the source, Luke... LOL, yeah, I should have, sorry. See tuplestore_begin_heap in backend/utils/sort/tuplestore.c and heap_create_with_catalog in backend/catalog/heap.c. You'll find that creating a tuplestore is far easier than creating a temp table. I had used intuition (instead of the source) to come at the same conclusion regarding the level of complexity of these two... But I'll look at the source ;) Perhaps it would be worth creating a class of temporary tables that used a tuplestore, although that would greatly limit what could be done with that temp table. Just selecting from it I guess, but that's all that's needed. Anymore would duplicate the functionality of a temp table. I find cursors awkward. The application can FETCH from them, but postgres itself can't do it in SQL, unless using FOR.. IN in plpgsql... It would be a powerful addition to be able to split queries, factor out common parts between multiple queries, etc, using this system, it can even be used to execute an inner part of a query, then plan the rest according to the results and execute it... without the overhead of a temp table. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] in memory views
Scott Marlowe wrote: On Wed, 2006-05-10 at 10:41, Thomas Vatter wrote: Scott Marlowe wrote: What happens if you do this by declaring it as a cursor and then fetching the first row? I do executeQuery(), for the resultSet I do next() and return one row, but wait, I have to review the logic in this area, I can tell you tomorrow A good short test is to run explain analyze on the query from the psql command line. If it shows an execution time of significantly less than what you get from you application, then it is likely that the real problem is that your application is receiving the whole result set via libpq and waiting for that. A cursor will solve that problem. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org Yes, the difference between psql command line and application is 6 seconds to 40 seconds. It is exactly the step resultSet = excecuteQuery() that needs 40 seconds. I use next() as a cursor through the resultSet, but I fear this is not enough, do I have to use createStatement(resultSetType, resultSetConcurrency) respectively prepareStatement (resultSetType, resultSetConcurrency) to achieve the cursor behaviour? regards tom
Re: [PERFORM] in memory views
On Wed, 2006-05-10 at 15:54, Thomas Vatter wrote: Yes, the difference between psql command line and application is 6 seconds to 40 seconds. It is exactly the step resultSet = excecuteQuery() that needs 40 seconds. I use next() as a cursor through the resultSet, but I fear this is not enough, do I have to use createStatement(resultSetType, resultSetConcurrency) respectively prepareStatement (resultSetType, resultSetConcurrency) to achieve the cursor behaviour? Not sure. I don't use a lot of prepared statements. I tend to build queries and throw the at the database. In that instance, it's done like: create cursor cursorname as select (rest of query here); fetch from cursorname; You can find more on cursors here: http://www.postgresql.org/docs/8.1/interactive/sql-declare.html Not sure if you can use them with prepared statements, or if prepared statements have their own kind of implementation. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] in memory views
Title: Message Are you using the Postgres JDBC driver? Or are you using an ODBC JDBC driver? The Postgres specific driver is usually faster. -Original Message-From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Thomas VatterSent: Wednesday, May 10, 2006 3:54 PMTo: Scott MarloweCc: Tino Wildenhain; pgsql-performance@postgresql.orgSubject: Re: [PERFORM] in memory viewsScott Marlowe wrote: On Wed, 2006-05-10 at 10:41, Thomas Vatter wrote: Scott Marlowe wrote: What happens if you do this by declaring it as a cursor and then fetching the first row? I do executeQuery(), for the resultSet I do next() and return one row, but wait, I have to review the logic in this area, I can tell you tomorrow A good short test is to run explain analyze on the query from the psql command line. If it shows an execution time of significantly less than what you get from you application, then it is likely that the real problem is that your application is receiving the whole result set via libpq and waiting for that. A cursor will solve that problem. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org Yes, the difference between psql command line and application is 6 seconds to 40 seconds. It isexactly the step resultSet = excecuteQuery() that needs 40 seconds. I use next() as a cursorthrough the resultSet, but I fear this is not enough, do I have to use createStatement(resultSetType, resultSetConcurrency) respectively prepareStatement (resultSetType, resultSetConcurrency) toachieve the cursor behaviour?regardstom
Re: [PERFORM] in memory views
Title: Message Dave Dutcher wrote: Are you using the Postgres JDBC driver? Or are you using an ODBC JDBC driver? The Postgres specific driver is usually faster. I'm using the postgres driver regards tom -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] On Behalf Of Thomas Vatter Sent: Wednesday, May 10, 2006 3:54 PM To: Scott Marlowe Cc: Tino Wildenhain; pgsql-performance@postgresql.org Subject: Re: [PERFORM] in memory views Scott Marlowe wrote: On Wed, 2006-05-10 at 10:41, Thomas Vatter wrote: Scott Marlowe wrote: What happens if you do this by declaring it as a cursor and then fetching the first row? I do executeQuery(), for the resultSet I do next() and return one row, but wait, I have to review the logic in this area, I can tell you tomorrow A good short test is to run explain analyze on the query from the psql command line. If it shows an execution time of significantly less than what you get from you application, then it is likely that the real problem is that your application is receiving the whole result set via libpq and waiting for that. A cursor will solve that problem. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org Yes, the difference between psql command line and application is 6 seconds to 40 seconds. It is exactly the step resultSet = excecuteQuery() that needs 40 seconds. I use next() as a cursor through the resultSet, but I fear this is not enough, do I have to use createStatement(resultSetType, resultSetConcurrency) respectively prepareStatement (resultSetType, resultSetConcurrency) to achieve the cursor behaviour? regards tom -- Mit freundlichen Gren / Regards Vatter Network Inventory Software Sun Microsystems Principal Partner www.network-inventory.de Tel. 030-79782510 E-Mail [EMAIL PROTECTED]
Re: [PERFORM] in memory views
Scott Marlowe wrote: On Wed, 2006-05-10 at 15:54, Thomas Vatter wrote: Yes, the difference between psql command line and application is 6 seconds to 40 seconds. It is exactly the step resultSet = excecuteQuery() that needs 40 seconds. I use next() as a cursor through the resultSet, but I fear this is not enough, do I have to use createStatement(resultSetType, resultSetConcurrency) respectively prepareStatement (resultSetType, resultSetConcurrency) to achieve the cursor behaviour? Not sure. I don't use a lot of prepared statements. I tend to build queries and throw the at the database. In that instance, it's done like: create cursor cursorname as select (rest of query here); fetch from cursorname; You can find more on cursors here: http://www.postgresql.org/docs/8.1/interactive/sql-declare.html Not sure if you can use them with prepared statements, or if prepared statements have their own kind of implementation. ---(end of broadcast)--- TIP 6: explain analyze is your friend Yes, I have used embedded sql and create cursor, fetch before I started with jdbc, seems that I have to find out if new jdbc has a better way than simply resultSet = statement.executeQuery(). regards tom
Re: [PERFORM] Lot'sa joins - performance tip-up, please?
On Thu, May 04, 2006 at 04:45:57PM +0200, Mario Splivalo wrote: Well, here's the problem... - Nested Loop (cost=0.00..176144.30 rows=57925 width=26) (actual time=1074.984..992536.243 rows=57925 loops=1) - Seq Scan on ticketing_codes_played (cost=0.00..863.25 rows=57925 width=8) (actual time=74.479..2047.993 rows=57925 loops=1) - Index Scan using ticketing_codes_pk on ticketing_codes (cost=0.00..3.01 rows=1 width=18) (actual time=17.044..17.052 rows=1 loops=57925) Index Cond: (ticketing_codes.code_id = outer.code_id) Anyone have any idea why on earth it's doing that instead of a hash or merge join? In any case, try swapping the order of ticketing_codes_played and ticketing_codes. Actually, that'd probably make it worse. Try SET enable_nestloop = off; -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] in memory views
On Wed, 10 May 2006, Thomas Vatter wrote: Yes, the difference between psql command line and application is 6 seconds to 40 seconds. It is exactly the step resultSet = excecuteQuery() that needs 40 seconds. I use next() as a cursor through the resultSet, but I fear this is not enough, do I have to use createStatement(resultSetType, resultSetConcurrency) respectively prepareStatement (resultSetType, resultSetConcurrency) to achieve the cursor behaviour? http://jdbc.postgresql.org/documentation/81/query.html#query-with-cursor Kris Jurka ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] UNSUBSCRIBE
On Wed, May 10, 2006 at 11:10:37AM -0400, Tom Lane wrote: Michael Glaesemann [EMAIL PROTECTED] writes: (And are there mail readers out there that can pick those subscribe/ unsubscribe headers from the list emails? Now *that'd* be sweet.) Well, in my fairly ancient copy of exmh, any message with such headers causes an additional menu to appear: Based on the constantly broken threading in the lists, I'd bet that less than 20% of posters use something more sophisticated than MS LookOut!, and I'm sure that the stats for subscribers are far worse. Does majordomo have an option to automagically handle such posts that are sent to the post address instead of the admin address? I know mailman can do that... -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
[PERFORM] Same query - Slow in production
I'm trying to determine why an identical query is running approximately 500 to 1000 times slower on our production database compared to our backup database server. Both database servers are dual 2.3 GHz G5 Xserves running PostgreSQL 8.1.3; both are configured with 8GB of RAM with identical shared memory settings; both postgresql.conf files are identical; both databases have identical indexes defined. The three relevant tables are all clustered the same, although I'm not sure when clustering was last performed on either server. All three tables have recently been analyzed on both servers. The different explain plans for this query seem to be consistent on both servers regardless of category and the production server is consistently and drastically slower than the backup server. If anyone has any ideas on how to have the production server generate the same explain plan as the backup server, or can suggest anything I might want to try, I would greatly appreciate it. Brian Wipf ClickSpace Interactive Inc. [EMAIL PROTECTED] Here's the query: SELECT ac.attribute_id FROMattribute_category ac WHERE is_browsable = 'true' AND category_id = 1000962 AND EXISTS ( SELECT 'X' FROMproduct_attribute_value pav, category_product cp WHERE pav.attribute_id = ac.attribute_id AND pav.status_code is null AND pav.product_id = cp.product_id AND cp.category_id = ac.category_id AND cp.product_is_active = 'true' AND cp.product_status_code = 'complete' ) Explain plans: Fast (backup server): Index Scan using attribute_category__category_id_fk_idx on attribute_category ac (cost=0.00..47943.34 rows=7 width=4) (actual time=0.110..0.263 rows=5 loops=1) Index Cond: (category_id = 1000962) Filter: (((is_browsable)::text = 'true'::text) AND (subplan)) SubPlan - Nested Loop (cost=0.00..7983.94 rows=3 width=0) (actual time=0.043..0.043 rows=1 loops=5) - Index Scan using category_product__category_id_is_active_and_status_idx on category_product cp (cost=0.00..4362.64 rows=1103 width=4) (actual time=0.013..0.015 rows=2 loops=5) Index Cond: ((category_id = $1) AND ((product_is_active)::text = 'true'::text) AND ((product_status_code)::text = 'complete'::text)) - Index Scan using product_attribute_value__product_id_fk_idx on product_attribute_value pav (cost=0.00..3.27 rows=1 width=4) (actual time=0.016..0.016 rows=1 loops=8) Index Cond: (pav.product_id = outer.product_id) Filter: ((attribute_id = $0) AND (status_code IS NULL)) Total runtime: 0.449 ms (11 rows) Slow (production server): Index Scan using attribute_category__category_id_fk_idx on attribute_category ac (cost=0.00..107115.90 rows=7 width=4) (actual time=1.472..464.437 rows=5 loops=1) Index Cond: (category_id = 1000962) Filter: (((is_browsable)::text = 'true'::text) AND (subplan)) SubPlan - Nested Loop (cost=18.33..23739.70 rows=4 width=0) (actual time=92.870..92.870 rows=1 loops=5) - Bitmap Heap Scan on product_attribute_value pav (cost=18.33..8764.71 rows=2549 width=4) (actual time=10.191..45.672 rows=5869 loops=5) Recheck Cond: (attribute_id = $0) Filter: (status_code IS NULL) - Bitmap Index Scan on product_attribute_value__attribute_id_fk_idx (cost=0.00..18.33 rows=2952 width=0) (actual time=9.160..9.160 rows=0 loops=5) Index Cond: (attribute_id = $0) - Index Scan using x_category_product_pk on category_product cp (cost=0.00..5.86 rows=1 width=4) (actual time=0.007..0.007 rows=0 loops=29345) Index Cond: ((cp.category_id = $1) AND (outer.product_id = cp.product_id)) Filter: (((product_is_active)::text = 'true'::text) AND ((product_status_code)::text = 'complete'::text)) Total runtime: 464.667 ms (14 rows) Table Descriptions: \d attribute_category; Table public.attribute_category Column | Type | Modifiers -+--+--- attribute_id| integer | not null category_id | integer | not null is_browsable| character varying(5) | is_required | character varying(5) | sort_order | integer | default_unit_id | integer | Indexes: attribute_category_pk PRIMARY KEY, btree (attribute_id, category_id) attribute_category__attribute_id_fk_idx btree (attribute_id) attribute_category__category_id_fk_idx btree (category_id) CLUSTER Foreign-key constraints: attribute_category_attribute_fk FOREIGN KEY (attribute_id)
Re: [PERFORM] in memory views
Kris Jurka wrote: On Wed, 10 May 2006, Thomas Vatter wrote: Yes, the difference between psql command line and application is 6 seconds to 40 seconds. It is exactly the step resultSet = excecuteQuery() that needs 40 seconds. I use next() as a cursor through the resultSet, but I fear this is not enough, do I have to use createStatement(resultSetType, resultSetConcurrency) respectively prepareStatement (resultSetType, resultSetConcurrency) to achieve the cursor behaviour? http://jdbc.postgresql.org/documentation/81/query.html#query-with-cursor Kris Jurka I was just returning to my mailbox to report success, I was just a bit faster than your e-mail, I have found the fetchSize function, it reduces the delay to 6 seconds. thanks a lot to all who helped, this was really great support, I am glad that the problem is solved tom ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Same query - Slow in production
I added to the exists query qualifier: AND cp.category_id = 1000962 (in addition to the cp.category_id = ac.category_id) Now I am getting a much better query plan on our production server: Index Scan using attribute_category__category_id_fk_idx on attribute_category ac (cost=0.00..485.71 rows=7 width=4) (actual time=0.104..0.351 rows=5 loops=1) Index Cond: (category_id = 1000962) Filter: (((is_browsable)::text = 'true'::text) AND (subplan)) SubPlan - Nested Loop (cost=0.00..24.77 rows=1 width=0) (actual time=0.058..0.058 rows=1 loops=5) - Index Scan using x_category_product__category_id_fk_idx on category_product cp (cost=0.00..6.01 rows=1 width=4) (actual time=0.014..0.014 rows=1 loops=5) Index Cond: ((category_id = $1) AND (category_id = 1000962)) Filter: (((product_is_active)::text = 'true'::text) AND ((product_status_code)::text = 'complete'::text)) - Index Scan using product_attribute_value__product_id_fk_idx on product_attribute_value pav (cost=0.00..18.75 rows=1 width=4) (actual time=0.041..0.041 rows=1 loops=5) Index Cond: (pav.product_id = outer.product_id) Filter: ((attribute_id = $0) AND (status_code IS NULL)) Total runtime: 0.558 ms (12 rows) It is using the x_category_product__category_id_fk_idx on category_product instead of the category_product__category_id_is_active_and_status_idx index as on our backup server. Still not sure what's causing the differences in query execution between the servers, but at least the query is fast again. Brian On 10-May-06, at 4:39 PM, Brian Wipf wrote: I'm trying to determine why an identical query is running approximately 500 to 1000 times slower on our production database compared to our backup database server. Both database servers are dual 2.3 GHz G5 Xserves running PostgreSQL 8.1.3; both are configured with 8GB of RAM with identical shared memory settings; both postgresql.conf files are identical; both databases have identical indexes defined. The three relevant tables are all clustered the same, although I'm not sure when clustering was last performed on either server. All three tables have recently been analyzed on both servers. The different explain plans for this query seem to be consistent on both servers regardless of category and the production server is consistently and drastically slower than the backup server. If anyone has any ideas on how to have the production server generate the same explain plan as the backup server, or can suggest anything I might want to try, I would greatly appreciate it. Brian Wipf ClickSpace Interactive Inc. [EMAIL PROTECTED] Here's the query: SELECT ac.attribute_id FROMattribute_category ac WHERE is_browsable = 'true' AND category_id = 1000962 AND EXISTS ( SELECT 'X' FROMproduct_attribute_value pav, category_product cp WHERE pav.attribute_id = ac.attribute_id AND pav.status_code is null AND pav.product_id = cp.product_id AND cp.category_id = ac.category_id AND cp.product_is_active = 'true' AND cp.product_status_code = 'complete' ) Explain plans: Fast (backup server): Index Scan using attribute_category__category_id_fk_idx on attribute_category ac (cost=0.00..47943.34 rows=7 width=4) (actual time=0.110..0.263 rows=5 loops=1) Index Cond: (category_id = 1000962) Filter: (((is_browsable)::text = 'true'::text) AND (subplan)) SubPlan - Nested Loop (cost=0.00..7983.94 rows=3 width=0) (actual time=0.043..0.043 rows=1 loops=5) - Index Scan using category_product__category_id_is_active_and_status_idx on category_product cp (cost=0.00..4362.64 rows=1103 width=4) (actual time=0.013..0.015 rows=2 loops=5) Index Cond: ((category_id = $1) AND ((product_is_active)::text = 'true'::text) AND ((product_status_code)::text = 'complete'::text)) - Index Scan using product_attribute_value__product_id_fk_idx on product_attribute_value pav (cost=0.00..3.27 rows=1 width=4) (actual time=0.016..0.016 rows=1 loops=8) Index Cond: (pav.product_id = outer.product_id) Filter: ((attribute_id = $0) AND (status_code IS NULL)) Total runtime: 0.449 ms (11 rows) Slow (production server): Index Scan using attribute_category__category_id_fk_idx on attribute_category ac (cost=0.00..107115.90 rows=7 width=4) (actual time=1.472..464.437 rows=5 loops=1) Index Cond: (category_id = 1000962) Filter: (((is_browsable)::text = 'true'::text) AND (subplan)) SubPlan - Nested Loop (cost=18.33..23739.70 rows=4 width=0) (actual time=92.870..92.870 rows=1
Re: [HACKERS] [PERFORM] Big IN() clauses etc : feature proposal
Jim C. Nasby [EMAIL PROTECTED] writes: On Tue, May 09, 2006 at 03:13:01PM -0400, Tom Lane wrote: PFC [EMAIL PROTECTED] writes: Fun thing is, the rowcount from a temp table (which is the problem here) should be available without ANALYZE ; as the temp table is not concurrent, it would be simple to inc/decrement a counter on INSERT/DELETE... No, because MVCC rules still apply. But can anything ever see more than one version of what's in the table? Yes, because there can be more than one active snapshot within a single transaction (think about volatile functions in particular). Speaking of which, if a temp table is defined as ON COMMIT DROP or DELETE ROWS, there shouldn't be any need to store xmin/xmax, only cmin/cmax, correct? No; you forgot about subtransactions. regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Same query - Slow in production
Brian Wipf [EMAIL PROTECTED] writes: I'm trying to determine why an identical query is running approximately 500 to 1000 times slower on our production database compared to our backup database server. It looks to me like it's pure luck that the query is fast on the backup server. The outer side of the EXISTS' join is being badly misestimated: - Index Scan using category_product__category_id_is_active_and_status_idx on category_product cp (cost=0.00..4362.64 rows=1103 width=4) (actual time=0.013..0.015 rows=2 loops=5) Index Cond: ((category_id = $1) AND ((product_is_active)::text = 'true'::text) AND ((product_status_code)::text = 'complete'::text)) If there actually had been 1100 matching rows instead of 2, the query would have run 550 times slower, putting it in the same ballpark as the other plan. So what I'm guessing is that the planner sees these two plans as being nearly the same cost, and small differences in the stats between the two databases are enough to tip its choice in one direction or the other. So what you want, of course, is to improve that rowcount estimate. I suppose the reason it's so bad is that we don't have multicolumn statistics ... is there a strong correlation between product_is_active and product_status_code? If so, it might be worth your while to find a way to merge them into one column. regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster