Re: [PERFORM] How much expensive are row level statistics?
On Thu, 2005-12-15 at 19:06 -0500, Tom Lane wrote: Michael Fuhr [EMAIL PROTECTED] writes: Does the backend support, or could it be easily modified to support, a mechanism that would post the command string after a configurable amount of time had expired, and then continue processing the query? Not really, unless you want to add the overhead of setting a timer interrupt for every query. Which is sort of counterproductive when the motivation is to reduce overhead ... (It might be more or less free if you have statement_timeout set, since there would be a setitimer call anyway. But I don't think that's the norm.) We could do the deferred send fairly easily. You need only set a timer when stats_command_string = on, so we'd only do that when requested by the admin. Overall, that would be a cheaper way of doing it than now. However, I'm more inclined to the idea of a set of functions that allow an administrator to retrieve the full SQL text executing in a backend, with an option to return an EXPLAIN of the currently executing plan. Right now, stats only gives you the first 1000 chars, so you're always stuck if its a big query. Plus we don't yet have a way of getting the exact EXPLAIN of a running query (you can get close, but it could differ). Pull is better than push. Asking specific backends what they're doing when you need to know will be efficient; asking them to send their command strings, all of the time, deferred or not will always be more wasteful. Plus if you forgot to turn on stats_command_string before execution, then you've no way of knowing anyhow. Best Regards, Simon Riggs ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Crashing DB or Server?
On 12/16/05, Moritz Bayer [EMAIL PROTECTED] wrote: This is really weird, just a few hours ago the machine run very smooth serving the data for a big portal. Can you log the statements that are taking a long time and post them to the list with the table structures and indexes for the tables being used. To do this turn on logging for statements taking a long time, edit postgresql.conf file and change the following two parameters. log_min_duration_statement = 2000 # 2 seconds Your log should now be catching the statements that are slow. Then use the statements to get the explain plan ie dbnamr=# explain [sql thats taking a long time] We would also need to see the table structures. dbname=# \d [table name of each table in above explain plan] Has anybody an idea what might have happened here? I need a quick solution, since I'm talking about an live server that should be running 24 hours a day. It may be that the planner has started to pick a bad plan. This can happen if the database is regularly changing and the stats are not up to date. I believe it can happen even if the stats are up to date but is much less likely to do so. It might also be an idea to vacuum the database. dbname=# VACUUM ANALYZE; This will load the server up for a while though. -- http://www.hjackson.org http://www.uklug.co.uk ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Overriding the optimizer
Kevin Brown wrote: Craig A. James wrote: Hints are dangerous, and I consider them a last resort. If you consider them a last resort, then why do you consider them to be a better alternative than a workaround such as turning off enable_seqscan, when all the other tradeoffs are considered? I would like a bit finer degree of control on this - I'd like to be able to tell PG that for my needs, it is never OK to scan an entire table of more than N rows. I'd typically set N to 1,000,000 or so. What I would really like is for my DBMS to give me a little more pushback - I'd like to ask it to run a query, and have it either find a good way to run the query, or politely refuse to run it at all. Yes, I know that is an unusual request :-) The context is this - in a busy OLTP system, sometimes a query comes through that, for whatever reason (foolishness on my part as a developer, unexpected use by a user, imperfection of the optimizer, etc.), takes a really long time to run, usually because it table-scans one or more large tables. If several of these happen at once, it can grind an important production system effectively to a halt. I'd like to have a few users/operations get a sorry, I couldn't find a good way to do that message, rather than all the users find that their system has effectively stopped working. Kyle Cordes www.kylecordes.com ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[PERFORM] ALTER TABLE SET TABLESPACE and pg_toast
We're storing tif images in a table as bytea. We were running low on our primary space and moved several tables, including the one with the images, to a second tablespace using ALTER TABLE SET TABLESPACE. This moved quite cleaned out quite a bit of space on the original tablespace, but not as much as it should have. It does not appear that the corresponding pg_toast tables were moved. So, my questions are: 1) Is there a way to move pg_toast tables to new tablespaces (or at least assure that new ones are created there)? 2) Also, is there a good way to determine which pg_toast tables are associated with any particular table and column? Thank you for your help, Martin ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] How much expensive are row level statistics?
Now there goes Tom with his skeptical eye again, and here comes me saying oops again. Further tests show that for this application I made the same mistake, fwiw. The big hit comes with command_string. However, row level stats bring a big enough penalty (~10% on my usage) that I keep them turned off. The penalty is not just run time either, but increased cpu time. It just isn't an essential feature so unless it causes near zero extra load it will stay off on my servers. Additionally, back when I was testing the win32/pg platform I was getting random restarts of the stats collector when the server was under high load and row_level stats were on. This was a while back so this issue may or may not be resolved...it was really nasty because it cleared out pg_stats_activity which in turn ruined my admin tools. I should probably give that another look. Merlin ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Overriding the optimizer
On 12/16/05, Kyle Cordes [EMAIL PROTECTED] wrote: Kevin Brown wrote: Craig A. James wrote: Hints are dangerous, and I consider them a last resort. If you consider them a last resort, then why do you consider them to be a better alternative than a workaround such as turning off enable_seqscan, when all the other tradeoffs are considered? I would like a bit finer degree of control on this - I'd like to be able to tell PG that for my needs, it is never OK to scan an entire table of more than N rows. I'd typically set N to 1,000,000 or so. What I would really like is for my DBMS to give me a little more pushback - I'd like to ask it to run a query, and have it either find a good way to run the query, or politely refuse to run it at all. Yes, I know that is an unusual request :-) The context is this - in a busy OLTP system, sometimes a query comes through that, for whatever reason (foolishness on my part as a developer, unexpected use by a user, imperfection of the optimizer, etc.), takes a really long time to run, usually because it table-scans one or more large tables. If several of these happen at once, it can grind an important production system effectively to a halt. I'd like to have a few users/operations get a sorry, I couldn't find a good way to do that message, rather than all the users find that their system has effectively stopped working. Kyle Cordes www.kylecordes.com set statement_timeout in postgresql.conf -- Atentamente, Jaime Casanova (DBA: DataBase Aniquilator ;) ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Overriding the optimizer
Dnia 16-12-2005, pią o godzinie 16:16 +1300, Mark Kirkwood napisał(a): Craig A. James wrote: What would be cool would be some way the developer could alter the plan, but they way of doing so would strongly encourage the developer to send the information to this mailing list. Postgres would essentially say, Ok, you can do that, but we want to know why! Yeah it would - an implementation I have seen that I like is where the developer can supply the *entire* execution plan with a query. This is complex enough to make casual use unlikely :-), but provides the ability to try out other plans, and also fix that vital query that must run today. I think you could use SPI for that. There is function SPI_prepare, which prepares plan, and SPI_execute_plan, executing it. These functions are defined in src/backend/executor/spi.c. I think (someone please correct me if I'm wrong) you could prepare plan yourself, instead of taking it from SPI_prepare, and give it to SPI_execute_plan. SPI_prepare calls _SPI_prepare_plan, which parses query and calls pg_analyze_and_rewrite. In your version don't call this function, but provide PostgreSQL with your own plan (not-optimised according to PostrgeSQL, but meeting your criteria). -- Tomasz Rybak [EMAIL PROTECTED] ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] ALTER TABLE SET TABLESPACE and pg_toast
PostgreSQL [EMAIL PROTECTED] writes: We're storing tif images in a table as bytea. We were running low on our primary space and moved several tables, including the one with the images, to a second tablespace using ALTER TABLE SET TABLESPACE. This moved quite cleaned out quite a bit of space on the original tablespace, but not as much as it should have. It does not appear that the corresponding pg_toast tables were moved. I think you're mistaken; at least, the SET TABLESPACE code certainly intends to move a table's toast table and index along with the table. What's your evidence for saying it didn't happen, and which PG version are you using exactly? 2) Also, is there a good way to determine which pg_toast tables are associated with any particular table and column? pg_class.reltoastrelid and reltoastidxid. See http://www.postgresql.org/docs/8.1/static/storage.html http://www.postgresql.org/docs/8.1/static/catalog-pg-class.html regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Overriding the optimizer
Jaime Casanova wrote: The context is this - in a busy OLTP system, sometimes a query comes through that, for whatever reason (foolishness on my part as a developer, unexpected use by a user, imperfection of the optimizer, etc.), takes a really long time to run, usually because it table-scans one or more large tables. If several of these happen at once, it can grind an important production system effectively to a halt. I'd like to have a few users/operations get a sorry, I couldn't find a good way to do that message, rather than all the users find that their system has effectively stopped working. ... set statement_timeout in postgresql.conf I found it's better to use set statement_timeout in the code, rather than setting it globally. Someone else pointed out to me that setting it in postgresql.conf makes it apply to ALL transactions, including VACUUM, ANALYZE and so forth. I put it in my code just around the queries that are user generated -- queries that are from users' input. I expect any SQL that I write to finish in a reasonable time ;-). Craig ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Overriding the optimizer
On Thu, 2005-12-15 at 18:23 -0800, Craig A. James wrote: So, you still have no problem is exactly wrong, because Postgres picked the wrong plan. Postgres decided that applying myfunc() to 10,000,000 rows was a better plan than an index scan of 50,000 row_nums. So I'm screwed. FWIW, The cost_functionscan procedure in costsize.c has the following comment: /* * For now, estimate function's cost at one operator eval per function * call. Someday we should revive the function cost estimate columns in * pg_proc... */ I recognize that you're trying to talk about the issue in general rather than about this particular example. However, the example does seem to me to be exactly the case where the effort might be better spent improving the optimizer (reviving the function cost estimate columns), rather than implementing a general hinting facility. Which one is more effort? I don't really know for sure, but cost_functionscan does seem pretty straightforward. What percentage of problems raised on this list can be fixed by setting configuration parameters, adding indexes, increasing statistics, or re-architecting a crazy schema? I've only been lurking for a few months, but it seems like a pretty large fraction. Of the remainder, what percentage represent actual useful feedback about what needs improvement in the optimizer? A pretty large fraction, I think. Including your example. Personally, I think whoever was arguing for selectivity hints in -hackers recently made a pretty good point, so I'm partly on your side. Actually, function cost hints don't really seem that much different from selectivity hints, and both seem to me to be slicker solutions (closer to the right level of abstraction) than a general hint facility. Mitch ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[PERFORM] 8.1 - pg_autovacuum question
In PostgreSQL 8.1, is the pg_autovacuum daemon affected by the vacuum_cost_* variables? I need to make sure that if we turn autovacuuming on when we upgrade to 8.1, we don't cause any i/o issues. Thanks, Chris ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] 8.1 - pg_autovacuum question
Chris Hoover wrote: In PostgreSQL 8.1, is the pg_autovacuum daemon affected by the vacuum_cost_* variables? I need to make sure that if we turn autovacuuming on when we upgrade to 8.1, we don't cause any i/o issues. What pg_autovacuum daemon? The contrib one? I don't know. The integrated one? Yes it is; and you can set autovacuum-specific values in postgresql.conf and table-specific values (used for autovacuum only) in pg_autovacuum. -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Simple Join
David Lang wrote: On Fri, 16 Dec 2005, Mark Kirkwood wrote: Right on. Some of these coerced plans may perform much better. If so, we can look at tweaking your runtime config: e.g. effective_cache_size random_page_cost default_statistics_target to see if said plans can be chosen naturally. Mark, I've seen these config options listed as tweaking targets fairly frequently, has anyone put any thought or effort into creating a test program that could analyse the actual system and set the defaults based on the measured performance? I am sure this has been discussed before, I found this thread - http://archives.postgresql.org/pgsql-performance/2004-07/msg00189.php but I seem to recall others (but offhand can't find any of them). I think that the real difficultly here is that the construction of the test program is non trivial - for instance, the best test program for tuning *my* workload is my application with its collection of data, but it is probably not a good test program for *anyone else's* workload. cheers Mark ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Lots of postmaster processes (fwd)
Dunno if this has gotten a reply elsewhere, but during a checkpoint the database can become quite busy. If that happens and performance slows down, other queries will slow down as well. If you have an app where a a high rate of incomming requests (like a busy website), existing backends won't be able to keep up with demand, so incomming connections will end up spawning more connections to the database. On Tue, Dec 13, 2005 at 11:25:40AM -0600, Ameet Kini wrote: Resending it here as it may be more relevant here... Ameet -- Forwarded message -- Date: Tue, 13 Dec 2005 11:24:26 -0600 (CST) From: Ameet Kini [EMAIL PROTECTED] To: pgsql-admin@postgresql.org Subject: Lots of postmaster processes In our installation of the postgres 7.4.7, we are seeing a lot of the following postmaster processes (around 50) being spawned by the initial postmaster process once in a while: postgres 3977 1 1 Nov03 ?15:11:38 /s/postgresql-7.4.7/bin/postmaster -D /scratch.1/postgres/condor-db-7.4.7 .. postgres 31985 3977 0 10:08 ?00:00:00 /s/postgresql-7.4.7/bin/postmaster -D /scratch.1/postgres/condor-db-7.4.7 postgres 31986 3977 0 10:08 ?00:00:00 /s/postgresql-7.4.7/bin/postmaster -D /scratch.1/postgres/condor-db-7.4.7 postgres 31987 3977 0 10:08 ?00:00:00 /s/postgresql-7.4.7/bin/postmaster -D /scratch.1/postgres/condor-db-7.4.7 postgres 31988 3977 0 10:08 ?00:00:00 /s/postgresql-7.4.7/bin/postmaster -D /scratch.1/postgres/condor-db-7.4.7 .. At the same time when these processes being spawned, sometimes there is also the checkpoint subprocess. I am not sure if that is related. The document doesn't provide any information. The other activity going on at the same time is a 'COPY' statement from a client application. These extra processes put a considerable load on the machine and cause it to hang up. Thanks, Ameet ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] SAN/NAS options
On Wed, Dec 14, 2005 at 08:28:56PM +1300, Mark Kirkwood wrote: Another interesting thing to try is rebuilding the database ufs filesystem(s) with 32K blocks and 4K frags (as opposed to 8K/1K or 16K/2K - can't recall the default on 4.x). I found this to give a factor of 2 speedup on random disk access (specifically queries doing indexed joins). Even if you're doing a lot of random IO? I would think that random IO would perform better if you use smaller (8K) blocks, since there's less data being read in and then just thrown away that way. Is it mainly your 2 disk machines that are IOPS bound? if so, a cheap option may be to buy 2 more cheetahs for them! If it's the 4's, well how about a 2U U320 diskpack from whomever supplies you the Supermicro boxes? Also, on the 4 drive machines if you can spare the room you might see a big gain by putting the tables on one mirror and the OS and transaction logs on the other. -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] SAN/NAS options
On Wed, Dec 14, 2005 at 01:56:10AM -0500, Charles Sprickman wrote: You'll note that I'm being somewhat driven by my OS of choice, FreeBSD. Unlike Solaris or other commercial offerings, there is no nice volume management available. While I'd love to keep managing a dozen or so FreeBSD boxes, I could be persuaded to go to Solaris x86 if the volume management really shines and Postgres performs well on it. Have you looked at vinum? It might not qualify as a true volume manager, but it's still pretty handy. -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Overriding the optimizer
On Thu, Dec 15, 2005 at 09:48:55PM -0800, Kevin Brown wrote: Craig A. James wrote: Kevin Brown wrote: Hints are dangerous, and I consider them a last resort. If you consider them a last resort, then why do you consider them to be a better alternative than a workaround such as turning off enable_seqscan, when all the other tradeoffs are considered? If I understand enable_seqscan, it's an all-or-nothing affair. Turning it off turns it off for the whole database, right? The same is true of all of the planner-tuning parameters in the postgres conf file. Nope. What's in the conf file are the defaults. You can change them on a per-connection basis, via the SET command. Thus, before doing your problematic query: SET enable_seqscan = off; and then, after your query is done, SET enable_seqscan = on; You can also turn it off inside a transaction and have it only affect that transaction so that you can't accidentally forget to turn it back on (which could seriously hose things up if you're doing this in conjunction with a connection pool). -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Overriding the optimizer
On Fri, Dec 16, 2005 at 03:31:03PM +1300, Mark Kirkwood wrote: After years of using several other database products (some supporting hint type constructs and some not), I have come to believe that hinting (or similar) actually *hinders* the development of a great optimizer. I don't think you can assume that would hold true for an open-source database. Unlike a commercial database, it's trivially easy to notify developers about a bad query plan. With a commercial database you'd have to open a support ticket and hope they actually use that info to improve the planner. Here you need just send an email to this list and the developers will at least see it, and will usually try and fix the issue. -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Overriding the optimizer
On Fri, Dec 16, 2005 at 04:16:58PM +1300, Mark Kirkwood wrote: Craig A. James wrote: What would be cool would be some way the developer could alter the plan, but they way of doing so would strongly encourage the developer to send the information to this mailing list. Postgres would essentially say, Ok, you can do that, but we want to know why! Yeah it would - an implementation I have seen that I like is where the developer can supply the *entire* execution plan with a query. This is complex enough to make casual use unlikely :-), but provides the ability to try out other plans, and also fix that vital query that must run today. Being able to specify an exact plan would also provide for query plan stability; something that is critically important in certain applications. If you have to meet a specific response time requirement for a query, you can't afford to have the optimizer suddenly decide that some other plan might be faster when in fact it's much slower. -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] SAN/NAS options
Jim C. Nasby wrote: On Wed, Dec 14, 2005 at 08:28:56PM +1300, Mark Kirkwood wrote: Another interesting thing to try is rebuilding the database ufs filesystem(s) with 32K blocks and 4K frags (as opposed to 8K/1K or 16K/2K - can't recall the default on 4.x). I found this to give a factor of 2 speedup on random disk access (specifically queries doing indexed joins). Even if you're doing a lot of random IO? I would think that random IO would perform better if you use smaller (8K) blocks, since there's less data being read in and then just thrown away that way. Yeah, that's what I would have expected too! but the particular queries I tested do a ton of random IO (correlation of 0.013 on the join column for the big table). I did wonder if the gain has something to do with the underlying RAID stripe size (64K or 256K in my case), as I have only tested the 32K vs 8K/16K on RAIDed systems. I guess for a system where the number of concurrent users give rise to memory pressure, it will cause more thrashing of the file buffer cache, much could be a net loss. Still worth trying out I think, you will know soon enough if it is a win or lose! Note that I did *not* alter Postgres page/block size (BLCKSZ) from 8K, so no dump/reload is required to test this out. cheers Mark ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] SAN/NAS options
On Fri, Dec 16, 2005 at 04:18:01PM -0600, Jim C. Nasby wrote: Even if you're doing a lot of random IO? I would think that random IO would perform better if you use smaller (8K) blocks, since there's less data being read in and then just thrown away that way. The overhead of reading an 8k block instead of a 32k block is too small to measure on modern hardware. The seek is what dominates; leaving the read head on a little longer and then transmitting a little more over a 200 megabyte channel is statistical fuzz. Mike Stone ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] SAN/NAS options
On Fri, Dec 16, 2005 at 05:51:03PM -0500, Michael Stone wrote: On Fri, Dec 16, 2005 at 04:18:01PM -0600, Jim C. Nasby wrote: Even if you're doing a lot of random IO? I would think that random IO would perform better if you use smaller (8K) blocks, since there's less data being read in and then just thrown away that way. The overhead of reading an 8k block instead of a 32k block is too small to measure on modern hardware. The seek is what dominates; leaving the read head on a little longer and then transmitting a little more over a 200 megabyte channel is statistical fuzz. True, but now you've got 4x the amount of data in your cache that you probably don't need. Looks like time to do some benchmarking... -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] SAN/NAS options
On Fri, Dec 16, 2005 at 06:25:25PM -0600, Jim C. Nasby wrote: True, but now you've got 4x the amount of data in your cache that you probably don't need. Or you might be 4x more likely to have data cached that's needed later. If you're hitting disk either way, that's probably more likely than the extra IO pushing something critical out--if *all* the important stuff were cached you wouldn't be doing the seeks in the first place. This will obviously be heavily dependent on the amount of ram you've got and your workload, so (as always) you'll have to benchmark it to get past the hand-waving stage. Mike Stone ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Overriding the optimizer
Jaime Casanova wrote: What I would really like is for my DBMS to give me a little more pushback - I'd like to ask it to run a query, and have it either find a good way to run the query, or politely refuse to run it at all. set statement_timeout in postgresql.conf That is what I am doing now, and it is much better than nothing. But it's not really sufficient, in that it is still quite possible for users repeatedly trying an operation that unexpectedly causes excessive DB usage, to load down the system to the point of failure. In other words, I'd ideally like it to give up right away, not after N seconds of table scanning my 100-million-row tables... and not with a timeout, but with an explicit throwing up of its hands, exasperated, that it could not come up with an efficient way to run my query. Kyle Cordes www.kylecordes.com ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] How much expensive are row level statistics?
Tom Lane wrote: Michael Fuhr [EMAIL PROTECTED] writes: Further tests show that for this application the killer is stats_command_string, not stats_block_level or stats_row_level. I tried it with pgbench -c 10, and got these results: 41% reduction in TPS rate for stats_command_string Woh, 41%. That's just off the charts! What are we doing internally that would cause that? -- Bruce Momjian| http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Should Oracle outperform PostgreSQL on a complex
Bruce Momjian wrote: How are star joins different from what we do now? --- Recall that a star query with n tables means a query where there are (n - 1) supposedly small tables (dimensions) and 1 large table (fact) - which has foreign keys to each dimension. As I understand it, the classic tar join is planned like this: 1) The result of the restriction clauses on each of the (small) dimension tables is computed. 2) The cartesian product of all the results of 1) is formed. 3) The fact (big) table is joined to the pseudo relation formed by 2). From what I have seen most vendor implementations do not (always) perform the full cartesion product of the dimensions, but do some of them, join to the fact, then join to the remaining dimensions afterwards. There is another join strategy called the star transformation where some of the dimension joins get rewritten as subqueries, then the above method is used again! This tends to be useful when the cartesion products would be stupidly large (e.g. sparse facts, or few restriction clauses) regards Mark P.s : Luke or Simon might have a better definition... but thought I'd chuck in my 2c... :-) ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Overriding the optimizer
Craig A. James [EMAIL PROTECTED] writes: How about this: Instead of arguing in the abstract, tell me in concrete terms how you would address the very specific example I gave, where myfunc() is a user-written function. To make it a little more challenging, try this: myfunc() can behave very differently depending on the parameters, and sometimes (but not always), the application knows how it will behave and could suggest a good execution plan. A word to the wise: regression=# explain select * from tenk1 where ten 5 and ten 9 regression-# and myfunc(unique1,unique2); QUERY PLAN -- Seq Scan on tenk1 (cost=0.00..533.00 rows=982 width=244) Filter: ((ten 5) AND (ten 9) AND myfunc(unique1, unique2)) (2 rows) regression=# explain select * from tenk1 where myfunc(unique1,unique2) regression-# and ten 5 and ten 9; QUERY PLAN -- Seq Scan on tenk1 (cost=0.00..533.00 rows=982 width=244) Filter: (myfunc(unique1, unique2) AND (ten 5) AND (ten 9)) (2 rows) I might have taken your original complaint more seriously if it weren't so blatantly bogus. Your query as written absolutely would not have evaluated myfunc() first, because there was no reason for the planner to reorder the WHERE list. regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly