Re: [ADMIN] [PERFORM] since when has pg_stat_user_indexes.idx_scan been counting?
Tomas Vondra wrote: Stats about access to the data (index/seq scans, cache hit ratio etc.) are stored in pg_stat_* and pg_statio_* catalogs, and are updated after running each query. AFAIK it's not a synchronous process, but when a backend finishes a query, it sends the stats to the postmaster (and postmaster updates the catalogs). Description in the docs goes over this in a little more detail http://www.postgresql.org/docs/current/static/monitoring-stats.html : "The statistics collector communicates with the backends needing information (including autovacuum) through temporary files. These files are stored in the pg_stat_tmp subdirectory...When using the statistics to monitor current activity, it is important to realize that the information does not update instantaneously. Each individual server process transmits new statistical counts to the collector just before going idle; so a query or transaction still in progress does not affect the displayed totals. Also, the collector itself emits a new report at most once per PGSTAT_STAT_INTERVAL milliseconds (500 unless altered while building the server). So the displayed information lags behind actual activity. However, current-query information collected by track_activities is always up-to-date." It's not synchronous at all. The clients create a temporary file for the statistics collector and move on. The actual statistics don't get updated until the statistics collector decides enough time has passed to bother, which defaults to at most every 500ms. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us "PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Checkpoint execution overrun impact?
drvillo wrote: -given the configuration attached (which is basically a vanilla one) and the number of buffers written at each execution, are these execution times normal or above average? Given the configuration attached, most of them are normal. One problem may be that your vanilla configuration has checkpoint_segments set to 3. There is some logic in the checkpoint code to try and spread checkpoint writes out over a longer period of time. The intention is for a slower write spread to disrupt concurrent client activity less. It doesn't work all that well unless you give it some more segments to work with. Also, with the default setting for shared_buffers, you are doing a lot more redundant writes than you should be. The following postgresql.conf changes should improve things for you: shared_buffers=256MB checkpoint_segments=10 wal_buffers=16MB You may have to adjust your kernel shared memory memory settings for that to work. See http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server for an intro to these and the other common parameters you should consider adjusting. -in the case of the execution that overruns past the timeout, what are the implications wrt the client application? There really aren't any in the database. The server will immediately begin another checkpoint. Some additional disk space is used. So long as the server doesn't run out of disk space from that, clients shouldn't care. -AFAIU client connections are basically stalled during checkpoints. Is it reasonable to infer that the fact that the application blocking on a getConnection() might be related to checkpoints being executed? It can be. What I suspect is happening during the bad one: 2011-04-22 06:51:41 CEST LOG: checkpoint complete: wrote 108 buffers (2.6%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=409.007 s, sync=4.672 s, total=414.070 s 2011-04-22 06:55:42 CEST LOG: could not receive data from client: No connection could be made because the target machine actively refused it. Is that something is happening on the disks of the server that keeps the database from being able to write efficiently during this checkpoint. It then slows the checkpoint so much that clients are timing out. The tuning changes I suggested will lower the total amount of I/O the server does between checkpoints, which will mean there is less information in the OS cache to write out when the checkpoint comes. That may help, if the problem is really in the database. -considering some tuning on the PG side, should I try increasing checkpoint_timeout and rising checkpoint_completion_target to lessen the impact of IO on the client or should I shorten the period so there's less stuff to write? from the number of buffers written on average I'd assume the first option is the one to go for but I might miss some bit of reasoning here... Your problems are likely because the operating system cache is getting filled with something that is slowing checkpoints down. Maybe it's the regular database writes during the five minutes between checkpoints; maybe it's something else running on the server. Whatever is happening, you're unlikely to make it better by adjusting how often they happen. Either get the database to write less between checkpoints (like the changes I suggested), or figure out what else is doing the writes. I suspect they are coming from outside the database, only because if you really had high write activity on this server you'd also be having checkpoints more frequently, too. I've read about RAID5 not being a wise setup for disks hosting PG, what about RAID1? The problem with RAID5 is that it lowers write performance of a larger number of disks so it's potentially no better than a single drive. RAID1 is essentially a single drive, too. You may discover you're just running over what one drive can do. Something odd does seem to be doing on though. Normally in your situation I would try to find some system downtime and test the read/write speed of the drives, look for issues there. As Robert said already, you shouldn't be running this slowly unless there's something going wrong. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us "PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] tuning on ec2
> Sounds like a reasonable starting point. You could certainly fiddle > around a bit - especially with shared_buffers - to see if some other > setting works better, but that should be in the ballpark. I tend to set it a bit higher on EC2 to discourage the VM from overcommitting memory I need. So, I'd do 2.5GB for that one. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] 'Interesting' prepared statement slowdown on large table join
On Thu, May 12, 2011 at 8:53 AM, Prodan, Andrei wrote: > > @Jeff: thank you for the clear plan interpretation - but I'm afraid I > don't really understand the second bit: > 1) I provided the GOOD plan, so we already know what postgres thinks, > right? (Later edit: guess not. Doesn't work) > 2) There's no full table scan in any of the plans - it scans indices, > the problem seems to be that it scans them in the wrong order because it > thinks there are very few WHERE matches in big_table - which is > incorrect, as for that particular pair there is a huge amount of rows. Hi Andrei, "Explain analyze" only gives you the cost/rows for the plan components it actually executed, it doesn't give you costs for alternative rejected plans. Since the GOOD PLAN doesn't include the index scan in question, it doesn't give the estimated or actual rows for that scan under the stats/conditions that provoke the GOOD PLAN to be adopted. So to get that information, you have to design an experimental prepared query that will get executed using that particular scan, that way it will report the results I wanted to see. My concern is that the experimental query I proposed you use might instead decide to use a full table scan rather than the desired index scan. Although come to think of it, I think the same code will be used to arrive at the predicted number of rows regardless of whether it does a FTS or the desired index scan. Cheers, Jeff -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] setting configuration values inside a stored proc
I've got a stored proc that constructs some aggregation queries as strings and then executes them. I'd like to be able to increase work_mem before running those queries. If I set a new value for work_mem within the stored proc prior to executing my query string, will that actually have an impact on the query or is work_mem basically a constant once the outer statement that calls the stored proc has begun? I'd just test, but it will take hours for me to grab a copy of production data and import into a new db host for testing. I've already started that process, but I'm betting I'll have an answer by the time it completes. It's just the difference between modifying the application which calls the procs (and doing a full software release in order to do so or else waiting a month to go in the next release) vs modifying the procs themselves, which requires only db a update. --sam
Re: [PERFORM] Poor performance when joining against inherited tables
> It says the sequential scan has a cost that's way too high, and I'm > presuming that's why it's choosing the extremely slow plan over the much > faster plan. Well, not exactly. It's giving you that cost because you disabled seqscan, which actually just bumps the cost really high: postgres=# create temporary table foo as select generate_series(1,3); SELECT postgres=# explain analyze select * from foo; QUERY PLAN - Seq Scan on foo (cost=0.00..34.00 rows=2400 width=4) (actual time=0.010..0.012 rows=3 loops=1) Total runtime: 2.591 ms (2 rows) postgres=# set enable_seqscan to false; SET postgres=# explain analyze select * from foo; QUERY PLAN Seq Scan on foo (cost=100.00..134.00 rows=2400 width=4) (actual time=0.004..0.007 rows=3 loops=1) Total runtime: 0.037 ms (2 rows) As far as I know, there is no hard way to disable any given plan option, since sometimes that may be the only choice. The (estimated) cost of the seq scan chosen here is *not* the same as the cost of the scan when the planner actually considers this plan (in fact, that will the same as the one in the first plan). However, note the cost of the Index Scan nodes in the second plan: they are *higher* than their corresponding Seq Scan nodes (in the first plan), which is why you get the first plan when seq can *is* enabled. Also, your plan output looks like plain EXPLAIN and not EXPLAIN ANALYZE (i.e., the "actual time" nodes are missing). Other than that, I think Shaun's comments apply. --- Maciek Sakrejda | System Architect | Truviso 1065 E. Hillsdale Blvd., Suite 215 Foster City, CA 94404 (650) 242-3500 Main www.truviso.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [ADMIN] [PERFORM] since when has pg_stat_user_indexes.idx_scan been counting?
Dne 12.5.2011 22:03, Willy-Bas Loos napsal(a): > Then, are the index scans counted in a memory variable and written at > analyze time? No, I believe raghu mixed two things - stats used by the planner and stats about access to the data (how many tuples were read using an index, etc.) Stats for the planner are stored in pg_class/pg_statistic/pg_stats catalogs and are updated by ANALYZE (either manual or automatic). This is what raghu refered to, but these stats are completely useless when looking for unused indexes. Stats about access to the data (index/seq scans, cache hit ratio etc.) are stored in pg_stat_* and pg_statio_* catalogs, and are updated after running each query. AFAIK it's not a synchronous process, but when a backend finishes a query, it sends the stats to the postmaster (and postmaster updates the catalogs). Tomas -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [ADMIN] [PERFORM] since when has pg_stat_user_indexes.idx_scan been counting?
On Thu, May 12, 2011 at 9:09 PM, Willy-Bas Loos wrote: > Hi, > > We have some indexes that don't seem to be used at all. > I'd like to know since when they have not been used. > That is, the time when postgres started counting to reach the number that > is in pg_stat_user_indexes.idx_scan > > Is there a way to retrieve that from the database ? > "Analyze" activity will update the statistics of each catalog table. pg_postmaster_start_time --> Retrieves the Postmaster [ PostgreSQL Instance] start time postgres=# select pg_postmaster_start_time(); --Raghu Ram
Re: [PERFORM] Poor performance when joining against inherited tables
On 05/11/2011 09:38 AM, Robert Haas wrote: However, if I disable seqscan (set enable_seqscan=false), I get the following plan: QUERY PLAN Hash Join (cost=10001298843.53..290002337961.71 rows=8643757 width=506) Hash Cond: (f.id = objects.id) ->Append (cost=100.00..29536334.43 rows=8643757 width=20) ->Seq Scan on item f (cost=100.00..126.30 rows=1630 width=20) ->Index Scan using xxx_pkey on itemXX f (cost=0.00..10.60 rows=90 width=20) ->Index Scan using yyy_pkey on itemYY f (cost=0.00..25.24 rows=266 width=20) ->Index Scan using zzz_pkey on itemZZ f (cost=0.00..9.28 rows=2 width=20) ... ->Hash (cost=999347.17..999347.17 rows=3941949 width=490) ->Index Scan using objects_pkey on objects (cost=0.00..999347.17 rows=3941949 width=490) This seems like a much more sensible query plan. I don't think so. Scanning the index to extract all the rows in a table is typically going to be a lot slower than a sequential scan. A more interesting question is why you're not getting a plan like this: Nested Loop -> Seq Scan on objects -> Append -> Index Scan using xxx_pkey on itemXX -> Index Scan using yyy_pkey on itemYY -> Index Scan using zzz_pkey on itemZZ Compared to the previous query plan (omitted in this e-mail, in which the planner was scanning all the item tables sequentially), the second query is much more desirable. It takes about 12 seconds to complete, versus the other query which I canceled after six hours. However, what you propose seems to make even more sense. But it seems to think doing a sequential scan on the *empty* item table is excessively expensive in this case. Aside from enable_seqscan=false, is there any way I can make the query planner not balk over doing a seqscan on an empty table? Why would you care? A sequential scan of an empty table is very fast. My issue is that it looks like it's avoiding the sequential scan: Seq Scan on item f (cost=100.00..126.30 rows=1630 width=20) It says the sequential scan has a cost that's way too high, and I'm presuming that's why it's choosing the extremely slow plan over the much faster plan. I don't know very much about plans, but I'm assuming the planner chooses the plan with the lowest cost. I'd much prefer it *does* the sequential scan of the empty table and goes with the other parts of the plan. Thanks, Lucas Madar -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] since when has pg_stat_user_indexes.idx_scan been counting?
Greg Smith writes: > You're saying to watch out for (3); I think that's not usually the case, > but that's a fair thing to warn about. Even in that case, though, it > may still be worth dropping the index. Year-end processes are not > usually very sensitive to whether they take a little or a long time to > execute. But you will be paying to maintain the index every day while > it is there. Yeah. Another idea worth considering is to have the year-end processing build the index it wants, use it, drop it. It seems unlikely that it's worth maintaining an index year-round for such infrequent usage. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] since when has pg_stat_user_indexes.idx_scan been counting?
Tomas Vondra wrote: BTW it's really really tricky to remove indexes once they're created. What if the index is created for a single batch process that runs once a year to close the fiscal year etc? True in theory. Reports that are executing something big at the end of the year fall into three categories: 1) They touch a whole lot of the data for the year first. In this case, sequential scan is likely regardless. 2) They access data similarly to regular queries, using the same indexes. 3) They have some very specific data only they touch that is retrieved with an index. You're saying to watch out for (3); I think that's not usually the case, but that's a fair thing to warn about. Even in that case, though, it may still be worth dropping the index. Year-end processes are not usually very sensitive to whether they take a little or a long time to execute. But you will be paying to maintain the index every day while it is there. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us "PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [ADMIN] [PERFORM] since when has pg_stat_user_indexes.idx_scan been counting?
Then, are the index scans counted in a memory variable and written at analyze time? On Thu, May 12, 2011 at 8:22 PM, raghu ram wrote: > > "Analyze" activity will update the statistics of each catalog table. > --Raghu Ram > > -- "Patriotism is the conviction that your country is superior to all others because you were born in it." -- George Bernard Shaw
Re: [PERFORM] 'Interesting' prepared statement slowdown on large table join
Thank you for all the leads. I've increased stats to 1200 on everything obvious (external_id, attr_name, attr_value, party_id), and ran ANALYZE, but it didn't help at all - any other ideas of what else could be going wrong ? We'll disable preparation, but the thing is it works brilliantly 90% of the time and the other 10% should theoretically be fixable - because it's almost certainly a border scenario brought on by lack of maintenance on something somewhere. Is there any point in trying to rebuild the indexes involved in case Postgres decided they're too bloated or something like that? @Shaun: I just finished trying to max out stats and sadly it doesn't help, thank you very much for trying anyway. @Tom: The planner doesn't flip between the plans by itself - it will switch to the BAD plan at some point and never go back. The big_table has an extremely uneven distribution indeed. But it still plans right usually - and this apparently regardless of the statistics target. @Jeff: thank you for the clear plan interpretation - but I'm afraid I don't really understand the second bit: 1) I provided the GOOD plan, so we already know what postgres thinks, right? (Later edit: guess not. Doesn't work) 2) There's no full table scan in any of the plans - it scans indices, the problem seems to be that it scans them in the wrong order because it thinks there are very few WHERE matches in big_table - which is incorrect, as for that particular pair there is a huge amount of rows. Thank you, Andrei -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Postgres refusing to use >1 core
On 05/12/2011 11:07 AM, Tom Lane wrote: I find it odd that replacing the subquery with a temp table helps, though, because (unless you stuck in an ANALYZE you didn't mention) it would have no stats at all about the number of groups in the temp table. I did have an analyze initially for exactly that reason. But what I found odd is that in my rush to execute this for the end of day reports, I forgot that step, and it still ran fine. I've found that the planner tends to treat un-analyzed tables somewhat pessimistically, which is fine by me. -- Shaun Thomas OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604 312-676-8870 stho...@peak6.com __ See http://www.peak6.com/email_disclaimer.php for terms and conditions related to this email -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Postgres refusing to use >1 core
Shaun Thomas writes: > On 05/12/2011 09:51 AM, Tom Lane wrote: >> It does. I was a bit surprised that Shaun apparently got a plan that >> didn't include a materialize step, because when I test a similar query >> here, I get: > Remember when I said "old version" that prevented us from using CTEs? > We're still on 8.2 (basically, we're waiting for EnterpriseDB 9.0). > It's basically calculating the group aggregation wrong, but is that > enough to trigger it to go nuts? Hmm. As you say, the mistake it's making is a drastic underestimate of the number of groups in the subquery, leading to a bad choice of join method. I find it odd that replacing the subquery with a temp table helps, though, because (unless you stuck in an ANALYZE you didn't mention) it would have no stats at all about the number of groups in the temp table. Maybe the default guess just happens to produce the more desirable plan. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Postgres refusing to use >1 core
> > This is a perfect example of a place where you could push some work out of > the application and into the database. You can consolidate your 1 to 101 > queries into a single query. If you use: > > WHERE rte_nm='SH71' AND rm >= 206 AND rm <= 306 ORDER BY abs(rm - 256), rm > - 256 DESC LIMIT 1 > > it will always return the same value as the first matching query from your > list, and will never have to make more than one trip to the database. Your > one trip might be slightly slower than any one of the single trips above, > but it will certainly be much faster in the case where you have to hit any > significant % of your 101 potential queries. > THANKS!! I've been obsessing so much about parallelism that I hadn't spent much time finding better queries. Aren
Re: [PERFORM] Postgres refusing to use >1 core
Everyone, Just wanted to say thanks for your help with my performance question. You have given me plenty of things to investigate. Further, I think the problem is almost certainly with my app, so I need to do more work there! I really like the idea of just loading everything in memory and then dumping it all out later. I have 6 GB RAM, so it should be plenty to handle this. Aren Cambre
Re: [PERFORM] since when has pg_stat_user_indexes.idx_scan been counting?
Dne 12.5.2011 17:39, Willy-Bas Loos napsal(a): > Hi, > > We have some indexes that don't seem to be used at all. > I'd like to know since when they have not been used. > That is, the time when postgres started counting to reach the number > that is in pg_stat_user_indexes.idx_scan > > Is there a way to retrieve that from the database ? BTW it's really really tricky to remove indexes once they're created. What if the index is created for a single batch process that runs once a year to close the fiscal year etc? So be very careful about this. Tomas -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] since when has pg_stat_user_indexes.idx_scan been counting?
Dne 12.5.2011 17:39, Willy-Bas Loos napsal(a): > Hi, > > We have some indexes that don't seem to be used at all. > I'd like to know since when they have not been used. > That is, the time when postgres started counting to reach the number > that is in pg_stat_user_indexes.idx_scan > > Is there a way to retrieve that from the database ? Well, not really :-( You could call pg_postmaster_start_time() to get the start time, but that has two major drawbacks (1) The stats may be actually collected for much longer, because restart does not reset them. (2) If someone called pg_stat_reset(), the stats are lost but the start time remains the same. So there really is no reliable way to do detect this. In 9.1 this is not true - there's a timestamp for each database (and global stats) to keep track of the last reset. regards Tomas -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Postgres refusing to use >1 core
On Wed, May 11, 2011 at 9:17 PM, Aren Cambre wrote: > *2. Not TxDPS reference markers correspond to TxDOT reference markers.* > > Now, if I've matched a route, I have to find the reference marker. > > The TxDOT database is pretty good but not 100% complete, so some TxDPS > tickets' reference markers may not exist in the TxDOT table. Plus, it's > possible that some TxDPS tickets have the wrong marker. > > To compensate, I am looking for the closest reference marker along the > route that is not more than 50 marker units away, either direction. I've > again implemented that with multiple queries, where I don't stop until I > find a match. Suppose I am searching for reference marker 256 on TX 71. The > queries will be like this: > >1. rte_nm = 'SH71' AND rm = '256' (base marker) >2. rte_nm = 'SH71' AND rm = '257' (+1) >3. rte_nm = 'SH71' AND rm = '255' (-1) >4. rte_nm = 'SH71' AND rm = '258' (+2) >5. rte_nm = 'SH71' AND rm = '254' (-2) >6. ... >7. rte_nm = 'SH71' AND rm = '306' (+50) >8. rte_nm = 'SH71' AND rm = '206' (-50) > > Assuming a matching route name was found in the prior step, the app will > have 1 to 101 of these queries for each ticket. > This is a perfect example of a place where you could push some work out of the application and into the database. You can consolidate your 1 to 101 queries into a single query. If you use: WHERE rte_nm='SH71' AND rm >= 206 AND rm <= 306 ORDER BY abs(rm - 256), rm - 256 DESC LIMIT 1 it will always return the same value as the first matching query from your list, and will never have to make more than one trip to the database. Your one trip might be slightly slower than any one of the single trips above, but it will certainly be much faster in the case where you have to hit any significant % of your 101 potential queries. -Eric
Re: [PERFORM] Postgres refusing to use >1 core
On 05/12/2011 09:51 AM, Tom Lane wrote: > It does. I was a bit surprised that Shaun apparently got a plan that > didn't include a materialize step, because when I test a similar query > here, I get: Remember when I said "old version" that prevented us from using CTEs? We're still on 8.2 (basically, we're waiting for EnterpriseDB 9.0). It's basically calculating the group aggregation wrong, but is that enough to trigger it to go nuts? SELECT c.* FROM customer c JOIN (SELECT session_id, count(1) as counter FROM session WHERE date_created >= '2011-05-11 05:00' AND date_created < '2011-05-11 06:00' AND from_interface = 'UNKNOWN' GROUP BY 1) a ON (c.customer_id = a.session_id) WHERE c.process_date = CURRENT_DATE - interval '1 day' AND c.row_out IS NULL; So sayeth the planner: Nested Loop (cost=167.49..2354.62 rows=6 width=237) (actual time=43.949..166858.604 rows=168 loops=1) -> GroupAggregate (cost=167.49..176.97 rows=2 width=8) (actual time=1.042..2.827 rows=209 loops=1) -> Sort (cost=167.49..170.64 rows=1260 width=8) (actual time=1.037..1.347 rows=230 loops=1) Sort Key: session.session_id -> Index Scan using idx_session_date_created on session (cost=0.00..102.61 rows=1260 width=8) (actual time=0.044. .0.690 rows=230 loops=1) Index Cond: ((date_created >= '11-MAY-11 05:00:00'::timestamp without time zone) AND (date_created < '11-MAY-11 06:00:00':: timestamp without time zone)) Filter: ((from_interface)::text = 'UNKNOWN'::text) -> Index Scan using idx_customer_customer_id on customer c (cost=0.00..1088.78 rows=3 width=237) (actual time=19.820..798.348 rows=1 loops= 209) Index Cond: (c.customer_id = a.session_id) Filter: ((process_date = (('now'::text)::date - '@ 1 day'::interval)) AND (row_out IS NULL)) Total runtime: 166859.040 ms That one hour extract is much, much slower than this: SELECT 1 FROM customer c JOIN (SELECT session_id, count(*) as counter FROM session WHERE date_created >= '2011-05-08' GROUP BY 1) a ON (c.customer_id = a.session_id) WHERE c.process_date = CURRENT_DATE AND c.row_out IS NULL; Which gives this plan: Merge Join (cost=244565.52..246488.78 rows=377 width=0) (actual time=1958.781..2385.667 rows=22205 loops=1) Merge Cond: (a.session_id = c.customer_id) -> GroupAggregate (cost=19176.22..20275.99 rows=271 width=8) (actual time=1142.179..1459.779 rows=26643 loops=1) -> Sort (cost=19176.22..19541.68 rows=146184 width=8) (actual time=1142.152..1374.328 rows=179006 loops=1) Sort Key: session.session_id -> Index Scan using idx_session_date_created on session (cost=0.00..6635.51 rows=146184 width=8) (actual time=0.0 20..160.339 rows=179267 loops=1) Index Cond: (date_created >= '08-MAY-11 00:00:00'::timestamp without time zone) -> Sort (cost=225389.30..225797.47 rows=163267 width=8) (actual time=816.585..855.459 rows=155067 loops=1) Sort Key: c.customer_id -> Index Scan using idx_customer_rpt on customer c (cost=0.00..211252.93 rows=163267 width=8) (actual time=0.037..90.337 rows=155067 loops=1) Index Cond: (process_date = '10-MAY-11 00:00:00'::timestamp without time zone) Filter: (row_out IS NULL) But make the inner query slightly smaller, and... Nested Loop (cost=13755.53..223453.98 rows=276 width=0) -> GroupAggregate (cost=13755.53..14558.26 rows=198 width=8) -> Sort (cost=13755.53..14022.28 rows=106700 width=8) Sort Key: session.session_id -> Index Scan using idx_session_date_created on session (cost=0.00..4844.37 rows=106700 width=8) Index Cond: (date_created >= '09-MAY-11 00:00:00'::timestamp without time zone) -> Index Scan using idx_customer_customer_id on customer c (cost=0.00..1055.01 rows=1 width=8) Index Cond: (c.customer_id = a.session_id) Filter: ((process_date = '10-MAY-11 00:00:00'::timestamp without time zone) AND (row_out IS NULL)) I didn't want to wait two hours for that to finish. ;) But the stats are all pretty darn close, so far as I can tell. The only thing that's off is the group aggregate... by about two orders of magnitude. So I just chalked it up to 8.2 being relatively horrible, and punted to just using a temp table to trick the optimizer into doing it right. But my greater point was that even doing it all in SQL doesn't always work, which we all know. Use of EXPLAIN abounds, but that doesn't necessarily mean a dev will know how to fix a bad plan. -- Shaun Thomas OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604 312-676-8870 stho...@peak6.com __ See http://www.peak6.com/email_disclaimer.php for terms and conditions related to this email
[PERFORM] since when has pg_stat_user_indexes.idx_scan been counting?
Hi, We have some indexes that don't seem to be used at all. I'd like to know since when they have not been used. That is, the time when postgres started counting to reach the number that is in pg_stat_user_indexes.idx_scan Is there a way to retrieve that from the database ? Cheers, WBL -- "Patriotism is the conviction that your country is superior to all others because you were born in it." -- George Bernard Shaw
Re: [PERFORM] Postgres refusing to use >1 core
Josh Berkus writes: > On 5/11/11 3:04 PM, Shaun Thomas wrote: >> The original query, with our very large tables, ran for over *two hours* >> thanks to a nested loop iterating over the subquery. My replacement ran >> in roughly 30 seconds. If we were using a newer version of PG, we could >> have used a CTE. But do you get what I mean? Temp tables are a fairly >> common technique, but how would a coder know about CTEs? They're pretty >> new, even to *us*. > For that matter, it would be even better if PostgreSQL realized that a > materialize of the subquery was a better execution plan, and just did it > for you. It does. I was a bit surprised that Shaun apparently got a plan that didn't include a materialize step, because when I test a similar query here, I get: 1. a hash join, until I turn off enable_hashjoin; then 2. a merge join, until I turn off enable_mergejoin; then 3. a nestloop with materialize on the subquery scan. In 9.0 and up I can get a nestloop without materialize by also turning off enable_material, but pre-9.0 there's no such option ... regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Postgres refusing to use >1 core
On 05/12/2011 03:30 AM, Michael Graham wrote: Do you happen to produce slides for these lunch n learns or are they more informal than that? I guess you can work out where I'm going with this ;) Oh of course. I use rst2s5 for my stuff, so I have the slideshow and also generate a PDF complete with several paragraphs of explanation I distribute after the presentation itself. I have two of them now, but I'll probably have a third in a couple months. My next topic will probably be geared toward actual DBAs that might be intermediate level. Things like, what happens to an OLAP server that undergoes maintenance and experiences rapid (temporarily exponential) TPS increase. How that can affect the disk subsystem, how to recover, how to possibly bootstrap as a temporary fix, etc. Certainly things I would have liked to know before seeing them. I'm going to call it "Your Database Probably Hates You." ;) I have a tendency to enjoy "stories from the field," and I've got more than a few where I've saved a database from certain death. Sometimes it's tweaking a few config settings, sometimes it's new hardware based on system monitoring or allocation tests. Little things Senior DBAs might know after experiencing them, or reading lists like this one. -- Shaun Thomas OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604 312-676-8870 stho...@peak6.com __ See http://www.peak6.com/email_disclaimer.php for terms and conditions related to this email -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Postgres refusing to use >1 core
On Wed, 2011-05-11 at 17:04 -0500, Shaun Thomas wrote: > We hold regular Lunch'n'Learns for our developers to teach them the > good/bad of what they're doing, and that helps significantly. Even > hours later, I see them using the techniques I showed them. The one > I'm presenting soon is entitled '10 Ways to Ruin Performance' and > they're all specific examples taken from day-to-day queries and jobs > here, all from different categories of mistake. It's just a part of > being a good DBA. Do you happen to produce slides for these lunch n learns or are they more informal than that? I guess you can work out where I'm going with this ;) -- Michael Graham -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Postgres refusing to use >1 core
12.05.11 06:18, Aren Cambre ???(??): > Using one thread, the app can do about 111 rows per second, and it's > only exercising 1.5 of 8 CPU cores while doing this. 12,000,000 rows / > 111 rows per second ~= 30 hours. I don't know how I missed that. You ARE maxing out one cpu core, so you're quite right that you need more threads unless you can make your single worker more efficient. And the problem is my app already has between 20 and 30 threads. Something about C#'s PLINQ may not be working as intended... Have you checked that you are really doing fetch and processing in parallel? Dunno about C#, but under Java you have to make specific settings (e.g. setFetchSize) or driver will fetch all the data on query run. Check time needed to fetch first row from the query. Best regards, Vitalii Tymchyshyn
Re: [PERFORM] DBT-5 & Postgres 9.0.3
http://sourceforge.net/mailarchive/forum.php?forum_name=osdldbt-general&max_rows=25&style=nested&viewmonth=201104 - Sethu On Thu, May 12, 2011 at 5:22 AM, Robert Haas wrote: > On Tue, Apr 12, 2011 at 3:51 AM, Sethu Prasad > wrote: > > Anyone lucky to have dbt5 run for PostgreSQL 9.0.3?! > > > > I am trying on Novell SuSE Linux Enterprise Server 11 SP1 x86_64 with a > > virtual machine and bit hard with no success run yet. If you can help me > > with any docs will be more of a support. > > What's going wrong for you? > > -- > Robert Haas > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company >