Re: [PERFORM] Hardware vs Software RAID
On Wednesday 25 June 2008 11:24:23 Greg Smith wrote: What I often do is get a hardware RAID controller, just to accelerate disk writes, but configure it in JBOD mode and use Linux or other software RAID on that platform. JBOD + RAIDZ2 FTW ;-) -- Robert Treat Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] poor row estimates with multi-column joins
(cost=5322783.26..5972103.39 rows=4 width=100) (actual time=415082.543..606999.689 rows=18348993 loops=1) Merge Cond: (((pfl1.emal_id)::text = (le1.emal_id)::text) AND ((pfl1.ctm_nbr)::text = (le1.ctm_nbr)::text)) - Index Scan using peii_fast_lookup_pkey on peii_fast_lookup pfl1 (cost=0.00..462635.50 rows=9368569 width=33) (actual time=0.031..7342.227 rows=9368569 loops=1) - Materialize (cost=5322446.84..5556162.04 rows=18697216 width=67) (actual time=414700.258..519877.718 rows=18703401 loops=1) - Sort (cost=5322446.84..5369189.88 rows=18697216 width=67) (actual time=414700.254..506652.718 rows=18703401 loops=1) Sort Key: le1.emal_id, le1.ctm_nbr Sort Method: external merge Disk: 1620632kB - Seq Scan on lsteml_m le1 (cost=0.00..434871.16 rows=18697216 width=67) (actual time=0.006..6776.725 rows=18703401 loops=1) Total runtime: 611728.059 ms (9 rows) Still the same issue, so this doesn't seem like something specific to hash joins. I'll note that this is the behavior I recall from 8.2, so I'm not sure if this is a bug, or just an outright deficiancy, but thought I would post to see if anyone had any thoughts on it. (If there is some additional info I can provide, please lmk). -- Robert Treat Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Re: [HACKERS] [COMMITTERS] pgsql: Fix TransactionIdIsCurrentTransactionId() to use binary search
On Friday 25 April 2008 17:32, Tom Lane wrote: Robert Treat [EMAIL PROTECTED] writes: Oddly some dtrace profiling gave me this, which is pretty different, but certainly doesn't have concerns about TransactionIdIsCurrentTransactionId which seems to pretty much destroy your thesis, no? How so? Before the patch we bog down for hours, spending 99% of our time in TransactionIdIsCurrentTransactionId, after the patch everything performs well (really better than before) and we spend so little time in TransactionIdIsCurrentTransactionId it barely shows up on the radar. Note I'm open to the idea that TransactionIdIsCurrentTransactionId itself is not the problem, but that something else changed between 8.1 and 8.3 that exposes TransactionIdIsCurrentTransactionId as a problem. Changing to a binary search for TransactionIdIsCurrentTransactionId makes that a non-issue though. -- Robert Treat Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Re: [HACKERS] [COMMITTERS] pgsql: Fix TransactionIdIsCurrentTransactionId() to use binary search
On Saturday 26 April 2008 13:26, Tom Lane wrote: Robert Treat [EMAIL PROTECTED] writes: On Friday 25 April 2008 17:32, Tom Lane wrote: Robert Treat [EMAIL PROTECTED] writes: Oddly some dtrace profiling gave me this, which is pretty different, but certainly doesn't have concerns about TransactionIdIsCurrentTransactionId which seems to pretty much destroy your thesis, no? How so? Before the patch we bog down for hours, spending 99% of our time in TransactionIdIsCurrentTransactionId, after the patch everything performs well (really better than before) and we spend so little time in TransactionIdIsCurrentTransactionId it barely shows up on the radar. Oh, you failed to state that the dtrace output was post-patch. You need to show *pre* patch dtrace output if you want us to think it relevant. Please read up-thread. -- Robert Treat Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Re: [HACKERS] [COMMITTERS] pgsql: Fix TransactionIdIsCurrentTransactionId() to use binary search
On Monday 21 April 2008 12:54, Alvaro Herrera wrote: Robert Treat wrote: Unfortunatly I don't have the 8.1 system to bang on anymore for this, (though anecdotaly speaking, I never saw this behavior in 8.1) however I do now have a parallel 8.3 system crunching the data, and it is showing the same symptom (yes, 2 8.3 servers, crunching the same data, both bogged down now), so I do feel this is something specific to 8.3. I am mostly wondering if anyone else has encountered behavior like this on 8.3 (large sets of insertupdate exception block in plpgsql bogging down), or if anyone has any thoughts on which direction I should poke at it from here. TIA. Perhaps what you could do is backpatch the change and see if the problem goes away. So, after some more digging, we ended up backpatching the change. Results as follows: = hanging job before patch elapsed | status -+ 00:00:00.024075 | OK/starting with 2008-04-25 08:20:02 00:00:00.611411 | OK/processing 624529 hits up until 2008-04-25 10:20:02 03:48:02.748319 | ??/Processed 65000 aggregated rows so far (3 rows) = successful job after patch elapsed | status -+- 00:00:00.026809 | OK/starting with 2008-04-25 08:20:02 00:00:03.921532 | OK/processing 2150115 hits up until 2008-04-25 15:00:02 00:24:45.439081 | OK/Processed 334139 aggregated rows 00:00:00.019433 | OK/ (4 rows) Note the second run had to do all the rows from the first run, plus additional rows that accumulated while the first job was running. Oddly some dtrace profiling gave me this, which is pretty different, but certainly doesn't have concerns about TransactionIdIsCurrentTransactionId snip postgres`hash_search_with_hash_value 536 2.3% postgres`SearchCatCache 538 2.3% postgres`hash_seq_search 577 2.4% postgres`MemoryContextAllocZeroAligned610 2.6% postgres`_bt_compare 671 2.8% libc.so.1`memcpy 671 2.8% postgres`XLogInsert 755 3.2% postgres`LockReassignCurrentOwner 757 3.2% postgres`base_yyparse1174 5.0% postgres`AllocSetAlloc 1244 5.3% We still have one of our 8.3 servers running stock 8.3.1, so we'll see how long before this bites us again. Would certainly be nice to get this fixed in the mainline code. -- Robert Treat Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Re: [HACKERS] [COMMITTERS] pgsql: Fix TransactionIdIsCurrentTransactionId() to use binary search
On Thursday 27 March 2008 17:11, Tom Lane wrote: Robert Treat [EMAIL PROTECTED] writes: On Sunday 16 March 2008 22:18, Tom Lane wrote: Fix TransactionIdIsCurrentTransactionId() to use binary search instead of linear search when checking child-transaction XIDs. Are there any plans to backpatch this into REL8_3_STABLE? No. It looks like I am hitting a pretty serious performance regression on 8.3 with a stored procedure that grabs a pretty big recordset, and loops through doing insertupdate on unique failures. The procedure get progressivly slower the more records involved... and dbx shows me stuck in TransactionIdIsCurrentTransactionId(). If you can convince me it's a regression I might reconsider, but I rather doubt that 8.2 was better, Well, I can't speak for 8.2, but I have a second system crunching the same data using the same function on 8.1 (on lesser hardware in fact), and it doesn't have these type of issues. If you can condense it to a test case that is worse on 8.3 than 8.1, I'm willing to listen... I spent some time trying to come up with a test case, but had no luck. Dtrace showed that the running process was calling this function rather excessively; sample profiling for 30 seconds would look like this: FUNCTIONCOUNT PCNT snip postgres`LockBuffer10 0.0% postgres`slot_deform_tuple 11 0.0% postgres`ExecEvalScalarVar 11 0.0% postgres`ExecMakeFunctionResultNoSets 13 0.0% postgres`IndexNext 14 0.0% postgres`slot_getattr 15 0.0% postgres`LWLockRelease 20 0.0% postgres`index_getnext 55 0.1% postgres`TransactionIdIsCurrentTransactionId40074 99.4% But I saw similar percentages on the 8.1 machine, so I am not convinced this is where the problem is. Unfortunatly (in some respects) the problem went away up untill this morning, so I haven't been looking at it since the above exchange. I'm still open to the idea that something inside TransactionIdIsCurrentTransactionId could have changed to make things worse (in addition to cpu, the process does consume a significant amount of memory... prstat shows: PID USERNAME SIZE RSS STATE PRI NICE TIME CPU PROCESS/NLWP 3844 postgres 1118M 1094M cpu3500 6:25:48 12% postgres/1 I do wonder if the number of rows being worked on is significant in some way... by looking in the job log for the running procedure (we use autonoumous logging in this function), I can see that it has a much larger number of rows to be processed, so perhaps there is simply a tipping point that is reached which causes it to stop performing... still it would be curious that I never saw this behavior on 8.1 = current job elapsed | status -+ 00:00:00.042895 | OK/starting with 2008-04-21 03:20:03 00:00:00.892663 | OK/processing 487291 hits up until 2008-04-21 05:20:03 05:19:26.595508 | ??/Processed 7 aggregated rows so far (3 rows) = yesterdays run | elapsed | status +-+ | 00:00:00.680222 | OK/starting with 2008-04-20 04:20:02 | 00:00:00.409331 | OK/processing 242142 hits up until 2008-04-20 05:20:04 | 00:25:02.306736 | OK/Processed 35936 aggregated rows | 00:00:00.141179 | OK/ (4 rows) Unfortunatly I don't have the 8.1 system to bang on anymore for this, (though anecdotaly speaking, I never saw this behavior in 8.1) however I do now have a parallel 8.3 system crunching the data, and it is showing the same symptom (yes, 2 8.3 servers, crunching the same data, both bogged down now), so I do feel this is something specific to 8.3. I am mostly wondering if anyone else has encountered behavior like this on 8.3 (large sets of insertupdate exception block in plpgsql bogging down), or if anyone has any thoughts on which direction I should poke at it from here. TIA. -- Robert Treat Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] TB-sized databases
On Thursday 06 December 2007 04:38, Simon Riggs wrote: Robert, On Wed, 2007-12-05 at 15:07 -0500, Robert Treat wrote: If the whole performance of your system depends upon indexed access, then maybe you need a database that gives you a way to force index access at the query level? That sounds like a request for hints, which is OT here, ISTM. If you want to eat peas, and someone suggests you use a knife, can I only argue the validity of using a knife? I'd rather just recommend a spoon. I think you're completly overlooking the effect of disk latency has on query times. We run queries all the time that can vary from 4 hours to 12 hours in time based solely on the amount of concurrent load on the system, even though they always plan with the same cost. Not at all. If we had statement_cost_limit then it would be applied after planning and before execution begins. The limit would be based upon the planner's estimate, not the likely actual execution time. This is nice, but it doesnt prevent slow queries reliably (which seemed to be in the original complaints), since query time cannot be directly traced back to statement cost. So yes a query may vary in execution time by a large factor as you suggest, and it would be difficult to set the proposed parameter accurately. However, the same is also true of statement_timeout, which we currently support, so I don't see this point as an blocker. Which leaves us at the burning question: Would you use such a facility, or would the difficulty in setting it exactly prevent you from using it for real? I'm not sure. My personal instincts are that the solution is too fuzzy for me to rely on, and if it isnt reliable, it's not a good solution. If you look at all of the things people seem to think this will solve, I think I can raise an alternative option that would be a more definitive solution: prevent queries from taking longer than x - statement_timeout. prevent planner from switching to bad plan - hint system prevent query from consuming too many resources - true resource restrictions at the database level I'm not so much against the idea of a statement cost limit, but I think we need to realize that it does not really solve as many problems as people think, in cases where it will help it often will do so poorly, and that there are probably better solutions available to those problems. Of course if you back me into a corner I'll agree a poor solution is better than no solution, so... -- Robert Treat Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] TB-sized databases
On Thursday 29 November 2007 11:14, Simon Riggs wrote: On Thu, 2007-11-29 at 10:45 -0500, Tom Lane wrote: Gregory Stark [EMAIL PROTECTED] writes: Simon Riggs [EMAIL PROTECTED] writes: Tom's previous concerns were along the lines of How would know what to set it to?, given that the planner costs are mostly arbitrary numbers. Hm, that's only kind of true. The units are not the problem. The problem is that you are staking non-failure of your application on the planner's estimates being pretty well in line with reality. Not merely in line enough that it picks a reasonably cheap plan, but in line enough that if it thinks plan A is 10x more expensive than plan B, then the actual ratio is indeed somewhere near 10. Given that this list spends all day every day discussing cases where the planner is wrong, I'd have to think that that's a bet I wouldn't take. I think you have a point, but the alternative is often much worse. If an SQL statement fails because of too high cost, we can investigate the problem and re-submit. If a website slows down because somebody allowed a very large query to execute then everybody is affected, not just the person who ran the bad query. Either way the guy that ran the query loses, but without constraints in place one guy can kill everybody else also. You could probably avoid this risk by setting the cutoff at something like 100 or 1000 times what you really want to tolerate, but how useful is it then? Still fairly useful, as long as we understand its a blunt instrument. If the whole performance of your system depends upon indexed access then rogue queries can have disastrous, unpredictable consequences. Many sites construct their SQL dynamically, so a mistake in a seldom used code path can allow killer queries through. Even the best DBAs have been known to make mistakes. If the whole performance of your system depends upon indexed access, then maybe you need a database that gives you a way to force index access at the query level? e.g. An 80GB table has 8 million blocks in it. - So putting a statement_cost limit = 1 million would allow some fairly large queries but prevent anything that did a SeqScan (or worse). - Setting it 10 million is going to prevent things like sorting the whole table without a LIMIT - Setting it at 100 million is going to prevent unconstrained product joins etc.. I think you're completly overlooking the effect of disk latency has on query times. We run queries all the time that can vary from 4 hours to 12 hours in time based solely on the amount of concurrent load on the system, even though they always plan with the same cost. -- Robert Treat Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Training Recommendations
On Sunday 02 December 2007 15:26, Usama Munir Dar wrote: Robert Treat wrote: On Wednesday 28 November 2007 11:20, Usama Munir Dar wrote: EnterpriseDB (www.enterprisedb.com), ofcourse lame :-P Have you or anyone you know tried the training offerings? or you think its lame because i top posted , which of course would be a very poor criteria , not to mention completely unrelated, so i definitely think its not the reason. i would love to hear whats wrong with it so we can work on its improvement What I thought was lame was that you, being someone who works for EntepriseDB, suggested EnterpriseDB as a solution, with no mention of the other training options available. Now one guy doing this isn't such a big deal (though it is still poor practice), but if every training company we're to do this I think you can see how it doesn't do much for helping the public discourse. Of course I probably would have let the whole thing slide, but you top posted, so... -- Robert Treat Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Training Recommendations
On Wednesday 28 November 2007 11:20, Usama Munir Dar wrote: EnterpriseDB (www.enterprisedb.com), ofcourse lame :-P Campbell, Lance wrote: PostgreSQL: 8.2.4 Does anyone have any companies they would recommend using for performance tuning training of PostgreSQL for Linux? Or general DBA training? Never take advice from a guy who top posts... A friend of mine just went through an OTG course and had good things to say, and I've heard other speak well of it too, so I'd probably recommend them, but there are several options, check out the training section on the website: http://www.postgresql.org/about/eventarchive Note also some of the more popular pg support companies also offer personal training, even if it isn't advertised. HTH. -- Robert Treat Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM] Hardware for PostgreSQL
On Wednesday 31 October 2007 12:45, Ketema wrote: I am trying to build a very Robust DB server that will support 1000+ concurrent users (all ready have seen max of 237 no pooling being used). I have read so many articles now that I am just saturated. I have a general idea but would like feedback from others. Most of the other answers you've gotten have been pretty good, but I had some questions on the above; specifically is there a reason you're avoid pooling? (something like pgbouncer can work wonders). Are your 1000+ concurrent users working in something like a web environment, where they won't need a 1:1 user:connection map to service them all, or are these going to be more permanent connections into the system? FWIW I'd done 1000 connections simultaneous on pretty basic hardware, but you need to have the right kind of workloads to be able to do it. Who has built the biggest baddest Pg server out there and what do you use? While I'm not sure this will be that much help, I'd feel remisce if I didn't point you to it... http://www.lethargy.org/~jesus/archives/66-Big-Bad-PostgreSQL.html -- Robert Treat Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Best OS for Postgres 8.2
On Tuesday 08 May 2007 23:31, Greg Smith wrote: On Tue, 8 May 2007, Tom Lane wrote: What Debian has done is set up an arrangement that lets you run two (or more) different PG versions in parallel. Since that's amazingly helpful during a major-PG-version upgrade, most of the other packagers are scheming how to do something similar. I alluded to that but it is worth going into more detail on for those not familiar with this whole topic. I normally maintain multiple different PG versions in parallel already, mostly using environment variables to switch between them with some shell code. Debian has taken an approach where commands like pg_ctl are wrapped in multi-version/cluster aware scripts, so you can do things like restarting multiple installations more easily than that. My issue wasn't with the idea, it was with the implementation. When I have my newbie hat on, it adds a layer of complexity that isn't needed for simple installs. I think I would disagree with this. The confusion comes from the fact that it is different, not that it is more complex. For new users what seems to be most confusing is getting from install to initdb to logging in... if you tell them to use pg_ctlcluster rather than pg_ctl, it isn't more confusing, there just following directions at that point anyway. If the upstream project were to switch to debian's system, I think you'd end most of the confusion, make it easier to run concurrent servers and simplify the upgrade process for source installs, and give other package maintiners a way to achive what debian has. Maybe in PG 9... -- Robert Treat Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Equivalents in PostgreSQL of MySQL's ENGINE=MEMORY MAX_ROWS=1000
On Wednesday 04 April 2007 07:51, Arnau wrote: Hi Ansgar , On 2007-04-04 Arnau wrote: Josh Berkus wrote: Is there anything similar in PostgreSQL? The idea behind this is how I can do in PostgreSQL to have tables where I can query on them very often something like every few seconds and get results very fast without overloading the postmaster. If you're only querying the tables every few seconds, then you don't really need to worry about performance. Well, the idea behind this is to have events tables, and a monitoring system polls that table every few seconds. I'd like to have a kind of FIFO stack. From the events producer point of view he'll be pushing rows into that table, when it's filled the oldest one will be removed to leave room to the newest one. From the consumer point of view he'll read all the contents of that table. So I'll not only querying the tables, I'll need to also modify that tables. Ummm... this may be a dumb question, but why are you trying to implement something like a FIFO with an RDBMS in the first place? Wouldn't it be much easier to implement something like that as a separate program or script? Well, the idea is have a table with a maximum number of rows. As the number of queries over this table will be very high, I'd like to keep it as small as possible and without indexes and so on that could make the update slower. Maybe it's the moment to change my question, is there any trick to get a table that can be modified/queried very fast and with the minimum of overhead? This table will have several queries every second and I'd like to do this as fast as possible If you're wedded to the FIFO idea, I'd suggest reading this: http://people.planetpostgresql.org/greg/index.php?/archives/89-Implementing-a-queue-in-SQL-Postgres-version.html -- Robert Treat Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Hints proposal
On Thursday 12 October 2006 12:40, Bucky Jordan wrote: What is it about hinting that makes it so easily breakable with new versions? I don't have any experience with Oracle, so I'm not sure how they screwed logic like this up. I don't have a ton of experience with oracle either, mostly DB2, MSSQL and PG. So, I thought I'd do some googling, and maybe others might find this useful info. http://asktom.oracle.com/pls/ask/f?p=4950:8:2177642270773127589::NO::F4950_ P8_DISPLAYID,F4950_P8_CRITERIA:7038986332061 Interesting quote: In Oracle Applications development (11i apps - HR, CRM, etc) Hints are strictly forbidden. We find the underlying cause and fix it. and Hints -- only useful if you are in RBO and you want to make use of an access path. Maybe because I haven't had access to hints before, I've never been tempted to use them. However, I can't remember having to re-write SQL due to a PG upgrade either. When it happens it tends to look something like this: http://archives.postgresql.org/pgsql-performance/2006-01/msg00154.php Funny that for all the people who claim that improving the planner should be the primary goal that no one ever took interest in the above case. -- Robert Treat Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] [PERFORM] Hints proposal
On Friday 13 October 2006 12:46, Gregory Stark wrote: Josh Berkus josh@agliodbs.com writes: I actually think the way to attack this issue is to discuss the kinds of errors the planner makes, and what tweaks we could do to correct them. Here's the ones I'm aware of: -- Incorrect selectivity of WHERE clause -- Incorrect selectivity of JOIN -- Wrong estimate of rows returned from SRF -- Incorrect cost estimate for index use Can you think of any others? -- Incorrect estimate for result of DISTINCT or GROUP BY. Yeah, that one is bad. I also ran into one the other day where the planner did not seem to understand the distinctness of a columns values across table partitions... -- Robert Treat Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Looking for a tool to * pg tables as ERDs
On Thu, 2006-02-23 at 11:38, Ron Peacetree wrote: Where * == {print | save to PDF | save to mumble format | display on screen} Anyone know of one? case studio can reverse engineer erd's from existing schema, and you can print out the schema, create html or rdf reports, or export the erd as a graphic. Downside is it can't do direct port to pdf (though you could get around that with OO i imagine), plus its windows only and commercial. Robert Treat -- Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[PERFORM] how to interpret/improve bad row estimates
postgresql 8.1, I have two tables, bot hoth vacuumed and analyzed. on msg307 I have altered the entityid and msgid columns statistics values to 400. dev20001=# explain analyze SELECT ewm.entity_id, m.agentname, m.filecreatedate AS versioninfo FROM msg307 m join entity_watch_map ewm on (ewm.entity_id = m.entityid AND ewm.msgid = m.msgid AND ewm.msg_type = 307); QUERY PLAN --- Nested Loop (cost=6.62..5227.40 rows=1 width=36) (actual time=0.583..962.346 rows=75322 loops=1) - Bitmap Heap Scan on entity_watch_map ewm (cost=6.62..730.47 rows=748 width=8) (actual time=0.552..7.017 rows=1264 loops=1) Recheck Cond: (msg_type = 307) - Bitmap Index Scan on ewm_msg_type (cost=0.00..6.62 rows=748 width=0) (actual time=0.356..0.356 rows=1264 loops=1) Index Cond: (msg_type = 307) - Index Scan using msg307_entityid_msgid_idx on msg307 m (cost=0.00..6.00 rows=1 width=40) (actual time=0.011..0.295 rows=60 loops=1264) Index Cond: ((outer.entity_id = m.entityid) AND (outer.msgid = m.msgid)) Total runtime: 1223.469 ms (8 rows) I guess that the planner can not tell there is no correlation between the distinctness of those two columns, and so makes a really bad estimate on the indexscan, and pushes that estimate up into the nested loop? (luckily in this case doing an index scan is generally a good idea, so it works out, but it wouldn't always be a good idea) some pg_statistics information for those two columns entityid: starelid| 25580 staattnum | 1 stanullfrac | 0 stawidth| 4 stadistinct | 1266 stakind1| 1 stakind2| 2 stakind3| 3 stakind4| 0 staop1 | 96 staop2 | 97 staop3 | 97 staop4 | 0 stanumbers1 | {0.00222976,0.00222976,0.00153048,0.00137216,0.00137216} stanumbers2 | stanumbers3 | {0.100312} stanumbers4 | msgid: starelid| 25580 staattnum | 2 stanullfrac | 0 stawidth| 4 stadistinct | 1272 stakind1| 1 stakind2| 2 stakind3| 3 stakind4| 0 staop1 | 96 staop2 | 97 staop3 | 97 staop4 | 0 stanumbers1 | {0.00164923,0.00163604,0.00163604,0.00163604,0.00137216} stanumbers2 | stanumbers3 | {-0.0660856} stanumbers4 | is my interpretation of why i am seeing such bad estimates correct? I don't really think it is, because looking at a similar scenario on a 7.3 machine: -- Merge Join (cost=1531.39..5350.90 rows=1 width=48) (actual time=118.44..899.37 rows=58260 loops=1) Merge Cond: ((outer.entityid = inner.entity_id) AND (outer.msgid = inner.msgid)) - Index Scan using msg307_entityid_msgid_idx on msg307 m (cost=0.00..3669.42 rows=58619 width=40) (actual time=0.31..390.01 rows=58619 loops=1) - Sort (cost=1531.39..1533.16 rows=709 width=8) (actual time=118.09..157.45 rows=58218 loops=1) Sort Key: ewm.entity_id, ewm.msgid - Seq Scan on entity_watch_map ewm (cost=0.00..1497.80 rows=709 width=8) (actual time=0.14..114.74 rows=1157 loops=1) Filter: (msg_type = 307) Total runtime: 951.23 msec (8 rows) It still has the bad estimate at the nested loop stage, but it does seem to have a better understanding of the # of rows it will return in the index scan on msg307. This leads me to wonder if there something I could do to improve the estimates on the 8.1 machine? Robert Treat -- Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL ---(end of broadcast)--- TIP 6: explain analyze is your friend
[PERFORM] sum of left join greater than its parts
(cost=0.00..111.75 rows=25752 width=0) (actual time=4.271..4.271 rows=25542 loops=1) - Hash (cost=55.95..55.95 rows=1695 width=8) (actual time=5.663..5.663 rows=1695 loops=1) - Seq Scan on myapp_app ia (cost=0.00..55.95 rows=1695 width=8) (actual time=0.006..2.888 rows=1695 loops=1) - Sort (cost=3985.92..4050.30 rows=25752 width=20) (actual time=249.682..286.295 rows=25542 loops=1) Sort Key: public.msg306u.rmsbinaryid, public.msg306u.msgid, public.msg306u.entityid - Seq Scan on msg306u (cost=0.00..1797.28 rows=25752 width=20) (actual time=0.010..80.572 rows=25542 loops=1) Filter: (downloadstatus '0'::text) Total runtime: 540.284 ms (31 rows) i've been banging on this one off and on for awhile now with little progress, can someone explain why it is choosing the initial slower plan and/or how to get it to run something closer to the second faster plan? Robert Treat -- Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL ---(end of broadcast)--- TIP 6: explain analyze is your friend
[PERFORM] query slower on 8.1 than 7.3
) - Index Scan using software_download_host_id on software_download (cost=0.00..615.92 rows=13416 width=96) (actual time=0.019..30.345 rows=13372 loops=1) - Sort (cost=616.56..620.45 rows=1555 width=12) (actual time=45.720..53.265 rows=6407 loops=1) Sort Key: latest_download.host_id - Subquery Scan latest_download (cost=499.13..534.12 rows=1555 width=12) (actual time=42.867..44.763 rows=472 loops=1) - HashAggregate (cost=499.13..518.57 rows=1555 width=16) (actual time=42.862..43.628 rows=472 loops=1) - Hash Join (cost=5.64..477.57 rows=2875 width=16) (actual time=0.206..41.503 rows=623 loops=1) Hash Cond: (outer.software_binary_id = inner.software_binary_id) - Seq Scan on software_download (cost=0.00..377.78 rows=13080 width=16) (actual time=0.007..23.494 rows=13167 loops=1) Filter: ((bds_status_id 6) AND (bds_status_id 17) AND (bds_status_id 18)) - Hash (cost=5.59..5.59 rows=20 width=4) (actual time=0.155..0.155 rows=22 loops=1) - Seq Scan on software_binary (cost=0.00..5.59 rows=20 width=4) (actual time=0.011..0.112 rows=22 loops=1) Filter: ((binary_type_id = 3) OR (binary_type_id = 5) OR (binary_type_id = 6)) - Hash (cost=6418.20..6418.20 rows=1 width=20) (actual time=437.111..437.111 rows=238 loops=1) - Merge Join (cost=6149.96..6418.20 rows=1 width=20) (actual time=367.555..436.667 rows=238 loops=1) Merge Cond: ((outer.rmsbinaryid = inner.rmsbinaryid) AND (outer.msgid = inner.msgid) AND (outer.entityid = inner.entityid)) - Sort (cost=2119.55..2121.03 rows=593 width=16) (actual time=117.104..117.476 rows=323 loops=1) Sort Key: a1.rmsbinaryid, a1.msgid, a1.entityid - Hash Join (cost=2054.19..2092.23 rows=593 width=16) (actual time=114.671..116.280 rows=323 loops=1) Hash Cond: (outer.entityid = inner.myapp_app_id) - HashAggregate (cost=1994.00..2001.41 rows=593 width=12) (actual time=108.909..109.486 rows=323 loops=1) - Seq Scan on msg306u (cost=0.00..1797.28 rows=26230 width=12) (actual time=0.009..68.861 rows=25542 loops=1) Filter: (downloadstatus = '1'::text) - Hash (cost=55.95..55.95 rows=1695 width=8) (actual time=5.736..5.736 rows=1695 loops=1) - Seq Scan on myapp_app ia (cost=0.00..55.95 rows=1695 width=8) (actual time=0.005..2.850 rows=1695 loops=1) - Sort (cost=4030.42..4095.99 rows=26230 width=20) (actual time=250.434..286.311 rows=25542 loops=1) Sort Key: public.msg306u.rmsbinaryid, public.msg306u.msgid, public.msg306u.entityid - Seq Scan on msg306u (cost=0.00..1797.28 rows=26230 width=20) (actual time=0.009..80.478 rows=25542 loops=1) Filter: (downloadstatus '0'::text) Total runtime: 553.409 ms Ah, a beautiful scheme! So given I can't run with enable_nestloop off, anyone have a suggestion on how to get this thing moving in the right direction? I tried raising statistics estimates on some of the columns but that didn't help, though maybe I was raising it on the right columns.. any suggestions there? Or perhaps a better way to write the query... I'm open to suggestions. TIA, Robert Treat -- Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Help tuning postgres
reindex should be faster, since you're not dumping/reloading the table contents on top of rebuilding the index, you're just rebuilding the index. Robert Treat emdeon Practice Services Alachua, Florida On Wed, 2005-10-12 at 13:32, Steve Poe wrote: Would it not be faster to do a dump/reload of the table than reindex or is it about the same? Steve Poe On Wed, 2005-10-12 at 13:21 -0400, Tom Lane wrote: Emil Briggs [EMAIL PROTECTED] writes: Not yet, the db is in production use and I have to plan for a down-time for that... or is it not impacting the activity on the table ? It will cause some performance hit while you are doing it. It'll also lock out writes on the table until the index is rebuilt, so he does need to schedule downtime. regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings -- Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] One tuple per transaction
On Tuesday 15 March 2005 04:37, Richard Huxton wrote: Tambet Matiisen wrote: Now, if typical inserts into your most active table occur in batches of 3 rows, in one transaction, then row count for this table is updated 3 times during transaction. 3 updates generate 3 tuples, while 2 of them are dead from the very start. You effectively commit 2 useless tuples. After millions of inserts you end up with rowcounts table having 2/3 of dead tuples and queries start to slow down. Current solution is to vacuum often. My proposal was to create new tuple only with first update. The next updates in the same transaction would update the existing tuple, not create a new. How do you roll back to a savepoint with this model? You can't, but you could add the caveat to just do this auto-reuse within any given nested transaction. Then as long as you aren't using savepoints you get to reclaim all the space/ On a similar note I was just wondering if it would be possible to mark any of these dead tuples as ready to be reused at transaction commit time, since we know that they are dead to any and all other transactions currently going on. This would save you from having to vacuum to get the tuples marked ready for reuse. In the above scenario this could be a win, whether it would be overall is hard to say. -- Robert Treat Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM] column name is LIMIT
Yeah... how come no one told him don't do that? LIMIT is an SQL reserved word, so it's likely to cause trouble in any database you try to use it on... I'd strongly recommend renaming that column asap. You can see other reserved words at http://www.postgresql.org/docs/8.0/interactive/sql-keywords-appendix.html Robert Treat On Mon, 2005-03-14 at 03:55, Christopher Kings-Lynne wrote: You will still need to use double quotes in 8.0.1... Chris Gourish Singbal wrote: Thanks a lot, we might be upgrading to 8.0.1 soon.. till than using double quotes should be fine. regards gourish On Mon, 14 Mar 2005 18:25:22 +1100, Russell Smith [EMAIL PROTECTED] wrote: On Mon, 14 Mar 2005 06:14 pm, Gourish Singbal wrote: Guys, I am having a problem firing queries on one of the tables which is having limit as the column name. If a run an insert/select/update command on that table i get the below error. ERROR: syntax error at or near limit at character 71 select limit from limit_table WHERE limit 50 LIMIT 2; You need to quote the field name, and make sure the case is correct. Any Help would be realyl great to solve the problem. postgresql 7.4.5 and linux OS You should probably upgrade to 7.4.7 Regards Russell Smith. ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED]) -- Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Low Performance for big hospital server ..
On Monday 03 January 2005 10:40, [EMAIL PROTECTED] wrote: I realize you may be stuck with 7.3.x but you should be aware that 7.4 is considerably faster, and 8.0 appears to be even faster yet. There are a little bit incompatibility between 7.3 -8 , so rather difficult to change. Sure, but even moving to 7.4 would be a bonus, especially if you use a lot of select * from tab where id in (select ... ) type queries, and the incompataibility is less as well. I would seriously consider upgrading, if at all possible. A few more hints. One thing I didn't see mentioned that should have been was to watch for index bloat, which was a real problem on 7.3 machines. You can determine which indexes are bloated by studying vacuum output or by comparing index size on disk to table size on disk. Another thing I didn't see mentioned was to your free space map settings. Make sure these are large enough to hold your data... max_fsm_relations should be larger then the total # of tables you have in your system (check the archives for the exact query needed) and max_fsm_pages needs to be big enough to hold all of the pages you use in a day... this is hard to calculate in 7.3, but if you look at your vacuum output and add the number of pages cleaned up for all tables, this could give you a good number to work with. It would certainly tell you if your setting is too small. -- Robert Treat Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] slony replication
I didn't see any responses to this, but given it is off topic for both groups that wouldn't surprise me. In the future please direct these questions to the slony project mailing lists. On Monday 20 December 2004 17:25, sarlav kumar wrote: Hi All, I installed slony1.0.5 and tried the example replication of pgbench database. That seemed to work. Now I need to replicate a DB running on a different server. slony1.0.5 is installed on the Fedora core 3 machine where Postgres 7.4.6 is installed. I have to replicate the 'test' database installed on a different machine using Postgres 7.3.2. In the instructions to replicate the pgbench example, there is script file to create the initial configuration for the master-slave setup of the pgbench database. Is this the script file that has to be modified accordingly, to replicate my 'test' DB. And ofcourse, the shell variables have to be changed to indicate the correct location of the master and slave DBs. Am I right? More or less. The scripts provided are just examples, but you can modify them to suite your einvironment rather than write your own. Also, in the script, the following lines are used to create sets of tables: # Slony-I organizes tables into sets. The smallest unit a node can # subscribe is a set. The following commands create one set containing # all 4 pgbench tables. The master or origin of the set is node 1. #-- create set (id=1, origin=1, comment='All pgbench tables'); set add table (set id=1, origin=1, id=1, fully qualified name = 'public.accounts', comment='accounts table'); set add table (set id=1, origin=1, id=2, fully qualified name = 'public.branches', comment='branches table'); set add table (set id=1, origin=1, id=3, fully qualified name = 'public.tellers', comment='tellers table'); set add table (set id=1, origin=1, id=4, fully qualified name = 'public.history', comment='history table', key = serial); #-- Can this be skipped? I have over 200 tables, and I am not sure if I have to list each of them in the set add table part of the scripts file. nope, you have to do them all, and dont forget the sequences. easiest way i found was to generate the list programatically around a select * from pg_class with appropriate where clause to get just the desired tables. Do I need to change any of the other scripts file in the example? Chances are yes, since those scripts were written for the example scenario provided, and your environment is sure to be different. Again, post to the slony mailing lists if you need more help. -- Robert Treat Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] vacuum full max_fsm_pages question
On Tuesday 21 September 2004 00:01, Patrick Hatcher wrote: Hello. Couple of questions: - Q1: Today I decided to do a vacuum full verbose analyze on a large table that has been giving me slow performance. And then I did it again. I noticed that after each run the values in my indexes and estimate row version changed. What really got me wondering is the fact my indexes report more rows than are in the table and then the estimated rows is less than the actual amount. The table is a read-only table that is updated 1/wk. After updating it is vacuumed full. I've also tried reindexing but the numbers still change. Is this normal? Below is a partial output for 4 consecutive vacuum full analyzes. No data was added nor was there anyone in the table. This looks normal to me for a pre 7.4 database, if I am right your running on 7.2? Basically your indexes are overgrown, so each time you run vacuum you are shrinking the number of pages involved, which will change the row counts, and correspondingly change the count on the table as the sampled pages change. - Q2: I have about a dozen 5M plus row tables. I currently have my max_fsm_pages set to 300,000. As you can see in vacuum full output I supplied, one table is already over this amount. Is there a limit on the size of max_fsm_pages? The limit is based on your memory... each page = 6 bytes. But according to the output below you are not over 30 pages yet on that table (though you might be on some other tables.) CONF settings: # - Memory - shared_buffers = 2000 # min 16, at least max_connections*2, 8KB each sort_mem = 12288# min 64, size in KB #vacuum_mem = 8192 # min 1024, size in KB # - Free Space Map - max_fsm_pages = 30 # min max_fsm_relations*16, 6 bytes each max_fsm_relations = 500 # min 100, ~50 bytes each Vacuum full information #after second vacuum full INFO: index emaildat_fkey now contains 8053743 row versions in 25764 pages DETAIL: 1895 index row versions were removed. 0 index pages have been deleted, 0 are currently reusable. CPU 2.38s/0.42u sec elapsed 11.11 sec. INFO: analyzing cdm.cdm_email_data INFO: cdm_email_data: 65882 pages, 3000 rows sampled, 392410 estimated total rows #after third vacuum full INFO: index emaildat_fkey now contains 8052738 row versions in 25769 pages DETAIL: 890 index row versions were removed. 0 index pages have been deleted, 0 are currently reusable. CPU 2.08s/0.32u sec elapsed 4.36 sec. INFO: analyzing cdm.cdm_email_data INFO: cdm_email_data: 65874 pages, 3000 rows sampled, 392363 estimated total rows #after REINDEX and vacuum full INFO: index emaildat_fkey now contains 8052369 row versions in 25771 pages DETAIL: 521 index row versions were removed. 0 index pages have been deleted, 0 are currently reusable. CPU 1.37s/0.35u sec elapsed 4.79 sec. INFO: analyzing cdm.cdm_email_data INFO: cdm_email_data: 65869 pages, 3000 rows sampled, 392333 estimated total rows #After vacuum full(s) mdc_oz=# select count(*) from cdm.cdm_email_data; count - 5433358 (1 row) I do think the count(*) seems a bit off based on the vacuum output above. I'm guessing you either have blocking transactions in the way or your not giving us a complete copy/paste of the session involved. -- Robert Treat Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] [HACKERS] [SQL] Materialized View Summary
On Tue, 2004-02-24 at 12:11, Richard Huxton wrote: On Tuesday 24 February 2004 16:11, Jonathan M. Gardner wrote: I've written a summary of my findings on implementing and using materialized views in PostgreSQL. I've already deployed eagerly updating materialized views on several views in a production environment for a company called RedWeek: http://redweek.com/. As a result, some queries that were taking longer than 30 seconds to run now run in a fraction of a millisecond. You can view my summary at http://jonathangardner.net/PostgreSQL/materialized_views/matviews.html have you done much concurrency testing on your snapshot views? I implemented a similar scheme in one of my databases but found problems when I had concurrent refresh attempts. I ended up serializing the calls view LOCKing, which was ok for my needs, but I thought potentially problematic in other cases. Interesting (and well written) summary. Even if not a built in feature, I'm sure that plenty of people will find this useful. Make sure it gets linked to from techdocs. Done. :-) If you could identify candidate keys on a view, you could conceivably automate the process even more. That's got to be possible in some cases, but I'm not sure how difficult it is to do in all cases. it seems somewhere between Joe Conways work work arrays and polymorphic functions in 7.4 this should be feasible. Robert Treat -- Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM] Pl/Pgsql Functions running simultaneously
Uh... I don't think this is necessarily the wrong list, sometimes people don't have much to chime in. You could try reposting to -sql or -general I suppose. As for my take on your questions, I wasn't exactly clear on what the problem is. If its just that things seem slow, make sure you have done the appropriate vacuum/analyze/reindex tech and then try adding some debug info to the function to determine where in the function it is slowing down. queries inside plpgsql functions will take locks as needed, but they are no different than regular statements, just keep in mind that the queries inside the function will work like an implicit transaction. Robert Treat On Thu, 2004-06-03 at 17:38, Marcus Whitney wrote: Am I on the wrong list to ask this question, or does this list usually have low activity? Just asking because I am new and I need to know where to ask this question. Thanks. On Wednesday 02 June 2004 16:08, Marcus Whitney wrote: Hello all, I have an import function that I have been working on for some time now, and it performed well up until recently. It is doing a lot, and because the queries are not cached, I am not sure if that is what the problem is. If a function takes a while, does it lock any of the tables it is accessing, even for SELECT? Below is the bulk of the function: -- set sql statement variables create_import_file_sql := ''COPY '' || container_table || '' ('' || filtered_container_columns || '') TO '' || quote_literal(formatted_import_file) || '' WITH NULL AS '' || null_single_quotes; upload_to_import_table_sql := ''COPY '' || import_table || '' ('' || field_names || '') FROM '' || quote_literal(formatted_import_file) || '' WITH NULL AS '' || null_single_quotes; clean_personalization_fields_sql := ''UPDATE '' || import_table || '' SET emma_member_email = btrim(emma_member_email, '' || quote_literal(quoted_single_quote) || '') , emma_member_name_first = btrim(emma_member_name_first, '' || quote_literal(quoted_single_quote) || '') , emma_member_name_last = btrim(emma_member_name_last, '' || quote_literal(quoted_single_quote) || '') ;''; clean_personalization_fields_sql2 := ''UPDATE '' || import_table || '' SET emma_member_email = btrim(emma_member_email) , emma_member_name_first = btrim(emma_member_name_first) , emma_member_name_last = btrim(emma_member_name_last) ;''; set_account_id_sql := ''UPDATE '' || import_table || '' SET emma_account_id = '' || account_id; set_default_active_status_sql := ''UPDATE '' || import_table || '' SET emma_member_status_id = 1''; set_errors_for_null_email_sql := ''UPDATE '' || import_table || '' SET emma_member_status_id = 2 WHERE emma_member_email IS NULL''; record_null_email_count_sql := ''UPDATE '' || import_history_table || '' SET emma_import_null_email_count = (SELECT COUNT(*) FROM '' || import_table || '' WHERE emma_member_email IS NULL) WHERE emma_import_history_id ='' || import_history_id; set_errors_for_invalid_email_sql := ''UPDATE '' || import_table || '' SET emma_member_status_id = 2 WHERE emma_member_email !~* '' || email_regex; record_invalid_email_count_sql := ''UPDATE '' || import_history_table || '' SET emma_import_invalid_email_count = ( SELECT COUNT(*) FROM '' || import_table || '' WHERE emma_member_email !~* '' || email_regex || '' ) WHERE emma_import_history_id ='' || import_history_id; get_dupes_in_import_sql := ''SELECT emma_member_email, emma_member_status_id FROM '' || import_table || '' GROUP BY emma_member_email, emma_member_status_id having count(*) 1''; insert_dupes_sql := ''INSERT INTO '' || dupe_table || '' SELECT * FROM '' || import_table || '' WHERE LOWER(emma_member_email) = LOWER('' || member_table || ''.emma_member_email)''; record_table_dupe_count_sql := ''UPDATE '' || import_history_table || '' SET emma_import_table_dupe_email_count = (SELECT COUNT(*) FROM '' || import_table || '' WHERE emma_member_email = LOWER('' || member_table || ''.emma_member_email)) WHERE emma_import_history_id ='' || import_history_id; remove_dupes_from_import_table_sql := ''DELETE FROM '' || import_table || '' WHERE LOWER(emma_member_email) = LOWER('' || member_table || ''.emma_member_email)''; create_clean_import_file_sql := ''COPY '' || import_table || '' TO '' || quote_literal(clean_import_file) || '' WITH NULL AS '' || null_single_quotes; create_members_groups_ids_file_sql := ''COPY '' || import_table || '' (emma_member_id) TO '' || quote_literal(members_groups_ids_file) || '' WITH NULL AS '' || null_single_quotes; empty_import_table_sql := ''TRUNCATE '' || import_table; upload_clean_import_sql := ''COPY '' || member_table || '' FROM '' || quote_literal(clean_import_file) || '' WITH NULL AS '' || null_single_quotes
Re: [PERFORM] PostgreSQL caching
On Tue, 2004-05-25 at 15:53, Vitaly Belman wrote: QUERY PLAN -- Limit (cost=2337.41..2337.43 rows=10 width=76) (actual time=7875.000..7875.000 rows=10 loops=1) - Sort (cost=2337.41..2337.94 rows=214 width=76) (actual time=7875.000..7875.000 rows=10 loops=1) Sort Key: bv_books.vote_avg - Nested Loop (cost=0.00..2329.13 rows=214 width=76) (actual time=16.000..7844.000 rows=1993 loops=1) - Index Scan using i_bookgenres_genre_id on bv_bookgenres (cost=0.00..1681.54 rows=214 width=4) (actual time=16.000..3585.000 rows=1993 loops=1) Index Cond: (genre_id = 5830) - Index Scan using bv_books_pkey on bv_books (cost=0.00..3.01 rows=1 width=76) (actual time=2.137..2.137 rows=1 loops=1993) Index Cond: (bv_books.book_id = quot;outerquot;.book_id) Total runtime: 7875.000 ms A question and two experiments... what version of postgresql is this? Try reindexing i_bookgenres_genre_id and capture the explain analyze for that. If it doesn't help try doing set enable_indexscan = false and capture the explain analyze for that. Robert Treat -- Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM] Interpreting vmstat
different. troll vmstat while you do this to see if there is bi occurring. I probably should mention that just because you see activity on bi doesn't mean that you'll notice any difference in performance against running the query with no bi, it's dependent on a number of factors really. Oh, and as the other poster alluded to, knock down your shared buffers by about 50% and see where that gets you. I might also knock *up* your effective cache size... try doubling that and see how things go. Hope this helps... and others jump in with corrections if needed. Robert Treat -- Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] query slows down with more accurate stats
On Tue, 2004-04-13 at 15:18, Tom Lane wrote: Robert Treat [EMAIL PROTECTED] writes: Well, the first problem is why is ANALYZE's estimate of the total row count so bad :-( ? I suspect you are running into the situation where the initial pages of the table are thinly populated and ANALYZE mistakenly assumes the rest are too. That was my thinking, which is somewhat confirmed after a vacuum full on the table; now analyze gives pretty accurate states. Of course the downside is that now the query is consistently slower. so i guess i am wondering if there is something I should be doing to help get the better plan at the more accurate stats levels and/or why it doesn't stick with the original plan (I noticed disabling merge joins does seem to push it back to the original plan). With the larger number of estimated rows it's figuring the nestloop will be too expensive. The row estimate for the cl scan went up from 1248 to 10546, so the estimated cost for the nestloop plan would go to about 24 units vs 8 for the mergejoin plan. This is obviously off rather badly when the true runtimes are 1.7 vs 8.1 seconds :-(. I think this is an example of a case where we really need better estimation of nestloop costs --- it's drastically overestimating the relative cost of the nestloop because it's not accounting for the cache benefits of the repeated index searches. You could probably force the nestloop to be chosen by lowering random_page_cost, but that's just a kluge solution ... the real problem is the model is wrong. Unfortunately playing with random_page_cost doesn't seem to be enough to get it to favor the nested loop... though setting it down to 2 does help overall. played with index_cpu_tuple_cost a bit but that seemed even less useful. aggravating when you know there is a better plan it could pick but no (clean) way to get it to do so... I have a to-do item to work on this, and will try to bump up its priority a bit. I'll keep an eye out, thanks Tom. Robert Treat -- Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[PERFORM] query slows down with more accurate stats
In the process of optimizing some queries, I have found the following query seems to degrade in performance the more accurate I make the statistics on the table... whether by using increased alter table ... set statistics or by using vacuum.. SELECT count( cl.caller_id ), npanxx.city, npanxx.state FROM cl LEFT OUTER JOIN npanxx on substr( cl.caller_id, 1, 3 ) = npanxx.npa and substr( cl.caller_id, 4, 3 ) = npanxx.nxx LEFT OUTER JOIN cp ON cl.caller_id = cp.caller_id WHERE cl.ivr_system_id = 130 AND cl.call_time '2004-03-01 06:00:00.0 CST' AND cl.call_time '2004-04-01 06:00:00.0 CST' AND cp.age = 18 AND cp.age = 24 AND cp.gender = 'm' GROUP BY npanxx.city, npanxx.state live=# analyze cl; ANALYZE live=# select reltuples from pg_class where relname = 'cl'; reltuples --- 53580 (1 row) live=# select count(*) from cl; count - 1140166 (1 row) The plan i get under these conditions is actually pretty good... HashAggregate (cost=28367.22..28367.66 rows=174 width=32) (actual time=1722.060..1722.176 rows=29 loops=1) - Nested Loop (cost=0.00..28365.92 rows=174 width=32) (actual time=518.592..1716.254 rows=558 loops=1) - Nested Loop Left Join (cost=0.00..20837.33 rows=1248 width=32) (actual time=509.991..1286.755 rows=13739 loops=1) - Index Scan using cl_ivr_system_id on cl (cost=0.00..13301.15 rows=1248 width=14) (actual time=509.644..767.421 rows=13739 loops=1) Index Cond: (ivr_system_id = 130) Filter: ((call_time '2004-03-01 07:00:00-05'::timestamp with time zone) AND (call_time '2004-04-01 07:00:00-05'::timestamp with time zone)) - Index Scan using npanxx_pkey on npanxx (cost=0.00..6.02 rows=1 width=32) (actual time=0.025..0.027 rows=1 loops=13739) Index Cond: ((substr((outer.caller_id)::text, 1, 3) = (npanxx.npa)::text) AND (substr((outer.caller_id)::text, 4, 3) = (npanxx.nxx)::text)) - Index Scan using cp_pkey on cp (cost=0.00..6.02 rows=1 width=14) (actual time=0.027..0.027 rows=0 loops=13739) Index Cond: ((outer.caller_id)::text = (cp.caller_id)::text) Filter: ((age = 18) AND (age = 24) AND (gender = 'm'::bpchar)) Total runtime: 1722.489 ms (12 rows) but when i do live=# vacuum cl; VACUUM live=# select reltuples from pg_class where relname = 'cl'; reltuples - 1.14017e+06 (1 row) (or alternatively increase the stats target on the table) I now get the following plan: HashAggregate (cost=80478.74..80482.41 rows=1471 width=32) (actual time=8132.261..8132.422 rows=29 loops=1) - Merge Join (cost=79951.95..80467.70 rows=1471 width=32) (actual time=7794.338..8130.041 rows=558 loops=1) Merge Cond: (outer.?column4? = inner.?column2?) - Sort (cost=55719.06..55745.42 rows=10546 width=32) (actual time=4031.827..4052.526 rows=13739 loops=1) Sort Key: (cl.caller_id)::text - Merge Right Join (cost=45458.30..55014.35 rows=10546 width=32) (actual time=2944.441..3796.787 rows=13739 loops=1) Merge Cond: (((outer.npa)::text = inner.?column2?) AND ((outer.nxx)::text = inner.?column3?)) - Index Scan using npanxx_pkey on npanxx (cost=0.00..8032.99 rows=132866 width=32) (actual time=0.200..461.767 rows=130262 loops=1) - Sort (cost=45458.30..45484.67 rows=10546 width=14) (actual time=2942.994..2967.935 rows=13739 loops=1) Sort Key: substr((cl.caller_id)::text, 1, 3), substr((cl.caller_id)::text, 4, 3) - Seq Scan on cl (cost=0.00..44753.60 rows=10546 width=14) (actual time=1162.423..2619.662 rows=13739 loops=1) Filter: ((ivr_system_id = 130) AND (call_time '2004-03-01 07:00:00-05'::timestamp with time zone) AND (call_time '2004-04-01 07:00:00-05'::timestamp with time zone)) - Sort (cost=24232.89..24457.06 rows=89667 width=14) (actual time=3761.703..3900.340 rows=98010 loops=1) Sort Key: (cp.caller_id)::text - Seq Scan on cp (cost=0.00..15979.91 rows=89667 width=14) (actual time=0.128..1772.215 rows=100302 loops=1) Filter: ((age = 18) AND (age = 24) AND (gender = 'm'::bpchar)) Total runtime: 8138.607 ms (17 rows) so i guess i am wondering if there is something I should be doing to help get the better plan at the more accurate stats levels and/or why it doesn't stick with the original plan (I noticed disabling merge joins does seem to push it back to the original plan). alternatively if anyone has any general suggestions on speeding up the query I'd be open to that too :-) Robert Treat -- Build A Brighter Lamp
Re: [PERFORM] column size too large, is this a bug?
On Sunday 28 March 2004 14:25, Josh Berkus wrote: Andrew, I used to use the connect-by patch, but have since rewritten everything to use a nested set model. Cool! You're probably the only person I know other than me using nested sets in a production environment. You cut me deep there Josh, real deep. :-) If you search the pgsql-sql archives you'll find some helpful threads on using nested sets in PostgreSQL, one in particular I was involved with was a generic move_tree function that enabled moving a node from one branch to another. Robert Treat -- Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Nested Sets WAS: column size too large, is this a bug?
On Tuesday 30 March 2004 11:38, Josh Berkus wrote: Robert, If you search the pgsql-sql archives you'll find some helpful threads on using nested sets in PostgreSQL, one in particular I was involved with was a generic move_tree function that enabled moving a node from one branch to another. I have to admit to failing to follow -SQL over the last few months.This list and Hackers are pretty much the only ones I read all of. Maybe I should get back on -SQL and we can compare move_tree functions :-) Did yours use a temp table, or some other means? Nope, Greg Mullane and I worked out the math and came up with an algorithm of sorts that we could apply to the tree when moving elements. digs a little http://archives.postgresql.org/pgsql-sql/2002-11/msg00355.php Seemed to work though someone else had posted yet another version after ours... and in fact the one posted is not exactly what I use now either :-) Robert Treat -- Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Scaling further up
On Mon, 2004-03-08 at 11:40, William Yu wrote: Anjan Dave wrote: Great response, Thanks. Regarding 12GB memory and 13G db, and almost no I/O, one thing I don't understand is that even though the OS caches most of the memory and PG can use it if it needs it, why would the system swap (not much, only during peak times)? The SHMMAX is set to 512MB, shared_buffers is 150MB, effective cache size is 2GB, sort mem is 2MB, rest is default values. It also happens that a large query (reporting type) can hold up the other queries, and the load averages shoot up during peak times. In regards to your system going to swap, the only item I see is sort_mem at 2MB. How many simultaneous transactions do you get? If you get hundreds or thousands like your first message stated, every select sort would take up 2MB of memory regardless of whether it needed it or not. That could cause your swap activity during peak traffic. The only other item to bump up is the effective cache size -- I'd set it to 12GB. Was surprised that no one corrected this bit of erroneous info (or at least I didn't see it) so thought I would for completeness. a basic explanation is that sort_mem controls how much memory a given query is allowed to use before spilling to disk, but it will not grab that much memory if it doesn't need it. See the docs for a more detailed explanation: http://www.postgresql.org/docs/7.4/interactive/runtime-config.html#RUNTIME-CONFIG-RESOURCE Robert Treat -- Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] [HACKERS] [SQL] Materialized View Summary
On Wed, 2004-02-25 at 03:19, Jonathan M. Gardner wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 I'm not sure if my original reply made it through. Ignore the last one if it did. But I liked the last one :-) On Tuesday 24 February 2004 1:48 pm, Robert Treat wrote: On Tue, 2004-02-24 at 12:11, Richard Huxton wrote: On Tuesday 24 February 2004 16:11, Jonathan M. Gardner wrote: I've written a summary of my findings on implementing and using materialized views in PostgreSQL. I've already deployed eagerly updating materialized views on several views in a production environment for a company called RedWeek: http://redweek.com/. As a result, some queries that were taking longer than 30 seconds to run now run in a fraction of a millisecond. You can view my summary at http://jonathangardner.net/PostgreSQL/materialized_views/matviews.h tml have you done much concurrency testing on your snapshot views? I implemented a similar scheme in one of my databases but found problems when I had concurrent refresh attempts. I ended up serializing the calls view LOCKing, which was ok for my needs, but I thought potentially problematic in other cases. We are running into some small problems with deadlocks and multiple inserts. It's not a problem unless we do a mass update to the data or something like that. I'm interested in how you solved your problem. Well, I have two different cases actually. In one case I have a master table with what are essentially 4 or 5 matviews based off of that. I don't allow updates to the matviews, only to the master table, and only via stored procedures. This would work better if locking semantics inside of pl functions worked properly, but currently we have the application lock the table in exclusive access mode and then call the function to make the data changes which then fires off a function to update the matviews. Since it's all within a transaction, readers of the matviews are oblivious to the change. IMO this whole method is a wizardry in database hack jobs that I would love to replace. The second case, and this one being much simpler, started out as a view that does aggregation across several other views and tables, which is pretty resource intensive but only returns 4 rows. I refresh the matview via a cron job which basically does a SELECT * FOR UPDATE on the matview, deletes the entire contents, then does an INSERT INTO matview SELECT * FROM view. Again since it's in a transaction, readers of the matview are happy (and apps are only granted select on the matview). Concurrency is kept because the cron job must wait to get a LOCK on the table before it can proceed with the delete/update. I have a feeling that this method could fall over given a high enough number of concurrent updaters, but works pretty well for our needs. I am playing with an exclusive lock scheme that will lock all the materialized views with an exclusive lock (see Section 12.3 for a reminder on what exactly this means). The locks have to occur in order, so I use a recursive function to traverse a dependency tree to the root and then lock from there. Right now, we only have one materialized view tree, but I can see some schemas having multiple seperate trees with multiple roots. So I put in an ordering to lock the tables in a pre-defined order. But if the two dependency trees are totally seperate, it is possible for one transaction to lock tree A and then tree B, and for another to lock tree B and then tree A, causing deadlock. Unfortunately, I can't force any update to the underlying tables to force this locking function to be called. So we will probably call this manually before we touch any of those tables. Yeah, I ran into similar problems as this, but ISTM you could do a before update trigger on the matview to do the locking (though I'd guess this would end in trouble due to plpgsql lock semantics, so maybe i shouldn't send you down a troubled road...) In the future, it would be nice to have a hook into the locking mechanism so any kind of lock on the underlying tables can trigger this. Also, building the dependency trees is completely manual. Until I can get some functions to actually assemble the triggers and such, automatic building of the trees will be difficult. I just noticed that your summary doesn't make use of postgresql RULES in any way, how much have you traveled down that path? We had cooked up a scheme for our second case where we would have a table that held an entry for the matview and then a timestamp of the last update/insert into any of the base tables the matview depended on. when then would create rules on all the base tables to do an update to the refresh table any time they were updated/inserted/deleted. We would then put a corresponding rule on the matview so that on each select from the matview, it would check to see if any of it's base tables had
Re: [PERFORM] [pgsql-advocacy] MySQL+InnoDB vs. PostgreSQL test?
On Mon, 2004-02-02 at 12:21, Josh Berkus wrote: Folks, I've had requests from a couple of businesses to see results of infomal MySQL +InnoDB vs. PostgreSQL tests.I know that we don't have the setup to do full formal benchmarking, but surely someone in our community has gone head-to-head on your own application? We have the setup to do informal benchmarking via OSDL, but afaik mysql doesn't conform to any of the dbt benchmarks... Robert Treat -- Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] COUNT Pagination
On Mon, 2004-01-12 at 10:37, David Shadovitz wrote: If you only need the count when you've got the results, most PG client interfaces will tell you how many rows you've got. What language is your app in? PHP. But I have only a subset of the results, retrieved via a query with a LIMIT m clause, so $pg_numrows is m. And retrieving all results (i.e. no LIMIT) is at least as expensive as COUNT(*). Depending on frequency of updates and need for real time info, you could cache the count in session as long as the user stays within the given piece of your app. Robert Treat -- Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Select max(foo) and select count(*) optimization
On Tue, 2004-01-06 at 07:20, Shridhar Daithankar wrote: On Tuesday 06 January 2004 17:48, D'Arcy J.M. Cain wrote: On January 6, 2004 01:42 am, Shridhar Daithankar wrote: cert=# select relpages,reltuples::bigint from pg_class where relname= 'certificate'; relpages | reltuples --+--- 399070 | 24858736 (1 row) But: cert=# select count(*) from certificate; [*Crunch* *Crunch* *Crunch*] count -- 19684668 (1 row) Am I missing something? Max certificate_id is 20569544 btw. Do 'vacuum analyze certificate' and try..:-) The numbers from pg_class are estimates updated by vacuum /analyze. Of course you need to run vacuum frequent enough for that statistics to be updated all the time or run autovacuum daemon.. Ran into same problem on my machine till I remembered about vacuum..:-) Actually you only need to run analyze to update the statistics. Robert Treat -- Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
[PERFORM] sequence overhead
Just wondering if anyone has done any testing on the amount of overhead for insert you might gain by adding a serial column to a table. I'm thinking of adding a few to some tables that get an average of 30 - 40 inserts per second, sometimes bursting over 100 inserts per second and wondering if there will be any noticeable impact. Robert Treat -- Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] A question on the query planner
On Mon, 2003-12-01 at 16:44, Jared Carr wrote: I am currently working on optimizing some fairly time consuming queries on a decently large dataset. The Following is the query in question. SELECT z.lat, z.lon, z.city, z.state, q.date_time, c.make, c.model, c.year FROM quotes AS q, zips AS z, cars AS c WHERE z.zip = q.zip AND c.car_id = q.car_id AND z.state != 'AA' AND z.state != 'AE' AND z.state != 'AP' AND z.state = 'WA' ORDER BY date_time; This wont completely solve your problem, but z.state = 'WA' would seem to be mutually exclusive of the != AA|AE|AP. While it's not much, it is extra overhead there doesn't seem to be any need for... Robert Treat -- Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [HACKERS] [PERFORM] More detail on settings for pgavd?
On Thu, 2003-11-20 at 19:40, Matthew T. O'Connor wrote: I'm open to discussion on changing the defaults. Perhaps what it would be better to use some non-linear (perhaps logorithmic) scaling factor. So that you wound up with something roughly like this: #tuples activity% for vacuum 1k 100% 10k 70% 100k 45% 1M20% 10M 10% 100M 8% Just thinking out loud here, so disregard if you think its chaff but... if we had a system table pg_avd_defaults that held what we generally consider the best default percentages based on reltuples/pages, and added a column to pg_class (could be some place better but..) which could hold an overriding percentage, you could then have a column added to pg_stat_all_tables called vacuum_percentage, which would be a coalesce of the override percentage or the default percentages based on rel_tuples (or rel_pages). This would give autovacuum a place to look for each table as to when it should vacuum, and gives administrators the option to tweak it on a per table basis if they find they need a specific table to vacuum at a different rate than the standard. Robert Treat -- Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] Value of Quad vs. Dual Processor machine
On Wed, 2003-11-12 at 09:28, Jeff wrote: On Tue, 11 Nov 2003 21:13:19 -0500 Chris Field [EMAIL PROTECTED] wrote: we are looking at Xeon, We are currently running it on a quad sun v880 compiled to be 64bit and have been getting dreadful performance. I don't think we really have much to gain from going 64bit. By chance, are you running 7.3.4 on that sun? If so, try this: export CFLAGS=-02 ./configure and rebuild PG. Before 7.4 PG was build with _no_ optimization on Solaris. Recompiling gives __HUGE__ (notice the underscores) performance gains. And onto the dual vs quad. PG will only use 1 cpu / connection / query. So if your machine iwll have 1-2 queries running at a time those other 2 proc's will sit around idling. However if you are going to have a bunch going, 4 cpus will be most useful. One of hte nicest things to do for PG is more ram and fast IO. It really loves those things. We've just started kicking around the idea of moving one of our boxes to a quad-proc machine from a dual. Under normal circumstances the 2 processors handle maybe 200 transactions per second with 90% system idle. However we have people who occasionally run historical reports on our data, and those reports are fairly CPU intensive. Usually it is not a problem for the main web system, but when pg_dump is running, that is also cpu intensive, so we end up with two highly cpu intensive items running on our machine, and we start to notice issues on the main web system. Robert Treat -- Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] IN surpasses NOT EXISTS in 7.4RC2 ??
It is believed that the IN optimization can lead to faster IN times than EXIST times on some queries, the extent of which is still a bit of an unknown. (Incidentally is there an FAQ item on this that needs updating?) Does the not exist query produce worse results in 7.4 than it did in 7.3? Robert Treat On Thu, 2003-11-13 at 02:53, Rajesh Kumar Mallah wrote: Hi, NOT EXISTS is taking almost double time than NOT IN . I know IN has been optimised in 7.4 but is anything wrong with the NOT EXISTS? I have vaccumed , analyze and run the query many times still not in is faster than exists : Regds Mallah. NOT IN PLAN tradein_clients=# explain analyze SELECT count(*) from general.profile_master where profile_id not in (select profile_id from general.account_profiles ) ; QUERY PLAN - Aggregate (cost=32238.19..32238.19 rows=1 width=0) (actual time=5329.206..5329.207 rows=1 loops=1) - Seq Scan on profile_master (cost=4458.25..31340.38 rows=359125 width=0) (actual time=1055.496..4637.908 rows=470386 loops=1) Filter: (NOT (hashed subplan)) SubPlan - Seq Scan on account_profiles (cost=0.00..3817.80 rows=256180 width=4) (actual time=0.061..507.811 rows=256180 loops=1) Total runtime: 5337.591 ms (6 rows) tradein_clients=# explain analyze SELECT count(*) from general.profile_master where not exists (select profile_id from general.account_profiles where profile_id=general.profile_master.profile_id ) ; QUERY PLAN --- Aggregate (cost=1674981.97..1674981.97 rows=1 width=0) (actual time=14600.386..14600.387 rows=1 loops=1) - Seq Scan on profile_master (cost=0.00..1674084.16 rows=359125 width=0) (actual time=13.687..13815.798 rows=470386 loops=1) Filter: (NOT (subplan)) SubPlan - Index Scan using account_profiles_profile_id on account_profiles (cost=0.00..4.59 rows=2 width=4) (actual time=0.013..0.013 rows=0 loops=718250) Index Cond: (profile_id = $0) Total runtime: 14600.531 ms ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED]) -- Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] IN surpasses NOT EXISTS in 7.4RC2 ??
On Thu, 2003-11-13 at 12:00, Tom Lane wrote: Robert Treat [EMAIL PROTECTED] writes: Does the not exist query produce worse results in 7.4 than it did in 7.3? EXISTS should work the same as before. right. the original poster is asking if there is something wrong with exist based on the comparison to IN, he needs to compare it vs. 7.3 EXISTS to determine if something is wrong. Robert Treat -- Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
[PERFORM] redundent index?
I just noticed on one of my tables I have the following two indexes: Indexes: entity_watch_map_pkey primary key btree (entity_id, watch_id), ewm_entity_id btree (entity_id), I can't think of why the second index is there, as ISTM there is no instance where the first index wouldn't be used in place of the second one if i were to delete the second one. its a heavily updated table, so axing the second one would be a bonus for performance, am i missing something? Thanks in advance, Robert Treat -- Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] SRFs ... no performance penalty?
On Mon, 2003-10-20 at 20:55, Josh Berkus wrote: Folks, I'm working on the demo session for our upcoming presentation at PHPCon. As a side issue, we ended up comparing 3 versions of the same search screen: 1) All in PHP with views; 2) Using a function to build a query and count results but executing that query directly and sorting, paging in PHP; 3) Using a Set Returning function to handle row-returning, sorting, and paging. All three methods were executing a series moderately complex query against a medium-sized data set (only about 20,000 rows but it's on a laptop). The postgresql.conf was tuned like a webserver; e.g. low sort_mem, high max_connections. So far, on the average of several searches, we have: 1) 0.19687 seconds 2) 0.20667 seconds 3) 0.20594 seconds Is this measuring time in the back-end or total time of script execution? In our tests, using any kind of PL/pgSQL function seems to carry a 0.01 second penalty over using PHP to build the search query. I'm not sure if this is comparitive time for string-parsing or something else; the 0.01 seems to be consistent regardless of scale. The difference between using a PL/pgSQL function as a query-builder only (the 7.2.x method) and using SRFs was small enough not to be significant. -- -Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 8: explain analyze is your friend Robert Treat -- Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] free space map usage
On Tue, 2003-10-14 at 15:43, Jeremy M. Guthrie wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On Tuesday 14 October 2003 02:16 pm, Tom Lane wrote: Jeremy M. Guthrie [EMAIL PROTECTED] writes: Is there any way to determine how much of the free space map is currently i= n=20 use?(ie. where and what it is tracking?) I vacuum on a regular basis but I= =20 never hold in terms of disk space usage. Not in 7.3 AFAIR. In 7.4 a full-database VACUUM VERBOSE will end with the info you want: regression=# vacuum verbose; ... much cruft ... INFO: free space map: 11 relations, 144 pages stored; 272 total pages needed DETAIL: Allocated FSM size: 1000 relations + 2 pages = 178 kB shared memory. VACUUM regression=# This tells me I'm only using about 1% of the FSM space (272 out of 2 page slots). I jacked up the free space map=20 pages but this doesn't appear to be working. You know you have to restart the postmaster to make those changes take effect, right? Yup. I still see no effect after restart. Given that you knew of no way to determine how much free space map you were using, what is your criteria for it to appear to be working? If it's that space keeps growing, then your probably not vacuuming frequently enough. Robert Treat -- Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] upping checkpoints on production server
On Wed, 2003-09-24 at 17:57, Tom Lane wrote: Robert Treat [EMAIL PROTECTED] writes: In .conf file I have default checkpoints set to 3, but I noticed that in my pg_xlog directory I always seem to have at least 8 log files. Since this is more than the suggested 7, I'm wondering if this means I ought to bump my checkpoint segments up to 4? Hm. What is the typical delta in the mod times of the log files? It sounds like you are in a regime where checkpoints are always triggered by checkpoint_segments and never by checkpoint_timeout, in which case increasing the former might be a good idea. Or decrease the latter, but that could put a drag on performance. # ls -lht /var/lib/pgsql/data/pg_xlog/ total 129M -rw---1 postgres postgres 16M Sep 25 11:12 006E0059 -rw---1 postgres postgres 16M Sep 25 11:12 006E005A -rw---1 postgres postgres 16M Sep 25 11:08 006E0058 -rw---1 postgres postgres 16M Sep 25 11:05 006E005F -rw---1 postgres postgres 16M Sep 25 11:02 006E005E -rw---1 postgres postgres 16M Sep 25 10:59 006E005D -rw---1 postgres postgres 16M Sep 25 10:55 006E005B -rw---1 postgres postgres 16M Sep 25 10:51 006E005C #ls -lht /var/lib/pgsql/data/pg_xlog/ total 129M -rw---1 postgres postgres 16M Sep 25 10:52 006E0054 -rw---1 postgres postgres 16M Sep 25 10:51 006E0053 -rw---1 postgres postgres 16M Sep 25 10:49 006E0052 -rw---1 postgres postgres 16M Sep 25 10:45 006E0059 -rw---1 postgres postgres 16M Sep 25 10:40 006E0057 -rw---1 postgres postgres 16M Sep 25 10:37 006E0058 -rw---1 postgres postgres 16M Sep 25 10:33 006E0056 -rw---1 postgres postgres 16M Sep 25 10:29 006E0055 from the 7.4 docs: Checkpoints are fairly expensive because they force all dirty kernel buffers to disk using the operating system sync() call. Busy servers may fill checkpoint segment files too quickly, causing excessive checkpointing. it goes on to mention checkpoint_warning, which I don't have in 7.3, but I think this is a case where I'd likely see those warnings. The server in question has a fairly high write/read ratio and is fairly busy (over 100 tps iirc). since more often than not I don't make it to 5 minutes, seems like upping checkpoint segments is the way to go, right? Robert Treat -- Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
[PERFORM] upping checkpoints on production server
All this talk of checkpoints got me wondering if I have them set at an optimum level on my production servers. I noticed the following in the docs: There will be at least one 16 MB segment file, and will normally not be more than 2 * checkpoint_segments + 1 files. You can use this to estimate space requirements for WAL. Ordinarily, when old log segment files are no longer needed, they are recycled (renamed to become the next segments in the numbered sequence). If, due to a short-term peak of log output rate, there are more than 2 * checkpoint_segments + 1 segment files, the unneeded segment files will be deleted instead of recycled until the system gets back under this limit. In .conf file I have default checkpoints set to 3, but I noticed that in my pg_xlog directory I always seem to have at least 8 log files. Since this is more than the suggested 7, I'm wondering if this means I ought to bump my checkpoint segments up to 4? I don't really want to bump it up unnecessarily as quick recover time is important on this box, however if i would get an overall performance boost it seems like it would be worth it, and given that I seem to be using more than the default number anyways... I've always treated wal logs as self maintaining, am I over analyzing this? Another thought popped into my head, is it just coincidence that I always seem to have 8 files and that wal_buffers defaults to 8? Seems like it's not but I love a good conspiracy theory. Robert Treat -- Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] restore time: sort_mem vs. checkpoing_segments
On Mon, 2003-09-15 at 15:15, Vivek Khera wrote: And the winner is... checkpoint_segments. Restore of a significanly big database (~19.8GB restored) shows nearly no time difference depending on sort_mem when checkpoint_segments is large. There are quite a number of tables and indexes. The restore was done from a pg_dump -Fc dump of one database. All tests with 16KB page size, 30k shared buffers, sort_mem=8192, PG 7.4b2 on FreeBSD 4.8. hmm... i wonder what would happen if you pushed your sort_mem higher... on some of our development boxes and upgrade scripts, i push the sort_mem to 102400 and sometimes even higher depending on the box. this really speeds up my restores quit a bit (and is generally safe as i make sure there isn't any other activity going on at the time) another thing i like to do is turn of fsync, as if the system crashes in the middle of reload i'm pretty sure i'd be starting all over anyway... Robert Treat -- Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] Tests
On Thu, 2003-08-21 at 14:16, Bill Moran wrote: What test are interesting? Plese give us tips and ideas. The guy has time for other test. It's a shame you didn't test ufs+softupdates. I'd be curious to see how it stacked up against the others. Shame? I smell here a harcore BSD fighter :) Well, I suppose ... Well I'm not a hardcore bsd fighter and I'd like to see how it stacks up as well. UFS+softupdates is supposed to be a very safe combination, if it performs well enough I could see a recommendation for it for those who are willing to look beyond linux. Robert Treat -- Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM] Tests
On Fri, 2003-08-22 at 16:54, Tomka Gergely wrote: 2003-08-22 ragyogó napján Robert Treat ezt üzente: On Thu, 2003-08-21 at 14:16, Bill Moran wrote: What test are interesting? Plese give us tips and ideas. The guy has time for other test. It's a shame you didn't test ufs+softupdates. I'd be curious to see how it stacked up against the others. Shame? I smell here a harcore BSD fighter :) Well, I suppose ... Well I'm not a hardcore bsd fighter and I'd like to see how it stacks up as well. UFS+softupdates is supposed to be a very safe combination, if it performs well enough I could see a recommendation for it for those who are willing to look beyond linux. The guy who do the test have only a few weeks *bsd-experience, and i dont have bsd experience at all. The guy now planning tests on BSD, but he need some time to build up a good relationship with the *BSD. Another thought would be linux w/ xfs Also, can you post more complete hardware/ os info? Oh, and vs 7.4beta1 would be great too. :-) Robert Treat -- Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] Version 7 question
On Tue, 2003-07-01 at 08:10, Hilary Forbes wrote: I'm just trying to improve performance on version 7 before doing some tests and hopefully upgrading to 7.3. At the moment we have B=64 (no of shared buffers) N=32 (no of connections) in postmaster.opt which I take it is the equivalent of the new postgresql.conf file. From all that is being written about later versions I suspect that this is far too low. Would I be fairly safe in making the no of shared buffers larger? yes, I'd say start with about 25% of RAM, then adjust from there. If 25% takes you over your SHMMAX then start at your SHMMAX. Also is there an equivalent of effective_cache_size that I can set for version 7? If by 7 your mean 7.0.x then I don't believe so, been awhile though, I could be wrong. IMHO no amount of tuning you can do in 7.0 would be as effective as an upgrade, after setting your shared buffers up, I'd put your efforts into upgrading. (Note Beta test for 7.4 starts in 2 weeks) Robert Treat -- Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL ---(end of broadcast)--- TIP 8: explain analyze is your friend