[PERFORM] Function with table%ROWTYPE globbing
Postgresql 8.0.4 using plpgsql The basic function is set up as: CREATE FUNCTION add_data(t_row mytable) RETURNS VOID AS $func$ DECLARE newtable text; thesql text; BEGIN INSERT INTO newtable thename from mytable where lookup.id = t_row.id; thesql := 'INSERT INTO ' || newtable || VALUES (' || t_row.* ')'; EXECUTE thesql; RETURN; END; $func$ LANGUAGE plpgsql VOLATILE; SELECT add_data(t.*) FROM mytable t where ERROR: column "*" not found in data type mytable Now I have tried to drop the * but then there is no concatenation function to join text to a table%ROWTYPE. So my question is how can I make this dynamic insert statement without listing out every t_row.colname? Or, alternatively, is there a better way to parse out each row of a table into subtables based on a column value? Sven ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] effective cache size on FreeBSD (WAS: Performance on SUSE w/
On Tue, 2005-10-11 at 16:54 +0200, Claus Guttesen wrote: > > > I have a postgresql 7.4.8-server with 4 GB ram. > > > #effective_cache_size = 1000# typically 8KB each > > > > > > This is computed by sysctl -n vfs.hibufspace / 8192 (on FreeBSD). So I > > > changed it to: > > > > > > effective_cache_size = 27462# typically 8KB each > > > > Apparently this formula is no longer relevant on the FreeBSD systems as > > it can cache up to almost all the available RAM. With 4GB of RAM, one > > could specify most of the RAM as being available for caching, assuming > > that nothing but PostgreSQL runs on the server -- certainly 1/2 the RAM > > would be a reasonable value to tell the planner. > > > > (This was verified by using dd: > > dd if=/dev/zero of=/usr/local/pgsql/iotest bs=128k count=16384 to create > > a 2G file then > > dd if=/usr/local/pgsql/iotest of=/dev/null > > > > If you run systat -vmstat 2 you will see 0% diskaccess during the read > > of the 2G file indicating that it has, in fact, been cached) > > Thank you for your reply. Does this apply to FreeBSD 5.4 or 6.0 on > amd64 (or both)? > Not sure about 6.0 (but I don't know why it would change) but definitely on 5.4 amd64 (and I would imagine i386 as well). Sven ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Performance on SUSE w/ reiserfs
On Tue, 2005-10-11 at 09:41 +0200, Claus Guttesen wrote: > I have a postgresql 7.4.8-server with 4 GB ram. > > > #effective_cache_size = 1000# typically 8KB each > > This is computed by sysctl -n vfs.hibufspace / 8192 (on FreeBSD). So I > changed it to: > > effective_cache_size = 27462# typically 8KB each Apparently this formula is no longer relevant on the FreeBSD systems as it can cache up to almost all the available RAM. With 4GB of RAM, one could specify most of the RAM as being available for caching, assuming that nothing but PostgreSQL runs on the server -- certainly 1/2 the RAM would be a reasonable value to tell the planner. (This was verified by using dd: dd if=/dev/zero of=/usr/local/pgsql/iotest bs=128k count=16384 to create a 2G file then dd if=/usr/local/pgsql/iotest of=/dev/null If you run systat -vmstat 2 you will see 0% diskaccess during the read of the 2G file indicating that it has, in fact, been cached) Sven ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] faster INSERT with possible pre-existing row?
On Tue, 2005-07-26 at 10:50 -0600, Dan Harris wrote: > I am working on a process that will be inserting tens of million rows > and need this to be as quick as possible. > > The catch is that for each row I could potentially insert, I need to > look and see if the relationship is already there to prevent > multiple entries. Currently I am doing a SELECT before doing the > INSERT, but I recognize the speed penalty in doing to operations. I > wonder if there is some way I can say "insert this record, only if it > doesn't exist already". To see if it exists, I would need to compare > 3 fields instead of just enforcing a primary key. > > Even if this could be a small increase per record, even a few percent > faster compounded over the whole load could be a significant reduction. > > Thanks for any ideas you might have. > Perhaps a trigger: CREATE FUNCTION verify_unique() RETURNS TRIGGER AS $func$ BEGIN PERFORM a,b,c FROM table1 WHERE a = NEW.a and b = NEW.b and c = NEW.c; IF FOUND THEN RETURN NULL; END IF; RETURN NEW; END; $func$ LANGUAGE plpgsql STABLE; CREATE TRIGGER verify_unique BEFORE INSERT ON table1 FOR EACH ROW EXECUTE PROCEDURE verify_unique(); Triggers are fired on COPY commands and if table1 is able to be cached and you have an index on table1(a,b,c) the results should be fairly decent. I would be interested in seeing the difference in timing between this approach and the temp table approach. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Looking for tips
On Tue, 2005-07-19 at 17:04 -0400, Oliver Crosby wrote: > > since triggers work with COPY, you could probably write a trigger that > > looks for this condition and does the ID processsing you need; you could > > thereby enjoy the enormous speed gain resulting from COPY and maintain > > your data continuity. > > So... (bear with me here.. trying to make sense of this).. > With triggers there's a way I can do the parsing I need to on a log > file and react to completed events in non-sequential order (you can > ignore that part.. it's just how we piece together different related > events) and then have perl/DBD::Pg invoke a copy command (which, from > what I can tell, has to operate on a file...) and the copy command can > feed the ID I need back to perl so I can work with it... > If that doesn't hurt my brain, then I'm at least kinda confused... > Anyway. Heading home now. I'll think about this more tonight/tomorrow. > Well without knowing the specifics of what you are actually trying to accomplish I cannot say yes or no to your question. I am not sure from where this data is coming that you are inserting into the db. However, if the scenario is this: a) attempt to insert a row b) if row exists already, grab the ID and do other db selects/inserts/deletes based on that ID, then there is no need to feed this information back to the perlscript. Is your perlscript parsing a file and then using the parsed information to insert rows? If so, how is the ID that is returned used? Can you have the trigger use the ID that may be returned to perform whatever it is that your perlscript is trying to accomplish with that ID? It's all kind of vague so my answers may or may not help, but based on the [lack of] specifics you have provided, I fear that is the best suggestion that I can offer at this point. Sven ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Looking for tips
On Tue, 2005-07-19 at 16:28 -0400, Oliver Crosby wrote: > > If it is possible try: > > 1) wrapping many inserts into one transaction > > (BEGIN;INSERT;INSERT;...INSERT;COMMIT;). As PostgreSQL will need to > > handle less transactions per second (each your insert is a transaction), it > > may work faster. > > Aye, that's what I have it doing right now. The transactions do save a > HUGE chunk of time. (Cuts it down by about 40%). > > > 2) If you can do 1, you could go further and use a COPY command which is > > the fastest way to bulk-load a database. > > I don't think I can use COPY in my case because I need to do > processing on a per-line basis, and I need to check if the item I want > to insert is already there, and if it is, I need to get it's ID so I > can use that for further processing. > since triggers work with COPY, you could probably write a trigger that looks for this condition and does the ID processsing you need; you could thereby enjoy the enormous speed gain resulting from COPY and maintain your data continuity. Sven ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Adaptec/LSI/?? RAID
Stacy White presumably uttered the following on 06/01/05 23:42: We're in the process of buying another Opteron server to run Postgres, and based on the suggestions in this list I've asked our IT director to get an LSI MegaRaid controller rather than one of the Adaptecs. But when we tried to place our order, our vendor (Penguin Computing) advised us: "we find LSI does not work well with 4GB of RAM. Our engineering find that LSI card could cause system crashes. One of our customer ... has found that Adaptec cards works well on PostGres SQL -- they're using it as a preforce server with xfs and post-gress." Any comments? Suggestions for other RAID controllers? ---(end of broadcast)--- We use the LSI MegaRaid 320-2x with the battery-backed cache on a dual opteron system that uses 8G of RAM. OS is FreeBSD amd64 (5.4) and runs without hesitation. Database currently over 100GB and it performs admirably. So chalk one anecdotal item towards the LSI column. To be fair I have not tried an Adaptec card with this setup so I cannot comment positively or negatively on that card. As a side note, we did have issues with this setup with Linux (2.6 kernel - 64bit) and XFS file system (we generally use FreeBSD but I wanted to try other 64bit OSes before committing). Whether the linux issues were due to the LSI, memory, Tyan mobo, or something else was never determined -- FreeBSD ran it and did so without flinching so our choice was easy. HTH Sven ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] PostgreSQL on Solaris 8 and ufs
On Tue, 2005-03-22 at 14:44 -0600, Brandon Metcalf wrote: > We've recently moved our pgsql installation and DBs to a Solaris 8 > machine with striped and mirrored ufs filesystem that houses the DB > data. We are now seeing terrible performance and the bottleneck is no > doubt disk I/O. > > We've tried modifying a tunables related to ufs, but it doesn't seem > to be helping. > > Is there anything we should be looking at that is specifically related > to ufs filesystems on Solaris 8 or possibly something in general that > would improve performance? > > Thanks. > What are you using to create your raid? You say it is "no doubt disk I/O" - does iostat confirm this? A lot of performance issues are related to the size of the stripe you chose for the striped portion of the array, the actual array configuration, etc. I am assuming you have looked at system variables such as autoup and the likes? What tweaks have you done? Also, are your pg_xlog and data directories separated onto separate volumes? Doing so will help immensely. What are you using to measure performance? Sven ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Inheritence versus delete from
On Tue, 2005-03-01 at 09:48 -0600, John Arbash Meinel wrote: > Sven Willenberger wrote: > > >Trying to determine the best overall approach for the following > >scenario: > > > >Each month our primary table accumulates some 30 million rows (which > >could very well hit 60+ million rows per month by year's end). Basically > >there will end up being a lot of historical data with little value > >beyond archival. > > > >The question arises then as the best approach of which I have enumerated > >three: > > > > I just thought of another possibility. You could create each table > month-by-month, and then use a view to combine them, and possibly a rule > to keep things clean. > > So you would do something like: > > I will assume you already have the data in one big table to show the > easiest way to create the small tables. > > create table tblname-2005-01 as select * from orig_tbl where day >= > '2005-01-01' and day < '2005-02-01'; > create table tblname-2005-02 as select * from orig_tbl where day >= > '2005-02-01' and day < '2005-03-01'; > create table tblname-2005-03 as select * from orig_tbl where day >= > '2005-03-01' and day < '2005-04-01'; > -- create appropriate indicies, rules, constraints on these tables > > Then you create a view which includes all of these tables. > > create or replace view tblname as > select * from tblname-2005-01 > union all select * from tblname-2005-02 > union all select * from tblname-2005-03 > ; > > Then create insert and update rules which fixe which table gets the new > data. > > create rule up_tblname as on update to tblname do instead > update tblname-2005-03 set >col1 = NEW.col1, >col2 = NEW.col2, >... > where id = NEW.id; > -- This assumes that you have a unique id on your tables. This is just > whatever your > -- primary key is, so it should be a decent assumption. > > create rule ins_tblname as on insert to tblname do instead > insert into tblname-2005-03 (col1, col2, ...) > values (new.col1, new.col2, ...); > > Now the downside of this method, is that every month you need to create > a new table, and then update the views and the rules. The update rules > are pretty straightforward, though. > > The nice thing is that it keeps your data partitioned, and you don't > ever have a large select/delete step. You probably will want a small one > each month to keep the data exactly aligned by month. You don't really > have to have exact alignments, but as humans, we tend to like that stuff. :) > > Probably this is more overhead than you would like to do. Especially if > you know that you can get away with method 2 (keep 1 big table, and just > remove old rows out of it every month.) > > But this method means that all of your data stays live, but queries with > appropriate restrictions should stay fast. You also have the ability > (with v8.0) to move the individual tables onto separate disks. > > One more time, though, if you can get away with removing old data and > just archiving it, do so. But if you want to keep the data live, there > are a couple of alternatives. > Actually that was the thought behind my using inheritance; when querying the , it basically does a union all; also, I think it would be quicker to insert directly into the child table (simply by modifying my query once a month) rather than the overhead sustained by the rule. Since the children tables are individual tables, all the benefits you cite above still hold. Thanks for the input on this ... will have to try a couple things to see which is most manageable.\ Sven ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
[PERFORM] Inheritence versus delete from
Trying to determine the best overall approach for the following scenario: Each month our primary table accumulates some 30 million rows (which could very well hit 60+ million rows per month by year's end). Basically there will end up being a lot of historical data with little value beyond archival. The question arises then as the best approach of which I have enumerated three: 1) Just allow the records to accumulate and maintain constant vacuuming, etc allowing for the fact that most queries will only be from a recent subset of data and should be mostly cached. 2) Each month: SELECT * INTO 3monthsago_dynamically_named_table FROM bigtable WHERE targetdate < $3monthsago; DELETE FROM bigtable where targetdate < $3monthsago; VACUUM ANALYZE bigtable; pg_dump 3monthsago_dynamically_named_table for archiving; 3) Each month: CREATE newmonth_dynamically_named_table (like mastertable) INHERITS (mastertable); modify the copy.sql script to copy newmonth_dynamically_named_table; pg_dump 3monthsago_dynamically_named_table for archiving; drop table 3monthsago_dynamically_named_table; Any takes on which approach makes most sense from a performance and/or maintenance point of view and are there other options I may have missed? Sven Willenberger ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Using LIMIT changes index used by planner
On Mon, 2004-12-13 at 17:43 -0500, Tom Lane wrote: > Sven Willenberger <[EMAIL PROTECTED]> writes: > > explain analyze select storelocation,order_number from custacct where > > referrer = 1365 and orderdate between '2004-12-07' and '2004-12-07 > > 12:00:00' order by custacctid limit 10; > > > QUERY PLAN > > > --- > > Limit (cost=0.00..43065.76 rows=10 width=43) (actual > > time=1306957.216..1307072.111 rows=10 loops=1) > > -> Index Scan using custacct2_pkey on custacct > > (cost=0.00..92083209.38 rows=21382 width=43) (actual > > time=1306957.205..1307072.017 rows=10 loops=1) > > Filter: ((referrer = 1365) AND (orderdate >= '2004-12-07 > > 00:00:00'::timestamp without time zone) AND (orderdate <= '2004-12-07 > > 12:00:00'::timestamp without time zone)) > > Total runtime: 1307072.231 ms > > (4 rows) > > I think this is the well-known issue of lack of cross-column correlation > statistics. The planner is well aware that this indexscan will be > horridly expensive if run to completion --- > > There isn't any near-term fix in the wind for this, since storing > cross-column statistics is an expensive proposition that we haven't > decided how to handle. Your workaround with separating the ORDER BY > from the LIMIT is a good one. > You are correct in that there is a high degree of correlation between the custacctid (which is a serial key) and the orderdate as the orders generally get entered in the order that they arrive. I will go with the workaround subselect query plan then. On a related note, is there a way (other than set enable_seqscan=off) to give a hint to the planner that it is cheaper to use and index scan versus seq scan? Using the "workaround" query on any time period greater than 12 hours results in the planner using a seq scan. Disabling the seq scan and running the query on a full day period for example shows: explain analyze select foo.storelocaion, foo.order_number from (select storelocation,order_number from custacct where referrer = 1365 and ordertdate between '2004-12-09' and '2004-12-10' order by custacctid) as foo limit 10 offset 100; QUERY PLAN --- Limit (cost=2661326.22..2661326.35 rows=10 width=100) (actual time=28446.605..28446.796 rows=10 loops=1) -> Subquery Scan foo (cost=2661324.97..2661866.19 rows=43297 width=100) (actual time=28444.916..28446.298 rows=110 loops=1) -> Sort (cost=2661324.97..2661433.22 rows=43297 width=41) (actual time=28444.895..28445.334 rows=110 loops=1) Sort Key: custacctid -> Index Scan using orderdate_idx on custacct (cost=0.00..2657990.68 rows=43297 width=41) (actual time=4.432..28145.212 rows=44333 loops=1) Index Cond: ((orderdate >= '2004-12-09 00:00:00'::timestamp without time zone) AND (orderdate <= '2004-12-10 00:00:00'::timestamp without time zone)) Filter: (referrer = 1365) Total runtime: 28456.893 ms (8 rows) If I interpret the above correctly, the planner guestimates a cost of 2661326 but the actual cost is much less (assuming time is equivalent to cost). Would the set statistics command be of any benefit here in "training" the planner? Sven ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PERFORM] Using LIMIT changes index used by planner
Andrew McMillan wrote: On Mon, 2004-12-13 at 01:13 -0500, Sven Willenberger wrote: I have a question regarding a serious performance hit taken when using a LIMIT clause. I am using version 7.4.6 on FreeBSD 4.10-STABLE with 2GB of memory. The table in question contains some 25 million rows with a bigserial primary key, orderdate index and a referrer index. The 2 select statements are as follow: It's an interesting question, but to be able to get answers from this list you will need to provide "EXPLAIN ANALYZE ..." rather than just "EXPLAIN ...". A) Query without limit clause: explain analyze select storelocation,order_number from custacct where referrer = 1365 and orderdate between '2004-12-07' and '2004-12-07 12:00:00' order by custacctid; QUERY PLAN --- Sort (cost=1226485.32..1226538.78 rows=21382 width=43) (actual time=30340.322..30426.274 rows=21432 loops=1) Sort Key: custacctid -> Index Scan using orderdate_idx on custacct (cost=0.00..1224947.52 rows=21382 width=43) (actual time=159.218..30196.686 rows=21432 loops=1) Index Cond: ((orderdate >= '2004-12-07 00:00:00'::timestamp without time zone) AND (orderdate <= '2004-12-07 12:00:00'::timestamp without time zone)) Filter: (referrer = 1365) Total runtime: 30529.151 ms (6 rows) A2) Same query run again, to see effect of caching: explain analyze select storelocation,order_number from custacct where referrer = 1365 and orderdate between '2004-12-07' and '2004-12-07 12:00:00' order by custacctid; QUERY PLAN --- Sort (cost=1226485.32..1226538.78 rows=21382 width=43) (actual time=1402.410..1488.395 rows=21432 loops=1) Sort Key: custacctid -> Index Scan using orderdate_idx on custacct (cost=0.00..1224947.52 rows=21382 width=43) (actual time=0.736..1259.964 rows=21432 loops=1) Index Cond: ((orderdate >= '2004-12-07 00:00:00'::timestamp without time zone) AND (orderdate <= '2004-12-07 12:00:00'::timestamp without time zone)) Filter: (referrer = 1365) Total runtime: 1590.675 ms (6 rows) *** B) Query run with LIMIT explain analyze select storelocation,order_number from custacct where referrer = 1365 and orderdate between '2004-12-07' and '2004-12-07 12:00:00' order by custacctid limit 10; QUERY PLAN --- Limit (cost=0.00..43065.76 rows=10 width=43) (actual time=1306957.216..1307072.111 rows=10 loops=1) -> Index Scan using custacct2_pkey on custacct (cost=0.00..92083209.38 rows=21382 width=43) (actual time=1306957.205..1307072.017 rows=10 loops=1) Filter: ((referrer = 1365) AND (orderdate >= '2004-12-07 00:00:00'::timestamp without time zone) AND (orderdate <= '2004-12-07 12:00:00'::timestamp without time zone)) Total runtime: 1307072.231 ms (4 rows) C) Query using the subselect variation explain analyze select foo.storelocation, foo.order_number from (select storelocation,order_number from custacct where referrer = 1365 and orderdate between '2004-12-07' and '2004-12-07 12:00:00' order by custacctid) as foo limit 10; QUERY PLAN --- Limit (cost=1226485.32..1226485.45 rows=10 width=100) (actual time=1413.829..1414.024 rows=10 loops=1) -> Subquery Scan foo (cost=1226485.32..1226752.60 rows=21382 width=100) (actual time=1413.818..1413.933 rows=10 loops=1) -> Sort (cost=1226485.32..1226538.78 rows=21382 width=43) (actual time=1413.798..1413.834 rows=10 loops=1) Sort Key: custacctid -> Index Scan using orderdate_idx on custacct (cost=0.00..1224947.52 rows=21382 width=43) (actual time=0.740..1272.380 rows=21432 loops=1) Index Cond: ((orderdate >= '2004-12-07 00:00:00'::timestamp without time zone) AND (orderdate <= '2004-12-07 12:00:00'::timestamp without time zone)) Filter: (referrer = 1365) Total runtime: 1418.964 ms (8 rows) Thanks, Sven ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[PERFORM] Using LIMIT changes index used by planner
I have a question regarding a serious performance hit taken when using a LIMIT clause. I am using version 7.4.6 on FreeBSD 4.10-STABLE with 2GB of memory. The table in question contains some 25 million rows with a bigserial primary key, orderdate index and a referrer index. The 2 select statements are as follow: A) select storelocation,order_number from custacct where referrer = 1365 and orderdate between '2004-12-07' and '2004-12-07 12:00:00' order by custacctid; B) select storelocation,order_number from custacct where referrer = 1365 and orderdate between '2004-12-07' and '2004-12-07 12:00:00' order by custacctid limit 10; So the only difference is the use of the Limit, which, in theory, should be quicker after custacctid is ordered. Now the analyze results: A) explain select storelocation,order_number from custacct where referrer = 1365 and orderdate between '2004-12-07' and '2004-12-07 12:00:00' order by custacctid; QUERY PLAN --- Sort (cost=904420.55..904468.11 rows=19025 width=44) Sort Key: custacctid -> Index Scan using orderdate_idx on custacct (cost=0.00..903068.29 rows=19025 width=44) Index Cond: ((orderdate >= '2004-12-07 00:00:00'::timestamp without time zone) AND (orderdate <= '2004-12-07 12:00:00'::timestamp without time zone)) Filter: (referrer = 1365) (5 rows) B) explain select storelocation,order_number from custacct where referrer = 1365 and orderdate between '2004-12-07' and '2004-12-07 12:00:00' order by custacctid limit 10; QUERY PLAN --- Limit (cost=0.00..33796.50 rows=10 width=44) -> Index Scan using custacct2_pkey on custacct (cost=0.00..64297840.86 rows=19025 width=44) Filter: ((referrer = 1365) AND (orderdate >= '2004-12-07 00:00:00'::timestamp without time zone) AND (orderdate <= '2004-12-07 12:00:00'::timestamp without time zone)) (3 rows) *** Notice the huge cost difference in the two plans: 904468 in the one without LIMIT versus 64297840.86 for the index scan on custacct index. Why would the planner switch from using the orderdate index to the custacct index (which is a BIGSERIAL, btw)? I can change that behavior (and speed up the resultant query) by using the following subquery: explain select foo.storelocation, foo.order_number from (select storelocation,order_number from custacct where referrer = 1365 and orderdate between '2004-12-07' and '2004-12-07 12:00:00' order by custacctid) as foo limit 10; QUERY PLAN --- Limit (cost=904420.55..904420.67 rows=10 width=100) -> Subquery Scan foo (cost=904420.55..904658.36 rows=19025 width=100) -> Sort (cost=904420.55..904468.11 rows=19025 width=44) Sort Key: custacctid -> Index Scan using orderdate_idx on custacct (cost=0.00..903068.29 rows=19025 width=44) Index Cond: ((orderdate >= '2004-12-07 00:00:00'::timestamp without time zone) AND (orderdate <= '2004-12-07 12:00:00'::timestamp without time zone)) Filter: (referrer = 1365) (7 rows) As a side note, when running query A, the query takes 1772.523 ms, when running the subselect version to get the limit, it takes 1415.615 ms. Running option B (with the other index being scanned) takes several minutes (close to 10 minutes!). What am I missing about how the planner views the LIMIT statement? Sven ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
[PERFORM] Overhead of dynamic query in trigger
(Originally asked in [General], realized that it would probably be better asked in [Perform]: I am curious as to how much overhead building a dynamic query in a trigger adds to the process. The example: Have a list of subcontractors, each of which gets unique pricing. There is a total of roughly 100,000 items available and some 100 subcontractors. The 2 design choices would be 100 tables (one for each sub) at 100,000 rows or 1 table with 10,000,000 rows. Choice 1: table has item number (indexed) and price Choice 2: table has subcontractor id, item number, and price; index on (subcontractorid, item number). Table of orders would have a trigger to insert line item cost: --- Trigger Choice 1: Select into thetable lookupprice from subcontractors where subcontractorid = NEW.subcontractorid; thequery := ''Select price from '' || thetable.lookupprice || '' where itemnumber = '' || NEW.itemnumber; FOR therow IN EXECUTE thequery LOOP NEW.itemcost := therow.price; END LOOP; RETURN NEW; --- Trigger Choice 2: Select into thetable lookupprice from subcontractors where subcontractorid = NEW.subcontractorid; Select into therow price from mastertable where subcontractorid = NEW.subcontractorid and itemnumber = NEW.itemnumber; NEW.itemcost := therow.price; RETURN NEW; --- Doing a select from the command line, the mastertable method (with id and partno index) is faster than looking up a single item in a named table (with partno index). At what point would Trigger Choice 2 fall behind performance with Trigger Choice 1 (if ever)? Is there a way to analyze the performance of dynamic queries? If I had only 10 subcontractors or if I had 1000 subcontractors, at what point is the overhead of building/executing a dynamic query negated by the amount of time to look up both the subid and part number in one massive table? Thanks, Sven ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: Off-list Re: [PERFORM] Alternatives to Dell?
Josh Berkus wrote: Jeff, I'm curious about the problem's you're seeing with Dell servers since we're about to buy some 750s, 2850s and 1850s. The problems I've been dealing with have been on the *650s. They're the ones you name. FYI ... the 750s, 1850s and 2850s use Intel chipsets (E7520 on 1850s and 2850s, 7210 on 750s), Intel NICs, and come only with LSI Logic RAID controllers. It looks like Dell has dropped the Broadcom/ServerWorks and Adaptec junk. I don't know if Vivek is on this list; I think he just had a critical failure with one of the new Dells with LSI. On this note about "Adaptec junk", I have a question regarding hardware as well. We tend to build a lot of servers in house (Supermicro based with the Zero-channel raid). Does anyone have any anecdotal or empirical data on using a ZCR card versus a full-blown RAID controller (adaptec or other)?? I am trying to build a medium-duty database server with 8G RAM, 4x144GB U320 Scsi RAID 10, FreeBSD (5.3-stable or 4-stable) and was wondering about performance differences between ZCR and Adaptec versus other manufacturers' Full-RAID cards. (PCI-E) Sven ---(end of broadcast)--- TIP 8: explain analyze is your friend