Re: [PERFORM] App very unresponsive while performing simple update
You should realize this will produce a lot of garbage records and mean you'll have to be running vacuum very frequently. You might consider instead of updating the main table inserting into a separate clickstream table. That trades off not getting instantaneous live totals with isolating the maintenance headache in a single place. That table will grow large but you can prune it at your leisure without impacting query performance on your main tables. We actually already have a table for this purpose. product_click_history Actually, you can still get instant results, you just have to hit two tables to do it. Well, not really for our situation. We use the click_count on product to sort our product listings by popularity. Joining with our product_click_history to get live counts would be very slow. Some categories have many tens of thousands of products. Any joins outside our category_product table tend to be very slow. We'll probably have to write a process to update the click_count from querying our product_click_history table. Brendan Duddridge | CTO | 403-277-5591 x24 | [EMAIL PROTECTED] ClickSpace Interactive Inc. Suite L100, 239 - 10th Ave. SE Calgary, AB T2G 0V9 http://www.clickspace.com On May 31, 2006, at 12:23 AM, Jim C. Nasby wrote: On Sun, May 28, 2006 at 07:20:59PM -0400, Greg Stark wrote: Brendan Duddridge <[EMAIL PROTECTED]> writes: We do have foreign keys on other tables that reference the product table. Also, there will be updates going on at the same time as this update. When anyone clicks on a product details link, we issue an update statement to increment the click_count on the product. e.g. update product set click_count = click_count + 1; You should realize this will produce a lot of garbage records and mean you'll have to be running vacuum very frequently. You might consider instead of updating the main table inserting into a separate clickstream table. That trades off not getting instantaneous live totals with isolating the maintenance headache in a single place. That table will grow large but you can prune it at your leisure without impacting query performance on your main tables. Actually, you can still get instant results, you just have to hit two tables to do it. More likely you were blocking on some lock. Until that other query holding that lock tries to commit Postgres won't actually detect a deadlock, it'll just sit waiting until the lock becomes available. Wow, are you sure that's how it works? I would think it would be able to detect deadlocks as soon as both processes are waiting on each other's locks. -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] INSERT OU UPDATE WITHOUT SELECT?
On Tue, May 30, 2006 at 07:05:08PM -0400, D'Arcy J.M. Cain wrote: > On Tue, 30 May 2006 17:54:00 -0500 > "Dave Dutcher" <[EMAIL PROTECTED]> wrote: > > What I do when I'm feeling lazy is execute a delete statement and then > > an insert. I only do it when I'm inserting/updating a very small number > > of rows, so I've never worried if its optimal for performance. Besides > > I've heard that an update in postgres is similar in performance to a > > delete/insert. > > Well, they are basically the same operation in PostgreSQL. An update > adds a row to the end and marks the old one dead. A delete/insert > marks the row dead and adds one at the end. There may be some > optimization if the engine does both in one operation. The new tuple will actually go on the same page during an update, if possible. If not, the FSM is consulted. Appending to the end of the table is a last resort. Update is more effecient than delete/insert. First, it's one less statement to parse and plan. Second, AFAIK insert always goes to the FSM; it has no way to know you're replacing the row(s) you just deleted. -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] App very unresponsive while performing simple update
On Sun, May 28, 2006 at 07:20:59PM -0400, Greg Stark wrote: > Brendan Duddridge <[EMAIL PROTECTED]> writes: > > > We do have foreign keys on other tables that reference the product table. > > Also, there will be updates going on at the same time as this update. When > > anyone clicks on a product details link, we issue an update statement to > > increment the click_count on the product. e.g. update product set > > click_count > > = click_count + 1; > > You should realize this will produce a lot of garbage records and mean you'll > have to be running vacuum very frequently. You might consider instead of > updating the main table inserting into a separate clickstream table. That > trades off not getting instantaneous live totals with isolating the > maintenance headache in a single place. That table will grow large but you can > prune it at your leisure without impacting query performance on your main > tables. Actually, you can still get instant results, you just have to hit two tables to do it. > More likely you were blocking on some lock. Until that other query holding > that lock tries to commit Postgres won't actually detect a deadlock, it'll > just sit waiting until the lock becomes available. Wow, are you sure that's how it works? I would think it would be able to detect deadlocks as soon as both processes are waiting on each other's locks. -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Why the 8.1 plan is worst than 7.4?
"[EMAIL PROTECTED]" <[EMAIL PROTECTED]> writes: > Why the plan is worst in postgres 8.1? (1) you have not actually shown us that the plan is worse. If you are complaining that the planner is wrong, EXPLAIN output (which contains only the planner's estimates) is useless for proving your point. Show EXPLAIN ANALYZE. (2) Have you ANALYZEd these tables recently in either database? The discrepancies in estimated rowcounts suggest that the two planners are working with different statistics. regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Query performance
I'm executing the queries from phpPgAdmin. The above are for explain analyse. I was referring to the pure query execution time. Does anyone have an idea why the OR-query takes so long? Any server-side tuning possibilities? I wouldn't like to change the code of ldap's back-sql... If you're using phpPgAdmin's timings, they could be more off than the real explain analyze timings. Make sure you're using the figure given by explain analyze itself. Chris ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Bulk loading/merging
Your best bet is to do this as a single, bulk operation if possible. That way you can simply do an UPDATE ... WHERE EXISTS followed by an INSERT ... SELECT ... WHERE NOT EXISTS. On Fri, May 26, 2006 at 02:48:20PM -0400, Worky Workerson wrote: > I've set up something similar the 'recommended' way to merge data into > the DB, i.e. > > http://www.postgresql.org/docs/current/static/plpgsql-control-structures.html#PLPGSQL-ERROR-TRAPPING > > however I did it with a trigger on insert, i.e. (not my schema :) ): -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Why the 8.1 plan is worst than 7.4?
What's explain analyze show? On Fri, May 26, 2006 at 09:04:56AM -0300, [EMAIL PROTECTED] wrote: > Hi, > > I have 2 servers, one of them has a 7.4 postgres and the other has a 8.1 > > I have this query: > > select fagrempr,fagrdocr,fagrserr,fagrparr > from arqcfat > left join arqfagr on fagrorig = 'CFAT' and fagrdocu = cfatdocu and fagrempe > = cfatempe and fagrseri = cfatseri > where cfatdata between '2006-01-01' and '2006-01-31' > and cfattipo = 'VD' > and cfatstat <> 'C' > and fagrform = 'CT' > and fagrtipr = 'REC' > group by fagrempr,fagrdocr,fagrserr,fagrparr > > The 8.1 give me this plan: > > HashAggregate (cost=59.07..59.08 rows=1 width=20) >-> Nested Loop (cost=0.00..59.06 rows=1 width=20) > -> Index Scan using arqfagr_arqfa3_key on arqfagr > (cost=0.00..53.01 rows=1 width=36) >Index Cond: ((fagrorig = 'CFAT'::bpchar) AND (fagrform = > 'CT'::bpchar)) >Filter: (fagrtipr = 'REC'::bpchar) > -> Index Scan using arqcfat_arqcfat1_key on arqcfat > (cost=0.00..6.03 rows=1 width=16) >Index Cond: (("outer".fagrempe = arqcfat.cfatempe) AND > ("outer".fagrdocu = arqcfat.cfatdocu) AND ("outer".fagrseri = > arqcfat.cfatseri)) >Filter: ((cfatdata >= '01-01-2006'::date) AND (cfatdata <= > '31-01-2006'::date) AND (cfattipo = 'VD'::bpchar) AND (cfatstat <> > 'C'::bpchar)) > > The 7.4 give me this plan: > > HashAggregate (cost=2163.93..2163.93 rows=1 width=19) >-> Nested Loop (cost=0.00..2163.92 rows=1 width=19) > -> Index Scan using arqcfat_arqcfat2_key on arqcfat > (cost=0.00..2145.78 rows=3 width=15) >Index Cond: ((cfatdata >= '01-01-2006'::date) AND (cfatdata > <= '31-01-2006'::date)) >Filter: ((cfattipo = 'VD'::bpchar) AND (cfatstat <> > 'C'::bpchar)) > -> Index Scan using arqfagr_arqfa1_key on arqfagr > (cost=0.00..6.03 rows=1 width=34) >Index Cond: ((arqfagr.fagrorig = 'CFAT'::bpchar) AND > (arqfagr.fagrempe = "outer".cfatempe) AND (arqfagr.fagrdocu = > "outer".cfatdocu) AND (arqfagr.fagrseri = "outer".cfatseri)) >Filter: ((fagrform = 'CT'::bpchar) AND (fagrtipr = > 'REC'::bpchar)) > > Why the plan is worst in postgres 8.1? > > I know the best plan is read fisrt the table which has a date index as the > 7.4 did, because in a few days I will have few lines too, so the query will > be faster. > > Is there some thing I have to change in 8.1 to make the plans as the 7.4? > > Thanks , > > Waldomiro C. Neto. > > > > ---(end of broadcast)--- > TIP 3: Have you checked our extensive FAQ? > >http://www.postgresql.org/docs/faq > -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] INSERT OU UPDATE WITHOUT SELECT?
On Tue, 30 May 2006 17:54:00 -0500 "Dave Dutcher" <[EMAIL PROTECTED]> wrote: > What I do when I'm feeling lazy is execute a delete statement and then > an insert. I only do it when I'm inserting/updating a very small number > of rows, so I've never worried if its optimal for performance. Besides > I've heard that an update in postgres is similar in performance to a > delete/insert. Well, they are basically the same operation in PostgreSQL. An update adds a row to the end and marks the old one dead. A delete/insert marks the row dead and adds one at the end. There may be some optimization if the engine does both in one operation. -- D'Arcy J.M. Cain | Democracy is three wolves http://www.druid.net/darcy/| and a sheep voting on +1 416 425 1212 (DoD#0082)(eNTP) | what's for dinner. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] INSERT OU UPDATE WITHOUT SELECT?
What I do when I'm feeling lazy is execute a delete statement and then an insert. I only do it when I'm inserting/updating a very small number of rows, so I've never worried if its optimal for performance. Besides I've heard that an update in postgres is similar in performance to a delete/insert. > -Original Message- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf Of PFC > Sent: Tuesday, May 30, 2006 5:35 PM > To: Jonah H. Harris; Waldomiro > Cc: pgsql-performance@postgresql.org > Subject: Re: [PERFORM] INSERT OU UPDATE WITHOUT SELECT? > > > > PostgreSQL does not support MERGE at the moment, sorry. > > Issue an UPDATE, and watch the rowcount ; if the > rowcount is 0, issue an > INSERT. > Be prepared to retry if another transaction has > inserted the row > meanwhile, though. > > MERGE would be really useful. > > > ---(end of > broadcast)--- > TIP 5: don't forget to increase your free space map settings > ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] INSERT OU UPDATE WITHOUT SELECT?
PFC wrote: > >PostgreSQL does not support MERGE at the moment, sorry. > > Issue an UPDATE, and watch the rowcount ; if the rowcount is 0, > issue an INSERT. > Be prepared to retry if another transaction has inserted the row > meanwhile, though. Oh, you mean, like the example that's in the documentation? http://www.postgresql.org/docs/8.1/static/plpgsql-control-structures.html#PLPGSQL-ERROR-TRAPPING Example 36-1 > MERGE would be really useful. It has been discussed before -- MERGE is something different. -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Optimizing a huge_table/tiny_table join
SELECT * FROM huge_table h, tiny_table t WHERE UPPER( h.id ) = UPPER( t.id ) What about : SELECT * FROM huge_table h WHERE UPPER(id) IN (SELECT upper(id) FROM tiny_table t) Or, try opening a cursor on your original query and using FETCH. It might result in a different plan. Or lower random_page_cost. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] INSERT OU UPDATE WITHOUT SELECT?
PostgreSQL does not support MERGE at the moment, sorry. Issue an UPDATE, and watch the rowcount ; if the rowcount is 0, issue an INSERT. Be prepared to retry if another transaction has inserted the row meanwhile, though. MERGE would be really useful. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Getting even more insert performance (250m+rows/day)
On May 30, 2006, at 3:59 PM, Daniel J. Luke wrote: I should have gprof numbers on a similarly set up test machine soon ... gprof output is available at http://geeklair.net/~dluke/ postgres_profiles/ (generated from CVS HEAD as of today). Any ideas are welcome. Thanks! -- Daniel J. Luke ++ | * [EMAIL PROTECTED] * | | *-- http://www.geeklair.net -* | ++ | Opinions expressed are mine and do not necessarily | | reflect the opinions of my employer. | ++ PGP.sig Description: This is a digitally signed message part
Re: [PERFORM] Split select completes, single select doesn't and
On þri, 2006-05-30 at 10:26 +1000, Anthony Ransley wrote: > Can any one explain why the following query > > select f(q) from > ( > select * from times > where '2006-03-01 00:00:00'<=q and q<'2006-03-08 00:00:00' > order by q > ) v; > > never completes, but splitting up the time span into single days does work. > > select f(q) from > ( > select * from times > where '2006-03-01 00:00:00'<=q and q<'2006-03-02 00:00:00' > order by q > ) v; first question: is f() relevant to your problem? I mean do you see the same effect with: select q from ( select * from times where '2006-03-01 00:00:00'<=q and q<'2006-03-08 00:00:00' order by q ) v; or even: select q from times where '2006-03-01 00:00:00'<=q and q<'2006-03-08 00:00:00' order by q if f() is needed to make this happen show us f() if f() is not relevant, show us the simplest cases where you see this. show us EXPLAIN on the query that does not finish, show us EXPLAIN ANALYZE on the queries that do. second question: what indexes exist on the table "times" ? another question: how many rows in the table ? next question: is the table newly ANALYZED? finally: what version of postgresql are you using? whithout more info , it is difficult to guess what your problem is, but possibly you need to increase the statistics target of column "q" gnari ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] INSERT OU UPDATE WITHOUT SELECT?
On 5/30/06, Waldomiro <[EMAIL PROTECTED]> wrote: Is there a command to Insert a record If It does not exists and a update if It exists? Sure, it's called MERGE. See http://en.wikipedia.org/wiki/Merge_%28SQL%29 I mean the postgres should test if a record exist before insert and if It exist then the postgres must do an update instead an insert. PostgreSQL does not support MERGE at the moment, sorry. -- Jonah H. Harris, Software Architect | phone: 732.331.1300 EnterpriseDB Corporation| fax: 732.331.1301 33 Wood Ave S, 2nd Floor| [EMAIL PROTECTED] Iselin, New Jersey 08830| http://www.enterprisedb.com/ ---(end of broadcast)--- TIP 6: explain analyze is your friend
[PERFORM] INSERT OU UPDATE WITHOUT SELECT?
Hi, Is there a command to Insert a record If It does not exists and a update if It exists? I do not want to do a select before a insert or update. I mean the postgres should test if a record exist before insert and if It exist then the postgres must do an update instead an insert. Thanks, WMiro.
[PERFORM] Split select completes, single select doesn't and becomes IO bound!
Can any one explain why the following query select f(q) from ( select * from times where '2006-03-01 00:00:00'<=q and q<'2006-03-08 00:00:00' order by q ) v; never completes, but splitting up the time span into single days does work. select f(q) from ( select * from times where '2006-03-01 00:00:00'<=q and q<'2006-03-02 00:00:00' order by q ) v; select f(q) from ( select * from times where '2006-03-02 00:00:00'<=q and q<'2006-03-03 00:00:00' order by q ) v; ... select f(q) from ( select * from times where '2006-03-07 00:00:00'<=q and q<'2006-03-08 00:00:00' order by q ) v; The stored procedure f(q) take a timestamp and does a select and a calculation and then an update of a results table. The times table containes only a 100 rows per day. It is also observed that the cpu starts the query with 100% usage and then the slowly swings up and down from 100% to 20% over the first half hour, and then by the following morning the query is still running and the cpu usage is 3-5%. IO bound i'm guessing as the hdd is in constant use at 5 to 15 MB per second usage. In contrast the query that is split up into days has a 100% cpu usage all the way through to its completion, which only takes twenty minutes each. The computer is not being used for anything else, and is a dual core Athlon 4400+ with 4GB of ram. Thanks for any information you can give on this. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Query performance
Antonio Batovanja wrote: (...) 1) the slow query: EXPLAIN ANALYZE SELECT DISTINCT ldap_entries.id, organization.id, text('organization') AS objectClass, ldap_entries.dn AS dn FROM ldap_entries, organization, ldap_entry_objclasses WHERE organization.id=ldap_entries.keyval AND ldap_entries.oc_map_id=1 AND upper(ldap_entries.dn) LIKE '%DC=HUMANOMED,DC=AT' AND 1=1 OR (ldap_entries.id=ldap_entry_objclasses.entry_id AND ldap_entry_objclasses.oc_name='organization'); First, presenting your query in any readable form might be helpful if you want the community to help you. (Hint! Hint!) SELECT DISTINCT ldap_entries.id, organization.id, text('organization') AS objectClass, ldap_entries.dn AS dn FROM ldap_entries, organization, ldap_entry_objclasses WHERE organization.id=ldap_entries.keyval AND ldap_entries.oc_map_id=1 AND upper(ldap_entries.dn) LIKE '%DC=HUMANOMED,DC=AT' AND 1=1 OR (ldap_entries.id=ldap_entry_objclasses.entry_id AND ldap_entry_objclasses.oc_name='organization'); Next, you might want to use aliases to make it more readable. SELECT DISTINCT e.id, o.id, text('organization') AS objectClass, e.dn AS dn FROM ldap_entries AS e, organization AS o, ldap_entry_objclasses AS eo WHERE o.id=e.keyval AND e.oc_map_id=1 AND upper(e.dn) LIKE '%DC=HUMANOMED,DC=AT' AND 1=1 OR (e.id=eo.entry_id AND eo.oc_name='organization'); There are a couple redundant (nonsensical) items, syntax-wise. Let's strip these: SELECT DISTINCT e.id, o.id, text('organization') AS objectClass, e.dn FROM ldap_entries AS e, organization AS o, ldap_entry_objclasses AS eo WHERE o.id=e.keyval AND e.oc_map_id=1 AND e.dn ILIKE '%DC=HUMANOMED,DC=AT' OR e.id=eo.entry_id AND eo.oc_name='organization'; And finally, I suspect the lexical precedence of AND and OR might be the issue here. http://www.postgresql.org/docs/8.1/static/sql-syntax.html#SQL-PRECEDENCE Maybe that is what you really want (just guessing): SELECT DISTINCT e.id, o.id, text('organization') AS objectClass, e.dn FROM ldap_entries e JOIN organization o ON o.id=e.keyval LEFT JOIN ldap_entry_objclasses eo ON eo.entry_id=e.id WHERE e.oc_map_id=1 AND e.dn ILIKE '%DC=HUMANOMED,DC=AT' OR eo.oc_name='organization)'; I didn't take the time to read the rest. My appologies if I guessed wrong. Regards, Erwin ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Query performance
Antonio Batovanja wrote: Laurenz Albe wrote: Antonio Batovanja wrote: I'm having trouble understanding, why a specific query on a small database is taking so long... Before I try to understand the execution plans: Have you run ANALYZE on the tables involved before you ran the query? Hi, Just to be on the safe side, I've run ANALYZE now. Here are the query plans for the two queries: I suspect a misunderstanding here. What Laurenz probably meant is to run analyze on the involved _tables_ so the statistics data is refreshed. If the query planner runs with outdated statistics, queries may perform very poorly. Try vacuum full analyze yourdatabase To fully vacuum your database and analyze all tables. (vacuum full is extra, but can't hurt.) http://www.postgresql.org/docs/8.1/static/sql-vacuum.html http://www.postgresql.org/docs/8.1/static/sql-analyze.html Regards, Erwin ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[PERFORM] Bulk loading/merging
I've set up something similar the 'recommended' way to merge data into the DB, i.e. http://www.postgresql.org/docs/current/static/plpgsql-control-structures.html#PLPGSQL-ERROR-TRAPPING however I did it with a trigger on insert, i.e. (not my schema :) ): CREATE TABLE db (a INT PRIMARY KEY, b TEXT, c INTEGER, d INET); CREATE FUNCTION merge_db() RETURNS TRIGGER AS $$ BEGIN UPDATE db SET b = NEW.data WHERE a = NEW.key AND NOT (c IS DISTINCT FROM NEW.c) AND NOT (d IS DISTINCT FROM NEW.d); IF found THEN RETURN NULL; END IF; RETURN NEW; END; $$ LANGUAGE plpgsql; CREATE TRIGGER merge_db_tr BEFORE INSERT ON db FOR EACH ROW EXECUTE PROCEDURE merge_db(); Is this the best/fastest way to do this sort of thing? I only get about 50 records/second inserts, while without the trigger (inserting unmerged data) I can get more like 1000/second. I'm doing the whole NOT ... IS DISTINCT stuff to handle NULL values that might be in the columns ... I'm only considering two column keys equal if (a,c,d) are all the same (i.e. either the same value or both NULL). I read that there is a race condition with the above method as applied to a normal function ... does this apply to a trigger as well? Optimization Questions: -Can I do better with the trigger function itself? -I realize that I can create indexes on some of the lookup columns ('key' in the above example). This would speed up the location of the update record but slow down the actual update insert, right? Would this be a win? I tested an index on 1 rows, and it beat out the non-indexed by about 7% (3:31 with index, 3:45 without) ... is this all the benefit that I can expect? -Will moving pg_xlog to a different disk help all that much, if the whole DB is currently on a 4 disk RAID10? What about moving the indexes? I've set up my postgresql.conf according to the docs and Josh Berkus' presentation, i.e. (16GB ram, quad Opteron moachine, not all settings are relevant): shared_buffers = 6 temp_buffers = 1 work_mem = 131072 maintenance_work_mem = 524288 effective_cache_size = 12 random_page_cost = 2 wal_buffers = 128 checkpoint_segments = 128 checkpoint_timeout = 3000 max_fsm_pages = 200 max_fsm_relations = 100 -If I break up my dataset into smaller chunks and parallelize it, could I get better total performance, or would I most likely be thrashing the disk? -If I sort the data in the COPY file by key (i.e. a,c,d) before inserting it into the database, will this help out the DB at all? -Its cleaner to just be able to insert everything into the database and let the DB aggregate the records, however I could use some of our extra hardware to do aggregation in perl and then output the already aggregated records to the DB ... this has the advantage of being easily parallelizable but requires a bit of extra work to get right. Do you think that this is the best way to go? Also, as a slight aside, without a trigger, COPY seems to process each record very quickly (using Perl DBI, about 7000 records/second) however there is a long pause once the last record has been delivered. Is this just the backend queuing up the insert commands given by perl, or is there extra processing that needs to be done at the end of the COPY that could be taking a while (10s on 500K record COPY). Thanks! ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Bulk loading/merging
Another little question ... would using any sort of TEMP table help out, i.e. loading the unaggregated data into a TEMP table, aggregating the data via a SELECT INTO another TEMP table, and then finally INSERT ... SELECT into the master, aggregated, triggered table? It seems like this might be a win if A) the TEMP tables fit into memory, and B) the load data aggregates well. Worst case ( i.e. all unique data in the load) seems like it might take much longer, however, since I'm creating 2 new TEMP tables
[PERFORM] Why the 8.1 plan is worst than 7.4?
Hi, I have 2 servers, one of them has a 7.4 postgres and the other has a 8.1 I have this query: select fagrempr,fagrdocr,fagrserr,fagrparr from arqcfat left join arqfagr on fagrorig = 'CFAT' and fagrdocu = cfatdocu and fagrempe = cfatempe and fagrseri = cfatseri where cfatdata between '2006-01-01' and '2006-01-31' and cfattipo = 'VD' and cfatstat <> 'C' and fagrform = 'CT' and fagrtipr = 'REC' group by fagrempr,fagrdocr,fagrserr,fagrparr The 8.1 give me this plan: HashAggregate (cost=59.07..59.08 rows=1 width=20) -> Nested Loop (cost=0.00..59.06 rows=1 width=20) -> Index Scan using arqfagr_arqfa3_key on arqfagr (cost=0.00..53.01 rows=1 width=36) Index Cond: ((fagrorig = 'CFAT'::bpchar) AND (fagrform = 'CT'::bpchar)) Filter: (fagrtipr = 'REC'::bpchar) -> Index Scan using arqcfat_arqcfat1_key on arqcfat (cost=0.00..6.03 rows=1 width=16) Index Cond: (("outer".fagrempe = arqcfat.cfatempe) AND ("outer".fagrdocu = arqcfat.cfatdocu) AND ("outer".fagrseri = arqcfat.cfatseri)) Filter: ((cfatdata >= '01-01-2006'::date) AND (cfatdata <= '31-01-2006'::date) AND (cfattipo = 'VD'::bpchar) AND (cfatstat <> 'C'::bpchar)) The 7.4 give me this plan: HashAggregate (cost=2163.93..2163.93 rows=1 width=19) -> Nested Loop (cost=0.00..2163.92 rows=1 width=19) -> Index Scan using arqcfat_arqcfat2_key on arqcfat (cost=0.00..2145.78 rows=3 width=15) Index Cond: ((cfatdata >= '01-01-2006'::date) AND (cfatdata <= '31-01-2006'::date)) Filter: ((cfattipo = 'VD'::bpchar) AND (cfatstat <> 'C'::bpchar)) -> Index Scan using arqfagr_arqfa1_key on arqfagr (cost=0.00..6.03 rows=1 width=34) Index Cond: ((arqfagr.fagrorig = 'CFAT'::bpchar) AND (arqfagr.fagrempe = "outer".cfatempe) AND (arqfagr.fagrdocu = "outer".cfatdocu) AND (arqfagr.fagrseri = "outer".cfatseri)) Filter: ((fagrform = 'CT'::bpchar) AND (fagrtipr = 'REC'::bpchar)) Why the plan is worst in postgres 8.1? I know the best plan is read fisrt the table which has a date index as the 7.4 did, because in a few days I will have few lines too, so the query will be faster. Is there some thing I have to change in 8.1 to make the plans as the 7.4? Thanks , Waldomiro C. Neto. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Optimizing a huge_table/tiny_table join
On 5/24/06, Tom Lane <[EMAIL PROTECTED]> wrote: <[EMAIL PROTECTED]> writes:> Limit (cost=19676.75..21327.99 rows=6000 width=84)>-> Hash Join (cost=19676.75..1062244.81 rows=3788315 width=84)> Hash Cond: (upper(("outer".id)::text) = upper(("inner".id)::text))> -> Seq Scan on huge_table h (cost= 0.00..51292.43 rows=2525543 width=46)> -> Hash (cost=19676.00..19676.00 rows=300 width=38)>-> Seq Scan on tiny_table t (cost=0.00..19676.00 rows=300 width=38)Um, if huge_table is so much bigger than tiny_table, why are the cost estimates for seqscanning them only about 2.5x different? There'ssomething wacko about your statistics, methinks. You mean there's a bug in explain? I agree that it makes no sense that the costs don't differ as much as one would expect, but you can see right there the numbers of rows for the two tables, which are exactly as I described. At any rate, how would one go about finding an explanation for these strange stats? More bewildering still (and infuriating as hell--because it means that all of my work for yesterday has been wasted) is that I can no longer reproduce the best query plan, even though the tables have not changed at all. (Hence I can't post the explain analyze for the best query plan.) No matter what value I use for LIMIT, the query planner now insists on sequentially scanning huge_table and ignoring the available index. (If I turn off enable_seqscan, I get the second worst query plan I posted yesterday.) Anyway, I take it that there is no way to bypass the optimizer and instruct PostgreSQL exactly how one wants the search performed? Thanks! kj
[PERFORM] Optimizing a huge_table/tiny_table join
[ I had a problem with my mailer when I first sent this. My apologies for any repeats. ] I want to optimize this simple join: SELECT * FROM huge_table h, tiny_table t WHERE UPPER( h.id ) = UPPER( t.id ) huge_table has about 2.5 million records, can be assumed as fixed, and has the following index: CREATE INDEX huge_table_index ON huge_table( UPPER( id ) ); ...while tiny_table changes with each user request, and typically will contain on the order of 100-1000 records. For this analysis, I put 300 records in tiny_table, resulting in 505 records in the join. I tried several approaches. In order of increasing speed of execution: 1. executed as shown above, with enable_seqscan on: about 100 s. 2. executed as shown above, with enable_seqscan off: about 10 s. 3. executed with a LIMIT 6000 clause added to the SELECT statement, and enable_seqscan on: about 5 s. 4. executed with a LIMIT 600 clause added to the SELECT statement, and enable_seqscan on: less than 1 s. Clearly, using LIMIT is the way to go. Unfortunately I *do* want all the records that would have been produced without the LIMIT clause, and I don't have a formula for the limit that will guarantee this. I could use a very large value (e.g. 20x the size of tiny_table, as in approach 3 above) which would make the probability of hitting the limit very small, but unfortunately, the query plan in this case is different from the query plan when the limit is just above the expected number of results (approach 4 above). The query plan for the fastest approach is this: QUERY PLAN - Limit (cost=0.01..2338.75 rows=600 width=84) -> Nested Loop (cost=0.01..14766453.89 rows=3788315 width=84) -> Seq Scan on tiny_table t (cost=0.00..19676.00 rows=300 width=38) -> Index Scan using huge_table_index on huge_table h (cost=0.01..48871.80 rows=12628 width=46) Index Cond: (upper(("outer".id)::text) = upper((h.id)::text)) How can I *force* this query plan even with a higher limit value? I found, by dumb trial and error, that in this case the switch happens at LIMIT 5432, which, FWIW, is about 0.2% of the size of huge_table. Is there a simpler way to determine this limit (hopefully programmatically)? Alternatively, I could compute the value for LIMIT as 2x the number of records in tiny_table, and if the number of records found is *exactly* this number, I would know that (most likely) some records were left out. In this case, I could use the fact that, according to the query plan above, the scan of tiny_table is sequential to infer which records in tiny_table were disregarded when the limit was reached, and then repeat the query with only these left over records in tiny_table. What's your opinion of this strategy? Is there a good way to improve it? Many thanks in advance! kj PS: FWIW, the query plan for the query with LIMIT 6000 is this: QUERY PLAN - Limit (cost=19676.75..21327.99 rows=6000 width=84) -> Hash Join (cost=19676.75..1062244.81 rows=3788315 width=84) Hash Cond: (upper(("outer".id)::text) = upper(("inner".id)::text)) -> Seq Scan on huge_table h (cost=0.00..51292.43 rows=2525543 width=46) -> Hash (cost=19676.00..19676.00 rows=300 width=38) -> Seq Scan on tiny_table t (cost=0.00..19676.00 rows=300 width=38) ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[PERFORM] Optimizing a huge_table/tiny_table join
I want to optimize this simple join: SELECT * FROM huge_table h, tiny_table t WHERE UPPER( h.id ) = UPPER( t.id ) huge_table has about 2.5 million records, can be assumed as fixed, and has the following index: CREATE INDEX huge_table_index ON huge_table( UPPER( id ) ); ...while tiny_table changes with each user request, and typically will contain on the order of 100-1000 records. For this analysis, I put 300 records in tiny_table, resulting in 505 records in the join. I tried several approaches. In order of increasing speed of execution: 1. executed as shown above, with enable_seqscan on: about 100 s. 2. executed as shown above, with enable_seqscan off: about 10 s. 3. executed with a LIMIT 6000 clause added to the SELECT statement, and enable_seqscan on: about 5 s. 4. executed with a LIMIT 600 clause added to the SELECT statement, and enable_seqscan on: less than 1 s. Clearly, using LIMIT is the way to go. Unfortunately I *do* want all the records that would have been produced without the LIMIT clause, and I don't have a formula for the limit that will guarantee this. I could use a very large value (e.g. 20x the size of tiny_table, as in approach 3 above) which would make the probability of hitting the limit very small, but unfortunately, the query plan in this case is different from the query plan when the limit is just above the expected number of results (approach 4 above). The query plan for the fastest approach is this: QUERY PLAN - Limit (cost=0.01..2338.75 rows=600 width=84) -> Nested Loop (cost=0.01..14766453.89 rows=3788315 width=84) -> Seq Scan on tiny_table t (cost=0.00..19676.00 rows=300 width=38) -> Index Scan using huge_table_index on huge_table h (cost=0.01..48871.80 rows=12628 width=46) Index Cond: (upper(("outer".id)::text) = upper((h.id)::text)) How can I *force* this query plan even with a higher limit value? I found, by dumb trial and error, that in this case the switch happens at LIMIT 5432, which, FWIW, is about 0.2% of the size of huge_table. Is there a simpler way to determine this limit (hopefully programmatically)? Alternatively, I could compute the value for LIMIT as 2x the number of records in tiny_table, and if the number of records found is *exactly* this number, I would know that (most likely) some records were left out. In this case, I could use the fact that, according to the query plan above, the scan of tiny_table is sequential to infer which records in tiny_table were disregarded when the limit was reached, and then repeat the query with only these left over records in tiny_table. What's your opinion of this strategy? Is there a good way to improve it? Many thanks in advance! kj PS: FWIW, the query plan for the query with LIMIT 6000 is this: QUERY PLAN - Limit (cost=19676.75..21327.99 rows=6000 width=84) -> Hash Join (cost=19676.75..1062244.81 rows=3788315 width=84) Hash Cond: (upper(("outer".id)::text) = upper(("inner".id)::text)) -> Seq Scan on huge_table h (cost=0.00..51292.43 rows=2525543 width=46) -> Hash (cost=19676.00..19676.00 rows=300 width=38) -> Seq Scan on tiny_table t (cost=0.00..19676.00 rows=300 width=38) ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[PERFORM] compiling source code!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Hi I'm a new postgresql user. I wrote ACO (ant colony optimazition) and want to replace it with GEQO in postres/src/backend/optimizer but I don't know how to compile and run the source code :( I installed postgresql-8.1.3 and cygwin but I can not use them to compile the source code. I want to compare GEQO and ACO optimizers performance using a small database Can you help me??? Talk is cheap. Use Yahoo! Messenger to make PC-to-Phone calls. Great rates starting at 1¢/min.
Re: [PERFORM] Query performance
Steinar H. Gunderson wrote: > [EMAIL PROTECTED] wrote: >> The above query takes 5 seconds to execute! >> >> [...] >> >> Total runtime: 96109.571 ms > > It sure doesn't look like it... > >> Total runtime: 461.907 ms >> >> [...] >> >> Suddenly the query takes only 0.29 seconds! > > How are you timing this, really? > > /* Steinar */ I'm executing the queries from phpPgAdmin. The above are for explain analyse. I was referring to the pure query execution time. Does anyone have an idea why the OR-query takes so long? Any server-side tuning possibilities? I wouldn't like to change the code of ldap's back-sql... Toni ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Adding and filling new column on big table
Jonathan Blitz writes: I suppose I could do but I need to install PostgreSQL there and then copy over the database. Maybe I will give it a try. I really think that is your best bet. If for whatever reason that will not be an option perhaps you can just let the process run over the weekend.. possibly monitor the process from the OS to make sure it is not frozen. Don't recall if you mentioned the OS.. is it any unix like os? If so there are several ways you could check to make sure the process is not frozen such as iostats, top, vmstats(these from FreeBSD, but most unix like os should have tools like those if not some with the same name). ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Adding and filling new column on big table
Most laptop drives are only 5,400 RPM which would make a transaction like you are describing likely take a while. Not sure what my one is but it is new(ish). If you're doing data intensive operations (like a big update which looks like what you're doing) it will write many megabytes to the harddrive... my laptop HDD (5400 rpm) does about 15 MB/s throughput while a standard desktop 7200rpm drive does 55-60 MB/s throughput. Plus, seek times on a laptop drive are horrendous. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Adding and filling new column on big table
On Tue, 2006-05-30 at 16:04, Jonathan Blitz wrote: > > > > Most laptop drives are only 5,400 RPM which would make a transaction like > > you are describing likely take a while. > > Not sure what my one is but it is new(ish). > > > > > No desktop at home you could try it on? > > I think the problem with the laptop is likely it's drive. > > I suppose I could do but I need to install PostgreSQL there and then copy > over the database. Keep in mind, most, if not all IDE drives lie about fsync, so the speed of the drive is a limit only if you're actually writing a lot of data. If you're doing a lot of little transactions, the drive should be lying and holding the data in cache on board, so the speed should be OK. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Adding and filling new column on big table
> > Most laptop drives are only 5,400 RPM which would make a transaction like > you are describing likely take a while. Not sure what my one is but it is new(ish). > > No desktop at home you could try it on? > I think the problem with the laptop is likely it's drive. I suppose I could do but I need to install PostgreSQL there and then copy over the database. Maybe I will give it a try. Jonathan -- No virus found in this outgoing message. Checked by AVG Free Edition. Version: 7.1.394 / Virus Database: 268.7.4/351 - Release Date: 05/29/2006 ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Getting even more insert performance (250m+rows/day)
On May 29, 2006, at 7:11 AM, Markus Schaber wrote: One big transaction every 5 minutes using 'COPY FROM' (instead of inserts). Are you using "COPY table FROM '/path/to/file'", having the file sitting on the server, or "COPY table FROM STDIN" or psql "/copy", having the file sitting on the client? COPY table FROM STDIN using psql on the server I should have gprof numbers on a similarly set up test machine soon ... -- Daniel J. Luke ++ | * [EMAIL PROTECTED] * | | *-- http://www.geeklair.net -* | ++ | Opinions expressed are mine and do not necessarily | | reflect the opinions of my employer. | ++ PGP.sig Description: This is a digitally signed message part
Re: [PERFORM] Adding and filling new column on big table
Jonathan Blitz writes: Nope. Didn't think it would make any difference. May be worth a try. I am using a laptop :). Pentium 4 (not 4M) with 1GB of memory - 2 MHZ Most laptop drives are only 5,400 RPM which would make a transaction like you are describing likely take a while. Must do it on that since the program is aimed for use at home. No desktop at home you could try it on? I think the problem with the laptop is likely it's drive. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] pg_dump issue
Title: RE: [PERFORM] pg_dump issue I did carry it down to the subdirectory level but only included the total for brevity. I'll paste the complete readout at the end of the email. I'll try the "vmstat 1" as you suggest next time the backups run. If the Eng staff finds anything I'll post the results and maybe save someone else some grief if they have the same issue. Thanks again for your input Tom. Tim PROD001 PROD002 220K ./global[PARA]4.0K ./pg_xlog/archive_status[PARA]529M ./pg_xlog[PARA]36K ./pg_clog[PARA]256K ./pg_subtrans[PARA]4.0K ./base/1/pgsql_tmp[PARA]4.8M ./base/1[PARA]4.8M ./base/17229[PARA]4.0K ./base/62878500/pgsql_tmp[PARA]4.8M ./base/62878500[PARA]5.5M ./base/1152695[PARA]4.0K ./base/67708567/pgsql_tmp[PARA]1.6G ./base/67708567[PARA]12K ./base/1157024/pgsql_tmp[PARA]6.3G ./base/1157024[PARA]4.0K ./base/1159370/pgsql_tmp[PARA]543M ./base/1159370[PARA]4.0K ./base/1157190/pgsql_tmp[PARA]164M ./base/1157190[PARA]4.0K ./base/1621391/pgsql_tmp[PARA]81M ./base/1621391[PARA]8.6G ./base[PARA]4.0K ./pg_tblspc[PARA]604K ./pg_log[PARA]9.1G . 220K ./global[PARA]4.0K ./pg_xlog/archive_status[PARA]529M ./pg_xlog[PARA]136K ./pg_clog[PARA]208K ./pg_subtrans[PARA]4.0K ./base/1/pgsql_tmp[PARA]4.9M ./base/1[PARA]4.8M ./base/17229[PARA]5.3M ./base/1274937[PARA]4.0K ./base/64257611/pgsql_tmp[PARA]1.6G ./base/64257611[PARA]4.0K ./base/71683200/pgsql_tmp[PARA]6.1G ./base/71683200[PARA]4.0K ./base/1281929/pgsql_tmp[PARA]478M ./base/1281929[PARA]4.0K ./base/58579022/pgsql_tmp[PARA]154M ./base/58579022[PARA]81M ./base/1773916[PARA]4.0K ./base/55667447/pgsql_tmp[PARA]4.8M ./base/55667447[PARA]8.3G ./base[PARA]4.0K ./pg_tblspc[PARA]588K ./pg_log[PARA]8.8G . -Original Message- From: Tom Lane [mailto:[EMAIL PROTECTED]] Sent: Tuesday, May 30, 2006 12:20 PM To: mcelroy, tim Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] pg_dump issue "mcelroy, tim" <[EMAIL PROTECTED]> writes: > The du . -h in $PGDATA showed PROD001 at 9.1G and Prod0002 at 8.8G so > they're pretty much the same, one would think the smaller one should be > faster. Yes, the backup files are identical in size. Hmph. You should carry the "du" analysis down to the subdirectory level, eg make sure that it's not a case of lots of pg_xlog bloat balancing out bloat in a different area on the other system. But I suspect you won't find anything. > I'm hoping the Engineering staff can find something system related as I > doubted and still doubt that it's a postgres issue. I tend to agree. You might try watching "vmstat 1" output while taking the dumps, so you could at least get a clue whether the problem is CPU or I/O related ... regards, tom lane
Re: [PERFORM] pg_dump issue
"mcelroy, tim" <[EMAIL PROTECTED]> writes: > The du . -h in $PGDATA showed PROD001 at 9.1G and Prod0002 at 8.8G so > they're pretty much the same, one would think the smaller one should be > faster. Yes, the backup files are identical in size. Hmph. You should carry the "du" analysis down to the subdirectory level, eg make sure that it's not a case of lots of pg_xlog bloat balancing out bloat in a different area on the other system. But I suspect you won't find anything. > I'm hoping the Engineering staff can find something system related as I > doubted and still doubt that it's a postgres issue. I tend to agree. You might try watching "vmstat 1" output while taking the dumps, so you could at least get a clue whether the problem is CPU or I/O related ... regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Adding and filling new column on big table
> > > Could you do the updates in batches instead of trying to do them all at > once? Nope. Didn't think it would make any difference. > > Have you done a vacuum full on this table ever? Many times > > What hardware? > I have a dual CPU opteron with 4GB of RAM and 8 disks in RAID 10 (SATA). > Doing an update on a 5 million record table took quite a while, but it did > fininish. :-) I am using a laptop :). Pentium 4 (not 4M) with 1GB of memory - 2 MHZ Must do it on that since the program is aimed for use at home. Jonathan -- No virus found in this outgoing message. Checked by AVG Free Edition. Version: 7.1.394 / Virus Database: 268.7.4/351 - Release Date: 05/29/2006 ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Adding and filling new column on big table
Jonathan Blitz writes: I just gave up in the end and left it with NULL as the default value. Could you do the updates in batches instead of trying to do them all at once? Have you done a vacuum full on this table ever? There were, in fact, over 2 million rows in the table rather than 1/4 of a million so that was part of the problem. What hardware? I have a dual CPU opteron with 4GB of RAM and 8 disks in RAID 10 (SATA). Doing an update on a 5 million record table took quite a while, but it did fininish. :-) I just did vacuum full before and after though.. That many updates tend to slow down operations on the table aftewards unless you vacuum the table. Based on what you wrote it sounded as if you tried a few times and may have killed the process.. this would certainly slow down the operations on that table unless you did a vacuum full. I wonder if running vacuum analyze against the table as the updates are running would be of any help. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] pg_dump issue
Title: RE: [PERFORM] pg_dump issue Thanks Tom. I have scheduled vacuums as follows and all have run without error. Mon - Thu after-hours: vacuumdb -z -e -a -v On Fridays I add the -f option vacuumdb -z -e -a -v -f The du . -h in $PGDATA showed PROD001 at 9.1G and Prod0002 at 8.8G so they're pretty much the same, one would think the smaller one should be faster. Yes, the backup files are identical in size. BTW - this is postgres 8.0.1. Stuck at this due to "that is the latest postgresql version certified by our vendor's application". I'm hoping the Engineering staff can find something system related as I doubted and still doubt that it's a postgres issue. Tim -Original Message- From: Tom Lane [mailto:[EMAIL PROTECTED]] Sent: Tuesday, May 30, 2006 11:16 AM To: mcelroy, tim Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] pg_dump issue "mcelroy, tim" <[EMAIL PROTECTED]> writes: > I have identical postgres installations running on identical machines. Dual > Core AMD Opteron(tm) Processor 870 , 16GB RAM, Red Hat Linux 3.2.3-20 and > 120GB worth of disk space on two drives. > Recently, I have noticed that my nightly backups take longer on one machine > than on the other. I back up five (5) databases totaling 8.6GB in size. On > Prod001 the backups take app. 7 minutes, on Prod002 the backups take app. 26 > minutes! Quite a discrepancy. Are the resulting backup files identical? Chasing down the reasons for any diffs might yield some enlightenment. One idea that comes to mind is that Prod002 is having performance problems due to table bloat (maybe a missing vacuum cron job or some such). A quick "du" on the two $PGDATA directories to check for significant size differences would reveal this if so. regards, tom lane
Re: [PERFORM] Speedup hint needed, if available? :)
On Tue, 2006-05-30 at 11:05 -0400, Tom Lane wrote: > Do you have realistic test data? The EXPLAIN shows that this is pulling > 275366 of the 826032 rows in the two tables, which seems like rather a > lot for a single user. If it's reasonable that the query needs to fetch > one-third of the data, then you should resign yourself to it taking > awhile :-( I'd say so, yes. The user_subscription table now has only six rows, but the number of actions (giving/taking credits) for a user could easily be as high as 50.000. > If the expected number of matching rows were much smaller, it would > make sense to use indexscans over the two big tables, but unfortunately > existing PG releases don't know how to generate an indexscan join > with a UNION ALL in between :-(. FWIW, 8.2 will be able to do it. > In current releases the only thing I can suggest is to merge > user_subscription_credits_given and user_subscription_credits_taken > into one table so you don't need the UNION ALL. See, that's an idea! :) Thnx, I'll try that. Is it inapropriate to ask about rough estimate on availableness of 8.2? :) Mario -- Mario Splivalo Mob-Art [EMAIL PROTECTED] "I can do it quick, I can do it cheap, I can do it well. Pick any two." ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] pg_dump issue
"mcelroy, tim" <[EMAIL PROTECTED]> writes: > I have identical postgres installations running on identical machines. Dual > Core AMD Opteron(tm) Processor 870 , 16GB RAM, Red Hat Linux 3.2.3-20 and > 120GB worth of disk space on two drives. > Recently, I have noticed that my nightly backups take longer on one machine > than on the other. I back up five (5) databases totaling 8.6GB in size. On > Prod001 the backups take app. 7 minutes, on Prod002 the backups take app. 26 > minutes! Quite a discrepancy. Are the resulting backup files identical? Chasing down the reasons for any diffs might yield some enlightenment. One idea that comes to mind is that Prod002 is having performance problems due to table bloat (maybe a missing vacuum cron job or some such). A quick "du" on the two $PGDATA directories to check for significant size differences would reveal this if so. regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Speedup hint needed, if available? :)
Mario Splivalo <[EMAIL PROTECTED]> writes: > Here is the query which gets information on particular user, shows > subscriptions to mailinglists and available credits on those > mailinglists: > SELECT u.subscription_id, u.user_id, u.mailinglist_id, u.valid_from, > u.valid_to, sum(credits.credits_given - credits.credits_taken)::integer > AS credits > FROM user_subscriptions u > LEFT JOIN > (SELECT user_subscription_credits_given.subscription_id, > user_subscription_credits_given.credits AS credits_given, 0 AS > credits_taken >FROM user_subscription_credits_given > UNION ALL > SELECT user_subscription_credits_taken.subscription_id, 0 AS > credits_given, user_subscription_credits_taken.credits AS credits_taken >FROM user_subscription_credits_taken) credits > ON u.subscription_id = credits.subscription_id > where > u.user_id = 1 > GROUP BY u.subscription_id, u.user_id, u.mailinglist_id, u.valid_from, > u.valid_to Do you have realistic test data? The EXPLAIN shows that this is pulling 275366 of the 826032 rows in the two tables, which seems like rather a lot for a single user. If it's reasonable that the query needs to fetch one-third of the data, then you should resign yourself to it taking awhile :-( If the expected number of matching rows were much smaller, it would make sense to use indexscans over the two big tables, but unfortunately existing PG releases don't know how to generate an indexscan join with a UNION ALL in between :-(. FWIW, 8.2 will be able to do it. In current releases the only thing I can suggest is to merge user_subscription_credits_given and user_subscription_credits_taken into one table so you don't need the UNION ALL. regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[PERFORM] pg_dump issue
Title: pg_dump issue Good morning, I have identical postgres installations running on identical machines. Dual Core AMD Opteron(tm) Processor 870 , 16GB RAM, Red Hat Linux 3.2.3-20 and 120GB worth of disk space on two drives. Recently, I have noticed that my nightly backups take longer on one machine than on the other. I back up five (5) databases totaling 8.6GB in size. On Prod001 the backups take app. 7 minutes, on Prod002 the backups take app. 26 minutes! Quite a discrepancy. I checked myself than checked with our Engineering staff and have been assured that the machines are identical hardware wise, CPU, disk, etc. Question; has anyone run into a similar issue? Here is the command I use for the nightly backup on both machines: pg_dump -F c -f $DB.backup.$DATE $DB Kind of scratching my head on this one Thank you, Tim McElroy
[PERFORM] Speedup hint needed, if available? :)
Hello again. I have to track user subscriptions to certain mailinglists, and I also need to track credits users have on those mailinglists. On one side I have procedures that add credits, on other side I have procedures that subtract available credits. Add/subtract is pretty intensive, around 30-50 adds per minute (usualy 10 or 100 credits), and around 200-500 subtracts per minute (usualy by one or two credits). I have created table user_subscriptions to track user subscriptions to certain mailing list. I have derived subscription_id as primary key. I have two other tables, user_subscription_credits_given, and _credits_taken, wich track credits for subscription added or subtracted to or from certain subscription. I created those two tables so I could eliminate a lot of UPDATES on user_subscriptions table (if I were to have a column 'credits' in that table). user_subscriptions table is projected to have around 100.000 rows, and _credits_given/_credits_taken table is projected to have around 10.000.000 rows. Now, I have filled the tables with test data, and the query results is kind of poor. It takes almost 50 seconds to get the data for the particular subscription. Now, is there a way to speed this up, or I need different approach? Here is the DDL/DML: CREATE TABLE user_subscriptions ( subscription_id int4 NOT NULL DEFAULT nextval('user_subscriptions_id_seq'::regclass), user_id int4 NOT NULL, mailinglist_id int4 NOT NULL, valid_from timestamptz NOT NULL, valid_to timestamptz, CONSTRAINT user_subscriptions_pkey PRIMARY KEY (subscription_id) ); CREATE TABLE user_subscription_credits_given ( subscription_id int4 NOT NULL, credits int4 NOT NULL, CONSTRAINT user_subscription_credits_given_fk__subscription_id FOREIGN KEY (subscription_id) REFERENCES user_subscriptions (subscription_id) MATCH SIMPLE ON UPDATE NO ACTION ON DELETE NO ACTION ); CREATE INDEX fki_user_subscriptions_fk__mailinglist_id ON user_subscriptions USING btree (mailinglist_id); CREATE INDEX fki_user_subscriptions_fk__users_id ON user_subscriptions USING btree (user_id); CREATE INDEX fki_user_subscription_credits_given_fk__subscription_id ON user_subscription_credits_given USING btree (subscription_id); CREATE INDEX fki_user_subscription_credits_taken_fk__subscription_id ON user_subscription_credits_taken USING btree (subscription_id); Here is the query which gets information on particular user, shows subscriptions to mailinglists and available credits on those mailinglists: SELECT u.subscription_id, u.user_id, u.mailinglist_id, u.valid_from, u.valid_to, sum(credits.credits_given - credits.credits_taken)::integer AS credits FROM user_subscriptions u LEFT JOIN (SELECT user_subscription_credits_given.subscription_id, user_subscription_credits_given.credits AS credits_given, 0 AS credits_taken FROM user_subscription_credits_given UNION ALL SELECT user_subscription_credits_taken.subscription_id, 0 AS credits_given, user_subscription_credits_taken.credits AS credits_taken FROM user_subscription_credits_taken) credits ON u.subscription_id = credits.subscription_id where u.user_id = 1 GROUP BY u.subscription_id, u.user_id, u.mailinglist_id, u.valid_from, u.valid_to And here is the 'explain analyze' of the above query: QUERY PLAN -- HashAggregate (cost=200079055.24..200079055.28 rows=2 width=36) (actual time=56527.153..56527.163 rows=2 loops=1) -> Nested Loop Left Join (cost=200033690.72..200078931.34 rows=8260 width=36) (actual time=0.432..54705.844 rows=275366 loops=1) Join Filter: ("outer".subscription_id = "inner".subscription_id) -> Index Scan using fki_user_subscriptions_fk__users_id on user_subscriptions u (cost=0.00..3.03 rows=2 width=28) (actual time=0.030..0.055 rows=2 loops=1) Index Cond: (user_id = 1) -> Materialize (cost=200033690.72..200045984.63 rows=825991 width=12) (actual time=0.043..22404.107 rows=826032 loops=2) -> Subquery Scan credits (cost=1.00..200028830.73 rows=825991 width=12) (actual time=0.050..31500.589 rows=826032 loops=1) -> Append (cost=1.00..200020570.82 rows=825991 width=8) (actual time=0.041..22571.540 rows=826032 loops=1) -> Subquery Scan "*SELECT* 1" (cost=1.00..11946.96 rows=78148 width=8) (actual time=0.031..1226.640 rows=78148 loops=1) -> Seq Scan on user_subscription_credits_given (cost=1.00..11165.48 rows=78148 width=8) (actual time=0.022..404.253 rows=78148 loops=1) -> Subquery Scan "*SELECT* 2" (cost=1.00..100018623.86 rows=747843 width=8) (actual time=0.032..12641.705
Re: [PERFORM] Selects query stats?
try pgtop. It is mytop clone for postgresql. Regards, alvis Francisco Reyes wrote: I am not sure if this is what the original poster was refering to, but I have used an application called mtop that shows how many queries per second mysql is doing. In my case this is helpfull because we have a number of machines running postfix and each incoming mail generates about 7 queries. Queries are all very simmilar to each other in that scenario. Having access to that query/second stat allowed me to improve the settings in MysQL. Ultimately once we migrated to a new server I could see how moving to the new machine increased the speed at which we could accept emails. I am, little by little, getting PostgreSQL to be used, but for now the postfix queries are tied to MySQL. By the time we hopefully do move to PostgreSQL for the Postfix backend it will be very helpfull to have some sort of way to measure queries/time period. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Adding and filling new column on big table
> > So, I have tried to run the following command. The command never finishes > > (I gave up after about and hour and a half!). > > Did you ever find what was the problem? > Perhaps you needed to run a vacuum full on the table? Nope. I just gave up in the end and left it with NULL as the default value. There were, in fact, over 2 million rows in the table rather than 1/4 of a million so that was part of the problem. Jonathan -- No virus found in this outgoing message. Checked by AVG Free Edition. Version: 7.1.394 / Virus Database: 268.7.4/351 - Release Date: 05/29/2006 ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly