Re: [GENERAL] Query performance question on a large table
Tom, Sort of piggybacking on this thread but why the suggestion to drop the use of DISTINCT in 7.4? We use DISTINCT all over the place to eliminate duplicates in sub select statements. Running 7.4.0 currently on FreeBSD5.1 Dell 2650 4GB RAM 5 disk SCSI array hardware RAID 0 Example: explain analyze select t1.raw_agent_string from d_useragent t1 where t1.id in (select distinct useragent_key from f_pageviews where date_key between 356 and 362); QUERY PLAN -- Nested Loop (cost=1020025.13..1020178.84 rows=51 width=79) (actual time=954080.021..970268.457 rows=82207 loops=1) - HashAggregate (cost=1020025.13..1020025.13 rows=51 width=4) (actual time=954049.317..954450.065 rows=82208 loops=1) - Subquery Scan IN_subquery (cost=983429.20..1020025.00 rows=51 width=4) (actual time=856641.244..953639.116 rows=82208 loops=1) - Unique (cost=983429.20..1020024.49 rows=51 width=4) (actual time=856641.230..952939.539 rows=82208 loops=1) - Sort (cost=983429.20..1001726.84 rows=7319058 width=4) (actual time=856641.215..906429.835 rows=11067735 loops=1) Sort Key: useragent_key - Index Scan using idx_pageviews_date_dec_2003 on f_pageviews (cost=0.00..136434.63 rows=7319058 width=4) (actual time=1.140..693400.464 rows=11067735 loops=1) Index Cond: ((date_key = 356) AND (date_key = 362)) - Index Scan using d_useragent_pkey on d_useragent t1 (cost=0.00..3.00 rows=1 width=83) (actual time=0.169..0.174 rows=1 loops=82208) Index Cond: (t1.id = outer.useragent_key) Total runtime: 970657.888 ms (11 rows) t1.id is the primary key on d_useragent. d_useragent actually has 390751 rows. useragent_key has an index. f_pageviews has roughly 120 million rows. Is there a better way of writing this sort of query that will accomplish the same thing? Thanks. --sean Tom Lane wrote: =?iso-8859-2?Q?Egy=FCd_Csaba?= [EMAIL PROTECTED] writes: here is a sample query: select mertido, fomeazon, ertektipus, mertertek from t_me30 where fomeazon in (select distinct fomeazon from t_fome where lower(inuse) = 'igen') and mertido like '2003-12-17%' and ertektipus in ('+MW') order by mertido, fomeazon, ertektipus; Ohh, I nearly forgot the config: Linux 7.1; Postgres 7.3.2; The first thing you ought to do is move to PG 7.4. foo IN (SELECT ...) generally works a lot better under 7.4 than prior releases. I'd suggest dropping the DISTINCT when using 7.4, too. regards, tom lane ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED] ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] Query performance question on a large table
Sean Shanny [EMAIL PROTECTED] writes: Sort of piggybacking on this thread but why the suggestion to drop the use of DISTINCT in 7.4? Because the 7.4 planner can decide for itself whether DISTINCT'ifying the sub-select output is the best way to proceed or not. There is more than one good way to do an IN sub-SELECT operation, and the 7.4 planner knows several. (Pre-7.4 planners didn't know any :-( ... but I digress.) When you write foo IN (SELECT DISTINCT ...), the DISTINCT doesn't change the semantics at all, it just adds overhead. In fact it's worse than that: if the planner decides that the best way to proceed is to make the subselect output unique, it will throw another layer of sort/unique processing on top of what you did. So writing DISTINCT is actually a pessimization in 7.4. Example: - HashAggregate (cost=1020025.13..1020025.13 rows=51 width=4) (actual time=954049.317..954450.065 rows=82208 loops=1) - Subquery Scan IN_subquery (cost=983429.20..1020025.00 rows=51 width=4) (actual time=856641.244..953639.116 rows=82208 loops=1) - Unique (cost=983429.20..1020024.49 rows=51 width=4) (actual time=856641.230..952939.539 rows=82208 loops=1) - Sort (cost=983429.20..1001726.84 rows=7319058 width=4) (actual time=856641.215..906429.835 rows=11067735 loops=1) Sort Key: useragent_key - Index Scan using The sort/unique steps are coming from the DISTINCT. The HashAggregate step is the planner making sure the output rows are distinct :-( I just a couple days ago added some logic to CVS tip to notice that the sub-select has a DISTINCT clause, and not add unnecessary unique-ifying processing on top of it. So in 7.5, writing a DISTINCT clause will amount to forcing a particular query plan, which might or might not be the best thing but hopefully won't be too terrible. But in 7.4 it has nothing to recommend it ... regards, tom lane ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [GENERAL] problems with transaction blocks
Transactions are atomic. What you are asking to do violates the whole concept of a transaction. You can, however, do these inserts outside of the transaction block. Best Wishes, Chris Travers - Original Message - From: Chris Ochs [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Wednesday, January 07, 2004 7:52 AM Subject: [GENERAL] problems with transaction blocks I want to do a series of inserts within a single transaction block, but with postgresql if one insert fails, the whole block is aborted. Is there any way to get around this behavior so that postgresql won't abort the entire transaction if a single insert returns an error? Chris ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED] ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [GENERAL] Query performance question on a large table
Hi Tom, thank you, I'll upgrade as soon as I can. Anyway I've already planned to do so for a while. I'll keep in mind your remarks concerning the DISTINCT clause too. Bye and Best Regards, -- Csaba -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Behalf Of Tom Lane Sent: 2004. janur 6. 21:04 To: [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] (E-mail) Subject: Re: [GENERAL] Query performance question on a large table =?iso-8859-2?Q?Egy=FCd_Csaba?= [EMAIL PROTECTED] writes: here is a sample query: select mertido, fomeazon, ertektipus, mertertek from t_me30 where fomeazon in (select distinct fomeazon from t_fome where lower(inuse) = 'igen') and mertido like '2003-12-17%' and ertektipus in ('+MW') order by mertido, fomeazon, ertektipus; Ohh, I nearly forgot the config: Linux 7.1; Postgres 7.3.2; The first thing you ought to do is move to PG 7.4. foo IN (SELECT ...) generally works a lot better under 7.4 than prior releases. I'd suggest dropping the DISTINCT when using 7.4, too. regards, tom lane ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED] -- Incoming mail is certified Virus Free. Checked by AVG Anti-Virus (http://www.grisoft.com). Version: 7.0.209 / Virus Database: 261 - Release Date: 2004. 01. 02. ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
[GENERAL] Reparse PL/pgSQL Function?
Is it possible to (automatically) force the reparse of a stored PL/pgSQL function following the deletion and recreation of an object that was referenced within the function (for example a table or another function)? The need to re-create (create or replace ...) the function is a pain as you have to follow a potential chain of other objects recreating them too. I think that it would be better if Postgres could (optionally) try re-parsing the function if it failed to locate a referenced object's OID. The reparse would allow it to locate the newly created OID (assuming the object had been recreated), and then operate as before. I know that someone is going to say that it is safer not to do this because the error flags the DB in an inconsistant state, but... Oracle has the concept of a validity flag for stored procs/functions. When you delete an object, all procs/functions referencing that object are marked invalid. When you try to execute a function marked invalid, Oracle reparses and compiles the proc/function the next time it is executed. If the referenced objects have been recreated, then operation proceeds normally, and if not, an error is generated. Could Postgres provide the same behaviour? And if so, how likely is a fix? :) John Sidney-Woollett ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] Loading a dumped databse from cd
On Wednesday 07 January 2004 04:38, Paul Mc Gee wrote: hi everybody i have postgresql installed on my red hat linux machine and i want to load up a dumped postgresql database which i have on cdrom. does anyone know how i could do this? You probably want to read up on pg_restore (and perhaps pg_dump) - you can find information in the postgresql docs on your installation, at http://www.postgresql.org/docs/ or with man pg_restore. It's difficult to say much more without knowing your situation. Some things you might want to consider: Does the database and its structure (tables etc) already exist, do you just want to restore the data? Does the user for this database already exist? Does the database (empty) already exist? These will help you decide what options to select for pg_restore. If you don't know exactly what you've got then pg_restore --list myfile is probably a good place to start. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 8: explain analyze is your friend
[GENERAL] deadlocks - sharelocks on transactions
Hi, I have a stored procedure that is causing deadlocks when called multiple times synchronously. The odd issue is that the deadlock seems to be happening on different threads waiting for locks on transactions. What exactly is this transaction lock? ERROR: deadlock detected DETAIL: Process 1740 waits for ShareLock on transaction 1488; blocked by process 1716. Process 1716 waits for ShareLock on transaction 1490; blocked by process 1740. It's a little difficult to debug this issue when it's not identifying which data accesses are causing the deadlocks. Does anyone have any information that may help in tracking down the problem? Many thanks, Tim Current set-up: Postgresql 7.4.1 (Windows 2000, Cygwin) or Postgresql 7.4 (Linux, Redhat 9) ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [GENERAL] problems with transaction blocks
Another good way to handle this is to put a trigger on the table that diverts inserts that would fail to a holding table. While this will slow down the inserts, it will allow you to insert large lists of dubious quality and worry about the bad rows later. My preference is to fix the data feed, or pre-process it with PHP/Perl to split it into two files ahead of time, but I'm more of a coder than a dba. I get a lot of data to import from other sources at work, and it's often easier to make the sources fix their data feeds than it is to try and massage them each and every time. On Wed, 7 Jan 2004, Chris Travers wrote: Transactions are atomic. What you are asking to do violates the whole concept of a transaction. You can, however, do these inserts outside of the transaction block. Best Wishes, Chris Travers - Original Message - From: Chris Ochs [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Wednesday, January 07, 2004 7:52 AM Subject: [GENERAL] problems with transaction blocks I want to do a series of inserts within a single transaction block, but with postgresql if one insert fails, the whole block is aborted. Is there any way to get around this behavior so that postgresql won't abort the entire transaction if a single insert returns an error? Chris ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED] ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [GENERAL] Reparse PL/pgSQL Function?
On Wed, 7 Jan 2004, John Sidney-Woollett wrote: Is it possible to (automatically) force the reparse of a stored PL/pgSQL function following the deletion and recreation of an object that was referenced within the function (for example a table or another function)? Would CREATE OR REPLACE function work? ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [GENERAL] problems with transaction blocks
On Tue, Jan 06, 2004 at 16:52:12 -0800, Chris Ochs [EMAIL PROTECTED] wrote: I want to do a series of inserts within a single transaction block, but with postgresql if one insert fails, the whole block is aborted. Is there any way to get around this behavior so that postgresql won't abort the entire transaction if a single insert returns an error? Currently there is no provision for recovery from error by the application inside a transaction. What you can do is have the application check for problems before trying the insert. Depending on the problems you expect, you might be able to avoid trying an insert which will fail in almost all cases. ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [GENERAL] problems with transaction blocks
Another way is break the transaction. Instead of consisting of many inserts, each insert is a transaction itself. Do a Begin and a Commit (or rollback) circling the insert statement. HTH Rodrigo Malara Em Qua, 2004-01-07 às 14:41, scott.marlowe escreveu: Another good way to handle this is to put a trigger on the table that diverts inserts that would fail to a holding table. While this will slow down the inserts, it will allow you to insert large lists of dubious quality and worry about the bad rows later. My preference is to fix the data feed, or pre-process it with PHP/Perl to split it into two files ahead of time, but I'm more of a coder than a dba. I get a lot of data to import from other sources at work, and it's often easier to make the sources fix their data feeds than it is to try and massage them each and every time. On Wed, 7 Jan 2004, Chris Travers wrote: Transactions are atomic. What you are asking to do violates the whole concept of a transaction. You can, however, do these inserts outside of the transaction block. Best Wishes, Chris Travers - Original Message - From: Chris Ochs [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Wednesday, January 07, 2004 7:52 AM Subject: [GENERAL] problems with transaction blocks I want to do a series of inserts within a single transaction block, but with postgresql if one insert fails, the whole block is aborted. Is there any way to get around this behavior so that postgresql won't abort the entire transaction if a single insert returns an error? Chris ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED] ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [GENERAL] deadlocks - sharelocks on transactions
On Wed, 7 Jan 2004, Tim McAuley wrote: Hi, I have a stored procedure that is causing deadlocks when called multiple times synchronously. The odd issue is that the deadlock seems to be happening on different threads waiting for locks on transactions. What exactly is this transaction lock? My first guess would be waiting on row level locks. Are you doing anything with FOR UPDATE or foreign keys? ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [GENERAL] Any way to have CREATEUSER privs without having all privs?
On Sat, Dec 06, 2003 at 22:33:00 -0800, Ezra Epstein [EMAIL PROTECTED] wrote: Thank you Bruno for the informative reply. I'm not sure how ident solves this. I would like the session to run as the actual user (via set session authorization) so that that user's actual privs are enforced. But I want the connection to be shared: so it cannot be per login (username/pw combo). I'm not up on ident enough to see the fit. Any pointers would be most welcome. I was wrong about this being useful in your situation. SET SESSION AUTHORIZATION doesn't reauthenticate, it only allows you to switch to a new user if you originally were connected as a superuser. Ident authentication would only be useful if you could close and then reopen the connection. This could be useful if the overhead of doing this wasn't a concern. ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] deadlocks - sharelocks on transactions
Tim McAuley [EMAIL PROTECTED] writes: I have a stored procedure that is causing deadlocks when called multiple times synchronously. The odd issue is that the deadlock seems to be happening on different threads waiting for locks on transactions. What exactly is this transaction lock? ERROR: deadlock detected DETAIL: Process 1740 waits for ShareLock on transaction 1488; blocked by process 1716. Process 1716 waits for ShareLock on transaction 1490; blocked by process 1740. What you've got here is transactions deadlocked by trying to update/delete the same rows (unfortunately the lock manager doesn't know exactly which rows, so the DETAIL isn't too helpful). Most of the complaints I've seen about this sort of problem are not really from updates per se, but the SELECT FOR UPDATE locking that is done for foreign key references. Are you inserting multiple rows that might reference the same foreign keys? regards, tom lane ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [GENERAL] Query performance question on a large table
On Wed, Jan 07, 2004 at 02:31:22 -0500, Tom Lane [EMAIL PROTECTED] wrote: I just a couple days ago added some logic to CVS tip to notice that the sub-select has a DISTINCT clause, and not add unnecessary unique-ifying processing on top of it. So in 7.5, writing a DISTINCT clause will amount to forcing a particular query plan, which might or might not be the best thing but hopefully won't be too terrible. But in 7.4 it has nothing to recommend it ... Can't the DISTINCT be dropped if there isn't a LIMIT clause? Similarly UNION, INTERSECTION and EXCEPT could also also be changed to the ALL forms if there isn't a LIMIT. ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster