Re: [GENERAL] Query performance question on a large table

2004-01-07 Thread Sean Shanny
Tom,

Sort of piggybacking on this thread but why the suggestion to drop the 
use of DISTINCT in 7.4?  We use DISTINCT all over the place to eliminate 
duplicates in sub select statements. Running 7.4.0 currently on 
FreeBSD5.1 Dell 2650 4GB RAM 5 disk SCSI array hardware RAID 0

Example:

explain analyze select t1.raw_agent_string from d_useragent t1 where 
t1.id in (select distinct useragent_key from f_pageviews where date_key 
between 356 and 362);
 
QUERY PLAN
 
--
Nested Loop  (cost=1020025.13..1020178.84 rows=51 width=79) (actual 
time=954080.021..970268.457 rows=82207 loops=1)
  -  HashAggregate  (cost=1020025.13..1020025.13 rows=51 width=4) 
(actual time=954049.317..954450.065 rows=82208 loops=1)
-  Subquery Scan IN_subquery  (cost=983429.20..1020025.00 
rows=51 width=4) (actual time=856641.244..953639.116 rows=82208 loops=1)
  -  Unique  (cost=983429.20..1020024.49 rows=51 width=4) 
(actual time=856641.230..952939.539 rows=82208 loops=1)
-  Sort  (cost=983429.20..1001726.84 rows=7319058 
width=4) (actual time=856641.215..906429.835 rows=11067735 loops=1)
  Sort Key: useragent_key
  -  Index Scan using 
idx_pageviews_date_dec_2003 on f_pageviews  (cost=0.00..136434.63 
rows=7319058 width=4) (actual time=1.140..693400.464 rows=11067735 loops=1)
Index Cond: ((date_key = 356) AND 
(date_key = 362))
  -  Index Scan using d_useragent_pkey on d_useragent t1  
(cost=0.00..3.00 rows=1 width=83) (actual time=0.169..0.174 rows=1 
loops=82208)
Index Cond: (t1.id = outer.useragent_key)
Total runtime: 970657.888 ms
(11 rows)

t1.id is the primary key on d_useragent.  d_useragent  actually has   
390751 rows.
useragent_key has an index.  f_pageviews has roughly 120 million rows.

Is there a better way of writing this sort of query that will accomplish 
the same thing?

Thanks.

--sean

Tom Lane wrote:

=?iso-8859-2?Q?Egy=FCd_Csaba?= [EMAIL PROTECTED] writes:
 

here is a sample query:
select  mertido, fomeazon, ertektipus, mertertek from t_me30 where fomeazon
in (select distinct fomeazon from t_fome where lower(inuse) = 'igen') and
mertido like '2003-12-17%' and ertektipus in ('+MW') order by mertido,
fomeazon, ertektipus;
   

 

Ohh, I nearly forgot the config:  Linux 7.1; Postgres 7.3.2;
   

The first thing you ought to do is move to PG 7.4.  foo IN (SELECT ...)
generally works a lot better under 7.4 than prior releases.  I'd suggest
dropping the DISTINCT when using 7.4, too.
			regards, tom lane

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
 



---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


Re: [GENERAL] Query performance question on a large table

2004-01-07 Thread Tom Lane
Sean Shanny [EMAIL PROTECTED] writes:
 Sort of piggybacking on this thread but why the suggestion to drop the 
 use of DISTINCT in 7.4?

Because the 7.4 planner can decide for itself whether DISTINCT'ifying
the sub-select output is the best way to proceed or not.  There is more
than one good way to do an IN sub-SELECT operation, and the 7.4
planner knows several.  (Pre-7.4 planners didn't know any :-( ... but
I digress.)  When you write foo IN (SELECT DISTINCT ...), the DISTINCT
doesn't change the semantics at all, it just adds overhead.

In fact it's worse than that: if the planner decides that the best way
to proceed is to make the subselect output unique, it will throw another
layer of sort/unique processing on top of what you did.  So writing
DISTINCT is actually a pessimization in 7.4.

  Example:

-  HashAggregate  (cost=1020025.13..1020025.13 rows=51 width=4) 
 (actual time=954049.317..954450.065 rows=82208 loops=1)
  -  Subquery Scan IN_subquery  (cost=983429.20..1020025.00 
 rows=51 width=4) (actual time=856641.244..953639.116 rows=82208 loops=1)
-  Unique  (cost=983429.20..1020024.49 rows=51 width=4) 
 (actual time=856641.230..952939.539 rows=82208 loops=1)
  -  Sort  (cost=983429.20..1001726.84 rows=7319058 
 width=4) (actual time=856641.215..906429.835 rows=11067735 loops=1)
Sort Key: useragent_key
-  Index Scan using 

The sort/unique steps are coming from the DISTINCT.  The HashAggregate
step is the planner making sure the output rows are distinct :-(

I just a couple days ago added some logic to CVS tip to notice that the
sub-select has a DISTINCT clause, and not add unnecessary unique-ifying
processing on top of it.  So in 7.5, writing a DISTINCT clause will
amount to forcing a particular query plan, which might or might not be
the best thing but hopefully won't be too terrible.  But in 7.4 it has
nothing to recommend it ...

regards, tom lane

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [GENERAL] problems with transaction blocks

2004-01-07 Thread Chris Travers
Transactions are atomic.  What you are asking to do violates the whole
concept of a transaction.

You can, however, do these inserts outside of the transaction block.

Best Wishes,
Chris Travers
- Original Message -
From: Chris Ochs [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Wednesday, January 07, 2004 7:52 AM
Subject: [GENERAL] problems with transaction blocks


 I want to do a series of inserts within a single transaction block, but
with
 postgresql if one insert fails, the whole block is aborted.  Is there any
 way to get around this behavior so that postgresql won't abort the entire
 transaction if a single insert returns an error?

 Chris


 ---(end of broadcast)---
 TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]




---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [GENERAL] Query performance question on a large table

2004-01-07 Thread Egyd Csaba
Hi Tom,
thank you, I'll upgrade as soon as I can. Anyway I've already planned to do
so for a while.
I'll keep in mind your remarks concerning the DISTINCT clause too.

Bye and Best Regards,
-- Csaba

 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] Behalf Of Tom Lane
 Sent: 2004. janur 6. 21:04
 To: [EMAIL PROTECTED]
 Cc: [EMAIL PROTECTED] (E-mail)
 Subject: Re: [GENERAL] Query performance question on a large table


 =?iso-8859-2?Q?Egy=FCd_Csaba?= [EMAIL PROTECTED] writes:
  here is a sample query:
  select  mertido, fomeazon, ertektipus, mertertek from
 t_me30 where fomeazon
  in (select distinct fomeazon from t_fome where lower(inuse)
 = 'igen') and
  mertido like '2003-12-17%' and ertektipus in ('+MW') order
 by mertido,
  fomeazon, ertektipus;

  Ohh, I nearly forgot the config:  Linux 7.1; Postgres 7.3.2;

 The first thing you ought to do is move to PG 7.4.  foo IN
 (SELECT ...)
 generally works a lot better under 7.4 than prior releases.
 I'd suggest
 dropping the DISTINCT when using 7.4, too.

   regards, tom lane

 ---(end of
 broadcast)---
 TIP 1: subscribe and unsubscribe commands go to
 [EMAIL PROTECTED]



 -- Incoming mail is certified Virus Free.
 Checked by AVG Anti-Virus (http://www.grisoft.com).
 Version: 7.0.209 / Virus Database: 261 - Release Date: 2004. 01. 02.



---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


[GENERAL] Reparse PL/pgSQL Function?

2004-01-07 Thread John Sidney-Woollett
Is it possible to (automatically) force the reparse of a stored PL/pgSQL
function following the deletion and recreation of an object that was
referenced within the function (for example a table or another function)?

The need to re-create (create or replace ...) the function is a pain as
you have to follow a potential chain of other objects recreating them too.

I think that it would be better if Postgres could (optionally) try
re-parsing the function if it failed to locate a referenced object's OID.
The reparse would allow it to locate the newly created OID (assuming the
object had been recreated), and then operate as before.

I know that someone is going to say that it is safer not to do this
because the error flags the DB in an inconsistant state, but...

Oracle has the concept of a validity flag for stored procs/functions. When
you delete an object, all procs/functions referencing that object are
marked invalid. When you try to execute a function marked invalid, Oracle
reparses and compiles the proc/function the next time it is executed. If
the referenced objects have been recreated, then operation proceeds
normally, and if not, an error is generated.

Could Postgres provide the same behaviour? And if so, how likely is a fix? :)

John Sidney-Woollett

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [GENERAL] Loading a dumped databse from cd

2004-01-07 Thread Richard Huxton
On Wednesday 07 January 2004 04:38, Paul Mc Gee wrote:
 hi everybody
 i have postgresql installed on my red hat linux
 machine and i want to load up a dumped postgresql
 database which i have on cdrom. does anyone know how i
 could do this?

You probably want to read up on pg_restore (and perhaps pg_dump) - you can 
find information in the postgresql docs on your installation, at 
http://www.postgresql.org/docs/ or with man pg_restore.

It's difficult to say much more without knowing your situation. Some things 
you might want to consider:
Does the database and its structure (tables etc) already exist, do you just 
want to restore the data?
Does the user for this database already exist?
Does the database (empty) already exist?

These will help you decide what options to select for pg_restore. If you don't 
know exactly what you've got then pg_restore --list myfile is probably a 
good place to start.
-- 
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 8: explain analyze is your friend


[GENERAL] deadlocks - sharelocks on transactions

2004-01-07 Thread Tim McAuley
Hi,

I have a stored procedure that is causing deadlocks when called multiple 
times synchronously. The odd issue is that the deadlock seems to be 
happening on different threads waiting for locks on transactions. What 
exactly is this transaction lock?

ERROR:  deadlock detected
DETAIL:  Process 1740 waits for ShareLock on transaction 1488; blocked 
by process 1716.
Process 1716 waits for ShareLock on transaction 1490; blocked 
by process 1740.

It's a little difficult to debug this issue when it's not identifying 
which data accesses are causing the deadlocks.

Does anyone have any information that may help in tracking down the problem?

Many thanks,

Tim

Current set-up:

Postgresql 7.4.1 (Windows 2000, Cygwin) or
Postgresql 7.4   (Linux, Redhat 9)
---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [GENERAL] problems with transaction blocks

2004-01-07 Thread scott.marlowe
Another good way to handle this is to put a trigger on the table that 
diverts inserts that would fail to a holding table.  While this will slow 
down the inserts, it will allow you to insert large lists of dubious 
quality and worry about the bad rows later.

My preference is to fix the data feed, or pre-process it with PHP/Perl to 
split it into two files ahead of time, but I'm more of a coder than a dba.  
I get a lot of data to import from other sources at work, and it's often 
easier to make the sources fix their data feeds than it is to try and 
massage them each and every time.

On Wed, 7 Jan 2004, Chris Travers wrote:

 Transactions are atomic.  What you are asking to do violates the whole
 concept of a transaction.
 
 You can, however, do these inserts outside of the transaction block.
 
 Best Wishes,
 Chris Travers
 - Original Message -
 From: Chris Ochs [EMAIL PROTECTED]
 To: [EMAIL PROTECTED]
 Sent: Wednesday, January 07, 2004 7:52 AM
 Subject: [GENERAL] problems with transaction blocks
 
 
  I want to do a series of inserts within a single transaction block, but
 with
  postgresql if one insert fails, the whole block is aborted.  Is there any
  way to get around this behavior so that postgresql won't abort the entire
  transaction if a single insert returns an error?
 
  Chris
 
 
  ---(end of broadcast)---
  TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
 
 
 
 
 ---(end of broadcast)---
 TIP 5: Have you checked our extensive FAQ?
 
http://www.postgresql.org/docs/faqs/FAQ.html
 


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [GENERAL] Reparse PL/pgSQL Function?

2004-01-07 Thread scott.marlowe
On Wed, 7 Jan 2004, John Sidney-Woollett wrote:

 Is it possible to (automatically) force the reparse of a stored PL/pgSQL
 function following the deletion and recreation of an object that was
 referenced within the function (for example a table or another function)?

Would 

CREATE OR REPLACE function

work?




---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [GENERAL] problems with transaction blocks

2004-01-07 Thread Bruno Wolff III
On Tue, Jan 06, 2004 at 16:52:12 -0800,
  Chris Ochs [EMAIL PROTECTED] wrote:
 I want to do a series of inserts within a single transaction block, but with
 postgresql if one insert fails, the whole block is aborted.  Is there any
 way to get around this behavior so that postgresql won't abort the entire
 transaction if a single insert returns an error?

Currently there is no provision for recovery from error by the application
inside a transaction. What you can do is have the application check for
problems before trying the insert. Depending on the problems you expect,
you might be able to avoid trying an insert which will fail in almost
all cases.

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [GENERAL] problems with transaction blocks

2004-01-07 Thread Rodrigo Malara
Another way is break the transaction. Instead of consisting of many
inserts, each insert is a transaction itself.
Do a Begin and a Commit (or rollback) circling the insert statement.
HTH
Rodrigo Malara
Em Qua, 2004-01-07 às 14:41, scott.marlowe escreveu:
 Another good way to handle this is to put a trigger on the table that 
 diverts inserts that would fail to a holding table.  While this will slow 
 down the inserts, it will allow you to insert large lists of dubious 
 quality and worry about the bad rows later.
 
 My preference is to fix the data feed, or pre-process it with PHP/Perl to 
 split it into two files ahead of time, but I'm more of a coder than a dba.  
 I get a lot of data to import from other sources at work, and it's often 
 easier to make the sources fix their data feeds than it is to try and 
 massage them each and every time.
 
 On Wed, 7 Jan 2004, Chris Travers wrote:
 
  Transactions are atomic.  What you are asking to do violates the whole
  concept of a transaction.
  
  You can, however, do these inserts outside of the transaction block.
  
  Best Wishes,
  Chris Travers
  - Original Message -
  From: Chris Ochs [EMAIL PROTECTED]
  To: [EMAIL PROTECTED]
  Sent: Wednesday, January 07, 2004 7:52 AM
  Subject: [GENERAL] problems with transaction blocks
  
  
   I want to do a series of inserts within a single transaction block, but
  with
   postgresql if one insert fails, the whole block is aborted.  Is there any
   way to get around this behavior so that postgresql won't abort the entire
   transaction if a single insert returns an error?
  
   Chris
  
  
   ---(end of broadcast)---
   TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
  
  
  
  
  ---(end of broadcast)---
  TIP 5: Have you checked our extensive FAQ?
  
 http://www.postgresql.org/docs/faqs/FAQ.html
  
 
 
 ---(end of broadcast)---
 TIP 7: don't forget to increase your free space map settings



---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [GENERAL] deadlocks - sharelocks on transactions

2004-01-07 Thread Stephan Szabo

On Wed, 7 Jan 2004, Tim McAuley wrote:

 Hi,

 I have a stored procedure that is causing deadlocks when called multiple
 times synchronously. The odd issue is that the deadlock seems to be
 happening on different threads waiting for locks on transactions. What
 exactly is this transaction lock?

My first guess would be waiting on row level locks.  Are you doing
anything with FOR UPDATE or foreign keys?

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [GENERAL] Any way to have CREATEUSER privs without having all privs?

2004-01-07 Thread Bruno Wolff III
On Sat, Dec 06, 2003 at 22:33:00 -0800,
  Ezra Epstein [EMAIL PROTECTED] wrote:
 Thank you Bruno for the informative reply.
 
 I'm not sure how ident solves this.  I would like the session to run as the
 actual user (via set session authorization) so that that user's actual privs
 are enforced.  But I want the connection to be shared: so it cannot be per
 login (username/pw combo).  I'm not up on ident enough to see the fit.  Any
 pointers would be most welcome.

I was wrong about this being useful in your situation. SET SESSION
AUTHORIZATION doesn't reauthenticate, it only allows you to switch
to a new user if you originally were connected as a superuser.

Ident authentication would only be useful if you could close and then
reopen the connection. This could be useful if the overhead of doing
this wasn't a concern.

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [GENERAL] deadlocks - sharelocks on transactions

2004-01-07 Thread Tom Lane
Tim McAuley [EMAIL PROTECTED] writes:
 I have a stored procedure that is causing deadlocks when called multiple 
 times synchronously. The odd issue is that the deadlock seems to be 
 happening on different threads waiting for locks on transactions. What 
 exactly is this transaction lock?

 ERROR:  deadlock detected
 DETAIL:  Process 1740 waits for ShareLock on transaction 1488; blocked 
 by process 1716.
  Process 1716 waits for ShareLock on transaction 1490; blocked 
 by process 1740.

What you've got here is transactions deadlocked by trying to
update/delete the same rows (unfortunately the lock manager doesn't know
exactly which rows, so the DETAIL isn't too helpful).

Most of the complaints I've seen about this sort of problem are not
really from updates per se, but the SELECT FOR UPDATE locking that
is done for foreign key references.  Are you inserting multiple rows
that might reference the same foreign keys?

regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [GENERAL] Query performance question on a large table

2004-01-07 Thread Bruno Wolff III
On Wed, Jan 07, 2004 at 02:31:22 -0500,
  Tom Lane [EMAIL PROTECTED] wrote:
 
 I just a couple days ago added some logic to CVS tip to notice that the
 sub-select has a DISTINCT clause, and not add unnecessary unique-ifying
 processing on top of it.  So in 7.5, writing a DISTINCT clause will
 amount to forcing a particular query plan, which might or might not be
 the best thing but hopefully won't be too terrible.  But in 7.4 it has
 nothing to recommend it ...

Can't the DISTINCT be dropped if there isn't a LIMIT clause?
Similarly UNION, INTERSECTION and EXCEPT could also also be changed
to the ALL forms if there isn't a LIMIT.

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster