Re: [PERFORM] Performance problems with multiple layers of functions

2006-03-24 Thread Jim C. Nasby
On Fri, Mar 24, 2006 at 01:49:17PM +0100, Svenne Krap wrote:
 explain select dataset_id, entity, sum(amount) from entrydata_current 
 where  flow_direction in (select * from outflow_direction(dataset_id)) 
 and dataset_id in (select * from get_dataset_ids(122)) group by 
 dataset_id, entity;
snip 
 which does not return within 10 minutes - which is unacceptable.


The issue is that the planner has no way to know what's comming back
from get_dataset_ids.

I think your best bet will be to wrap that select into it's own function
and have that function prepare the query statement, going back to
hard-coded values. So you could do something like:

SQL := 'SELECT ... AND dataset_id IN (''' || get_dataset_ids(122) ||
''');' (yeah, I know that won't work as written, but you get the idea).
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Performance problems with multiple layers of functions

2006-03-24 Thread Tom Lane
Jim C. Nasby [EMAIL PROTECTED] writes:
 On Fri, Mar 24, 2006 at 01:49:17PM +0100, Svenne Krap wrote:
 explain select dataset_id, entity, sum(amount) from entrydata_current 
 where  flow_direction in (select * from outflow_direction(dataset_id)) 
 and dataset_id in (select * from get_dataset_ids(122)) group by 
 dataset_id, entity;

 The issue is that the planner has no way to know what's comming back
 from get_dataset_ids.

More specifically, the first IN is not optimizable into a join because
the results of the sub-SELECT depend on the current row of the outer
query.  The second IN is being optimized fine, but the first one is
what's killing you.

I'd suggest refactoring the functions into something that returns a set
of outflow_direction/dataset_id pairs, and then phrase the query as

where (flow_direction, dataset_id) in (select * from new_func(122))

You could do it without refactoring:

where (flow_direction, dataset_id) in
  (select outflow_direction(id),id from get_dataset_ids(122) id)

however this won't work if outflow_direction() is a plpgsql function
because of limitations in plpgsql's set-function support.  (It will work
if outflow_direction() is a SQL function, or you could kluge it as a SQL
function wrapper around a plpgsql function.)

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Performance problems with multiple layers of functions

2006-03-24 Thread Svenne Krap

Tom Lane wrote:

where (flow_direction, dataset_id) in (select * from new_func(122))
  


Is this form of multi-column IN mentioned anywhere in the docs? I can't 
find it.


Svenne


smime.p7s
Description: S/MIME Cryptographic Signature


Re: [PERFORM] Performance problems with multiple layers of functions

2006-03-24 Thread Tom Lane
Svenne Krap [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 where (flow_direction, dataset_id) in (select * from new_func(122))

 Is this form of multi-column IN mentioned anywhere in the docs? I can't 
 find it.

Sure, look under Subquery Expressions.  8.0 and later refer to it as a
row_constructor, but it's documented at least as far back as 7.3.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org