Re: [GENERAL] Handling large number of OR/IN conditions

2009-05-01 Thread Tom Lane
Steve Atkins  writes:
> On May 1, 2009, at 2:42 PM, David Wall wrote:
>> Does anybody know if PG will perform better with the table join  
>> instead of evaluating the series of OR/IN?  The OR/IN has to be  
>> parsed, but the comparisons may be faster than the table join.

> It used to be that populating and then joining with a temporary table  
> was faster than using IN (1, 3, 5, 7, 9, 11, 13) for all but the  
> smallest sets. That's no longer true, and IN() is pretty good.

> I'd still use a temporary table myself, though. It's cleaner and  
> easier to populate one than to cleanly produce a statement with a  
> variable number of identifiers in it. And you can reuse it for  
> multiple reports, join against it different ways and so on. Also you  
> can populate it either from your UI or by selecting from the  
> relationships table suggested above (create temporary table foo as  
> select peon from reports where overlord in ('bob', 'ben', 'jerry) ),  
> and still run the same reports against it.

Possibly worth noting: if you're depending on the quality of join plans
involving such a table, it's worth doing an ANALYZE against it after
you populate it.  (Autovacuum won't do that for you, because it can't
access temp tables.)

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Handling large number of OR/IN conditions

2009-05-01 Thread Steve Atkins


On May 1, 2009, at 2:42 PM, David Wall wrote:


(quoted from Chris)
Select field1,field2 FROM table1 inner join relationships on  
table1.creator_user_id = relationships.employee WHERE  
relationships.manager = ?


(quoted from Steve)
select table1.field1, table2.field2 from table1, reports where  
table1.creator_user_id = reports.peon and reports.overlord = 'bob'


Thanks, Steve and Chris, who both suggested similar things.

I believe that will work for me in some situations, but currently  
the report allows the manager to select any combination (from one to  
all) of users that report to him.  So one solution would be to run  
the report as you have it and then have the application filter it,  
but that could be painful if the users "de-selected" from his full  
team happen to have a large percentage of the matching rows.  Of  
course, I may see if this is something they really do much of.  I  
mean, perhaps if they select individual users (just a few), I can  
just use the OR/IN style, and if they select "all my users" I could  
rely on the table joins.


Does anybody know if PG will perform better with the table join  
instead of evaluating the series of OR/IN?  The OR/IN has to be  
parsed, but the comparisons may be faster than the table join.


It used to be that populating and then joining with a temporary table  
was faster than using IN (1, 3, 5, 7, 9, 11, 13) for all but the  
smallest sets. That's no longer true, and IN() is pretty good.


I'd still use a temporary table myself, though. It's cleaner and  
easier to populate one than to cleanly produce a statement with a  
variable number of identifiers in it. And you can reuse it for  
multiple reports, join against it different ways and so on. Also you  
can populate it either from your UI or by selecting from the  
relationships table suggested above (create temporary table foo as  
select peon from reports where overlord in ('bob', 'ben', 'jerry) ),  
and still run the same reports against it.


Cheers,
  Steve


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Handling large number of OR/IN conditions

2009-05-01 Thread David Wall
(quoted from Chris) 
Select field1,field2 FROM table1 inner join relationships on 
table1.creator_user_id = relationships.employee WHERE 
relationships.manager = ?


(quoted from Steve) 
select table1.field1, table2.field2 from table1, reports where 
table1.creator_user_id = reports.peon and reports.overlord = 'bob'


Thanks, Steve and Chris, who both suggested similar things. 

I believe that will work for me in some situations, but currently the 
report allows the manager to select any combination (from one to all) of 
users that report to him.  So one solution would be to run the report as 
you have it and then have the application filter it, but that could be 
painful if the users "de-selected" from his full team happen to have a 
large percentage of the matching rows.  Of course, I may see if this is 
something they really do much of.  I mean, perhaps if they select 
individual users (just a few), I can just use the OR/IN style, and if 
they select "all my users" I could rely on the table joins.


Does anybody know if PG will perform better with the table join instead 
of evaluating the series of OR/IN?  The OR/IN has to be parsed, but the 
comparisons may be faster than the table join.


Many thanks for your help

David


Re: [GENERAL] Handling large number of OR/IN conditions

2009-05-01 Thread Chris Spotts
A separate table for managing the relationships.  One column for the manager
and one for employee.

Then you end up with a query like this.

 

Select field1,field2 FROM table1 inner join relationships on
table1.creator_user_id = relationships.employee WHERE relationships.manager
= ?

  _  

From: pgsql-general-ow...@postgresql.org
[mailto:pgsql-general-ow...@postgresql.org] On Behalf Of David Wall
Sent: Friday, May 01, 2009 12:49 PM
To: pgsql-general@postgresql.org
Subject: [GENERAL] Handling large number of OR/IN conditions

 

We have a database report function that seemed clean when the number of
users was small, but as the number of users grow, I was wondering if anybody
had any good ideas about how to handle OR or IN for SELECTs.  

The general scenario is that a manager runs reports that list all records
that were created by users under his/her oversight.  So, when the number of
users is small, we had simple queries like:

SELECT field1, field2 FROM table1 WHERE creator_user_id = 'U1' OR
creator_user_id = 'U2';

But when there are thousands of users, and a manager has oversight of 100 of
them, the OR construct seems out of whack when you read the query:

WHERE creator_user_id = 'U1' OR creator_user_id = 'U2' ... OR
creator_user_id = 'U99' OR creator_user_id = 'U100'

I know it can be shortened with IN using something like, but don't know if
it's any more/less efficient or a concern:

WHERE creator_user_id IN ('U1', 'U2', , 'U99', 'U100)

How do people tend to handle this sort of thing?  I suspect manager reports
against their people must be pretty common.  Are there any good tricks on
how to group users like this?  Unfortunately, group membership changes over
time, and users may report to more than one manager and thus belong to more
than one group, so we can't just have a 'creator_group_id' attribute that is
set and then query against that. 

Thanks,
David



Re: [GENERAL] Handling large number of OR/IN conditions

2009-05-01 Thread Steve Atkins


On May 1, 2009, at 10:49 AM, David Wall wrote:

We have a database report function that seemed clean when the number  
of users was small, but as the number of users grow, I was wondering  
if anybody had any good ideas about how to handle OR or IN for  
SELECTs.


The general scenario is that a manager runs reports that list all  
records that were created by users under his/her oversight.  So,  
when the number of users is small, we had simple queries like:


SELECT field1, field2 FROM table1 WHERE creator_user_id = 'U1' OR  
creator_user_id = 'U2';


But when there are thousands of users, and a manager has oversight  
of 100 of them, the OR construct seems out of whack when you read  
the query:


WHERE creator_user_id = 'U1' OR creator_user_id = 'U2' ... OR  
creator_user_id = 'U99' OR creator_user_id = 'U100'


I know it can be shortened with IN using something like, but don't  
know if it's any more/less efficient or a concern:


WHERE creator_user_id IN ('U1', 'U2', , 'U99', 'U100)

How do people tend to handle this sort of thing?  I suspect manager  
reports against their people must be pretty common.  Are there any  
good tricks on how to group users like this?  Unfortunately, group  
membership changes over time, and users may report to more than one  
manager and thus belong to more than one group, so we can't just  
have a 'creator_group_id' attribute that is set and then query  
against that.



Sounds like a job for a two column table that lists manager and report.

select table1.field1, table2.field2 from table1, reports where  
table1.creator_user_id = reports.peon and reports.overlord = 'bob'


Cheers,
  Steve


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Handling large number of OR/IN conditions

2009-05-01 Thread David Wall
We have a database report function that seemed clean when the number of 
users was small, but as the number of users grow, I was wondering if 
anybody had any good ideas about how to handle OR or IN for SELECTs. 

The general scenario is that a manager runs reports that list all 
records that were created by users under his/her oversight.  So, when 
the number of users is small, we had simple queries like:


SELECT field1, field2 FROM table1 WHERE creator_user_id = 'U1' OR 
creator_user_id = 'U2';


But when there are thousands of users, and a manager has oversight of 
100 of them, the OR construct seems out of whack when you read the query:


WHERE creator_user_id = 'U1' OR creator_user_id = 'U2' *...* OR 
creator_user_id = 'U99' OR creator_user_id = 'U100'


I know it can be shortened with IN using something like, but don't know 
if it's any more/less efficient or a concern:


WHERE creator_user_id IN ('U1', 'U2', , 'U99', 'U100)

How do people tend to handle this sort of thing?  I suspect manager 
reports against their people must be pretty common.  Are there any good 
tricks on how to group users like this?  Unfortunately, group membership 
changes over time, and users may report to more than one manager and 
thus belong to more than one group, so we can't just have a 
'creator_group_id' attribute that is set and then query against that.


Thanks,
David