Re: [GENERAL] Handling large number of OR/IN conditions
Steve Atkins writes: > On May 1, 2009, at 2:42 PM, David Wall wrote: >> Does anybody know if PG will perform better with the table join >> instead of evaluating the series of OR/IN? The OR/IN has to be >> parsed, but the comparisons may be faster than the table join. > It used to be that populating and then joining with a temporary table > was faster than using IN (1, 3, 5, 7, 9, 11, 13) for all but the > smallest sets. That's no longer true, and IN() is pretty good. > I'd still use a temporary table myself, though. It's cleaner and > easier to populate one than to cleanly produce a statement with a > variable number of identifiers in it. And you can reuse it for > multiple reports, join against it different ways and so on. Also you > can populate it either from your UI or by selecting from the > relationships table suggested above (create temporary table foo as > select peon from reports where overlord in ('bob', 'ben', 'jerry) ), > and still run the same reports against it. Possibly worth noting: if you're depending on the quality of join plans involving such a table, it's worth doing an ANALYZE against it after you populate it. (Autovacuum won't do that for you, because it can't access temp tables.) regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Handling large number of OR/IN conditions
On May 1, 2009, at 2:42 PM, David Wall wrote: (quoted from Chris) Select field1,field2 FROM table1 inner join relationships on table1.creator_user_id = relationships.employee WHERE relationships.manager = ? (quoted from Steve) select table1.field1, table2.field2 from table1, reports where table1.creator_user_id = reports.peon and reports.overlord = 'bob' Thanks, Steve and Chris, who both suggested similar things. I believe that will work for me in some situations, but currently the report allows the manager to select any combination (from one to all) of users that report to him. So one solution would be to run the report as you have it and then have the application filter it, but that could be painful if the users "de-selected" from his full team happen to have a large percentage of the matching rows. Of course, I may see if this is something they really do much of. I mean, perhaps if they select individual users (just a few), I can just use the OR/IN style, and if they select "all my users" I could rely on the table joins. Does anybody know if PG will perform better with the table join instead of evaluating the series of OR/IN? The OR/IN has to be parsed, but the comparisons may be faster than the table join. It used to be that populating and then joining with a temporary table was faster than using IN (1, 3, 5, 7, 9, 11, 13) for all but the smallest sets. That's no longer true, and IN() is pretty good. I'd still use a temporary table myself, though. It's cleaner and easier to populate one than to cleanly produce a statement with a variable number of identifiers in it. And you can reuse it for multiple reports, join against it different ways and so on. Also you can populate it either from your UI or by selecting from the relationships table suggested above (create temporary table foo as select peon from reports where overlord in ('bob', 'ben', 'jerry) ), and still run the same reports against it. Cheers, Steve -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Handling large number of OR/IN conditions
(quoted from Chris) Select field1,field2 FROM table1 inner join relationships on table1.creator_user_id = relationships.employee WHERE relationships.manager = ? (quoted from Steve) select table1.field1, table2.field2 from table1, reports where table1.creator_user_id = reports.peon and reports.overlord = 'bob' Thanks, Steve and Chris, who both suggested similar things. I believe that will work for me in some situations, but currently the report allows the manager to select any combination (from one to all) of users that report to him. So one solution would be to run the report as you have it and then have the application filter it, but that could be painful if the users "de-selected" from his full team happen to have a large percentage of the matching rows. Of course, I may see if this is something they really do much of. I mean, perhaps if they select individual users (just a few), I can just use the OR/IN style, and if they select "all my users" I could rely on the table joins. Does anybody know if PG will perform better with the table join instead of evaluating the series of OR/IN? The OR/IN has to be parsed, but the comparisons may be faster than the table join. Many thanks for your help David
Re: [GENERAL] Handling large number of OR/IN conditions
A separate table for managing the relationships. One column for the manager and one for employee. Then you end up with a query like this. Select field1,field2 FROM table1 inner join relationships on table1.creator_user_id = relationships.employee WHERE relationships.manager = ? _ From: pgsql-general-ow...@postgresql.org [mailto:pgsql-general-ow...@postgresql.org] On Behalf Of David Wall Sent: Friday, May 01, 2009 12:49 PM To: pgsql-general@postgresql.org Subject: [GENERAL] Handling large number of OR/IN conditions We have a database report function that seemed clean when the number of users was small, but as the number of users grow, I was wondering if anybody had any good ideas about how to handle OR or IN for SELECTs. The general scenario is that a manager runs reports that list all records that were created by users under his/her oversight. So, when the number of users is small, we had simple queries like: SELECT field1, field2 FROM table1 WHERE creator_user_id = 'U1' OR creator_user_id = 'U2'; But when there are thousands of users, and a manager has oversight of 100 of them, the OR construct seems out of whack when you read the query: WHERE creator_user_id = 'U1' OR creator_user_id = 'U2' ... OR creator_user_id = 'U99' OR creator_user_id = 'U100' I know it can be shortened with IN using something like, but don't know if it's any more/less efficient or a concern: WHERE creator_user_id IN ('U1', 'U2', , 'U99', 'U100) How do people tend to handle this sort of thing? I suspect manager reports against their people must be pretty common. Are there any good tricks on how to group users like this? Unfortunately, group membership changes over time, and users may report to more than one manager and thus belong to more than one group, so we can't just have a 'creator_group_id' attribute that is set and then query against that. Thanks, David
Re: [GENERAL] Handling large number of OR/IN conditions
On May 1, 2009, at 10:49 AM, David Wall wrote: We have a database report function that seemed clean when the number of users was small, but as the number of users grow, I was wondering if anybody had any good ideas about how to handle OR or IN for SELECTs. The general scenario is that a manager runs reports that list all records that were created by users under his/her oversight. So, when the number of users is small, we had simple queries like: SELECT field1, field2 FROM table1 WHERE creator_user_id = 'U1' OR creator_user_id = 'U2'; But when there are thousands of users, and a manager has oversight of 100 of them, the OR construct seems out of whack when you read the query: WHERE creator_user_id = 'U1' OR creator_user_id = 'U2' ... OR creator_user_id = 'U99' OR creator_user_id = 'U100' I know it can be shortened with IN using something like, but don't know if it's any more/less efficient or a concern: WHERE creator_user_id IN ('U1', 'U2', , 'U99', 'U100) How do people tend to handle this sort of thing? I suspect manager reports against their people must be pretty common. Are there any good tricks on how to group users like this? Unfortunately, group membership changes over time, and users may report to more than one manager and thus belong to more than one group, so we can't just have a 'creator_group_id' attribute that is set and then query against that. Sounds like a job for a two column table that lists manager and report. select table1.field1, table2.field2 from table1, reports where table1.creator_user_id = reports.peon and reports.overlord = 'bob' Cheers, Steve -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Handling large number of OR/IN conditions
We have a database report function that seemed clean when the number of users was small, but as the number of users grow, I was wondering if anybody had any good ideas about how to handle OR or IN for SELECTs. The general scenario is that a manager runs reports that list all records that were created by users under his/her oversight. So, when the number of users is small, we had simple queries like: SELECT field1, field2 FROM table1 WHERE creator_user_id = 'U1' OR creator_user_id = 'U2'; But when there are thousands of users, and a manager has oversight of 100 of them, the OR construct seems out of whack when you read the query: WHERE creator_user_id = 'U1' OR creator_user_id = 'U2' *...* OR creator_user_id = 'U99' OR creator_user_id = 'U100' I know it can be shortened with IN using something like, but don't know if it's any more/less efficient or a concern: WHERE creator_user_id IN ('U1', 'U2', , 'U99', 'U100) How do people tend to handle this sort of thing? I suspect manager reports against their people must be pretty common. Are there any good tricks on how to group users like this? Unfortunately, group membership changes over time, and users may report to more than one manager and thus belong to more than one group, so we can't just have a 'creator_group_id' attribute that is set and then query against that. Thanks, David