Re: [PERFORM] multiple table scan performance
On 3/29/11 3:16 PM, Samuel Gendler wrote: I've got some functionality that necessarily must scan a relatively large table. Even worse, the total workload is actually 3 similar, but different queries, each of which requires a table scan. They all have a resultset that has the same structure, and all get inserted into a temp table. Is there any performance benefit to revamping the workload such that it issues a single: insert into (...) select ... UNION select ... UNION select as opposed to 3 separate "insert into () select ..." statements. I could figure it out empirically, but the queries are really slow on my dev laptop and I don't have access to the staging system at the moment. Also, it requires revamping a fair bit of code, so I figured it never hurts to ask. I don't have a sense of whether postgres is able to parallelize multiple subqueries via a single scan You don't indicate how complex your queries are. If it's just a single table and the conditions are relatively simple, could you do something like this? insert into (...) select ... where (...) OR (...) OR (...) Craig -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] multiple table scan performance
On Tue, Mar 29, 2011 at 5:05 PM, Marti Raudsepp wrote: > On Wed, Mar 30, 2011 at 01:16, Samuel Gendler > wrote: > > You can trick Postgres (8.3.x and newer) into doing it in parallel > anyway: open 3 separate database connections and issue each of these > 'INSERT INTO ... SELECT' parts separately. This way all the queries > should execute in about 1/3 the time, compared to running them in one > session or with UNION ALL. > That's a good idea, but forces a lot of infrastructural change on me. I'm inserting into a temp table, then deleting everything from another table before copying over. I could insert into an ordinary table, but then I've got to deal with ensuring that everything is properly cleaned up, etc. Since nothing is actually blocked, waiting for the queries to return, I think I'll just let them churn for now. It won't make much difference in production, where the whole table will fit easily into cache. I just wanted things to be faster in my dev environment. > > Regards, > Marti >
Re: [PERFORM] multiple table scan performance
On Wed, Mar 30, 2011 at 01:16, Samuel Gendler wrote: > I've got some functionality that necessarily must scan a relatively large > table > Is there any performance benefit to revamping the workload such that it issues > a single: > insert into (...) select ... UNION select ... UNION select > as opposed to 3 separate "insert into () select ..." statements. Apparently not, as explained by Claudio Freire. This seems like missed opportunity for the planner, however. If it scanned all three UNION subqueries in parallel, the synchronized seqscans feature would kick in and the physical table would only be read once, instead of 3 times. (I'm assuming that seqscan disk access is your bottleneck) You can trick Postgres (8.3.x and newer) into doing it in parallel anyway: open 3 separate database connections and issue each of these 'INSERT INTO ... SELECT' parts separately. This way all the queries should execute in about 1/3 the time, compared to running them in one session or with UNION ALL. Regards, Marti -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] multiple table scan performance
On Tue, Mar 29, 2011 at 7:16 PM, Samuel Gendler wrote: > Is there any performance benefit to revamping the workload such that it issues > a single: > insert into (...) select ... UNION select ... UNION select > as opposed to 3 separate "insert into () select ..." statements. I wouldn't expect any difference - if you used UNION ALL (union will be equivalent to insert into () select DISTINCT ...) -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] multiple table scan performance
I've got some functionality that necessarily must scan a relatively large table. Even worse, the total workload is actually 3 similar, but different queries, each of which requires a table scan. They all have a resultset that has the same structure, and all get inserted into a temp table. Is there any performance benefit to revamping the workload such that it issues a single: insert into (...) select ... UNION select ... UNION select as opposed to 3 separate "insert into () select ..." statements. I could figure it out empirically, but the queries are really slow on my dev laptop and I don't have access to the staging system at the moment. Also, it requires revamping a fair bit of code, so I figured it never hurts to ask. I don't have a sense of whether postgres is able to parallelize multiple subqueries via a single scan