Re: [PERFORM] multiple table scan performance

2011-03-29 Thread Craig James

On 3/29/11 3:16 PM, Samuel Gendler wrote:

I've got some functionality that necessarily must scan a relatively large 
table.  Even worse, the total workload is actually 3 similar, but different 
queries, each of which requires a table scan.  They all have a resultset that 
has the same structure, and all get inserted into a temp table.  Is there any 
performance benefit to revamping the workload such that it issues a single:

insert into (...) select ... UNION select ... UNION select

as opposed to 3 separate "insert into () select ..." statements.

I could figure it out empirically, but the queries are really slow on my dev 
laptop and I don't have access to the staging system at the moment.  Also, it 
requires revamping a fair bit of code, so I figured it never hurts to ask.  I 
don't have a sense of whether postgres is able to parallelize multiple 
subqueries via a single scan

You don't indicate how complex your queries are.  If it's just a single table 
and the conditions are relatively simple, could you do something like this?

  insert into (...) select ... where (...) OR (...) OR (...)

Craig

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] multiple table scan performance

2011-03-29 Thread Samuel Gendler
On Tue, Mar 29, 2011 at 5:05 PM, Marti Raudsepp  wrote:

> On Wed, Mar 30, 2011 at 01:16, Samuel Gendler 
> wrote:
>
> You can trick Postgres (8.3.x and newer) into doing it in parallel
> anyway: open 3 separate database connections and issue each of these
> 'INSERT INTO ... SELECT' parts separately.  This way all the queries
> should execute in about 1/3 the time, compared to running them in one
> session or with UNION ALL.
>

That's a good idea, but forces a lot of infrastructural change on me.  I'm
inserting into a temp table, then deleting everything from another table
before copying over.  I could insert into an ordinary table, but then I've
got to deal with ensuring that everything is properly cleaned up, etc.
 Since nothing is actually blocked, waiting for the queries to return, I
think I'll just let them churn for now. It won't make much difference in
production, where the whole table will fit easily into cache.  I just wanted
things to be faster in my dev environment.



>
> Regards,
> Marti
>


Re: [PERFORM] multiple table scan performance

2011-03-29 Thread Marti Raudsepp
On Wed, Mar 30, 2011 at 01:16, Samuel Gendler  wrote:
> I've got some functionality that necessarily must scan a relatively large 
> table

> Is there any performance benefit to revamping the workload such that it issues
> a single:
> insert into (...) select ... UNION select ... UNION select
> as opposed to 3 separate "insert into () select ..." statements.

Apparently not, as explained by Claudio Freire. This seems like missed
opportunity for the planner, however. If it scanned all three UNION
subqueries in parallel, the synchronized seqscans feature would kick
in and the physical table would only be read once, instead of 3 times.

(I'm assuming that seqscan disk access is your bottleneck)

You can trick Postgres (8.3.x and newer) into doing it in parallel
anyway: open 3 separate database connections and issue each of these
'INSERT INTO ... SELECT' parts separately.  This way all the queries
should execute in about 1/3 the time, compared to running them in one
session or with UNION ALL.

Regards,
Marti

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] multiple table scan performance

2011-03-29 Thread Claudio Freire
On Tue, Mar 29, 2011 at 7:16 PM, Samuel Gendler
 wrote:
> Is there any performance benefit to revamping the workload such that it issues
> a single:
> insert into (...) select ... UNION select ... UNION select
> as opposed to 3 separate "insert into () select ..." statements.

I wouldn't expect any difference - if you used UNION ALL (union will
be equivalent to insert into ()  select DISTINCT ...)

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] multiple table scan performance

2011-03-29 Thread Samuel Gendler
I've got some functionality that necessarily must scan a relatively large
table.  Even worse, the total workload is actually 3 similar, but different
queries, each of which requires a table scan.  They all have a resultset
that has the same structure, and all get inserted into a temp table.  Is
there any performance benefit to revamping the workload such that it issues
a single:

insert into (...) select ... UNION select ... UNION select

as opposed to 3 separate "insert into () select ..." statements.

I could figure it out empirically, but the queries are really slow on my dev
laptop and I don't have access to the staging system at the moment.  Also,
it requires revamping a fair bit of code, so I figured it never hurts to
ask.  I don't have a sense of whether postgres is able to parallelize
multiple subqueries via a single scan