Re: [PERFORM] Oracle v. Postgres 9.0 query performance

2011-06-09 Thread Tom Lane
Tatsuo Ishii  writes:
> Just out of curiosity, is there any chance that this kind of query is
> speeding up in 9.1 because of following changes?

>  * Allow FULL OUTER JOIN to be implemented as a hash join, and allow
>either side of a LEFT OUTER JOIN or RIGHT OUTER JOIN to be hashed
>(Tom Lane)

The given query wasn't an outer join, so this wouldn't affect it.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Oracle v. Postgres 9.0 query performance

2011-06-09 Thread Tatsuo Ishii
> * Tony Capobianco (tcapobia...@prospectiv.com) wrote:
>>  HashAggregate  (cost=4391163.81..4391288.05 rows=9939 width=12)
>>->  Hash Join  (cost=14.78..4344767.23 rows=9279316 width=12)
>>  Hash Cond: (o.emailcampaignid = s.emailcampaignid)
>>  ->  Seq Scan on openactivity o  (cost=0.00..3529930.67
>> rows=192540967 width=12)
>>  ->  Hash  (cost=8.79..8.79 rows=479 width=4)
>>->  Seq Scan on ecr_sents s  (cost=0.00..8.79 rows=479
>> width=4)
>> 
>> Yikes.  Two sequential scans.
> 
> Err, isn't that more-or-less exactly what you want here?  The smaller
> table is going to be hashed and then you'll traverse the bigger table
> and bounce each row off the hash table.  Have you tried actually running
> this and seeing how long it takes?  The bigger table doesn't look to be
> *that* big, if your i/o subsystem is decent and you've got a lot of
> memory available for kernel cacheing, should be quick.

Just out of curiosity, is there any chance that this kind of query is
speeding up in 9.1 because of following changes?

 * Allow FULL OUTER JOIN to be implemented as a hash join, and allow
   either side of a LEFT OUTER JOIN or RIGHT OUTER JOIN to be hashed
   (Tom Lane)
   Previously FULL OUTER JOIN could only be implemented as a merge
   join, and LEFT OUTER JOIN and RIGHT OUTER JOIN could hash only the
   nullable side of the join. These changes provide additional query
   optimization possibilities.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Oracle v. Postgres 9.0 query performance

2011-06-08 Thread Tony Capobianco
Oooo...some bad math there.  Thanks.

On Wed, 2011-06-08 at 12:38 -0700, Samuel Gendler wrote:
> 
> 
> On Wed, Jun 8, 2011 at 12:03 PM, Tony Capobianco
>  wrote:
> My current setting is 22G.  According to some documentation, I
> want to
> set effective_cache_size to my OS disk cache +
> shared_buffers.  In this
> case, I have 4 quad-core processors with 512K cache (8G) and
> my
> shared_buffers is 7680M.  Therefore my effective_cache_size
> should be
> approximately 16G?  Most of our other etl processes are
> running fine,
> however I'm curious if I could see a significant performance
> boost by
> reducing the effective_cache_size.
> 
> 
> 
> 
> 
> disk cache, not CPU memory cache.  It will be some significant
> fraction of total RAM on the host.  Incidentally, 16 * 512K cache =
> 8MB, not 8GB.
> 
> 
> http://en.wikipedia.org/wiki/CPU_cache
> 
> 
> 
> 



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Oracle v. Postgres 9.0 query performance

2011-06-08 Thread Samuel Gendler
On Wed, Jun 8, 2011 at 12:03 PM, Tony Capobianco  wrote:

> My current setting is 22G.  According to some documentation, I want to
> set effective_cache_size to my OS disk cache + shared_buffers.  In this
> case, I have 4 quad-core processors with 512K cache (8G) and my
> shared_buffers is 7680M.  Therefore my effective_cache_size should be
> approximately 16G?  Most of our other etl processes are running fine,
> however I'm curious if I could see a significant performance boost by
> reducing the effective_cache_size.
>
>
disk cache, not CPU memory cache.  It will be some significant fraction of
total RAM on the host.  Incidentally, 16 * 512K cache = 8MB, not 8GB.

http://en.wikipedia.org/wiki/CPU_cache


Re: [PERFORM] Oracle v. Postgres 9.0 query performance

2011-06-08 Thread Kevin Grittner
Tony Capobianco  wrote:
 
> According to some documentation, I want to set
> effective_cache_size to my OS disk cache + shared_buffers.
 
That seems reasonable, and is what has worked well for me.
 
> In this case, I have 4 quad-core processors with 512K cache (8G)
> and my shared_buffers is 7680M.  Therefore my effective_cache_size
> should be approximately 16G?
 
I didn't follow that at all.  Can you run `free` or `vmstat`?  If
so, go by what those say your cache size is.
 
> Most of our other etl processes are running fine, however I'm
> curious if I could see a significant performance boost by reducing
> the effective_cache_size.
 
Since it is an optimizer costing parameter and has no affect on
memory allocation, you can set it on a connection and run a query on
that connection to test the impact.  Why wonder about it when you
can easily test it?
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Oracle v. Postgres 9.0 query performance

2011-06-08 Thread Tony Capobianco
My current setting is 22G.  According to some documentation, I want to
set effective_cache_size to my OS disk cache + shared_buffers.  In this
case, I have 4 quad-core processors with 512K cache (8G) and my
shared_buffers is 7680M.  Therefore my effective_cache_size should be
approximately 16G?  Most of our other etl processes are running fine,
however I'm curious if I could see a significant performance boost by
reducing the effective_cache_size.


On Wed, 2011-06-08 at 13:03 -0400, Tom Lane wrote:
> Tony Capobianco  writes:
> > Well, this ran much better.  However, I'm not sure if it's because of
> > set enable_nestloop = 0, or because I'm executing the query twice in a
> > row, where previous results may be cached.  I will try this setting in
> > my code for when this process runs later today and see what the result
> > is.
> 
> If the performance differential holds up, you should look at adjusting
> your cost parameters so that the planner isn't so wrong about which one
> is faster.  Hacking enable_nestloop is a band-aid, not something you
> want to use in production.
> 
> Looking at the values you gave earlier, I wonder whether the
> effective_cache_size setting isn't unreasonably high.  That's reducing
> the estimated cost of accessing the large table via indexscans, and
> I'm thinking it reduced it too much.
> 
>   regards, tom lane
> 



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Oracle v. Postgres 9.0 query performance

2011-06-08 Thread Pavel Stehule
2011/6/8 Tony Capobianco :
> pg_dw=# show random_page_cost ;
>  random_page_cost
> --
>  4
> (1 row)
>
> Time: 0.299 ms
> pg_dw=# show seq_page_cost ;
>  seq_page_cost
> ---
>  1
> (1 row)
>
> Time: 0.250 ms
> pg_dw=# show work_mem ;
>  work_mem
> --
>  768MB
> (1 row)
>
>

it is ok.

Pavel

>
>
> On Wed, 2011-06-08 at 18:27 +0200, Pavel Stehule wrote:
>> Hello
>>
>> what is your settings for
>>
>> random_page_cost, seq_page_cost and work_mem?
>>
>> Regards
>>
>> Pavel Stehule
>>
>> 2011/6/8 Tony Capobianco :
>> > Here's the explain analyze:
>> >
>> > pg_dw=# explain analyze CREATE TABLE ecr_opens with (FILLFACTOR=100)
>> > as
>> > select o.emailcampaignid, count(memberid) opencnt
>> >  from openactivity o,ecr_sents s
>> >  where s.emailcampaignid = o.emailcampaignid
>> >  group by o.emailcampaignid;
>> >
>> > QUERY
>> > PLAN
>> > 
>> >  GroupAggregate  (cost=0.00..1788988.05 rows=9939 width=12) (actual
>> > time=308630.967..2592279.526 rows=472 loops=1)
>> >   ->  Nested Loop  (cost=0.00..1742467.24 rows=9279316 width=12)
>> > (actual time=31.489..2589363.047 rows=8586466 loops=1)
>> >         ->  Index Scan using ecr_sents_ecid_idx on ecr_sents s
>> > (cost=0.00..38.59 rows=479 width=4) (actual time=0.010..13.326 rows=479
>> > loops=1)
>> >         ->  Index Scan using openact_emcamp_idx on openactivity o
>> > (cost=0.00..3395.49 rows=19372 width=12) (actual time=1.336..5397.139
>> > rows=17926 loops=479)
>> >               Index Cond: (o.emailcampaignid = s.emailcampaignid)
>> >  Total runtime: 2592284.336 ms
>> >
>> >
>> > On Wed, 2011-06-08 at 17:31 +0200, t...@fuzzy.cz wrote:
>> >> > On Postgres, this same query takes about 58 minutes (could not run
>> >> > explain analyze because it is in progress):
>> >> >
>> >> > pg_dw=# explain CREATE TABLE ecr_opens with (FILLFACTOR=100)
>> >> > pg_dw-# as
>> >> > pg_dw-# select o.emailcampaignid, count(memberid) opencnt
>> >> > pg_dw-#   from openactivity o,ecr_sents s
>> >> > pg_dw-#  where s.emailcampaignid = o.emailcampaignid
>> >> > pg_dw-#  group by o.emailcampaignid;
>> >> >                                                  QUERY
>> >> > PLAN
>> >> > -
>> >> >  GroupAggregate  (cost=0.00..1788988.05 rows=9939 width=12)
>> >> >    ->  Nested Loop  (cost=0.00..1742467.24 rows=9279316 width=12)
>> >> >          ->  Index Scan using ecr_sents_ecid_idx on ecr_sents s
>> >> > (cost=0.00..38.59 rows=479 width=4)
>> >> >          ->  Index Scan using openact_emcamp_idx on openactivity o
>> >> > (cost=0.00..3395.49 rows=19372 width=12)
>> >> >                Index Cond: (o.emailcampaignid = s.emailcampaignid)
>> >> > (5 rows)
>> >> >
>> >>
>> >> Please, post EXPLAIN ANALYZE, not just EXPLAIN. Preferably using
>> >> explain.depesz.com.
>> >>
>> >> regards
>> >> Tomas
>> >>
>> >>
>> >
>> >
>> >
>> > --
>> > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
>> > To make changes to your subscription:
>> > http://www.postgresql.org/mailpref/pgsql-performance
>> >
>>
>
>
>

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Oracle v. Postgres 9.0 query performance

2011-06-08 Thread Tom Lane
Tony Capobianco  writes:
> Well, this ran much better.  However, I'm not sure if it's because of
> set enable_nestloop = 0, or because I'm executing the query twice in a
> row, where previous results may be cached.  I will try this setting in
> my code for when this process runs later today and see what the result
> is.

If the performance differential holds up, you should look at adjusting
your cost parameters so that the planner isn't so wrong about which one
is faster.  Hacking enable_nestloop is a band-aid, not something you
want to use in production.

Looking at the values you gave earlier, I wonder whether the
effective_cache_size setting isn't unreasonably high.  That's reducing
the estimated cost of accessing the large table via indexscans, and
I'm thinking it reduced it too much.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Oracle v. Postgres 9.0 query performance

2011-06-08 Thread Tony Capobianco
pg_dw=# show random_page_cost ;
 random_page_cost 
--
 4
(1 row)

Time: 0.299 ms
pg_dw=# show seq_page_cost ;
 seq_page_cost 
---
 1
(1 row)

Time: 0.250 ms
pg_dw=# show work_mem ;
 work_mem 
--
 768MB
(1 row)




On Wed, 2011-06-08 at 18:27 +0200, Pavel Stehule wrote:
> Hello
> 
> what is your settings for
> 
> random_page_cost, seq_page_cost and work_mem?
> 
> Regards
> 
> Pavel Stehule
> 
> 2011/6/8 Tony Capobianco :
> > Here's the explain analyze:
> >
> > pg_dw=# explain analyze CREATE TABLE ecr_opens with (FILLFACTOR=100)
> > as
> > select o.emailcampaignid, count(memberid) opencnt
> >  from openactivity o,ecr_sents s
> >  where s.emailcampaignid = o.emailcampaignid
> >  group by o.emailcampaignid;
> >
> > QUERY
> > PLAN
> > 
> >  GroupAggregate  (cost=0.00..1788988.05 rows=9939 width=12) (actual
> > time=308630.967..2592279.526 rows=472 loops=1)
> >   ->  Nested Loop  (cost=0.00..1742467.24 rows=9279316 width=12)
> > (actual time=31.489..2589363.047 rows=8586466 loops=1)
> > ->  Index Scan using ecr_sents_ecid_idx on ecr_sents s
> > (cost=0.00..38.59 rows=479 width=4) (actual time=0.010..13.326 rows=479
> > loops=1)
> > ->  Index Scan using openact_emcamp_idx on openactivity o
> > (cost=0.00..3395.49 rows=19372 width=12) (actual time=1.336..5397.139
> > rows=17926 loops=479)
> >   Index Cond: (o.emailcampaignid = s.emailcampaignid)
> >  Total runtime: 2592284.336 ms
> >
> >
> > On Wed, 2011-06-08 at 17:31 +0200, t...@fuzzy.cz wrote:
> >> > On Postgres, this same query takes about 58 minutes (could not run
> >> > explain analyze because it is in progress):
> >> >
> >> > pg_dw=# explain CREATE TABLE ecr_opens with (FILLFACTOR=100)
> >> > pg_dw-# as
> >> > pg_dw-# select o.emailcampaignid, count(memberid) opencnt
> >> > pg_dw-#   from openactivity o,ecr_sents s
> >> > pg_dw-#  where s.emailcampaignid = o.emailcampaignid
> >> > pg_dw-#  group by o.emailcampaignid;
> >> >  QUERY
> >> > PLAN
> >> > -
> >> >  GroupAggregate  (cost=0.00..1788988.05 rows=9939 width=12)
> >> >->  Nested Loop  (cost=0.00..1742467.24 rows=9279316 width=12)
> >> >  ->  Index Scan using ecr_sents_ecid_idx on ecr_sents s
> >> > (cost=0.00..38.59 rows=479 width=4)
> >> >  ->  Index Scan using openact_emcamp_idx on openactivity o
> >> > (cost=0.00..3395.49 rows=19372 width=12)
> >> >Index Cond: (o.emailcampaignid = s.emailcampaignid)
> >> > (5 rows)
> >> >
> >>
> >> Please, post EXPLAIN ANALYZE, not just EXPLAIN. Preferably using
> >> explain.depesz.com.
> >>
> >> regards
> >> Tomas
> >>
> >>
> >
> >
> >
> > --
> > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> > To make changes to your subscription:
> > http://www.postgresql.org/mailpref/pgsql-performance
> >
> 



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Oracle v. Postgres 9.0 query performance

2011-06-08 Thread Tony Capobianco
Well, this ran much better.  However, I'm not sure if it's because of
set enable_nestloop = 0, or because I'm executing the query twice in a
row, where previous results may be cached.  I will try this setting in
my code for when this process runs later today and see what the result
is.

Thanks!

pg_dw=# explain analyze CREATE TABLE ecr_opens with (FILLFACTOR=100)
pg_dw-# as
pg_dw-# select o.emailcampaignid, count(memberid) opencnt
pg_dw-#   from openactivity o,ecr_sents s
pg_dw-#  where s.emailcampaignid = o.emailcampaignid
pg_dw-#  group by o.emailcampaignid;

   QUERY
PLAN   

 HashAggregate  (cost=4391163.81..4391288.05 rows=9939 width=12) (actual
time=167254.751..167254.937 rows=472 loops=1)
   ->  Hash Join  (cost=14.78..4344767.23 rows=9279316 width=12) (actual
time=0.300..164577.131 rows=8586466 loops=1)
 Hash Cond: (o.emailcampaignid = s.emailcampaignid)
 ->  Seq Scan on openactivity o  (cost=0.00..3529930.67
rows=192540967 width=12) (actual time=0.011..124351.878 rows=192542480
loops=1)
 ->  Hash  (cost=8.79..8.79 rows=479 width=4) (actual
time=0.253..0.253 rows=479 loops=1)
   Buckets: 1024  Batches: 1  Memory Usage: 17kB
   ->  Seq Scan on ecr_sents s  (cost=0.00..8.79 rows=479
width=4) (actual time=0.010..0.121 rows=479 loops=1)
 Total runtime: 167279.950 ms



On Wed, 2011-06-08 at 11:51 -0400, Stephen Frost wrote:
> * Tony Capobianco (tcapobia...@prospectiv.com) wrote:
> >  HashAggregate  (cost=4391163.81..4391288.05 rows=9939 width=12)
> >->  Hash Join  (cost=14.78..4344767.23 rows=9279316 width=12)
> >  Hash Cond: (o.emailcampaignid = s.emailcampaignid)
> >  ->  Seq Scan on openactivity o  (cost=0.00..3529930.67
> > rows=192540967 width=12)
> >  ->  Hash  (cost=8.79..8.79 rows=479 width=4)
> >->  Seq Scan on ecr_sents s  (cost=0.00..8.79 rows=479
> > width=4)
> > 
> > Yikes.  Two sequential scans.
> 
> Err, isn't that more-or-less exactly what you want here?  The smaller
> table is going to be hashed and then you'll traverse the bigger table
> and bounce each row off the hash table.  Have you tried actually running
> this and seeing how long it takes?  The bigger table doesn't look to be
> *that* big, if your i/o subsystem is decent and you've got a lot of
> memory available for kernel cacheing, should be quick.
> 
>   Thanks,
> 
>   Stephen



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Oracle v. Postgres 9.0 query performance

2011-06-08 Thread Pavel Stehule
Hello

what is your settings for

random_page_cost, seq_page_cost and work_mem?

Regards

Pavel Stehule

2011/6/8 Tony Capobianco :
> Here's the explain analyze:
>
> pg_dw=# explain analyze CREATE TABLE ecr_opens with (FILLFACTOR=100)
> as
> select o.emailcampaignid, count(memberid) opencnt
>  from openactivity o,ecr_sents s
>  where s.emailcampaignid = o.emailcampaignid
>  group by o.emailcampaignid;
>
> QUERY
> PLAN
> 
>  GroupAggregate  (cost=0.00..1788988.05 rows=9939 width=12) (actual
> time=308630.967..2592279.526 rows=472 loops=1)
>   ->  Nested Loop  (cost=0.00..1742467.24 rows=9279316 width=12)
> (actual time=31.489..2589363.047 rows=8586466 loops=1)
>         ->  Index Scan using ecr_sents_ecid_idx on ecr_sents s
> (cost=0.00..38.59 rows=479 width=4) (actual time=0.010..13.326 rows=479
> loops=1)
>         ->  Index Scan using openact_emcamp_idx on openactivity o
> (cost=0.00..3395.49 rows=19372 width=12) (actual time=1.336..5397.139
> rows=17926 loops=479)
>               Index Cond: (o.emailcampaignid = s.emailcampaignid)
>  Total runtime: 2592284.336 ms
>
>
> On Wed, 2011-06-08 at 17:31 +0200, t...@fuzzy.cz wrote:
>> > On Postgres, this same query takes about 58 minutes (could not run
>> > explain analyze because it is in progress):
>> >
>> > pg_dw=# explain CREATE TABLE ecr_opens with (FILLFACTOR=100)
>> > pg_dw-# as
>> > pg_dw-# select o.emailcampaignid, count(memberid) opencnt
>> > pg_dw-#   from openactivity o,ecr_sents s
>> > pg_dw-#  where s.emailcampaignid = o.emailcampaignid
>> > pg_dw-#  group by o.emailcampaignid;
>> >                                                  QUERY
>> > PLAN
>> > -
>> >  GroupAggregate  (cost=0.00..1788988.05 rows=9939 width=12)
>> >    ->  Nested Loop  (cost=0.00..1742467.24 rows=9279316 width=12)
>> >          ->  Index Scan using ecr_sents_ecid_idx on ecr_sents s
>> > (cost=0.00..38.59 rows=479 width=4)
>> >          ->  Index Scan using openact_emcamp_idx on openactivity o
>> > (cost=0.00..3395.49 rows=19372 width=12)
>> >                Index Cond: (o.emailcampaignid = s.emailcampaignid)
>> > (5 rows)
>> >
>>
>> Please, post EXPLAIN ANALYZE, not just EXPLAIN. Preferably using
>> explain.depesz.com.
>>
>> regards
>> Tomas
>>
>>
>
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Oracle v. Postgres 9.0 query performance

2011-06-08 Thread Tony Capobianco
Here's the explain analyze:

pg_dw=# explain analyze CREATE TABLE ecr_opens with (FILLFACTOR=100)
as
select o.emailcampaignid, count(memberid) opencnt
  from openactivity o,ecr_sents s
 where s.emailcampaignid = o.emailcampaignid
 group by o.emailcampaignid;

QUERY
PLAN   

 GroupAggregate  (cost=0.00..1788988.05 rows=9939 width=12) (actual
time=308630.967..2592279.526 rows=472 loops=1)
   ->  Nested Loop  (cost=0.00..1742467.24 rows=9279316 width=12)
(actual time=31.489..2589363.047 rows=8586466 loops=1)
 ->  Index Scan using ecr_sents_ecid_idx on ecr_sents s
(cost=0.00..38.59 rows=479 width=4) (actual time=0.010..13.326 rows=479
loops=1)
 ->  Index Scan using openact_emcamp_idx on openactivity o
(cost=0.00..3395.49 rows=19372 width=12) (actual time=1.336..5397.139
rows=17926 loops=479)
   Index Cond: (o.emailcampaignid = s.emailcampaignid)
 Total runtime: 2592284.336 ms


On Wed, 2011-06-08 at 17:31 +0200, t...@fuzzy.cz wrote:
> > On Postgres, this same query takes about 58 minutes (could not run
> > explain analyze because it is in progress):
> >
> > pg_dw=# explain CREATE TABLE ecr_opens with (FILLFACTOR=100)
> > pg_dw-# as
> > pg_dw-# select o.emailcampaignid, count(memberid) opencnt
> > pg_dw-#   from openactivity o,ecr_sents s
> > pg_dw-#  where s.emailcampaignid = o.emailcampaignid
> > pg_dw-#  group by o.emailcampaignid;
> >  QUERY
> > PLAN
> > -
> >  GroupAggregate  (cost=0.00..1788988.05 rows=9939 width=12)
> >->  Nested Loop  (cost=0.00..1742467.24 rows=9279316 width=12)
> >  ->  Index Scan using ecr_sents_ecid_idx on ecr_sents s
> > (cost=0.00..38.59 rows=479 width=4)
> >  ->  Index Scan using openact_emcamp_idx on openactivity o
> > (cost=0.00..3395.49 rows=19372 width=12)
> >Index Cond: (o.emailcampaignid = s.emailcampaignid)
> > (5 rows)
> >
> 
> Please, post EXPLAIN ANALYZE, not just EXPLAIN. Preferably using
> explain.depesz.com.
> 
> regards
> Tomas
> 
> 



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Oracle v. Postgres 9.0 query performance

2011-06-08 Thread Vitalii Tymchyshyn

08.06.11 18:40, Tony Capobianco написав(ла):

pg_dw=# set enable_nestloop =0;
SET
Time: 0.165 ms
pg_dw=# explain CREATE TABLE ecr_opens with (FILLFACTOR=100)
pg_dw-# as
pg_dw-# select o.emailcampaignid, count(memberid) opencnt
pg_dw-#   from openactivity o,ecr_sents s
pg_dw-#  where s.emailcampaignid = o.emailcampaignid
pg_dw-#  group by o.emailcampaignid;
QUERY
PLAN
-
  HashAggregate  (cost=4391163.81..4391288.05 rows=9939 width=12)
->   Hash Join  (cost=14.78..4344767.23 rows=9279316 width=12)
  Hash Cond: (o.emailcampaignid = s.emailcampaignid)
  ->   Seq Scan on openactivity o  (cost=0.00..3529930.67
rows=192540967 width=12)
  ->   Hash  (cost=8.79..8.79 rows=479 width=4)
->   Seq Scan on ecr_sents s  (cost=0.00..8.79 rows=479
width=4)

Yikes.  Two sequential scans.


Yep. Can you see another options? Either you take each of 479 records 
and try to find matching records in another table using index (first 
plan), or you take both two tables fully (seq scan) and join - second plan.
First plan is better if your large table is clustered enough on 
emailcampaignid field (479 index reads and 479 sequential table reads). 
If it's not, you may get a 479 table reads transformed into a lot or 
random reads.
BTW: May be you have different data clustering in PostgreSQL & Oracle? 
Or data in Oracle may be "hot" in caches?
Also, sequential scan is not too bad thing. It may be cheap enough to 
read millions of records if they are not too wide. Please show "select 
pg_size_pretty(pg_relation_size('openactivity'));" Have you tried to 
explain analyze second plan?


Best regards, Vitalii Tymchyshyn




On Wed, 2011-06-08 at 11:33 -0400, Tom Lane wrote:

Tony Capobianco  writes:

pg_dw=# explain CREATE TABLE ecr_opens with (FILLFACTOR=100)
pg_dw-# as
pg_dw-# select o.emailcampaignid, count(memberid) opencnt
pg_dw-#   from openactivity o,ecr_sents s
pg_dw-#  where s.emailcampaignid = o.emailcampaignid
pg_dw-#  group by o.emailcampaignid;
  QUERY
PLAN
-
  GroupAggregate  (cost=0.00..1788988.05 rows=9939 width=12)
->   Nested Loop  (cost=0.00..1742467.24 rows=9279316 width=12)
  ->   Index Scan using ecr_sents_ecid_idx on ecr_sents s
(cost=0.00..38.59 rows=479 width=4)
  ->   Index Scan using openact_emcamp_idx on openactivity o
(cost=0.00..3395.49 rows=19372 width=12)
Index Cond: (o.emailcampaignid = s.emailcampaignid)
(5 rows)
Should this query be hashing the smaller table on Postgres rather than
using nested loops?

Yeah, seems like it.  Just for testing purposes, do "set enable_nestloop
= 0" and see what plan you get then.



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Oracle v. Postgres 9.0 query performance

2011-06-08 Thread Stephen Frost
* Tony Capobianco (tcapobia...@prospectiv.com) wrote:
>  HashAggregate  (cost=4391163.81..4391288.05 rows=9939 width=12)
>->  Hash Join  (cost=14.78..4344767.23 rows=9279316 width=12)
>  Hash Cond: (o.emailcampaignid = s.emailcampaignid)
>  ->  Seq Scan on openactivity o  (cost=0.00..3529930.67
> rows=192540967 width=12)
>  ->  Hash  (cost=8.79..8.79 rows=479 width=4)
>->  Seq Scan on ecr_sents s  (cost=0.00..8.79 rows=479
> width=4)
> 
> Yikes.  Two sequential scans.

Err, isn't that more-or-less exactly what you want here?  The smaller
table is going to be hashed and then you'll traverse the bigger table
and bounce each row off the hash table.  Have you tried actually running
this and seeing how long it takes?  The bigger table doesn't look to be
*that* big, if your i/o subsystem is decent and you've got a lot of
memory available for kernel cacheing, should be quick.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [PERFORM] Oracle v. Postgres 9.0 query performance

2011-06-08 Thread Tony Capobianco
pg_dw=# set enable_nestloop =0;
SET
Time: 0.165 ms
pg_dw=# explain CREATE TABLE ecr_opens with (FILLFACTOR=100)
pg_dw-# as
pg_dw-# select o.emailcampaignid, count(memberid) opencnt
pg_dw-#   from openactivity o,ecr_sents s
pg_dw-#  where s.emailcampaignid = o.emailcampaignid
pg_dw-#  group by o.emailcampaignid;
   QUERY
PLAN
-
 HashAggregate  (cost=4391163.81..4391288.05 rows=9939 width=12)
   ->  Hash Join  (cost=14.78..4344767.23 rows=9279316 width=12)
 Hash Cond: (o.emailcampaignid = s.emailcampaignid)
 ->  Seq Scan on openactivity o  (cost=0.00..3529930.67
rows=192540967 width=12)
 ->  Hash  (cost=8.79..8.79 rows=479 width=4)
   ->  Seq Scan on ecr_sents s  (cost=0.00..8.79 rows=479
width=4)

Yikes.  Two sequential scans.


On Wed, 2011-06-08 at 11:33 -0400, Tom Lane wrote:
> Tony Capobianco  writes:
> > pg_dw=# explain CREATE TABLE ecr_opens with (FILLFACTOR=100)
> > pg_dw-# as
> > pg_dw-# select o.emailcampaignid, count(memberid) opencnt
> > pg_dw-#   from openactivity o,ecr_sents s
> > pg_dw-#  where s.emailcampaignid = o.emailcampaignid
> > pg_dw-#  group by o.emailcampaignid;
> >  QUERY
> > PLAN  
> > -
> >  GroupAggregate  (cost=0.00..1788988.05 rows=9939 width=12)
> >->  Nested Loop  (cost=0.00..1742467.24 rows=9279316 width=12)
> >  ->  Index Scan using ecr_sents_ecid_idx on ecr_sents s
> > (cost=0.00..38.59 rows=479 width=4)
> >  ->  Index Scan using openact_emcamp_idx on openactivity o
> > (cost=0.00..3395.49 rows=19372 width=12)
> >Index Cond: (o.emailcampaignid = s.emailcampaignid)
> > (5 rows)
> 
> > Should this query be hashing the smaller table on Postgres rather than
> > using nested loops?
> 
> Yeah, seems like it.  Just for testing purposes, do "set enable_nestloop
> = 0" and see what plan you get then.
> 
>   regards, tom lane
> 



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Oracle v. Postgres 9.0 query performance

2011-06-08 Thread Tom Lane
Tony Capobianco  writes:
> pg_dw=# explain CREATE TABLE ecr_opens with (FILLFACTOR=100)
> pg_dw-# as
> pg_dw-# select o.emailcampaignid, count(memberid) opencnt
> pg_dw-#   from openactivity o,ecr_sents s
> pg_dw-#  where s.emailcampaignid = o.emailcampaignid
> pg_dw-#  group by o.emailcampaignid;
>  QUERY
> PLAN  
> -
>  GroupAggregate  (cost=0.00..1788988.05 rows=9939 width=12)
>->  Nested Loop  (cost=0.00..1742467.24 rows=9279316 width=12)
>  ->  Index Scan using ecr_sents_ecid_idx on ecr_sents s
> (cost=0.00..38.59 rows=479 width=4)
>  ->  Index Scan using openact_emcamp_idx on openactivity o
> (cost=0.00..3395.49 rows=19372 width=12)
>Index Cond: (o.emailcampaignid = s.emailcampaignid)
> (5 rows)

> Should this query be hashing the smaller table on Postgres rather than
> using nested loops?

Yeah, seems like it.  Just for testing purposes, do "set enable_nestloop
= 0" and see what plan you get then.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Oracle v. Postgres 9.0 query performance

2011-06-08 Thread tv
> On Postgres, this same query takes about 58 minutes (could not run
> explain analyze because it is in progress):
>
> pg_dw=# explain CREATE TABLE ecr_opens with (FILLFACTOR=100)
> pg_dw-# as
> pg_dw-# select o.emailcampaignid, count(memberid) opencnt
> pg_dw-#   from openactivity o,ecr_sents s
> pg_dw-#  where s.emailcampaignid = o.emailcampaignid
> pg_dw-#  group by o.emailcampaignid;
>  QUERY
> PLAN
> -
>  GroupAggregate  (cost=0.00..1788988.05 rows=9939 width=12)
>->  Nested Loop  (cost=0.00..1742467.24 rows=9279316 width=12)
>  ->  Index Scan using ecr_sents_ecid_idx on ecr_sents s
> (cost=0.00..38.59 rows=479 width=4)
>  ->  Index Scan using openact_emcamp_idx on openactivity o
> (cost=0.00..3395.49 rows=19372 width=12)
>Index Cond: (o.emailcampaignid = s.emailcampaignid)
> (5 rows)
>

Please, post EXPLAIN ANALYZE, not just EXPLAIN. Preferably using
explain.depesz.com.

regards
Tomas


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Oracle v. Postgres 9.0 query performance

2011-06-08 Thread Tony Capobianco
We are thiis close to moving our datawarehouse from Oracle to
Postgres.  This query is identical on both systems, but runs much, much
faster on Oracle.  Our Postgres host has far superior hardware and
tuning parameters have been set via pgtune.  Most everything else runs
faster in Postgres, except for this query.  In Oracle, we get a hash
join that takes about 2 minutes:

SQL> set line 200
delete from plan_table;
explain plan for
CREATE TABLE ecr_opens
as
select o.emailcampaignid, count(memberid) opencnt
  from openactivity o,ecr_sents s
 where s.emailcampaignid = o.emailcampaignid
 group by o.emailcampaignid;

SELECT PLAN_TABLE_OUTPUT FROM TABLE(DBMS_XPLAN.DISPLAY());
SQL> 
13 rows deleted.

SQL>   234567  
Explained.

SQL> SQL> 
PLAN_TABLE_OUTPUT

Plan hash value: 4034426201

-
| Id  | Operation | Name   | Rows  |
Bytes | Cost (%CPU)| Time |TQ  |IN-OUT| PQ Distrib |
-
|   0 | CREATE TABLE STATEMENT||  5094 |
91692 |  9651  (24)| 00:02:16 ||  ||
|   1 |  LOAD AS SELECT   | ECR_OPENS  |   |
||  ||  ||
|   2 |   PX COORDINATOR  ||   |
||  ||  ||
|   3 |PX SEND QC (RANDOM)| :TQ10002   |  5094 |
91692 |  2263 (100)| 00:00:32 |  Q1,02 | P->S | QC (RAND)  |
|   4 | HASH GROUP BY ||  5094 |
91692 |  2263 (100)| 00:00:32 |  Q1,02 | PCWP ||
|   5 |  PX RECEIVE   ||  5094 |
91692 |  2263 (100)| 00:00:32 |  Q1,02 | PCWP ||

PLAN_TABLE_OUTPUT

|   6 |   PX SEND HASH| :TQ10001   |  5094 |
91692 |  2263 (100)| 00:00:32 |  Q1,01 | P->P | HASH   |
|   7 |HASH GROUP BY  ||  5094 |
91692 |  2263 (100)| 00:00:32 |  Q1,01 | PCWP ||
|   8 | NESTED LOOPS  ||17M|
297M|   200  (98)| 00:00:03 |  Q1,01 | PCWP ||
|   9 |  BUFFER SORT  ||   |
||  |  Q1,01 | PCWC ||
|  10 |   PX RECEIVE  ||   |
||  |  Q1,01 | PCWP ||
|  11 |PX SEND ROUND-ROBIN| :TQ1   |   |
||  || S->P | RND-ROBIN  |
|  12 | TABLE ACCESS FULL | ECR_SENTS  |   476 |
6188 | 3   (0)| 00:00:01 ||  ||
|* 13 |  INDEX RANGE SCAN | OPENACT_EMCAMP_IDX | 36355 |
177K| 1   (0)| 00:00:01 |  Q1,01 | PCWP ||
-

Predicate Information (identified by operation id):

PLAN_TABLE_OUTPUT

---

  13 - access("S"."EMAILCAMPAIGNID"="O"."EMAILCAMPAIGNID")

Note
-
   - dynamic sampling used for this statement

29 rows selected.

SQL> desc openactivity
 Name  Null?Type
 - 

 EMAILCAMPAIGNID   NOT NULL NUMBER
 MEMBERID  NOT NULL NUMBER
 OPENDATE   DATE
 IPADDRESS  VARCHAR2(25)
 DATE_IDNUMBER

SQL> select count(*) from openactivity;

  COUNT(*)
--
 192542480

SQL> desc ecr_sents
 Name  Null?Type
 - 

 EMAILCAMPAIGNIDNUMBER
 MEMCNT NUMBER
 DATE_IDNUMBER
 SENTDATE   DATE


SQL> select count(*) from ecr_sents;

  COUNT(*)
--
   4