Re: [HACKERS] Possible regression with gather merge.

2017-03-22 Thread Robert Haas
On Wed, Mar 22, 2017 at 3:39 AM, Rushabh Lathia
 wrote:
> Looking at the explain analyze output of both the plan, its clear that GM
> taking longer as its using external merge dist for the sort, where as
> another plan perform top-N heapsort. For normal sort path, it can consider
> the limit as bound, but for GM its not possible.

Right, good catch.  Committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Possible regression with gather merge.

2017-03-22 Thread Mithun Cy
On Wed, Mar 22, 2017 at 1:09 PM, Rushabh Lathia
 wrote:
> In the code, gather merge consider the limit for the sort into
> create_ordered_paths,
> which is wrong. Into create_ordered_paths(), GM should not consider the
> limit
> while doing costing for the sort node.
>
> Attached patch fix the bug.
Thanks, Rushabh, Now I see non-parallel scan being picked up by default

create table test as (select id, (random()*1)::int as v1, random() as
v2 from generate_series(1,1000) id);

postgres=# explain analyze select * from test order by v1, v2 limit 10;
  QUERY PLAN
---
 Limit  (cost=370146.98..370147.00 rows=10 width=16) (actual
time=1169.685..1169.686 rows=10 loops=1)
   ->  Sort  (cost=370146.98..395146.63 rows=860 width=16) (actual
time=1169.683..1169.684 rows=10 loops=1)
 Sort Key: v1, v2
 Sort Method: top-N heapsort  Memory: 25kB
 ->  Seq Scan on test  (cost=0.00..154053.60 rows=860
width=16) (actual time=0.016..590.176 rows=1000 loops=1)
 Planning time: 0.070 ms
 Execution time: 1169.706 ms
(7 rows)


Another find by accident. Setting higher
max_parallel_workers_per_gather to a higher value than default results
in more parallel workers, But query runs slower than the plan with
lesser workers.

postgres=# set max_parallel_workers_per_gather = default;
SET
postgres=# explain analyze select * from test order by v1, v2 limit 1000;
 QUERY
PLAN

 Limit  (cost=697263.86..1669540.28 rows=8333216 width=16) (actual
time=2212.437..8207.161 rows=1000 loops=1)
   ->  Gather Merge  (cost=697263.86..1669540.28 rows=8333216
width=16) (actual time=2212.436..7600.478 rows=1000 loops=1)
 Workers Planned: 2
 Workers Launched: 2
 ->  Sort  (cost=696263.84..706680.36 rows=4166608 width=16)
(actual time=2173.756..3105.512 rows=333 loops=3)
   Sort Key: v1, v2
   Sort Method: external merge  Disk: 86648kB
   ->  Parallel Seq Scan on test  (cost=0.00..95721.08
rows=4166608 width=16) (actual time=0.030..240.486 rows=333
loops=3)
 Planning time: 0.096 ms
 Execution time: 8537.214 ms
(10 rows)

postgres=# set max_parallel_workers_per_gather = 10;
SET
postgres=# explain analyze select * from test order by v1, v2 limit 1000;
 QUERY
PLAN

 Limit  (cost=431168.44..1628498.09 rows=860 width=16) (actual
time=1525.120..13273.805 rows=1000 loops=1)
   ->  Gather Merge  (cost=431168.44..1628498.09 rows=860
width=16) (actual time=1525.119..12650.621 rows=1000 loops=1)
 Workers Planned: 4
 Workers Launched: 4
 ->  Sort  (cost=430168.39..436418.30 rows=2499965 width=16)
(actual time=1472.799..2133.571 rows=200 loops=5)
   Sort Key: v1, v2
   Sort Method: external merge  Disk: 50336kB
   ->  Parallel Seq Scan on test  (cost=0.00..79054.65
rows=2499965 width=16) (actual time=0.047..201.405 rows=200
loops=5)
 Planning time: 0.077 ms
 Execution time: 13622.319 ms
(10 rows)



-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Possible regression with gather merge.

2017-03-22 Thread Rushabh Lathia
Hi,


postgres=# explain analyze select * from test order by v1, v2 limit 10;
   QUERY
PLAN

 Limit  (cost=19576.71..19577.88 rows=10 width=16) (actual
time=408.842..408.853 rows=10 loops=1)
   ->  Gather Merge  (cost=19576.71..116805.80 rows=84 width=16)
(actual time=408.841..408.850 rows=10 loops=1)
 Workers Planned: 2
 Workers Launched: 2
 ->  Sort  (cost=18576.69..19618.36 rows=416667 width=16) (actual
time=393.602..394.080 rows=911 loops=3)
   Sort Key: v1, v2
   Sort Method: external merge  Disk: 8128kB
   ->  Parallel Seq Scan on test  (cost=0.00..9572.67
rows=416667 width=16) (actual time=0.053..46.238 rows=33 loops=3)
 Planning time: 0.118 ms
 Execution time: 414.763 ms
(10 rows)

postgres=# set enable_gathermerge = off;
SET
postgres=# explain analyze select * from test order by v1, v2 limit 10;
 QUERY
PLAN
-
 Limit  (cost=37015.64..37015.67 rows=10 width=16) (actual
time=268.859..268.861 rows=10 loops=1)
   ->  Sort  (cost=37015.64..39515.64 rows=100 width=16) (actual
time=268.858..268.859 rows=10 loops=1)
 Sort Key: v1, v2
 Sort Method: top-N heapsort  Memory: 25kB
 ->  Seq Scan on test  (cost=0.00..15406.00 rows=100 width=16)
(actual time=0.085..107.841 rows=100 loops=1)
 Planning time: 0.163 ms
 Execution time: 268.897 ms
(7 rows)

Looking at the explain analyze output of both the plan, its clear that GM
taking longer as its using external merge dist for the sort, where as
another plan perform top-N heapsort. For normal sort path, it can consider
the limit as bound, but for GM its not possible.

In the code, gather merge consider the limit for the sort into
create_ordered_paths,
which is wrong. Into create_ordered_paths(), GM should not consider the
limit
while doing costing for the sort node.

Attached patch fix the bug.


On Wed, Mar 22, 2017 at 12:05 PM, Rushabh Lathia 
wrote:

> Thanks for reporting, I am looking into this.
>
> On Wed, Mar 22, 2017 at 11:51 AM, Mithun Cy 
> wrote:
>
>> Adding more rows to table make gather merge execution time very slow
>> when compared to non-parallel plan we get after disabling gather
>> merge.
>>
>> create table test as (select id, (random()*1)::int as v1, random() as
>> v2 from generate_series(1,1) id);
>>
>> postgres=# set max_parallel_workers_per_gather = default;
>> SET
>> postgres=# explain analyze select * from test order by v1, v2 limit 10;
>>
>> QUERY PLAN
>> 
>> 
>> 
>>  Limit  (cost=1858610.53..1858611.70 rows=10 width=16) (actual
>> time=31103.880..31103.885 rows=10 loops=1)
>>->  Gather Merge  (cost=1858610.53..11581520.05 rows=8406
>> width=16) (actual time=31103.878..31103.882 rows=10 loops=1)
>>  Workers Planned: 2
>>  Workers Launched: 2
>>  ->  Sort  (cost=1857610.50..1961777.26 rows=41666703
>> width=16) (actual time=30560.865..30561.046 rows=911 loops=3)
>>Sort Key: v1, v2
>>Sort Method: external merge  Disk: 841584kB
>>->  Parallel Seq Scan on test  (cost=0.00..957208.03
>> rows=41666703 width=16) (actual time=0.050..2330.275 rows=
>> loops=3)
>>  Planning time: 0.292 ms
>>  Execution time: 31502.896 ms
>> (10 rows)
>>
>> postgres=# set max_parallel_workers_per_gather = 0;
>> SET
>> postgres=# explain analyze select * from test order by v1, v2 limit 10;
>>  QUERY
>> PLAN
>> 
>> 
>>  Limit  (cost=3701507.83..3701507.85 rows=10 width=16) (actual
>> time=13231.264..13231.266 rows=10 loops=1)
>>->  Sort  (cost=3701507.83..3951508.05 rows=10088 width=16)
>> (actual time=13231.261..13231.262 rows=10 loops=1)
>>  Sort Key: v1, v2
>>  Sort Method: top-N heapsort  Memory: 25kB
>>  ->  Seq Scan on test  (cost=0.00..1540541.88 rows=10088
>> width=16) (actual time=0.045..6759.363 rows=1 loops=1)
>>  Planning time: 0.131 ms
>>  Execution time: 13231.299 ms
>> (7 rows)
>>
>> On Wed, Mar 22, 2017 at 11:07 AM, Mithun Cy 
>> wrote:
>> > I accidently encountered a case where gather merge was picked as
>> > default but disabling same by setting max_parallel_workers_per_gather
>> > = 0; produced a non-parallel plan which was 

Re: [HACKERS] Possible regression with gather merge.

2017-03-22 Thread Rushabh Lathia
Thanks for reporting, I am looking into this.

On Wed, Mar 22, 2017 at 11:51 AM, Mithun Cy 
wrote:

> Adding more rows to table make gather merge execution time very slow
> when compared to non-parallel plan we get after disabling gather
> merge.
>
> create table test as (select id, (random()*1)::int as v1, random() as
> v2 from generate_series(1,1) id);
>
> postgres=# set max_parallel_workers_per_gather = default;
> SET
> postgres=# explain analyze select * from test order by v1, v2 limit 10;
>
> QUERY PLAN
> 
> 
> 
>  Limit  (cost=1858610.53..1858611.70 rows=10 width=16) (actual
> time=31103.880..31103.885 rows=10 loops=1)
>->  Gather Merge  (cost=1858610.53..11581520.05 rows=8406
> width=16) (actual time=31103.878..31103.882 rows=10 loops=1)
>  Workers Planned: 2
>  Workers Launched: 2
>  ->  Sort  (cost=1857610.50..1961777.26 rows=41666703
> width=16) (actual time=30560.865..30561.046 rows=911 loops=3)
>Sort Key: v1, v2
>Sort Method: external merge  Disk: 841584kB
>->  Parallel Seq Scan on test  (cost=0.00..957208.03
> rows=41666703 width=16) (actual time=0.050..2330.275 rows=
> loops=3)
>  Planning time: 0.292 ms
>  Execution time: 31502.896 ms
> (10 rows)
>
> postgres=# set max_parallel_workers_per_gather = 0;
> SET
> postgres=# explain analyze select * from test order by v1, v2 limit 10;
>  QUERY
> PLAN
> 
> 
>  Limit  (cost=3701507.83..3701507.85 rows=10 width=16) (actual
> time=13231.264..13231.266 rows=10 loops=1)
>->  Sort  (cost=3701507.83..3951508.05 rows=10088 width=16)
> (actual time=13231.261..13231.262 rows=10 loops=1)
>  Sort Key: v1, v2
>  Sort Method: top-N heapsort  Memory: 25kB
>  ->  Seq Scan on test  (cost=0.00..1540541.88 rows=10088
> width=16) (actual time=0.045..6759.363 rows=1 loops=1)
>  Planning time: 0.131 ms
>  Execution time: 13231.299 ms
> (7 rows)
>
> On Wed, Mar 22, 2017 at 11:07 AM, Mithun Cy 
> wrote:
> > I accidently encountered a case where gather merge was picked as
> > default but disabling same by setting max_parallel_workers_per_gather
> > = 0; produced a non-parallel plan which was faster than gather merge,
> > but its cost is marked too high when compared to gather merge.
> >
> > I guess we need some cost adjustment is planner code.
> >
> > Test setting
> > =
> > create table test as (select id, (random()*1)::int as v1, random() as
> > v2 from generate_series(1,100) id);
> > create index test_v1_idx on test (v1);
> >
> >
> > Server setting is default.
> >
> >
> > postgres=# explain analyze select * from test order by v1, v2 limit 10;
> >QUERY
> > PLAN
> > 
> 
> 
> >  Limit  (cost=19576.71..19577.88 rows=10 width=16) (actual
> > time=265.989..265.995 rows=10 loops=1)
> >->  Gather Merge  (cost=19576.71..116805.80 rows=84 width=16)
> > (actual time=265.987..265.992 rows=10 loops=1)
> >  Workers Planned: 2
> >  Workers Launched: 2
> >  ->  Sort  (cost=18576.69..19618.36 rows=416667 width=16)
> > (actual time=250.202..250.424 rows=911 loops=3)
> >Sort Key: v1, v2
> >Sort Method: external merge  Disk: 9272kB
> >->  Parallel Seq Scan on test  (cost=0.00..9572.67
> > rows=416667 width=16) (actual time=0.053..41.397 rows=33 loops=3)
> >  Planning time: 0.193 ms
> >  Execution time: 271.222 ms
> >
> > postgres=# set max_parallel_workers_per_gather = 0;
> > SET
> > postgres=# explain analyze select * from test order by v1, v2 limit 10;
> >  QUERY PLAN
> > 
> -
> >  Limit  (cost=37015.64..37015.67 rows=10 width=16) (actual
> > time=211.582..211.584 rows=10 loops=1)
> >->  Sort  (cost=37015.64..39515.64 rows=100 width=16) (actual
> > time=211.581..211.582 rows=10 loops=1)
> >  Sort Key: v1, v2
> >  Sort Method: top-N heapsort  Memory: 25kB
> >  ->  Seq Scan on test  (cost=0.00..15406.00 rows=100
> > width=16) (actual time=0.085..107.522 rows=100 loops=1)
> >  Planning time: 0.093 ms
> >  Execution time: 211.608 ms
> > (7 rows)
> >
> >
> >
> > --
> > Thanks and Regards
> > Mithun C Y
> > EnterpriseDB: http://www.enterprisedb.com
>
>
>
> 

Re: [HACKERS] Possible regression with gather merge.

2017-03-22 Thread Mithun Cy
Adding more rows to table make gather merge execution time very slow
when compared to non-parallel plan we get after disabling gather
merge.

create table test as (select id, (random()*1)::int as v1, random() as
v2 from generate_series(1,1) id);

postgres=# set max_parallel_workers_per_gather = default;
SET
postgres=# explain analyze select * from test order by v1, v2 limit 10;

QUERY PLAN

 Limit  (cost=1858610.53..1858611.70 rows=10 width=16) (actual
time=31103.880..31103.885 rows=10 loops=1)
   ->  Gather Merge  (cost=1858610.53..11581520.05 rows=8406
width=16) (actual time=31103.878..31103.882 rows=10 loops=1)
 Workers Planned: 2
 Workers Launched: 2
 ->  Sort  (cost=1857610.50..1961777.26 rows=41666703
width=16) (actual time=30560.865..30561.046 rows=911 loops=3)
   Sort Key: v1, v2
   Sort Method: external merge  Disk: 841584kB
   ->  Parallel Seq Scan on test  (cost=0.00..957208.03
rows=41666703 width=16) (actual time=0.050..2330.275 rows=
loops=3)
 Planning time: 0.292 ms
 Execution time: 31502.896 ms
(10 rows)

postgres=# set max_parallel_workers_per_gather = 0;
SET
postgres=# explain analyze select * from test order by v1, v2 limit 10;
 QUERY
PLAN

 Limit  (cost=3701507.83..3701507.85 rows=10 width=16) (actual
time=13231.264..13231.266 rows=10 loops=1)
   ->  Sort  (cost=3701507.83..3951508.05 rows=10088 width=16)
(actual time=13231.261..13231.262 rows=10 loops=1)
 Sort Key: v1, v2
 Sort Method: top-N heapsort  Memory: 25kB
 ->  Seq Scan on test  (cost=0.00..1540541.88 rows=10088
width=16) (actual time=0.045..6759.363 rows=1 loops=1)
 Planning time: 0.131 ms
 Execution time: 13231.299 ms
(7 rows)

On Wed, Mar 22, 2017 at 11:07 AM, Mithun Cy  wrote:
> I accidently encountered a case where gather merge was picked as
> default but disabling same by setting max_parallel_workers_per_gather
> = 0; produced a non-parallel plan which was faster than gather merge,
> but its cost is marked too high when compared to gather merge.
>
> I guess we need some cost adjustment is planner code.
>
> Test setting
> =
> create table test as (select id, (random()*1)::int as v1, random() as
> v2 from generate_series(1,100) id);
> create index test_v1_idx on test (v1);
>
>
> Server setting is default.
>
>
> postgres=# explain analyze select * from test order by v1, v2 limit 10;
>QUERY
> PLAN
> 
>  Limit  (cost=19576.71..19577.88 rows=10 width=16) (actual
> time=265.989..265.995 rows=10 loops=1)
>->  Gather Merge  (cost=19576.71..116805.80 rows=84 width=16)
> (actual time=265.987..265.992 rows=10 loops=1)
>  Workers Planned: 2
>  Workers Launched: 2
>  ->  Sort  (cost=18576.69..19618.36 rows=416667 width=16)
> (actual time=250.202..250.424 rows=911 loops=3)
>Sort Key: v1, v2
>Sort Method: external merge  Disk: 9272kB
>->  Parallel Seq Scan on test  (cost=0.00..9572.67
> rows=416667 width=16) (actual time=0.053..41.397 rows=33 loops=3)
>  Planning time: 0.193 ms
>  Execution time: 271.222 ms
>
> postgres=# set max_parallel_workers_per_gather = 0;
> SET
> postgres=# explain analyze select * from test order by v1, v2 limit 10;
>  QUERY PLAN
> -
>  Limit  (cost=37015.64..37015.67 rows=10 width=16) (actual
> time=211.582..211.584 rows=10 loops=1)
>->  Sort  (cost=37015.64..39515.64 rows=100 width=16) (actual
> time=211.581..211.582 rows=10 loops=1)
>  Sort Key: v1, v2
>  Sort Method: top-N heapsort  Memory: 25kB
>  ->  Seq Scan on test  (cost=0.00..15406.00 rows=100
> width=16) (actual time=0.085..107.522 rows=100 loops=1)
>  Planning time: 0.093 ms
>  Execution time: 211.608 ms
> (7 rows)
>
>
>
> --
> Thanks and Regards
> Mithun C Y
> EnterpriseDB: http://www.enterprisedb.com



-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Possible regression with gather merge.

2017-03-21 Thread Mithun Cy
I accidently encountered a case where gather merge was picked as
default but disabling same by setting max_parallel_workers_per_gather
= 0; produced a non-parallel plan which was faster than gather merge,
but its cost is marked too high when compared to gather merge.

I guess we need some cost adjustment is planner code.

Test setting
=
create table test as (select id, (random()*1)::int as v1, random() as
v2 from generate_series(1,100) id);
create index test_v1_idx on test (v1);


Server setting is default.


postgres=# explain analyze select * from test order by v1, v2 limit 10;
   QUERY
PLAN

 Limit  (cost=19576.71..19577.88 rows=10 width=16) (actual
time=265.989..265.995 rows=10 loops=1)
   ->  Gather Merge  (cost=19576.71..116805.80 rows=84 width=16)
(actual time=265.987..265.992 rows=10 loops=1)
 Workers Planned: 2
 Workers Launched: 2
 ->  Sort  (cost=18576.69..19618.36 rows=416667 width=16)
(actual time=250.202..250.424 rows=911 loops=3)
   Sort Key: v1, v2
   Sort Method: external merge  Disk: 9272kB
   ->  Parallel Seq Scan on test  (cost=0.00..9572.67
rows=416667 width=16) (actual time=0.053..41.397 rows=33 loops=3)
 Planning time: 0.193 ms
 Execution time: 271.222 ms

postgres=# set max_parallel_workers_per_gather = 0;
SET
postgres=# explain analyze select * from test order by v1, v2 limit 10;
 QUERY PLAN
-
 Limit  (cost=37015.64..37015.67 rows=10 width=16) (actual
time=211.582..211.584 rows=10 loops=1)
   ->  Sort  (cost=37015.64..39515.64 rows=100 width=16) (actual
time=211.581..211.582 rows=10 loops=1)
 Sort Key: v1, v2
 Sort Method: top-N heapsort  Memory: 25kB
 ->  Seq Scan on test  (cost=0.00..15406.00 rows=100
width=16) (actual time=0.085..107.522 rows=100 loops=1)
 Planning time: 0.093 ms
 Execution time: 211.608 ms
(7 rows)



-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers