Re: [HACKERS] Possible regression with gather merge.
On Wed, Mar 22, 2017 at 3:39 AM, Rushabh Lathiawrote: > Looking at the explain analyze output of both the plan, its clear that GM > taking longer as its using external merge dist for the sort, where as > another plan perform top-N heapsort. For normal sort path, it can consider > the limit as bound, but for GM its not possible. Right, good catch. Committed. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Possible regression with gather merge.
On Wed, Mar 22, 2017 at 1:09 PM, Rushabh Lathiawrote: > In the code, gather merge consider the limit for the sort into > create_ordered_paths, > which is wrong. Into create_ordered_paths(), GM should not consider the > limit > while doing costing for the sort node. > > Attached patch fix the bug. Thanks, Rushabh, Now I see non-parallel scan being picked up by default create table test as (select id, (random()*1)::int as v1, random() as v2 from generate_series(1,1000) id); postgres=# explain analyze select * from test order by v1, v2 limit 10; QUERY PLAN --- Limit (cost=370146.98..370147.00 rows=10 width=16) (actual time=1169.685..1169.686 rows=10 loops=1) -> Sort (cost=370146.98..395146.63 rows=860 width=16) (actual time=1169.683..1169.684 rows=10 loops=1) Sort Key: v1, v2 Sort Method: top-N heapsort Memory: 25kB -> Seq Scan on test (cost=0.00..154053.60 rows=860 width=16) (actual time=0.016..590.176 rows=1000 loops=1) Planning time: 0.070 ms Execution time: 1169.706 ms (7 rows) Another find by accident. Setting higher max_parallel_workers_per_gather to a higher value than default results in more parallel workers, But query runs slower than the plan with lesser workers. postgres=# set max_parallel_workers_per_gather = default; SET postgres=# explain analyze select * from test order by v1, v2 limit 1000; QUERY PLAN Limit (cost=697263.86..1669540.28 rows=8333216 width=16) (actual time=2212.437..8207.161 rows=1000 loops=1) -> Gather Merge (cost=697263.86..1669540.28 rows=8333216 width=16) (actual time=2212.436..7600.478 rows=1000 loops=1) Workers Planned: 2 Workers Launched: 2 -> Sort (cost=696263.84..706680.36 rows=4166608 width=16) (actual time=2173.756..3105.512 rows=333 loops=3) Sort Key: v1, v2 Sort Method: external merge Disk: 86648kB -> Parallel Seq Scan on test (cost=0.00..95721.08 rows=4166608 width=16) (actual time=0.030..240.486 rows=333 loops=3) Planning time: 0.096 ms Execution time: 8537.214 ms (10 rows) postgres=# set max_parallel_workers_per_gather = 10; SET postgres=# explain analyze select * from test order by v1, v2 limit 1000; QUERY PLAN Limit (cost=431168.44..1628498.09 rows=860 width=16) (actual time=1525.120..13273.805 rows=1000 loops=1) -> Gather Merge (cost=431168.44..1628498.09 rows=860 width=16) (actual time=1525.119..12650.621 rows=1000 loops=1) Workers Planned: 4 Workers Launched: 4 -> Sort (cost=430168.39..436418.30 rows=2499965 width=16) (actual time=1472.799..2133.571 rows=200 loops=5) Sort Key: v1, v2 Sort Method: external merge Disk: 50336kB -> Parallel Seq Scan on test (cost=0.00..79054.65 rows=2499965 width=16) (actual time=0.047..201.405 rows=200 loops=5) Planning time: 0.077 ms Execution time: 13622.319 ms (10 rows) -- Thanks and Regards Mithun C Y EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Possible regression with gather merge.
Hi, postgres=# explain analyze select * from test order by v1, v2 limit 10; QUERY PLAN Limit (cost=19576.71..19577.88 rows=10 width=16) (actual time=408.842..408.853 rows=10 loops=1) -> Gather Merge (cost=19576.71..116805.80 rows=84 width=16) (actual time=408.841..408.850 rows=10 loops=1) Workers Planned: 2 Workers Launched: 2 -> Sort (cost=18576.69..19618.36 rows=416667 width=16) (actual time=393.602..394.080 rows=911 loops=3) Sort Key: v1, v2 Sort Method: external merge Disk: 8128kB -> Parallel Seq Scan on test (cost=0.00..9572.67 rows=416667 width=16) (actual time=0.053..46.238 rows=33 loops=3) Planning time: 0.118 ms Execution time: 414.763 ms (10 rows) postgres=# set enable_gathermerge = off; SET postgres=# explain analyze select * from test order by v1, v2 limit 10; QUERY PLAN - Limit (cost=37015.64..37015.67 rows=10 width=16) (actual time=268.859..268.861 rows=10 loops=1) -> Sort (cost=37015.64..39515.64 rows=100 width=16) (actual time=268.858..268.859 rows=10 loops=1) Sort Key: v1, v2 Sort Method: top-N heapsort Memory: 25kB -> Seq Scan on test (cost=0.00..15406.00 rows=100 width=16) (actual time=0.085..107.841 rows=100 loops=1) Planning time: 0.163 ms Execution time: 268.897 ms (7 rows) Looking at the explain analyze output of both the plan, its clear that GM taking longer as its using external merge dist for the sort, where as another plan perform top-N heapsort. For normal sort path, it can consider the limit as bound, but for GM its not possible. In the code, gather merge consider the limit for the sort into create_ordered_paths, which is wrong. Into create_ordered_paths(), GM should not consider the limit while doing costing for the sort node. Attached patch fix the bug. On Wed, Mar 22, 2017 at 12:05 PM, Rushabh Lathiawrote: > Thanks for reporting, I am looking into this. > > On Wed, Mar 22, 2017 at 11:51 AM, Mithun Cy > wrote: > >> Adding more rows to table make gather merge execution time very slow >> when compared to non-parallel plan we get after disabling gather >> merge. >> >> create table test as (select id, (random()*1)::int as v1, random() as >> v2 from generate_series(1,1) id); >> >> postgres=# set max_parallel_workers_per_gather = default; >> SET >> postgres=# explain analyze select * from test order by v1, v2 limit 10; >> >> QUERY PLAN >> >> >> >> Limit (cost=1858610.53..1858611.70 rows=10 width=16) (actual >> time=31103.880..31103.885 rows=10 loops=1) >>-> Gather Merge (cost=1858610.53..11581520.05 rows=8406 >> width=16) (actual time=31103.878..31103.882 rows=10 loops=1) >> Workers Planned: 2 >> Workers Launched: 2 >> -> Sort (cost=1857610.50..1961777.26 rows=41666703 >> width=16) (actual time=30560.865..30561.046 rows=911 loops=3) >>Sort Key: v1, v2 >>Sort Method: external merge Disk: 841584kB >>-> Parallel Seq Scan on test (cost=0.00..957208.03 >> rows=41666703 width=16) (actual time=0.050..2330.275 rows= >> loops=3) >> Planning time: 0.292 ms >> Execution time: 31502.896 ms >> (10 rows) >> >> postgres=# set max_parallel_workers_per_gather = 0; >> SET >> postgres=# explain analyze select * from test order by v1, v2 limit 10; >> QUERY >> PLAN >> >> >> Limit (cost=3701507.83..3701507.85 rows=10 width=16) (actual >> time=13231.264..13231.266 rows=10 loops=1) >>-> Sort (cost=3701507.83..3951508.05 rows=10088 width=16) >> (actual time=13231.261..13231.262 rows=10 loops=1) >> Sort Key: v1, v2 >> Sort Method: top-N heapsort Memory: 25kB >> -> Seq Scan on test (cost=0.00..1540541.88 rows=10088 >> width=16) (actual time=0.045..6759.363 rows=1 loops=1) >> Planning time: 0.131 ms >> Execution time: 13231.299 ms >> (7 rows) >> >> On Wed, Mar 22, 2017 at 11:07 AM, Mithun Cy >> wrote: >> > I accidently encountered a case where gather merge was picked as >> > default but disabling same by setting max_parallel_workers_per_gather >> > = 0; produced a non-parallel plan which was
Re: [HACKERS] Possible regression with gather merge.
Thanks for reporting, I am looking into this. On Wed, Mar 22, 2017 at 11:51 AM, Mithun Cywrote: > Adding more rows to table make gather merge execution time very slow > when compared to non-parallel plan we get after disabling gather > merge. > > create table test as (select id, (random()*1)::int as v1, random() as > v2 from generate_series(1,1) id); > > postgres=# set max_parallel_workers_per_gather = default; > SET > postgres=# explain analyze select * from test order by v1, v2 limit 10; > > QUERY PLAN > > > > Limit (cost=1858610.53..1858611.70 rows=10 width=16) (actual > time=31103.880..31103.885 rows=10 loops=1) >-> Gather Merge (cost=1858610.53..11581520.05 rows=8406 > width=16) (actual time=31103.878..31103.882 rows=10 loops=1) > Workers Planned: 2 > Workers Launched: 2 > -> Sort (cost=1857610.50..1961777.26 rows=41666703 > width=16) (actual time=30560.865..30561.046 rows=911 loops=3) >Sort Key: v1, v2 >Sort Method: external merge Disk: 841584kB >-> Parallel Seq Scan on test (cost=0.00..957208.03 > rows=41666703 width=16) (actual time=0.050..2330.275 rows= > loops=3) > Planning time: 0.292 ms > Execution time: 31502.896 ms > (10 rows) > > postgres=# set max_parallel_workers_per_gather = 0; > SET > postgres=# explain analyze select * from test order by v1, v2 limit 10; > QUERY > PLAN > > > Limit (cost=3701507.83..3701507.85 rows=10 width=16) (actual > time=13231.264..13231.266 rows=10 loops=1) >-> Sort (cost=3701507.83..3951508.05 rows=10088 width=16) > (actual time=13231.261..13231.262 rows=10 loops=1) > Sort Key: v1, v2 > Sort Method: top-N heapsort Memory: 25kB > -> Seq Scan on test (cost=0.00..1540541.88 rows=10088 > width=16) (actual time=0.045..6759.363 rows=1 loops=1) > Planning time: 0.131 ms > Execution time: 13231.299 ms > (7 rows) > > On Wed, Mar 22, 2017 at 11:07 AM, Mithun Cy > wrote: > > I accidently encountered a case where gather merge was picked as > > default but disabling same by setting max_parallel_workers_per_gather > > = 0; produced a non-parallel plan which was faster than gather merge, > > but its cost is marked too high when compared to gather merge. > > > > I guess we need some cost adjustment is planner code. > > > > Test setting > > = > > create table test as (select id, (random()*1)::int as v1, random() as > > v2 from generate_series(1,100) id); > > create index test_v1_idx on test (v1); > > > > > > Server setting is default. > > > > > > postgres=# explain analyze select * from test order by v1, v2 limit 10; > >QUERY > > PLAN > > > > > > Limit (cost=19576.71..19577.88 rows=10 width=16) (actual > > time=265.989..265.995 rows=10 loops=1) > >-> Gather Merge (cost=19576.71..116805.80 rows=84 width=16) > > (actual time=265.987..265.992 rows=10 loops=1) > > Workers Planned: 2 > > Workers Launched: 2 > > -> Sort (cost=18576.69..19618.36 rows=416667 width=16) > > (actual time=250.202..250.424 rows=911 loops=3) > >Sort Key: v1, v2 > >Sort Method: external merge Disk: 9272kB > >-> Parallel Seq Scan on test (cost=0.00..9572.67 > > rows=416667 width=16) (actual time=0.053..41.397 rows=33 loops=3) > > Planning time: 0.193 ms > > Execution time: 271.222 ms > > > > postgres=# set max_parallel_workers_per_gather = 0; > > SET > > postgres=# explain analyze select * from test order by v1, v2 limit 10; > > QUERY PLAN > > > - > > Limit (cost=37015.64..37015.67 rows=10 width=16) (actual > > time=211.582..211.584 rows=10 loops=1) > >-> Sort (cost=37015.64..39515.64 rows=100 width=16) (actual > > time=211.581..211.582 rows=10 loops=1) > > Sort Key: v1, v2 > > Sort Method: top-N heapsort Memory: 25kB > > -> Seq Scan on test (cost=0.00..15406.00 rows=100 > > width=16) (actual time=0.085..107.522 rows=100 loops=1) > > Planning time: 0.093 ms > > Execution time: 211.608 ms > > (7 rows) > > > > > > > > -- > > Thanks and Regards > > Mithun C Y > > EnterpriseDB: http://www.enterprisedb.com > > > >
Re: [HACKERS] Possible regression with gather merge.
Adding more rows to table make gather merge execution time very slow when compared to non-parallel plan we get after disabling gather merge. create table test as (select id, (random()*1)::int as v1, random() as v2 from generate_series(1,1) id); postgres=# set max_parallel_workers_per_gather = default; SET postgres=# explain analyze select * from test order by v1, v2 limit 10; QUERY PLAN Limit (cost=1858610.53..1858611.70 rows=10 width=16) (actual time=31103.880..31103.885 rows=10 loops=1) -> Gather Merge (cost=1858610.53..11581520.05 rows=8406 width=16) (actual time=31103.878..31103.882 rows=10 loops=1) Workers Planned: 2 Workers Launched: 2 -> Sort (cost=1857610.50..1961777.26 rows=41666703 width=16) (actual time=30560.865..30561.046 rows=911 loops=3) Sort Key: v1, v2 Sort Method: external merge Disk: 841584kB -> Parallel Seq Scan on test (cost=0.00..957208.03 rows=41666703 width=16) (actual time=0.050..2330.275 rows= loops=3) Planning time: 0.292 ms Execution time: 31502.896 ms (10 rows) postgres=# set max_parallel_workers_per_gather = 0; SET postgres=# explain analyze select * from test order by v1, v2 limit 10; QUERY PLAN Limit (cost=3701507.83..3701507.85 rows=10 width=16) (actual time=13231.264..13231.266 rows=10 loops=1) -> Sort (cost=3701507.83..3951508.05 rows=10088 width=16) (actual time=13231.261..13231.262 rows=10 loops=1) Sort Key: v1, v2 Sort Method: top-N heapsort Memory: 25kB -> Seq Scan on test (cost=0.00..1540541.88 rows=10088 width=16) (actual time=0.045..6759.363 rows=1 loops=1) Planning time: 0.131 ms Execution time: 13231.299 ms (7 rows) On Wed, Mar 22, 2017 at 11:07 AM, Mithun Cywrote: > I accidently encountered a case where gather merge was picked as > default but disabling same by setting max_parallel_workers_per_gather > = 0; produced a non-parallel plan which was faster than gather merge, > but its cost is marked too high when compared to gather merge. > > I guess we need some cost adjustment is planner code. > > Test setting > = > create table test as (select id, (random()*1)::int as v1, random() as > v2 from generate_series(1,100) id); > create index test_v1_idx on test (v1); > > > Server setting is default. > > > postgres=# explain analyze select * from test order by v1, v2 limit 10; >QUERY > PLAN > > Limit (cost=19576.71..19577.88 rows=10 width=16) (actual > time=265.989..265.995 rows=10 loops=1) >-> Gather Merge (cost=19576.71..116805.80 rows=84 width=16) > (actual time=265.987..265.992 rows=10 loops=1) > Workers Planned: 2 > Workers Launched: 2 > -> Sort (cost=18576.69..19618.36 rows=416667 width=16) > (actual time=250.202..250.424 rows=911 loops=3) >Sort Key: v1, v2 >Sort Method: external merge Disk: 9272kB >-> Parallel Seq Scan on test (cost=0.00..9572.67 > rows=416667 width=16) (actual time=0.053..41.397 rows=33 loops=3) > Planning time: 0.193 ms > Execution time: 271.222 ms > > postgres=# set max_parallel_workers_per_gather = 0; > SET > postgres=# explain analyze select * from test order by v1, v2 limit 10; > QUERY PLAN > - > Limit (cost=37015.64..37015.67 rows=10 width=16) (actual > time=211.582..211.584 rows=10 loops=1) >-> Sort (cost=37015.64..39515.64 rows=100 width=16) (actual > time=211.581..211.582 rows=10 loops=1) > Sort Key: v1, v2 > Sort Method: top-N heapsort Memory: 25kB > -> Seq Scan on test (cost=0.00..15406.00 rows=100 > width=16) (actual time=0.085..107.522 rows=100 loops=1) > Planning time: 0.093 ms > Execution time: 211.608 ms > (7 rows) > > > > -- > Thanks and Regards > Mithun C Y > EnterpriseDB: http://www.enterprisedb.com -- Thanks and Regards Mithun C Y EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Possible regression with gather merge.
I accidently encountered a case where gather merge was picked as default but disabling same by setting max_parallel_workers_per_gather = 0; produced a non-parallel plan which was faster than gather merge, but its cost is marked too high when compared to gather merge. I guess we need some cost adjustment is planner code. Test setting = create table test as (select id, (random()*1)::int as v1, random() as v2 from generate_series(1,100) id); create index test_v1_idx on test (v1); Server setting is default. postgres=# explain analyze select * from test order by v1, v2 limit 10; QUERY PLAN Limit (cost=19576.71..19577.88 rows=10 width=16) (actual time=265.989..265.995 rows=10 loops=1) -> Gather Merge (cost=19576.71..116805.80 rows=84 width=16) (actual time=265.987..265.992 rows=10 loops=1) Workers Planned: 2 Workers Launched: 2 -> Sort (cost=18576.69..19618.36 rows=416667 width=16) (actual time=250.202..250.424 rows=911 loops=3) Sort Key: v1, v2 Sort Method: external merge Disk: 9272kB -> Parallel Seq Scan on test (cost=0.00..9572.67 rows=416667 width=16) (actual time=0.053..41.397 rows=33 loops=3) Planning time: 0.193 ms Execution time: 271.222 ms postgres=# set max_parallel_workers_per_gather = 0; SET postgres=# explain analyze select * from test order by v1, v2 limit 10; QUERY PLAN - Limit (cost=37015.64..37015.67 rows=10 width=16) (actual time=211.582..211.584 rows=10 loops=1) -> Sort (cost=37015.64..39515.64 rows=100 width=16) (actual time=211.581..211.582 rows=10 loops=1) Sort Key: v1, v2 Sort Method: top-N heapsort Memory: 25kB -> Seq Scan on test (cost=0.00..15406.00 rows=100 width=16) (actual time=0.085..107.522 rows=100 loops=1) Planning time: 0.093 ms Execution time: 211.608 ms (7 rows) -- Thanks and Regards Mithun C Y EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers