Re: [PERFORM] unnecessary sort in the execution plan when doing group by

David Rowley Tue, 28 Oct 2014 00:08:18 -0700

On Tue, Oct 28, 2014 at 7:26 PM, Huang, Suya <[email protected]>
wrote:


>  Hi,
>
>
>
> This is the Greenplum database 4.3.1.0.
>
>
Likely this is the wrong place to ask for help. The plan output that you've
pasted below looks very different to PostgreSQL's EXPLAIN output.



                                                      QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Gather Motion 24:1  (slice2; segments: 24)  (cost=31286842.08..31287447.81
rows=1683 width=536)

   Rows out:  15380160 rows at destination with 14860 ms to first row,
23856 ms to end, start offset by 104 ms.

   ->  HashAggregate  (cost=31286842.08..31287447.81 rows=1683 width=536)


-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Gather Motion 24:1  (slice2; segments: 24)
(cost=152269717.33..157009763.41 rows=1196982 width=568)

   Rows out:  15380160 rows at destination with 35320 ms to first row,
70091 ms to end, start offset by 102 ms.

   ->  GroupAggregate  (cost=152269717.33..157009763.41 rows=1196982
width=568)


Most likely the reason you're getting the difference in plan is because the
planner is probably decided that there will be too many hash entries for a
hash table based on the 3 grouping columns... Look at the estimates, 1683 with
2 columns and 1196982 with the 3 columns. If those estimates turned out to
be true, then the hash table for 3 columns will be massively bigger than it
would be with 2 columns. With PostgreSQL you might see the plan changing if
you increased the work_mem setting. For greenplum, I've no idea if that's
the same.

Databases are often not very good at knowing with the number of distinct
values would be over more than 1 column. Certain databases have solved this
with multi column statistics, but PostgreSQL does not have these. Although
I just noticed last night that someone is working on them.

Regards

David Rowley

Re: [PERFORM] unnecessary sort in the execution plan when doing group by

Reply via email to