On Tue, Oct 28, 2014 at 7:26 PM, Huang, Suya <suya.hu...@au.experian.com> wrote:
> Hi, > > > > This is the Greenplum database 4.3.1.0. > > Likely this is the wrong place to ask for help. The plan output that you've pasted below looks very different to PostgreSQL's EXPLAIN output. QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Gather Motion 24:1 (slice2; segments: 24) (cost=31286842.08..31287447.81 rows=1683 width=536) Rows out: 15380160 rows at destination with 14860 ms to first row, 23856 ms to end, start offset by 104 ms. -> HashAggregate (cost=31286842.08..31287447.81 rows=1683 width=536) ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Gather Motion 24:1 (slice2; segments: 24) (cost=152269717.33..157009763.41 rows=1196982 width=568) Rows out: 15380160 rows at destination with 35320 ms to first row, 70091 ms to end, start offset by 102 ms. -> GroupAggregate (cost=152269717.33..157009763.41 rows=1196982 width=568) Most likely the reason you're getting the difference in plan is because the planner is probably decided that there will be too many hash entries for a hash table based on the 3 grouping columns... Look at the estimates, 1683 with 2 columns and 1196982 with the 3 columns. If those estimates turned out to be true, then the hash table for 3 columns will be massively bigger than it would be with 2 columns. With PostgreSQL you might see the plan changing if you increased the work_mem setting. For greenplum, I've no idea if that's the same. Databases are often not very good at knowing with the number of distinct values would be over more than 1 column. Certain databases have solved this with multi column statistics, but PostgreSQL does not have these. Although I just noticed last night that someone is working on them. Regards David Rowley