Re: [PERFORM] big distinct clause vs. group by

Uwe Bartels Mon, 18 Apr 2011 15:01:05 -0700

Hi Robert,

thanks for your answer.
the aggregate function I was talking about is the function I need to use for
the non-group by columns like min() in my example.
There are of course several function to choose from, and I wanted to know
which causes as less as possible resources.


best regards,
Uwe


On 18 April 2011 18:19, Robert Haas <robertmh...@gmail.com> wrote:

> On Wed, Mar 16, 2011 at 4:45 AM, Uwe Bartels <uwe.bart...@gmail.com>
> wrote:
> > I'm having trouble with some sql statements which use an expression with
> > many columns and distinct in the column list of the select.
> > select distinct col1,col2,.....col20,col21
> > from table1 left join table2 on <join condition>,...
> > where
> >  <other expressions>;
> >
> > The negative result is a big sort with teporary files.
> >               ->  Sort  (cost=5813649.93..5853067.63 rows=15767078
> width=80)
> > (actual time=79027.079..81556.059 rows=12076838 loops=1)
> >                     Sort Method:  external sort  Disk: 1086096kB
> > By the way - for this query I have a work_mem of 1 GB - so raising this
> > further is not generally possible - also not for one special command, due
> to
> > parallelism.
> >
> > How do I get around this?
>
> Hmm.  It seems to me that there's no way to work out the distinct
> values without either sorting or hashing the output, which will
> necessarily be slow if you have a lot of data.
>
> > I have one idea and like to know if there any other approaches or an even
> > known better solution to that problem. By using group by I don't need the
> > big sort for the distinct - I reduce it (theoreticly) to the key columns.
> >
> > select <list of key columns>,<non key column>
> > from tables1left join table2 on <join condition>,...
> > where
> >  <other conditions>
> > group by <list of key columns>
>
> You might try SELECT DISTINCT ON (key columns) <key columns> <non-key
> columns> FROM ...
>
> > Another question would be what's the aggregate function which needs as
> less
> > as possible resources (time).
>
> Not sure I follow this part.
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>

Re: [PERFORM] big distinct clause vs. group by

Reply via email to