Re: [HACKERS] Why is sorting on two columns so slower thansortingon one column?

Kenneth Marshall Thu, 23 Dec 2010 06:53:32 -0800

On Thu, Dec 23, 2010 at 10:42:26PM +0800, Li Jie wrote:
> ----- Original Message ----- 
> From: "Kenneth Marshall" <k...@rice.edu>
> To: "Li Jie" <jay23j...@gmail.com>
> Cc: "pgsql-hackers" <pgsql-hackers@postgresql.org>
> Sent: Thursday, December 23, 2010 10:30 PM
> Subject: Re: [HACKERS] Why is sorting on two columns so slower thansortingon 
> one column?
> 
> 
> > On Thu, Dec 23, 2010 at 10:19:46PM +0800, Li Jie wrote:
> >> Hi Ken,
> >> 
> >> Thanks for your tips! Yes it is the case, and I run another query sorting 
> >> on the second column whose values are random.
> >> 
> >> postgres=# explain analyze select * from big_wf order by id;
> >>                                                       QUERY PLAN           
> >>                                              
> >> -------------------------------------------------------------------------------------------------------------------------
> >> Sort  (cost=565525.45..575775.45 rows=4100000 width=8) (actual 
> >> time=25681.875..36458.824 rows=4100000 loops=1)
> >>   Sort Key: id
> >>   Sort Method:  external merge  Disk: 72048kB
> >>   ->  Seq Scan on big_wf  (cost=0.00..59142.00 rows=4100000 width=8) 
> >> (actual time=8.595..5569.500 rows=4100000 loops=1)
> >> 
> >> Now the sorting takes about 20 seconds, so it seems reasonable compared to 
> >> 30 seconds, right? But one thing I'm confused is that, why is additional 
> >> comparison really so expensive?  Does it incur additional I/O? From the 
> >> cost model, it seems not, all the "cost" are the same (575775.45).
> >> 
> >> Thanks,
> >> Li Jie
> > 
> > In the first query, the cost is basically the I/O cost to read the
> > table from disk. The actual sort does not do anything since the
> > sort values are the same. In the second query, the sort has to
> > swap things in memory/disk to get them in the correct order for
> > the result. This actually takes CPU and possibly additional I/O
> > which is why it is slower. In the case of sorting by just the "id"
> > column, the size of the sorted values is smaller which would need
> > fewer batches to complete the sort since the sort is bigger than
> > the work_mem.
> > 
> > Cheers,
> > Ken
> 
> Hi Ken,
> 
> Thanks for your analysis.
> 
> But in the last query that sorts on "id",  since the query selects all the 
> columns for output, the actual sorted size is the same, and the only 
> difference is the comparison cost. The query sorting on two columns needs to 
> do twice the comparison. Am I right?
> 
> Thanks,
> Li Jie


I think you are right. Sorry for the confusion.

Ken

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Why is sorting on two columns so slower thansortingon one column?

Reply via email to