On Thu, Dec 23, 2010 at 10:42:26PM +0800, Li Jie wrote: > ----- Original Message ----- > From: "Kenneth Marshall" <k...@rice.edu> > To: "Li Jie" <jay23j...@gmail.com> > Cc: "pgsql-hackers" <pgsql-hackers@postgresql.org> > Sent: Thursday, December 23, 2010 10:30 PM > Subject: Re: [HACKERS] Why is sorting on two columns so slower thansortingon > one column? > > > > On Thu, Dec 23, 2010 at 10:19:46PM +0800, Li Jie wrote: > >> Hi Ken, > >> > >> Thanks for your tips! Yes it is the case, and I run another query sorting > >> on the second column whose values are random. > >> > >> postgres=# explain analyze select * from big_wf order by id; > >> QUERY PLAN > >> > >> ------------------------------------------------------------------------------------------------------------------------- > >> Sort (cost=565525.45..575775.45 rows=4100000 width=8) (actual > >> time=25681.875..36458.824 rows=4100000 loops=1) > >> Sort Key: id > >> Sort Method: external merge Disk: 72048kB > >> -> Seq Scan on big_wf (cost=0.00..59142.00 rows=4100000 width=8) > >> (actual time=8.595..5569.500 rows=4100000 loops=1) > >> > >> Now the sorting takes about 20 seconds, so it seems reasonable compared to > >> 30 seconds, right? But one thing I'm confused is that, why is additional > >> comparison really so expensive? Does it incur additional I/O? From the > >> cost model, it seems not, all the "cost" are the same (575775.45). > >> > >> Thanks, > >> Li Jie > > > > In the first query, the cost is basically the I/O cost to read the > > table from disk. The actual sort does not do anything since the > > sort values are the same. In the second query, the sort has to > > swap things in memory/disk to get them in the correct order for > > the result. This actually takes CPU and possibly additional I/O > > which is why it is slower. In the case of sorting by just the "id" > > column, the size of the sorted values is smaller which would need > > fewer batches to complete the sort since the sort is bigger than > > the work_mem. > > > > Cheers, > > Ken > > Hi Ken, > > Thanks for your analysis. > > But in the last query that sorts on "id", since the query selects all the > columns for output, the actual sorted size is the same, and the only > difference is the comparison cost. The query sorting on two columns needs to > do twice the comparison. Am I right? > > Thanks, > Li Jie
I think you are right. Sorry for the confusion. Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers