Re: [HACKERS] sortsupport for text

Kevin Grittner Tue, 19 Jun 2012 08:18:29 -0700

Peter Geoghegan <pe...@2ndquadrant.com> wrote:

> So, just to give a bit more weight to my argument that we should
> recognise that equivalent strings ought to be treated identically

Since we appear to be questioning everything in this area, I'll
raise something which has been bugging me for a while: in some other
systems I've used, the "tie-breaker" comparison for equivalent
values comes after equivalence sorting on *all* sort keys, rather
than *each* sort key.  For example, this much makes sense with
lc_collate = 'en_US.UTF-8':

test=# create table c (last_name text not null, first_name text);
CREATE TABLE
test=# insert into c values ('smith', 'bob'), ('smith', 'peter'),
('SMITH', 'EDWARD');
INSERT 0 3
test=# select * from c order by 2;
 last_name | first_name 
-----------+------------
 smith     | bob
 SMITH     | EDWARD
 smith     | peter
(3 rows)

This seems completely wrong:

test=# select * from c order by 1,2;
 last_name | first_name 
-----------+------------
 smith     | bob
 smith     | peter
 SMITH     | EDWARD
(3 rows)

I have seen other databases which get it in the order I would expect
-- where the C compare only matters within groups of equivalent
rows.  It seems that PostgreSQL effectively orders by:

  last_name using collation 'en_US.UTF-8'
  last_name using collation 'C'
  first_name using collation 'en_US.UTF-8'
  first_name using collation 'C'

while some other products order by:

  last_name using collation 'en_US.UTF-8'
  first_name using collation 'en_US.UTF-8'
  last_name using collation 'C'
  first_name using collation 'C'

I'm sure the latter is harder to do and slower to execute; but the
former just doesn't seem defensible as correct.

-Kevin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] sortsupport for text

Reply via email to