On Sat, Oct 18, 2014 at 12:35 PM, Tatsuo Ishii <is...@postgresql.org> wrote:
> > The query, > > select count(distinct j) from t group by t, i; > > > > runs for 35 seconds. However, if I change the query to, > > select count(distinct j) from t group by i, t; -- note switching the > > ordering > > select count(distinct j) from t group by decode(t, 'escape'), i; -- > convert > > t to bytea > > > > Run times are just about 5 and 6.5 seconds. The reason is clear, > compare a > > string with collation is slow, which is well understood by pg hackers. > > However, here, the sorting order is forced by the planner, not user. > > Planner can do the following optimizations, > > Interesting. I got following result: > > test=# explain analyze select count(distinct j) from t group by t, i; > QUERY PLAN > > -------------------------------------------------------------------------------------------------------------------------- > GroupAggregate (cost=137519.84..157519.84 rows=1000000 width=22) (actual > time=1332.937..2431.238 rows=1000000 loops=1) > Group Key: t, i > -> Sort (cost=137519.84..140019.84 rows=1000000 width=22) (actual > time=1332.922..1507.413 rows=1000000 loops=1) > Sort Key: t, i > Sort Method: external merge Disk: 33232kB > -> Seq Scan on t (cost=0.00..17352.00 rows=1000000 width=22) > (actual time=0.006..131.406 rows=1000000 loops=1) > Planning time: 0.031 ms > Execution time: 2484.271 ms > (8 rows) > > Time: 2484.520 ms > > test=# explain analyze select count(distinct j) from t group by i, t; > QUERY PLAN > > -------------------------------------------------------------------------------------------------------------------------- > GroupAggregate (cost=137519.84..157519.84 rows=1000000 width=22) (actual > time=602.510..1632.087 rows=1000000 loops=1) > Group Key: i, t > -> Sort (cost=137519.84..140019.84 rows=1000000 width=22) (actual > time=602.493..703.274 rows=1000000 loops=1) > Sort Key: i, t > Sort Method: external sort Disk: 33240kB > -> Seq Scan on t (cost=0.00..17352.00 rows=1000000 width=22) > (actual time=0.014..129.213 rows=1000000 loops=1) > Planning time: 0.176 ms > Execution time: 1685.575 ms > (8 rows) > > Time: 1687.641 ms > > Not so big difference here (maybe because I use SSD) but there is > still about 50% difference in execution time. Note that I disable > locale support. > > I think this is more likely your locale settings, as if I do: create table t(i int, j int, k int, t text collate "C"); The GROUP BY t,i runs about 25% faster. I've not looked at it yet, but Peter G's patch here https://commitfest.postgresql.org/action/patch_view?id=1462 will quite likely narrow the performance gap between the 2 queries. Regards David Rowley