On Sat, Oct 18, 2014 at 5:10 AM, Feng Tian <feng...@gmail.com> wrote:
> Hi, > > Consider the following queries. > > create table t(i int, j int, k int, t text); > insert into t select i, i % 100, i % 1000, 'AABBCCDD' || i from > generate_series(1, 1000000) i; > > ftian=# explain select count(distinct j) from t group by t, i; > QUERY PLAN > ------------------------------------------------------------------------ > GroupAggregate (cost=158029.84..178029.84 rows=1000000 width=22) > -> Sort (cost=158029.84..160529.84 rows=1000000 width=22) > Sort Key: t, i > -> Seq Scan on t (cost=0.00..17352.00 rows=1000000 width=22) > (4 rows) > > > The query, > select count(distinct j) from t group by t, i; > > runs for 35 seconds. However, if I change the query to, > select count(distinct j) from t group by i, t; -- note switching the > ordering > select count(distinct j) from t group by decode(t, 'escape'), i; -- > convert t to bytea > > Run times are just about 5 and 6.5 seconds. The reason is clear, compare > a string with collation is slow, which is well understood by pg hackers. > However, here, the sorting order is forced by the planner, not user. > Planner can do the following optimizations, > > 1. for the sort we generated for sort agg, planner can switch column > ordering, put int before string, > 2. for the sort we generated for sort agg, use bytea compare instead of > string compare. > > They will bring big improvement to this common query. Is this something > reasonable? > > It seems like it might be worth looking into, but I think it's likely more complex than sorting the group by order into narrowest column first. If the query was: select count(distinct j) from t group by t, i order by t, i; Then if that was rewritten to make the group by i,t then the planner would then need to perform an additional sort on t,i to get the correct order by for the final result, which may or may not be faster, it would depend on how many groups there were to sort. Though I guess you could possibly just skip the optimisation if the next node up didn't need any specific ordering. I also wonder if taking into account the column's width is not quite enough. For example if the 'i' column only had 1 distinct value, then performing the group by on 'i' first wouldn't help at all. Keep in mind that the columns could be much closer in width than in your example, e.g int and bigint. Perhaps you'd need to include some sort of heuristics to look at the number of distinct values and the width, and form some sort of weighted estimates about which column to put first. Regards David Rowley