Hello, I posted some observations to the performance of postgres some weeks ago. The problem with the poor performance of "select distinct" still exists, but I tried to worked out some reproducable results in a less complicated way than in my first postings.
'select distinct' preforms on Oracle about 30 times faster than in Postgres. Well, only 'select distinct' is slow. All other parts of our application using Postgres perform quite well on Postgres (with really big amounts of data). (All tests on Postgres 7.3b1 and Oracle 9.0.1.3.0. Hardware: 4 * 1800 MHz Xeon, 3 * 100GB IDE Software-Raid0) Suse-Linux, 2.4.18 SMP-kernel) Here the table: mc=# \d egal Table "public.egal" Column | Type | Modifiers --------+---------+----------- i | integer | mc=# select count(*) from egal; count --------- 7227744 (1 row) mc=# select count(distinct i) from egal; count ------- 67 (1 row) (The integers have values between 0 and 99 are an extract of our warehouse and nearly distributed randomly) The last query - count(distinct) - takes 1m30s while oracle needs 0m7s for the same query and data. Getting the distinct rows and measuring the time results to time echo "select distinct i from egal;"|psql >/dev/null real 4m52.108s user 0m0.000s sys 0m0.010s Here I don't understand the difference in performance between "select distinct i" and "select count(distinct i)" - Oracle takes constantly 7s for each query: time echo "select distinct i from egal;"|sqlplus xxxx/yyyy >/dev/null real 0m6.979s user 0m0.020s sys 0m0.030s In the first case (count(distinct)) postgres produces a temporary file of 86 MB in pgsql_tmp, in the second case (select distinct) the temp-file increases to 260 MB. (this is even larger than the table size of egal which is 232 MB) I think the planner has not many choices for this query so it results to mc=# explain select distinct (i) from egal; QUERY PLAN --------------------------------------------------------------------------- Unique (cost=1292118.29..1328257.01 rows=722774 width=4) -> Sort (cost=1292118.29..1310187.65 rows=7227744 width=4) Sort Key: i -> Seq Scan on egal (cost=0.00..104118.44 rows=7227744 width=4) (4 rows) Which looks similar to oracle's plan: QPLAN --------------- SORT UNIQUE TABLE ACCESS FULL EGAL Our problem is that things getting worse when the table size increases, as described in my first mails. (Especially when the temp-file is split into parts) Reorganizing/normalizing the data is no solultion, because our application is just for analyzing the data. I tried out the trigger-idea: inserting a value into a table with an unique key only if still does't exists, it works, but .. select distinct is a much faster solution. An other way I tried is an PL/TCL function which stores all different values into a assoziative array in memory, by far the fastest solution - but returning the different values makes problems to me (I shouldn't say, but I put it into a file and "copy" it back). And it's not very nice - not as nice as "select distinct" :) Kind regards, Michael Contzen
<<attachment: winmail.dat>>
---------------------------(end of broadcast)--------------------------- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]