The builtin corr() aggregate doesn't produce the correct results in some circumstances. Per the SQL spec, corr(x, y) is defined as equivalent to covar_pop(x, y) / (stddev_pop(x) * stddev_pop(y)).
postgres=# create table t1 (x float8, y float8); CREATE TABLE postgres=# copy t1 from stdin with csv; 0.940839937888086,0.539830380585045 0.84795232815668,0.396385048050433 0.601479615084827,0.899995123967528 0.785623408854008,0.302559469360858 0.829138438683003,0.0211085784249008 0.926528611686081,0.315794581547379 0.25934984581545,0.609216409735382 0.976522764191031,0.877208305988461 \. postgres=# select corr(x, y) from t1; corr ------------------- 0.214150892978763 (1 row) postgres=# select covar_pop(x, y) / (stddev_pop(x) * stddev_pop(y)) from t1; ?column? -------------------- -0.214150892978763 (1 row) With the attached patch, we get the expected results: postgres=# select corr(x, y) from t1; corr -------------------- -0.214150892978763 (1 row) Credit: Jie Zhang at Greenplum and Gavin Sherry for reporting the issue. Barring any objections, I'll apply this to HEAD and 8.2 later tonight or tomorrow. -Neil
Index: src/backend/utils/adt/float.c =================================================================== RCS file: /home/neilc/postgres/cvs_root/pgsql/src/backend/utils/adt/float.c,v retrieving revision 1.150 diff -p -c -r1.150 float.c *** src/backend/utils/adt/float.c 5 Jun 2007 21:31:06 -0000 1.150 --- src/backend/utils/adt/float.c 19 Sep 2007 00:51:27 -0000 *************** float8_corr(PG_FUNCTION_ARGS) *** 2274,2281 **** if (numeratorX <= 0 || numeratorY <= 0) PG_RETURN_NULL(); ! PG_RETURN_FLOAT8(sqrt((numeratorXY * numeratorXY) / ! (numeratorX * numeratorY))); } Datum --- 2274,2280 ---- if (numeratorX <= 0 || numeratorY <= 0) PG_RETURN_NULL(); ! PG_RETURN_FLOAT8(numeratorXY / sqrt(numeratorX * numeratorY)); } Datum
---------------------------(end of broadcast)--------------------------- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate