We wrote a little contrib module, which we'd like to share. It can be used to generate random datasets as they have been used in [Borzsonyi2001] and related work. The code is based directly on the code of the authors, thanks to Donald Kossmann for sharing the code. Donald Kossmann agrees on sharing our implementation with the community.
We use it primary for profiling and regression testing of our implementation of the SKYLINE operator in PostgreSQL, which is work in progress, details will be announced soon. We use it in two ways: (1) create table a2d1000 as (select * from pg_rand_dataset(‘anti’, 2, 1000, 0) ds(id int, d1 float, d2 float)). And then work with this table, e.g. an additional index and query the data or (2) directly for profiling and regression testing as: select * from pg_rand_dataset(‘indep’, 3, 1000, 0) ds(id int, d1 float, d2 float, d3 float) skyline by d1 min, d2 min, d3 max; It might be useful for other purposes. We ask you for comments and suggestions. What's the best way to share this module? Declaration ----------- CREATE FUNCTION pg_catalog.pg_rand_dataset(disttype text, dim int, rows int, seed int) RETURNS setof record AS '$libdir/randdataset', 'pg_rand_dataset' LANGUAGE C IMMUTABLE STRICT; Usage example ------------- Hannes=# select * from pg_rand_dataset('indep',1,5,0) ds(id int, d1 float); id | d1 ----+-------------------- 1 | 0.170828036112165 2 | 0.749901980510867 3 | 0.0963716553972902 4 | 0.870465227342427 5 | 0.577303506702792 (5 rows)Hannes=# select * from pg_rand_dataset('corr',2,3,0) ds(id int, d1 float, d2 float);
id | d1 | d2 ----+-------------------+------------------- 1 | 0.489722997173791 | 0.431007019449241 2 | 0.385624871971487 | 0.645283734523713 3 | 0.754378154805565 | 0.82440492311745 (3 rows)Hannes=# select * from pg_rand_dataset('anti',3,2,0) ds(id int, d1 float, d2 float, d3 float);
id | d1 | d2 | d3 ----+-------------------+--------------------+------------------- 1 | 0.848072555389202 | 0.0868831583602555 | 0.656344708211334 2 | 0.447278565604571 | 0.641176215525044 | 0.523196923510816 (2 rows) References ---------- [Borzsonyi2001] Börzsönyi, S.; Kossmann, D. & Stocker, K.: The Skyline Operator, ICDE 2001, 421--432 Best, -Hannes
randdataset.tar.gz
Description: GNU Zip compressed data
---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster