[PERFORM] parallel query evaluation

Oliver Seidel Sat, 10 Nov 2012 06:06:19 -0800

Hi,

I have

create table x ( att bigint, val bigint, hash varchar(30));


with 693million rows.  The query

create table y as select att, val, count(*) as cnt from xgroup by att, val;

ran for more than 2000 minutes and used 14g memory on an 8g physicalRAM machine -- eventually I stopped it. Doing


            create table y ( att bigint, val bigint, cnt int );

and something a bit like: for i in `seq 0 255` | xargs -n 1-P 6psql -c "insert into y select att, val,count(*) from x where att%256=$1 group by att, val" test

runs 6 out of 256 in 10 minutes -- meaning the whole problem can bedone in just under 3 hours.

Question 1: do you see any reason why the second method would yield adifferent result from the first method?Question 2: is that method generalisabl so that it could be included inthe base system without manual shell glue?


Thanks,

Oliver



--
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

[PERFORM] parallel query evaluation

Reply via email to