Craig, Russel, I appreciate your help.
Thanks. 2010/6/22 Russell Smith <mr-r...@pws.com.au> > On 22/06/10 00:42, Sergio Charpinel Jr. wrote: > > Hi, > > > [snip] > > > > => explain analyze SELECT ip_src, port_src, ip_dst, port_dst, > > tcp_flags, ip_proto,SUM("bytes"),SUM("packets"),SUM("flows") FROM > > "acct_2010_25" WHERE "stamp_inserted">='2010-06-20 10:10' AND > > "stamp_inserted"<'2010-06-21 10:10' GROUP BY ip_src, port_src, ip_dst, > > port_dst, tcp_flags, ip_proto order by SUM(bytes) desc LIMIT 50 OFFSET 0; > > > > QUERY PLAN > > > > > -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > > Limit (cost=3998662.81..3998662.94 rows=50 width=50) (actual > > time=276981.107..276981.133 rows=50 loops=1) > > -> Sort (cost=3998662.81..4001046.07 rows=953305 width=50) > > (actual time=276981.105..276981.107 rows=50 loops=1) > > Sort Key: sum(bytes) > > -> GroupAggregate (cost=3499863.27..3754872.33 rows=953305 > > width=50) (actual time=165468.257..182677.580 rows=8182616 loops=1) > > -> Sort (cost=3499863.27..3523695.89 rows=9533049 > > width=50) (actual time=165468.022..168908.828 rows=9494165 loops=1) > > Sort Key: ip_src, port_src, ip_dst, port_dst, > > tcp_flags, ip_proto > > You are having to sort and aggregate a large number of rows before you > can get the top 50. That's 9 million rows in this case, width 50 = > 400MB+ sort. That's going to be slow as you are going to have to sort > it on disk unless you bump up sort mem to 500Mb (bad idea). So unless > you have really fast storage for temporary tables it's going to take a > while. About 2.5 minutes you are experiencing at the moment is probably > not too bad. > > I'm sure improvements have been made in the area since 8.1 and if you > are able to upgrade to 8.4 which is also offered by Centos5 now, you > might get benefit there. I can't remember the specific benefits, but I > believe sorting speed has improved, your explain analyze will also give > you more information about what's going on with disk/memory sorting. > > > -> Seq Scan on acct_2010_25 > > (cost=0.00..352648.10 rows=9533049 width=50) (actual > > time=0.038..50860.391 rows=9494165 loops=1) > > Filter: ((stamp_inserted >= '2010-06-20 > > 10:10:00'::timestamp without time zone) AND (stamp_inserted < > > '2010-06-21 10:10:00'::timestamp without time zone)) > > Total runtime: 278791.661 ms > > (9 registros) > > > > Another one just summing bytes (still low): > > > > => explain analyze SELECT ip_src, port_src, ip_dst, port_dst, > > tcp_flags, ip_proto,SUM("bytes") FROM "acct_2010_25" WHERE > > "stamp_inserted">='2010-06-20 10:10' AND "stamp_inserted"<'2010-06-21 > > 10:10' GROUP BY ip_src, port_src, ip_dst, port_dst, tcp_flags, > > ip_proto LIMIT 50 OFFSET 0; > > > > QUERY PLAN > > > > > -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > > Limit (cost=3395202.50..3395213.12 rows=50 width=42) (actual > > time=106261.359..106261.451 rows=50 loops=1) > > -> GroupAggregate (cost=3395202.50..3602225.48 rows=974226 > > width=42) (actual time=106261.357..106261.435 rows=50 loops=1) > > -> Sort (cost=3395202.50..3419558.14 rows=9742258 width=42) > > (actual time=106261.107..106261.169 rows=176 loops=1) > > Sort Key: ip_src, port_src, ip_dst, port_dst, > > tcp_flags, ip_proto > > -> Seq Scan on acct_2010_25 (cost=0.00..367529.72 > > rows=9742258 width=42) (actual time=0.073..8058.598 rows=9494165 loops=1) > > Filter: ((stamp_inserted >= '2010-06-20 > > 10:10:00'::timestamp without time zone) AND (stamp_inserted < > > '2010-06-21 10:10:00'::timestamp without time zone)) > > Total runtime: 109911.882 ms > > (7 registros) > > > > > > The server has 2 Intel(R) Xeon(R) CPU E5430 @ 2.66GHz and 16GB RAM. > > I'm using PostgreSQL 8.1.18 default config from Centos 5.5 (just > > increased checkpoint_segments to 50). > > Checkpoint segments won't help you as the number of segments is about > writing to the database and how fast that can happen. > > > > > What can I change to increase performance? > > Increasing sort-memory (work_mem) will give you speed benefits even > though you are going to disk. I don't know how much spare memory you > have, but trying other values between 8MB and 128MB may be useful just > for the specific query runs. If you can afford 512Mb for each of the > two sorts, go for that, but it's dangerous as mentioned due to the risk > of using more RAM than you have. work_mem allocates that amount of > memory per sort. > > If you are running these queries all the time, a summary table the > produces there reports on a regular basis, maybe daily or even hourly > would be useful. Basically the large amount of information that needs > to be processed and sorted is what's taking all the time here. > > Regards > > Russell > -- Sergio Roberto Charpinel Jr.