Hi,

One more question about two specifics query behavior: If I add "AND (ip_dst
= x.x.x.x)", it uses another plan and take a much more time. In both of
them, I'm using WHERE clause. Why this behavior?

=> explain analyze SELECT ip_src, port_src, ip_dst, port_dst, tcp_flags,
ip_proto, bytes, packets, flows FROM "acct_2010_26" WHERE
"stamp_inserted">='2010-06-28 09:07' AND "stamp_inserted"<'2010-06-29 08:07'
AND (ip_dst = '8.8.8.8') ORDER BY bytes DESC LIMIT 50 OFFSET 0;

                        QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=496332.56..496332.69 rows=50 width=50) (actual
time=125390.523..125390.540 rows=50 loops=1)
   ->  Sort  (cost=496332.56..496351.35 rows=7517 width=50) (actual
time=125390.520..125390.525 rows=50 loops=1)
         Sort Key: bytes
         ->  Index Scan using acct_2010_26_pk on acct_2010_26
 (cost=0.00..495848.62 rows=7517 width=50) (actual time=0.589..125385.680
rows=1011 loops=1)
               Index Cond: ((stamp_inserted >= '2010-06-28
09:07:00'::timestamp without time zone) AND (stamp_inserted < '2010-06-29
08:07:00'::timestamp without time zone) AND (ip_dst = '8.8.8.8'::inet))
 Total runtime: 125390.711 ms
(6 registros)


=> explain analyze SELECT ip_src, port_src, ip_dst, port_dst, tcp_flags,
ip_proto, bytes, packets, flows FROM "acct_2010_26" WHERE
"stamp_inserted">='2010-06-28 09:07' AND "stamp_inserted"<'2010-06-29 08:07'
ORDER BY bytes DESC LIMIT 50 OFFSET 0;

QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..268.25 rows=50 width=50) (actual time=0.150..70.780
rows=50 loops=1)
   ->  Index Scan Backward using ibytes_acct_2010_26 on acct_2010_26
 (cost=0.00..133240575.70 rows=24835384 width=50) (actual time=0.149..70.762
rows=50 loops=1)
         Filter: ((stamp_inserted >= '2010-06-28 09:07:00'::timestamp
without time zone) AND (stamp_inserted < '2010-06-29 08:07:00'::timestamp
without time zone))
 Total runtime: 70.830 ms
(4 registros)


Thanks in advance.

2010/6/23 Sergio Charpinel Jr. <sergiocharpi...@gmail.com>

> Craig, Russel,
>
> I appreciate your help.
>
> Thanks.
>
> 2010/6/22 Russell Smith <mr-r...@pws.com.au>
>
> On 22/06/10 00:42, Sergio Charpinel Jr. wrote:
>> > Hi,
>> >
>> [snip]
>> >
>> > => explain analyze SELECT ip_src, port_src, ip_dst, port_dst,
>> > tcp_flags, ip_proto,SUM("bytes"),SUM("packets"),SUM("flows") FROM
>> > "acct_2010_25" WHERE "stamp_inserted">='2010-06-20 10:10' AND
>> > "stamp_inserted"<'2010-06-21 10:10' GROUP BY ip_src, port_src, ip_dst,
>> > port_dst, tcp_flags, ip_proto order by SUM(bytes) desc LIMIT 50 OFFSET
>> 0;
>> >
>> >                QUERY PLAN
>> >
>> >
>> --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>> >  Limit  (cost=3998662.81..3998662.94 rows=50 width=50) (actual
>> > time=276981.107..276981.133 rows=50 loops=1)
>> >    ->  Sort  (cost=3998662.81..4001046.07 rows=953305 width=50)
>> > (actual time=276981.105..276981.107 rows=50 loops=1)
>> >          Sort Key: sum(bytes)
>> >          ->  GroupAggregate  (cost=3499863.27..3754872.33 rows=953305
>> > width=50) (actual time=165468.257..182677.580 rows=8182616 loops=1)
>> >                ->  Sort  (cost=3499863.27..3523695.89 rows=9533049
>> > width=50) (actual time=165468.022..168908.828 rows=9494165 loops=1)
>> >                      Sort Key: ip_src, port_src, ip_dst, port_dst,
>> > tcp_flags, ip_proto
>>
>> You are having to sort and aggregate a large number of rows before you
>> can get the top 50.  That's 9 million rows in this case, width 50 =
>> 400MB+ sort.  That's going to be slow as you are going to have to sort
>> it on disk unless you bump up sort mem to 500Mb (bad idea).  So unless
>> you have really fast storage for temporary tables it's going to take a
>> while.  About 2.5 minutes you are experiencing at the moment is probably
>> not too bad.
>>
>> I'm sure improvements have been made in the area since 8.1 and if you
>> are able to upgrade to 8.4 which is also offered by Centos5 now, you
>> might get benefit there.  I can't remember the specific benefits, but I
>> believe sorting speed has improved, your explain analyze will also give
>> you more information about what's going on with disk/memory sorting.
>>
>> >                      ->  Seq Scan on acct_2010_25
>> >  (cost=0.00..352648.10 rows=9533049 width=50) (actual
>> > time=0.038..50860.391 rows=9494165 loops=1)
>> >                            Filter: ((stamp_inserted >= '2010-06-20
>> > 10:10:00'::timestamp without time zone) AND (stamp_inserted <
>> > '2010-06-21 10:10:00'::timestamp without time zone))
>> >  Total runtime: 278791.661 ms
>> > (9 registros)
>> >
>> > Another one just summing bytes (still low):
>> >
>> > => explain analyze SELECT ip_src, port_src, ip_dst, port_dst,
>> > tcp_flags, ip_proto,SUM("bytes") FROM "acct_2010_25" WHERE
>> > "stamp_inserted">='2010-06-20 10:10' AND "stamp_inserted"<'2010-06-21
>> > 10:10' GROUP BY ip_src, port_src, ip_dst, port_dst, tcp_flags,
>> > ip_proto LIMIT 50 OFFSET 0;
>> >
>> >             QUERY PLAN
>> >
>> >
>> --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>> >  Limit  (cost=3395202.50..3395213.12 rows=50 width=42) (actual
>> > time=106261.359..106261.451 rows=50 loops=1)
>> >    ->  GroupAggregate  (cost=3395202.50..3602225.48 rows=974226
>> > width=42) (actual time=106261.357..106261.435 rows=50 loops=1)
>> >          ->  Sort  (cost=3395202.50..3419558.14 rows=9742258 width=42)
>> > (actual time=106261.107..106261.169 rows=176 loops=1)
>> >                Sort Key: ip_src, port_src, ip_dst, port_dst,
>> > tcp_flags, ip_proto
>> >                ->  Seq Scan on acct_2010_25  (cost=0.00..367529.72
>> > rows=9742258 width=42) (actual time=0.073..8058.598 rows=9494165
>> loops=1)
>> >                      Filter: ((stamp_inserted >= '2010-06-20
>> > 10:10:00'::timestamp without time zone) AND (stamp_inserted <
>> > '2010-06-21 10:10:00'::timestamp without time zone))
>> >  Total runtime: 109911.882 ms
>> > (7 registros)
>> >
>> >
>> > The server has 2 Intel(R) Xeon(R) CPU  E5430 @ 2.66GHz and 16GB RAM.
>> > I'm using PostgreSQL 8.1.18 default config from Centos 5.5 (just
>> > increased checkpoint_segments to 50).
>>
>> Checkpoint segments won't help you as the number of segments is about
>> writing to the database and how fast that can happen.
>>
>> >
>> > What can I change to increase performance?
>>
>> Increasing sort-memory (work_mem) will give you speed benefits even
>> though you are going to disk.  I don't know how much spare memory you
>> have, but trying other values between 8MB and 128MB may be useful just
>> for the specific query runs.  If you can afford 512Mb for each of the
>> two sorts, go for that, but it's dangerous as mentioned due to the risk
>> of using more RAM than you have.  work_mem allocates that amount of
>> memory per sort.
>>
>> If you are running these queries all the time, a summary table the
>> produces there reports on a regular basis, maybe daily or even hourly
>> would be useful.  Basically the large amount of information that needs
>> to be processed and sorted is what's taking all the time here.
>>
>> Regards
>>
>> Russell
>>
>
>
>
> --
> Sergio Roberto Charpinel Jr.
>



-- 
Sergio Roberto Charpinel Jr.

Reply via email to