On Mon, 25 Aug 2003, Stephan Szabo wrote:
> Date: Mon, 25 Aug 2003 00:43:56 -0700 (PDT)
> From: Stephan Szabo <[EMAIL PROTECTED]>
> To: mike <[EMAIL PROTECTED]>
> Cc: [EMAIL PROTECTED]
> Subject: Re: [BUGS] index not used afer VACUUM ANALYZE
>
> On Thu, 21 Aug 2003, mike wrote:
>
> > Hi,
> > I hav a db as specified in nit.sql
> > flows has 763488 entries.
> >
> > After dropping/creating/loading the db and running auswert.sh I get
> > the attached result from query1.txt.
> > After 'VACUUM ANALYZE' I get the results from query2.txt
> >
> > As you can see, the indexes are not used any longer.
> > Why?
>
> It looks like the row estimates changed to say that a large % of the rows
> match the condition. Is that true? In any case, what does EXPLAIN
Partially.
I have statistical records (763488) - various IP-Traffic - collected for one
month.
After collection I try to condense the data for dayly statistics.
The EXPLAIN ANALYZE output is attached:
a1.txt is before, a2.txt after VACUUM ANALYZE run.
Bye/2
---
Michael Reifenberger, Business Unit Manager SAP-Basis, Plaut Consulting
Comp: [EMAIL PROTECTED] | Priv: [EMAIL PROTECTED]
http://www.plaut.de | http://www.Reifenberger.com
#
# Anzahl S�tze in der DB, von Datum, bis Datum, Anzahl Tage
#
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------
Aggregate (cost=41037.49..41037.50 rows=1 width=4) (actual time=6307.81..6307.82
rows=1 loops=1)
-> Seq Scan on flows (cost=0.00..31493.88 rows=763488 width=4) (actual
time=0.02..3134.26 rows=763488 loops=1)
Total runtime: 6323.81 msec
(3 rows)
#
# Tagesstatistik je Protokoll
#
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=14991.34..14991.66 rows=131 width=52) (actual time=16473.23..16473.77
rows=159 loops=1)
Sort Key: f.date, (sum(f.sdbytes) + sum(f.dsbytes))
-> HashAggregate (cost=14984.76..14986.73 rows=131 width=52) (actual
time=16471.51..16472.41 rows=159 loops=1)
-> Hash Join (cost=2.62..14947.53 rows=2482 width=52) (actual
time=1.53..11924.14 rows=753800 loops=1)
Hash Cond: ("outer"."type" = "inner".pnum)
-> Index Scan using flows_2 on flows f (cost=0.00..14786.46 rows=3818
width=24) (actual time=0.15..4180.55 rows=753800 loops=1)
Index Cond: ((date >= '2003-07-01'::date) AND (date <=
'2003-08-01'::date))
-> Hash (cost=2.30..2.30 rows=130 width=36) (actual time=1.17..1.17
rows=0 loops=1)
-> Seq Scan on protos p (cost=0.00..2.30 rows=130 width=36)
(actual time=0.06..0.62 rows=130 loops=1)
Total runtime: 16499.55 msec
(10 rows)
#
# Tagesstatistik gesamt
#
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=14834.22..14834.23 rows=2 width=20) (actual time=8519.01..8519.12 rows=31
loops=1)
Sort Key: date, (sum(sdbytes) + sum(dsbytes))
-> GroupAggregate (cost=0.00..14834.21 rows=2 width=20) (actual
time=309.70..8518.71 rows=31 loops=1)
-> Index Scan using flows_2 on flows f (cost=0.00..14786.46 rows=3818
width=20) (actual time=0.19..3989.96 rows=753800 loops=1)
Index Cond: ((date >= '2003-07-01'::date) AND (date <=
'2003-08-01'::date))
Total runtime: 8519.59 msec
(6 rows)
#
# Top 50: max( Bytes )
#
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=14884.07..14884.20 rows=50 width=72) (actual time=15852.17..15852.70
rows=50 loops=1)
-> Sort (cost=14884.07..14884.58 rows=201 width=72) (actual
time=15852.16..15852.34 rows=50 loops=1)
Sort Key: (sum(sdbytes) + sum(dsbytes))
-> HashAggregate (cost=14872.36..14876.38 rows=201 width=72) (actual
time=15406.83..15551.54 rows=23410 loops=1)
-> Index Scan using flows_2 on flows f (cost=0.00..14824.64 rows=3818
width=72) (actual time=0.21..6820.67 rows=753800 loops=1)
Index Cond: ((date >= '2003-07-01'::date) AND (date <=
'2003-08-01'::date))
Total runtime: 15907.26 msec
(7 rows)
#
# Anzahl S�tze in der DB, von Datum, bis Datum, Anzahl Tage
#
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------
Aggregate (cost=41037.49..41037.50 rows=1 width=4) (actual time=6226.68..6226.69
rows=1 loops=1)
-> Seq Scan on flows (cost=0.00..31493.88 rows=763488 width=4) (actual
time=0.02..3071.08 rows=763488 loops=1)
Total runtime: 6227.08 msec
(3 rows)
#
# Tagesstatistik je Protokoll
#
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=175003.67..175013.77 rows=4038 width=28) (actual time=30314.13..30314.68
rows=159 loops=1)
Sort Key: f.date, (sum(f.sdbytes) + sum(f.dsbytes))
-> GroupAggregate (cost=161732.39..174761.81 rows=4038 width=28) (actual
time=23452.70..30313.26 rows=159 loops=1)
-> Sort (cost=161732.39..163585.08 rows=741077 width=28) (actual
time=23452.30..26414.32 rows=753800 loops=1)
Sort Key: f.date, p.pname
-> Hash Join (cost=2.62..46430.11 rows=741077 width=28) (actual
time=1.31..11960.64 rows=753800 loops=1)
Hash Cond: ("outer"."type" = "inner".pnum)
-> Seq Scan on flows f (cost=0.00..35311.32 rows=741077
width=24) (actual time=0.07..3909.10 rows=753800 loops=1)
Filter: ((date >= '2003-07-01'::date) AND (date <=
'2003-08-01'::date))
-> Hash (cost=2.30..2.30 rows=130 width=12) (actual
time=1.13..1.13 rows=0 loops=1)
-> Seq Scan on protos p (cost=0.00..2.30 rows=130
width=12) (actual time=0.05..0.58 rows=130 loops=1)
Total runtime: 30319.64 msec
(12 rows)
#
# Tagesstatistik gesamt
#
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------
Sort (cost=44576.06..44576.14 rows=32 width=20) (actual time=7703.97..7704.07
rows=31 loops=1)
Sort Key: date, (sum(sdbytes) + sum(dsbytes))
-> HashAggregate (cost=44574.78..44575.26 rows=32 width=20) (actual
time=7703.55..7703.72 rows=31 loops=1)
-> Seq Scan on flows f (cost=0.00..35311.32 rows=741077 width=20) (actual
time=0.07..3450.81 rows=753800 loops=1)
Filter: ((date >= '2003-07-01'::date) AND (date <= '2003-08-01'::date))
Total runtime: 7704.81 msec
(6 rows)
#
# Top 50: max( Bytes )
#
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=306484.35..306484.47 rows=50 width=30) (actual time=39336.01..39336.64
rows=50 loops=1)
-> Sort (cost=306484.35..308337.04 rows=741077 width=30) (actual
time=39336.00..39336.18 rows=50 loops=1)
Sort Key: (sum(sdbytes) + sum(dsbytes))
-> GroupAggregate (cost=161634.37..187572.07 rows=741077 width=30) (actual
time=30577.11..39023.60 rows=23410 loops=1)
-> Sort (cost=161634.37..163487.06 rows=741077 width=30) (actual
time=30576.95..34875.71 rows=753800 loops=1)
Sort Key: network(set_masklen(sip, 16)), network(set_masklen(dip,
16))
-> Seq Scan on flows f (cost=0.00..42722.09 rows=741077
width=30) (actual time=0.12..6544.05 rows=753800 loops=1)
Filter: ((date >= '2003-07-01'::date) AND (date <=
'2003-08-01'::date))
Total runtime: 39399.83 msec
(9 rows)
vdat='2003-07-01'
bdat='2003-08-01'
# Auswertung tabelleninhalt
echo "#"
echo "# Anzahl S�tze in der DB, von Datum, bis Datum, Anzahl Tage"
echo "#"
psql -U pgsql nitpicker << EOF1
EXPLAIN ANALYZE select count(*),
min(date) as Von, max(date) as Bis, max(date)-min(date) as Tage
from flows;
EOF1
# Auswertung Tage, Protokoll, Bytes, Pakete aufsummiert
echo "#"
echo "# Tagesstatistik je Protokoll"
echo "#"
psql -U pgsql nitpicker << EOF2
EXPLAIN ANALYZE select f.date as datum, p.pname as Protokoll,
sum(f.sdbytes) + sum(f.dsbytes) as Bytes,
sum(f.sdpkgs) + sum(f.dspkgs) as Pakete
from protos p, flows f
where
f.date >= '$vdat' and f.date <= '$bdat' and
f.type = p.pnum
group by datum, protokoll
order by datum, Bytes desc
;
EOF2
echo "#"
echo "# Tagesstatistik gesamt"
echo "#"
psql -U pgsql nitpicker << EOF3
EXPLAIN ANALYZE select f.date as datum,
sum(f.sdbytes) + sum(f.dsbytes) as Bytes,
sum(f.sdpkgs) + sum(f.dspkgs) as Pakete
from flows f
where
f.date >= '$vdat' and f.date <= '$bdat'
group by datum
order by datum, Bytes desc
;
EOF3
echo "#"
echo "# Top 50: max( Bytes )"
echo "#"
psql -U pgsql nitpicker << EOF4
EXPLAIN ANALYZE select count(*),
network(set_masklen(sip,16)), network(set_masklen(dip,16)),
sum(sdbytes) + sum(dsbytes) as bytes
from flows f
where
f.date >= '$vdat' and f.date <= '$bdat'
group by network(set_masklen(sip,16)), network(set_masklen(dip,16))
order by bytes desc
limit 50;
EOF4
#psql -U pgsql nitpicker << EOF
# select
# -- n1.nname as src, n2.nname as dst
# count(*)
# from flows f, networks n1, networks n2, protos p
# where
# f.sip << n1.nnet and
# f.dip << n2.nnet
#EOF
#psql -U pgsql nitpicker -c "\
#select \
# f.sip, f.dip, p.pname, \
# f.sdpkgs, f.dspkgs \
# from flows f, protos p where \
# not f.sip << '194.39.177.0/24' and \
# not f.sip << '194.99.75.0/24' and \
# not f.sip = '195.212.179.2' and \
# not f.dip << '194.39.177.0/24' and \
# not f.dip << '194.99.75.0/24' and \
# not f.dip = '195.212.179.2' and \
# f.type = p.pnum \
# order by sip, dip;"
#psql -U pgsql nitpicker << EOFX
# drop table volumen;
# create table volumen (
# year int4,
# week int4,
# snet inet,
# dnet inet
# );
# drop table n_temp;
# create table n_temp (
# year int4,
# week int4,
# snet inet,
# dnet inet,
# sdbytes int8,
# dsbytes int8,
# sdpkgs int8,
# dspkgs int8
# );
#EOFX
#date
#psql -U pgsql nitpicker << EOF
# insert into n_temp
# select
# date_part( 'year', f.date ) ,
# date_part( 'week', f.date ),
# n.nnet, network( set_masklen( f.dip, 16 ) ),
# sum( f.sdbytes ), sum( f.dsbytes ),
# sum( f.sdpkgs ), sum( f.dspkgs )
# from flows f, networks n
# where
# f.sip <<= n.nnet and
# not f.dip <<= n.nnet
# group by f.date, n.nnet, network( set_masklen( f.dip, 16 ) )
# union
# select
# date_part( 'year', f.date ) ,
# date_part( 'week', f.date ),
# n.nnet, network( set_masklen( f.sip, 16 ) ),
# sum( f.dsbytes ), sum( f.sdbytes ),
# sum( f.dspkgs ), sum( f.sdpkgs )
# from flows f, networks n
# where
# f.dip <<= n.nnet and
# not f.sip <<= n.nnet
# group by f.date, n.nnet, network( set_masklen( f.sip, 16 ) )
# union
# select
# date_part( 'year', f.date ) ,
# date_part( 'week', f.date ),
# network( set_masklen( f.sip, 16 ) ), network( set_masklen( f.dip, 16 ) ),
# sum( f.sdbytes ), sum( f.dsbytes ),
# sum( f.sdpkgs ), sum( f.dspkgs )
# from flows f, networks n
# where
# not f.sip <<= n.nnet and
# not f.dip <<= n.nnet
# group by f.date, network( set_masklen( f.sip, 16 ) ), network( set_masklen( f.dip,
16 ) )
#EOF
#date
# f.date >= '2003-03-28' and f.date <= '2003-04-01'
---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly