Re: [PERFORM] osdl-dbt3 run results - puzzled by the execution plans

2003-09-19 Thread Manfred Koizar
On Thu, 18 Sep 2003 15:36:50 -0700, Jenny Zhang [EMAIL PROTECTED]
wrote:
We thought the large effective_cache_size should lead us to better 
plans. But we found the opposite. 

The common structure of your query plans is:

 Sort
   Sort Key: sum((partsupp.ps_supplycost * partsupp.ps_availqty))
   InitPlan
 -  Aggregate
   -  SubPlan
   -  Aggregate
 Filter: (sum((ps_supplycost * ps_availqty))  $0)
 -  Group
   -  Sort
 Sort Key: partsupp.ps_partkey
 -  SubPlan (same as above)

where the SubPlan is

 -  Merge Join  (cost=519.60..99880.05 rows=32068 width=65)
 (actual time=114.78..17435.28 rows=30400 loops=1)
 ctr=5.73
   Merge Cond: (outer.ps_suppkey = inner.s_suppkey)
   -  Index Scan using i_ps_suppkey on partsupp
 (cost=0.00..96953.31 rows=801712 width=34)
 (actual time=0.42..14008.92 rows=799361 loops=1)
 ctr=6.92
   -  Sort  (cost=519.60..520.60 rows=400 width=31)
 (actual time=106.88..143.49 rows=30321 loops=1)
 ctr=3.63
 Sort Key: supplier.s_suppkey
 -  SubSubPlan

for large effective_cache_size and

 -  Nested Loop  (cost=0.00..130168.30 rows=32068 width=65)
  (actual time=0.56..1374.41 rows=30400 loops=1)
  ctr=94.71
   -  SubSubPlan
   -  Index Scan using i_ps_suppkey on partsupp
 (cost=0.00..323.16 rows=80 width=34)
 (actual time=0.16..2.98 rows=80 loops=380)
 ctr=108.44
 Index Cond: (partsupp.ps_suppkey = outer.s_suppkey)

for small effective_cache_size.  Both subplans have an almost
identical subsubplan:

-  Nested Loop  (cost=0.00..502.31 rows=400 width=31)
 (actual time=0.23..110.51 rows=380 loops=1)
 ctr=4.55
  Join Filter: (inner.s_nationkey = outer.n_nationkey)
  -  Seq Scan on nation  (cost=0.00..1.31 rows=1 width=10)
  (actual time=0.08..0.14 rows=1 loops=1)
  ctr=9.36
Filter: (n_name = 'ETHIOPIA'::bpchar)
  -  Seq Scan on supplier (cost=0.00..376.00 rows=1 width=21)
  (actual time=0.10..70.72 rows=1 loops=1)
   ctr=5.32

I have added the ctr (cost:time ratio) for each plan node.  These
values are mostly between 5 and 10 with two notable exceptions:

1) -  Sort  (cost=519.60..520.60 rows=400 width=31)
 (actual time=106.88..143.49 rows=30321 loops=1)
 ctr=3.63

It has already been noticed by Matt Clark that this is the only plan
node where the row count estimation looks wrong.  However, I don't
believe that this has great influence on the total cost of the plan,
because the ctr is not far from the usual range and if it were a bit
higher, it would only add a few hundred cost units to a branch costing
almost 10 units.  BTW I vaguely remember that there is something
strange with the way actual rows are counted inside a merge join.
Look at the branch below this plan node:  It shows an actual row count
of 380.

2) -  Index Scan using i_ps_suppkey on partsupp
 (cost=0.00..323.16 rows=80 width=34)
 (actual time=0.16..2.98 rows=80 loops=380)
 ctr=108.44

Here we have the only plan node where loops  1, and it is the only
one where the ctr is far off.  The planner computes the cost for one
loop and multiplies it by the number of loops (which it estimates
quite accurately to be 400), thus getting a total cost of ca. 13.
We have no reason to believe that the single loop cost is very far
from reality (for a *single* index scan), but the planner does not
account for additional index scans hitting pages in the cache that
have been brought in by preceding scans.  This is a known problem, Tom
has mentioned it several times, IIRC.

Now I'm very interested in getting a better understanding of this
problem, so could you please report the results of

. \d i_ps_suppkey

. VACUUM VERBOSE ANALYSE partsupp;
  VACUUM VERBOSE ANALYSE supplier;

. SELECT attname, null_frac, avg_witdh, n_distinct, correlation
FROM pg_stats
   WHERE tablename = 'partsupp' AND attname IN ('ps_suppkey', ...);

  Please insert other interesting column names for ..., especially
  those contained in i_ps_suppkey, if any.

. SELECT relname, relpages, reltuples
FROM pg_class
   WHERE relname IN ('partsupp', 'supplier', ...);
 ^^^
Add relevant index names here.

. EXPLAIN ANALYSE
  SELECT ps_partkey, ps_supplycost, ps_availqty
FROM partsupp, supplier
   WHERE ps_suppkey = s_suppkey AND s_nationkey = 'youknowit';

  The idea is to eliminate parts of the plan that are always the same.
  Omitting nation is possibly to much a simplification.  In this case
  please re-add it.
  Do this 

[PERFORM] osdl-dbt3 run results - puzzled by the execution plans

2003-09-18 Thread Jenny Zhang
Our hardware/software configuration:
kernel: 2.5.74
distro: RH7.2
pgsql:  7.3.3
CPUS:   8
MHz:700.217
model:  Pentium III (Cascades)
memory: 829 kB
shmmax: 3705032704

We did several sets of runs(repeating runs with the same database
parameters) and have the following observation:

1. With everything else the same, we did two run sets with
small effective_cache_size (default=1000) and large (655360 i.e. 5GB
or 60% of the system memory 8GB).  It seems to me that small 
effective_cache_size favors the choice of nested loop joins (NLJ) 
while the big effective_cache_size is in favor of merge joins (MJ).  
We thought the large effective_cache_size should lead us to better 
plans. But we found the opposite. 

Three plans out of 22 are different.  Two of those plans are worse 
in execution time by 2 times and 8 times.   For example, one plan, 
that included  NLJ ran in 4 seconds but the other, switching to an 
MJ, ran in  32 seconds.  Please refer to the link at the end of 
this mail for the query and plans.  Did we miss something, or 
improvements are needed for the optimizer?

2. Thanks to all the response we got from this mailing list, we 
decided to use SETSEED(0) default_statistics_target=1000 to reduce 
the variation.   We get now the exact the same execution plans 
and costs with repeated runs and that reduced the variation a lot.
However, within the same run set consist of 6 runs, we see 2-3% 
standard deviation for the run metrics associated with the multiple
stream part of the test (as opposed to the single stream part).

We would like to reduce the variation to be less than 1% so that a 
2% change between two different kernels would be significant. 
Is there anything else we can do?

query: http://developer.osdl.org/~jenny/11.sql
plan with small effective_cache_size: 
http://developer.osdl.org/~jenny/small_effective_cache_size_plan
plan with large effective_cache_size: 
http://developer.osdl.org/~jenny/large_effective_cache_size_plan

Thanks,
Jenny


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster