> The cost is now only 1141741215.35 compared to 2777810917708.17
> before;  this is an improvement factor of more than 2000.  So what's
> your problem? ;-)
>
> Servus
>  Manfred
>
In fact planner is estimating incredibly badly, it took only 833msecs now
runs perfectly

I'm going to keep on asking about another query:

SELECT
 customer.name,
 customer.custkey,
 orders.orderkey,
 orders.orderdate,
 orders.totalprice,
 sum(lineitem.quantity)
FROM
 customer,
 orders,
 lineitem
WHERE
 exists(
  SELECT
   lineitem.orderkey
  FROM
   lineitem
  WHERE
   lineitem.orderkey=orders.orderkey
  GROUP BY
   lineitem.orderkey HAVING
   sum(lineitem.quantity)>300
  )
 AND customer.custkey=orders.custkey
 AND orders.orderkey=lineitem.orderkey
GROUP BY
 customer.name,
 customer.custkey,
 orders.orderkey,
 orders.orderdate,
 orders.totalprice

ORDER BY
 orders.totalprice DESC,
 orders.orderdate;

NOTICE:  QUERY PLAN:

Sort  (cost=26923941.97..26923941.97 rows=300061 width=66)
  ->  Aggregate  (cost=26851634.86..26896644.05 rows=300061 width=66)
        ->  Group  (cost=26851634.86..26889142.52 rows=3000612 width=66)
              ->  Sort  (cost=26851634.86..26851634.86 rows=3000612
width=66)
                    ->  Hash Join  (cost=26107574.81..26457309.10
rows=3000612 width=66)
                          ->  Seq Scan on lineitem  (cost=0.00..222208.25
rows=6001225 width=8)
                          ->  Hash  (cost=26105699.81..26105699.81
rows=750000 width=58)
                                ->  Hash Join  (cost=7431.00..26105699.81
rows=750000 width=58)
                                      ->  Seq Scan on orders
(cost=0.00..26083268.81 rows=750000 width=25)
                                            SubPlan
                                              ->  Aggregate
(cost=0.00..17.35 rows=1 width=8)
                                                    ->  Group
(cost=0.00..17.34 rows=5 width=8)
                                                          ->  Index Scan
using lineitem_pkey on lineitem  (cost=0.00..17.33 rows=5 width=8)
                                      ->  Hash  (cost=7056.00..7056.00
rows=150000 width=33)
                                            ->  Seq Scan on customer
(cost=0.00..7056.00 rows=150000 width=33)

again:
orders 1500000 tuples
lineitem 6000000 tuples there are 1 to 7 lineitems per orderkey
Customer 150000 tuples

select attname,n_distinct,correlation from pg_stats where
tablename='lineitem';
    attname    | n_distinct | correlation
---------------+------------+-------------
 orderkey      |  -0.199847 |           1
 partkey       |     196448 |   0.0223377
 suppkey       |       9658 | -0.00822751
 linenumber    |          7 |     0.17274
 quantity      |         50 |   0.0150153
 extendedprice |      25651 | -0.00790245
 discount      |         11 |    0.103761
 tax           |          9 |   0.0993771
 returnflag    |          3 |    0.391434
 linestatus    |          2 |    0.509791
 shipdate      |       2440 |   0.0072777
 commitdate    |       2497 |  0.00698162
 receiptdate   |       2416 |  0.00726686
 shipinstruct  |          4 |    0.241511
 shipmode      |          7 |    0.138432
 comment       |     275488 |   0.0188006
(16 rows)

select attname,n_distinct,correlation from pg_stats where
tablename='orders';
    attname    | n_distinct | correlation
---------------+------------+-------------
 orderkey      |         -1 |   -0.999925
 custkey       |      76309 |  0.00590596
 orderstatus   |          3 |    0.451991
 totalprice    |         -1 | -0.00768806
 orderdate     |       2431 |  -0.0211354
 orderpriority |          5 |    0.182489
 clerk         |       1009 |  0.00546939
 shippriority  |          1 |           1
 comment       |  -0.750125 |  -0.0123887

Customer
  attname   | n_distinct | correlation
------------+------------+-------------
 custkey    |         -1 |           1
 name       |         -1 |           1
 address    |         -1 | -0.00510274
 nationkey  |         25 |   0.0170533
 phone      |         -1 |  -0.0227816
 acctbal    |   -0.83444 | -0.00220958
 mktsegment |          5 |    0.205013
 comment    |         -1 |   0.0327827

This query takes 12 minutes to run and returns about 50 customers.
lineitem.quantity takes values from 1 to 50, so 300 per orderkey is very
restrictive

May someone help on improving performance?
Again thanks in advance
Regards



---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

Reply via email to