hi,

avoid subselect: create a temp table and use join...

CREATE TEMP TABLE tmp AS
   SELECT
    lineitem.orderkey
   FROM
    lineitem
   WHERE
    lineitem.orderkey=orders.orderkey
   GROUP BY
    lineitem.orderkey HAVING
    sum(lineitem.quantity)>300;

CREATE INDEX tmp_idx ON tmp (orderkey);

 SELECT
  customer.name,
  customer.custkey,
  orders.orderkey,
  orders.orderdate,
  orders.totalprice,
  sum(lineitem.quantity)
 FROM
  customer,
  orders,
  lineitem,
  tmp
 WHERE
  orders.orderkey=tmp.orderkey
  AND customer.custkey=orders.custkey
  AND orders.orderkey=lineitem.orderkey
 GROUP BY
  customer.name,
  customer.custkey,
  orders.orderkey,
  orders.orderdate,
  orders.totalprice
 ORDER BY
  orders.totalprice DESC,
  orders.orderdate;


may be the index is not necessary...

kuba


> I'm going to keep on asking about another query:
>
> SELECT
>  customer.name,
>  customer.custkey,
>  orders.orderkey,
>  orders.orderdate,
>  orders.totalprice,
>  sum(lineitem.quantity)
> FROM
>  customer,
>  orders,
>  lineitem
> WHERE
>  exists(
>   SELECT
>    lineitem.orderkey
>   FROM
>    lineitem
>   WHERE
>    lineitem.orderkey=orders.orderkey
>   GROUP BY
>    lineitem.orderkey HAVING
>    sum(lineitem.quantity)>300
>   )
>  AND customer.custkey=orders.custkey
>  AND orders.orderkey=lineitem.orderkey
> GROUP BY
>  customer.name,
>  customer.custkey,
>  orders.orderkey,
>  orders.orderdate,
>  orders.totalprice
>
> ORDER BY
>  orders.totalprice DESC,
>  orders.orderdate;
>
> NOTICE:  QUERY PLAN:
>
> Sort  (cost=26923941.97..26923941.97 rows=300061 width=66)
>   ->  Aggregate  (cost=26851634.86..26896644.05 rows=300061 width=66)
>         ->  Group  (cost=26851634.86..26889142.52 rows=3000612 width=66)
>               ->  Sort  (cost=26851634.86..26851634.86 rows=3000612
> width=66)
>                     ->  Hash Join  (cost=26107574.81..26457309.10
> rows=3000612 width=66)
>                           ->  Seq Scan on lineitem  (cost=0.00..222208.25
> rows=6001225 width=8)
>                           ->  Hash  (cost=26105699.81..26105699.81
> rows=750000 width=58)
>                                 ->  Hash Join  (cost=7431.00..26105699.81
> rows=750000 width=58)
>                                       ->  Seq Scan on orders
> (cost=0.00..26083268.81 rows=750000 width=25)
>                                             SubPlan
>                                               ->  Aggregate
> (cost=0.00..17.35 rows=1 width=8)
>                                                     ->  Group
> (cost=0.00..17.34 rows=5 width=8)
>                                                           ->  Index Scan
> using lineitem_pkey on lineitem  (cost=0.00..17.33 rows=5 width=8)
>                                       ->  Hash  (cost=7056.00..7056.00
> rows=150000 width=33)
>                                             ->  Seq Scan on customer
> (cost=0.00..7056.00 rows=150000 width=33)
>
> again:
> orders 1500000 tuples
> lineitem 6000000 tuples there are 1 to 7 lineitems per orderkey
> Customer 150000 tuples
>
> select attname,n_distinct,correlation from pg_stats where
> tablename='lineitem';
>     attname    | n_distinct | correlation
> ---------------+------------+-------------
>  orderkey      |  -0.199847 |           1
>  partkey       |     196448 |   0.0223377
>  suppkey       |       9658 | -0.00822751
>  linenumber    |          7 |     0.17274
>  quantity      |         50 |   0.0150153
>  extendedprice |      25651 | -0.00790245
>  discount      |         11 |    0.103761
>  tax           |          9 |   0.0993771
>  returnflag    |          3 |    0.391434
>  linestatus    |          2 |    0.509791
>  shipdate      |       2440 |   0.0072777
>  commitdate    |       2497 |  0.00698162
>  receiptdate   |       2416 |  0.00726686
>  shipinstruct  |          4 |    0.241511
>  shipmode      |          7 |    0.138432
>  comment       |     275488 |   0.0188006
> (16 rows)
>
> select attname,n_distinct,correlation from pg_stats where
> tablename='orders';
>     attname    | n_distinct | correlation
> ---------------+------------+-------------
>  orderkey      |         -1 |   -0.999925
>  custkey       |      76309 |  0.00590596
>  orderstatus   |          3 |    0.451991
>  totalprice    |         -1 | -0.00768806
>  orderdate     |       2431 |  -0.0211354
>  orderpriority |          5 |    0.182489
>  clerk         |       1009 |  0.00546939
>  shippriority  |          1 |           1
>  comment       |  -0.750125 |  -0.0123887
>
> Customer
>   attname   | n_distinct | correlation
> ------------+------------+-------------
>  custkey    |         -1 |           1
>  name       |         -1 |           1
>  address    |         -1 | -0.00510274
>  nationkey  |         25 |   0.0170533
>  phone      |         -1 |  -0.0227816
>  acctbal    |   -0.83444 | -0.00220958
>  mktsegment |          5 |    0.205013
>  comment    |         -1 |   0.0327827
>
> This query takes 12 minutes to run and returns about 50 customers.
> lineitem.quantity takes values from 1 to 50, so 300 per orderkey is very
> restrictive
>
> May someone help on improving performance?
> Again thanks in advance
> Regards
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
> http://www.postgresql.org/users-lounge/docs/faq.html
>


---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
    (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Reply via email to