Re: [SQL] [HACKERS] please help on query

2002-07-12 Thread Jakub Ouhrabka

hi,

avoid subselect: create a temp table and use join...

CREATE TEMP TABLE tmp AS
   SELECT
lineitem.orderkey
   FROM
lineitem
   WHERE
lineitem.orderkey=orders.orderkey
   GROUP BY
lineitem.orderkey HAVING
sum(lineitem.quantity)>300;

CREATE INDEX tmp_idx ON tmp (orderkey);

 SELECT
  customer.name,
  customer.custkey,
  orders.orderkey,
  orders.orderdate,
  orders.totalprice,
  sum(lineitem.quantity)
 FROM
  customer,
  orders,
  lineitem,
  tmp
 WHERE
  orders.orderkey=tmp.orderkey
  AND customer.custkey=orders.custkey
  AND orders.orderkey=lineitem.orderkey
 GROUP BY
  customer.name,
  customer.custkey,
  orders.orderkey,
  orders.orderdate,
  orders.totalprice
 ORDER BY
  orders.totalprice DESC,
  orders.orderdate;


may be the index is not necessary...

kuba


> I'm going to keep on asking about another query:
>
> SELECT
>  customer.name,
>  customer.custkey,
>  orders.orderkey,
>  orders.orderdate,
>  orders.totalprice,
>  sum(lineitem.quantity)
> FROM
>  customer,
>  orders,
>  lineitem
> WHERE
>  exists(
>   SELECT
>lineitem.orderkey
>   FROM
>lineitem
>   WHERE
>lineitem.orderkey=orders.orderkey
>   GROUP BY
>lineitem.orderkey HAVING
>sum(lineitem.quantity)>300
>   )
>  AND customer.custkey=orders.custkey
>  AND orders.orderkey=lineitem.orderkey
> GROUP BY
>  customer.name,
>  customer.custkey,
>  orders.orderkey,
>  orders.orderdate,
>  orders.totalprice
>
> ORDER BY
>  orders.totalprice DESC,
>  orders.orderdate;
>
> NOTICE:  QUERY PLAN:
>
> Sort  (cost=26923941.97..26923941.97 rows=300061 width=66)
>   ->  Aggregate  (cost=26851634.86..26896644.05 rows=300061 width=66)
> ->  Group  (cost=26851634.86..26889142.52 rows=3000612 width=66)
>   ->  Sort  (cost=26851634.86..26851634.86 rows=3000612
> width=66)
> ->  Hash Join  (cost=26107574.81..26457309.10
> rows=3000612 width=66)
>   ->  Seq Scan on lineitem  (cost=0.00..08.25
> rows=6001225 width=8)
>   ->  Hash  (cost=26105699.81..26105699.81
> rows=75 width=58)
> ->  Hash Join  (cost=7431.00..26105699.81
> rows=75 width=58)
>   ->  Seq Scan on orders
> (cost=0.00..26083268.81 rows=75 width=25)
> SubPlan
>   ->  Aggregate
> (cost=0.00..17.35 rows=1 width=8)
> ->  Group
> (cost=0.00..17.34 rows=5 width=8)
>   ->  Index Scan
> using lineitem_pkey on lineitem  (cost=0.00..17.33 rows=5 width=8)
>   ->  Hash  (cost=7056.00..7056.00
> rows=15 width=33)
> ->  Seq Scan on customer
> (cost=0.00..7056.00 rows=15 width=33)
>
> again:
> orders 150 tuples
> lineitem 600 tuples there are 1 to 7 lineitems per orderkey
> Customer 15 tuples
>
> select attname,n_distinct,correlation from pg_stats where
> tablename='lineitem';
> attname| n_distinct | correlation
> ---++-
>  orderkey  |  -0.199847 |   1
>  partkey   | 196448 |   0.0223377
>  suppkey   |   9658 | -0.00822751
>  linenumber|  7 | 0.17274
>  quantity  | 50 |   0.0150153
>  extendedprice |  25651 | -0.00790245
>  discount  | 11 |0.103761
>  tax   |  9 |   0.0993771
>  returnflag|  3 |0.391434
>  linestatus|  2 |0.509791
>  shipdate  |   2440 |   0.0072777
>  commitdate|   2497 |  0.00698162
>  receiptdate   |   2416 |  0.00726686
>  shipinstruct  |  4 |0.241511
>  shipmode  |  7 |0.138432
>  comment   | 275488 |   0.0188006
> (16 rows)
>
> select attname,n_distinct,correlation from pg_stats where
> tablename='orders';
> attname| n_distinct | correlation
> ---++-
>  orderkey  | -1 |   -0.25
>  custkey   |  76309 |  0.00590596
>  orderstatus   |  3 |0.451991
>  totalprice| -1 | -0.00768806
>  orderdate |   2431 |  -0.0211354
>  orderpriority |  5 |0.182489
>  clerk |   1009 |  0.00546939
>  shippriority  |  1 |   1
>  comment   |  -0.750125 |  -0.0123887
>
> Customer
>   attname   | n_distinct | correlation
> ++-
>  custkey| -1 |   1
>  name   | -1 |   1
>  address| -1 | -0.00510274
>  nationkey  | 25 |   0.0170533
>  phone  | -1 |  -0.0227816
>  acctbal|   -0.83444 | -0.00220958
>  mktsegment |  5 |0.205013
>  comment| -1 |   0.0327827
>
> This query takes 12 minutes to run and returns about 50 customers.
> l

Re: [SQL] [HACKERS] please help on query

2002-07-12 Thread Jakub Ouhrabka

hi,

avoid subselect: create a temp table and use join...

CREATE TEMP TABLE tmp AS
   SELECT
lineitem.orderkey
   FROM
lineitem
   WHERE
lineitem.orderkey=orders.orderkey
   GROUP BY
lineitem.orderkey HAVING
sum(lineitem.quantity)>300;

CREATE INDEX tmp_idx ON tmp (orderkey);

 SELECT
  customer.name,
  customer.custkey,
  orders.orderkey,
  orders.orderdate,
  orders.totalprice,
  sum(lineitem.quantity)
 FROM
  customer,
  orders,
  lineitem,
  tmp
 WHERE
  orders.orderkey=tmp.orderkey
  AND customer.custkey=orders.custkey
  AND orders.orderkey=lineitem.orderkey
 GROUP BY
  customer.name,
  customer.custkey,
  orders.orderkey,
  orders.orderdate,
  orders.totalprice
 ORDER BY
  orders.totalprice DESC,
  orders.orderdate;


may be the index is not necessary...

kuba


> I'm going to keep on asking about another query:
>
> SELECT
>  customer.name,
>  customer.custkey,
>  orders.orderkey,
>  orders.orderdate,
>  orders.totalprice,
>  sum(lineitem.quantity)
> FROM
>  customer,
>  orders,
>  lineitem
> WHERE
>  exists(
>   SELECT
>lineitem.orderkey
>   FROM
>lineitem
>   WHERE
>lineitem.orderkey=orders.orderkey
>   GROUP BY
>lineitem.orderkey HAVING
>sum(lineitem.quantity)>300
>   )
>  AND customer.custkey=orders.custkey
>  AND orders.orderkey=lineitem.orderkey
> GROUP BY
>  customer.name,
>  customer.custkey,
>  orders.orderkey,
>  orders.orderdate,
>  orders.totalprice
>
> ORDER BY
>  orders.totalprice DESC,
>  orders.orderdate;
>
> NOTICE:  QUERY PLAN:
>
> Sort  (cost=26923941.97..26923941.97 rows=300061 width=66)
>   ->  Aggregate  (cost=26851634.86..26896644.05 rows=300061 width=66)
> ->  Group  (cost=26851634.86..26889142.52 rows=3000612 width=66)
>   ->  Sort  (cost=26851634.86..26851634.86 rows=3000612
> width=66)
> ->  Hash Join  (cost=26107574.81..26457309.10
> rows=3000612 width=66)
>   ->  Seq Scan on lineitem  (cost=0.00..08.25
> rows=6001225 width=8)
>   ->  Hash  (cost=26105699.81..26105699.81
> rows=75 width=58)
> ->  Hash Join  (cost=7431.00..26105699.81
> rows=75 width=58)
>   ->  Seq Scan on orders
> (cost=0.00..26083268.81 rows=75 width=25)
> SubPlan
>   ->  Aggregate
> (cost=0.00..17.35 rows=1 width=8)
> ->  Group
> (cost=0.00..17.34 rows=5 width=8)
>   ->  Index Scan
> using lineitem_pkey on lineitem  (cost=0.00..17.33 rows=5 width=8)
>   ->  Hash  (cost=7056.00..7056.00
> rows=15 width=33)
> ->  Seq Scan on customer
> (cost=0.00..7056.00 rows=15 width=33)
>
> again:
> orders 150 tuples
> lineitem 600 tuples there are 1 to 7 lineitems per orderkey
> Customer 15 tuples
>
> select attname,n_distinct,correlation from pg_stats where
> tablename='lineitem';
> attname| n_distinct | correlation
> ---++-
>  orderkey  |  -0.199847 |   1
>  partkey   | 196448 |   0.0223377
>  suppkey   |   9658 | -0.00822751
>  linenumber|  7 | 0.17274
>  quantity  | 50 |   0.0150153
>  extendedprice |  25651 | -0.00790245
>  discount  | 11 |0.103761
>  tax   |  9 |   0.0993771
>  returnflag|  3 |0.391434
>  linestatus|  2 |0.509791
>  shipdate  |   2440 |   0.0072777
>  commitdate|   2497 |  0.00698162
>  receiptdate   |   2416 |  0.00726686
>  shipinstruct  |  4 |0.241511
>  shipmode  |  7 |0.138432
>  comment   | 275488 |   0.0188006
> (16 rows)
>
> select attname,n_distinct,correlation from pg_stats where
> tablename='orders';
> attname| n_distinct | correlation
> ---++-
>  orderkey  | -1 |   -0.25
>  custkey   |  76309 |  0.00590596
>  orderstatus   |  3 |0.451991
>  totalprice| -1 | -0.00768806
>  orderdate |   2431 |  -0.0211354
>  orderpriority |  5 |0.182489
>  clerk |   1009 |  0.00546939
>  shippriority  |  1 |   1
>  comment   |  -0.750125 |  -0.0123887
>
> Customer
>   attname   | n_distinct | correlation
> ++-
>  custkey| -1 |   1
>  name   | -1 |   1
>  address| -1 | -0.00510274
>  nationkey  | 25 |   0.0170533
>  phone  | -1 |  -0.0227816
>  acctbal|   -0.83444 | -0.00220958
>  mktsegment |  5 |0.205013
>  comment| -1 |   0.0327827
>
> This query takes 12 minutes to run and returns about 50 customers.
> l

Re: [SQL] why the difference?

2002-11-21 Thread Jakub Ouhrabka
hi,

the where clause is evaluated before the distinct clause, so your queries
aren't equivalent because you switched the order by splitting the query
into two queries... so to obtain same results do create table as select
... where category_id=781 and then select distinct on () ...

hth,

kuba



> tradein_clients=# select distinct on  (amount,co_name,city)  
>category_id,amount,co_name,city from eyp_listing
> where keywordidx ## 'vegetable'  and  category_id=781 ;
>
>  category_id | amount |  co_name  |city
> -++---+
>  781 |  0 | ANURADHA EXPORTS  | CHENNAI
>  781 |  0 | R.K.INTERNATIONAL | CHENNAI
>  781 |  0 | SAI IMPEX | MUMBAI
>  781 |  0 | TRIMA ENTERPRISES | CHENNAI
>  781 |  0 | UNIQUE DEHYDRATES LIMITED | AHMEDABAD
>  781 |   5000 | RSV EXPORT| COIMBATORE
> (6 rows)
>
>
> lets remove the contraint "category_id=781" and store the output in a table "t_a".
>
> tradein_clients=# CREATE TABLE t_a AS select distinct on  (amount,co_name,city)  
>category_id,amount,co_name,city from
> eyp_listing  where keywordidx ## 'vegetable'  ;
>
> then when i select from t_a with category_id=781 i have less secords
>
> tradein_clients=# SELECT * from t_a where category_id=781;
>  category_id | amount |  co_name  |city
> -++---+
>  781 |  0 | R.K.INTERNATIONAL | CHENNAI
>  781 |  0 | SAI IMPEX | MUMBAI
>  781 |  0 | UNIQUE DEHYDRATES LIMITED | AHMEDABAD
>  781 |   5000 | RSV EXPORT| COIMBATORE
> (4 rows)
>
>
> Can anyone please explain the difference?
>
>
> Regds
> Mallah.
>
>
>
>
>
>
>
>
> --
> Rajesh Kumar Mallah,
> Project Manager (Development)
> Infocom Network Limited, New Delhi
> phone: +91(11)6152172 (221) (L) ,9811255597 (M)
>
> Visit http://www.trade-india.com ,
> India's Leading B2B eMarketplace.
>
>
>
> ---(end of broadcast)---
> TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
>


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: [SQL] order by and limit with multi-column index, bug?

2002-12-13 Thread Jakub Ouhrabka
hi,

could it be that "order by col1, col2 desc" is different from "order by
col1 desc, col2 desc" ? these are different and it's correct.

i'm not sure if this is your problem since i haven't digged into it...
sorry... but check it to be sure...

hth,

kuba



On Fri, 13 Dec 2002, Harald Krake wrote:

> as a workaround for min()/max() on indexed columns forcing an index-scan
> I tried "order by" with "limit 1". Works fine for the first record
> but fails for the last. I don't know why.
>
> Here's the setup:
>
> A table "journal" containing several million records with an index
>
> "CREATE INDEX journal_kblattsoll ON journal
>  (sollkontoid,waehrungid,periode,belegdatum,journalnr)"
>
> with periode being an INT, belegdatum DATE, the rest is INT8.
>
> As a replacement for
>
> "select min(periode) from journal
> where sollkontoid=266122::int8 and waehrungid=17::int8"
>
> which for some reason in the design of postgres scans the whole index,
> I tried:
>
> "select   sollkontoid,waehrungid,periode,belegdatum,journalnr from journal
>  where sollkontoid=266122::int8 and waehrungid=17::int8
>  order by sollkontoid,waehrungid,periode,belegdatum,journalnr asc limit 1;"
>
> this yields:
>  sollkontoid | waehrungid | periode | belegdatum | journalnr
> -++-++---
>   266122 | 17 |   0 | 2002-01-01 |   411
>
> which is correct and works in a fraction of a second as expected.
>
> now, doing the same with "desc" instead of "asc" should return
> "periode = 12" (see below) for the last record, but it doesn't!
>
> After a fairly long time I get:
>
>  sollkontoid | waehrungid | periode | belegdatum | journalnr
> -++-++---
>   266122 | 17 |   0 | 2002-01-01 |  2783
>
> ooops???! periode = 0???
>
> Query plan:
>  Limit  (cost=491999.72..491999.73 rows=1 width=32)
>->  Sort  (cost=491999.72..492309.30 rows=123828 width=32)
>  Sort Key: sollkontoid, waehrungid, periode, belegdatum, journalnr
>  ->  Index Scan using journal_kblattsoll on journal
> (cost=0.00..481525.10 rows=123828 width=32)
>Index Cond: ((sollkontoid = 266122::bigint) AND (waehrungid =
> 17::bigint))
>
>
> Surprisingly enough, reducing the fields in the order-by clause
> returns the correct value for "periode":
>
> select   sollkontoid,waehrungid,periode,belegdatum,journalnr from journal
> where sollkontoid=266122::int8 and waehrungid=17::int8
> order by sollkontoid,waehrungid,periode desc limit 1;
>
>  sollkontoid | waehrungid | periode | belegdatum | journalnr
> -++-++---
>   266122 | 17 |  12 | 2002-12-09 |303609
>
>
>
> min/max-checks:
>
> select max(periode)from journal where sollkontoid=266122::int8 and
>  waehrungid=17::int8;
>
> returns 12.
>
> select max(belegdatum) from journal where sollkontoid=266122::int8 and
>  waehrungid=17::int8 and periode=12;
>
> returns "2002-12-10"
>
>  select max(journalnr) from journal where sollkontoid=266122::int8 and
>  waehrungid=17::int8 and periode=12 and belegdatum='2002-12-10';
>
> returns 305098.
>
> Consequently, the last record according to the order by clause should be:
>
>  sollkontoid | waehrungid | periode | belegdatum | journalnr
> -++-++---
>   266122 | 17 |  12 | 2002-12-10 |305098
>
>
>
>
> questions:
>
> - what's this???
> - why does it take that long?
> - is "ORDER BY ... DESC" broken? (even after dropping the index I get
>   the same results)
> - am I missing something (friday, 13th, ...) ?
>
> thanx for any answer,
> harald.
>
> (postgres 7.3 on redhat 8.0)
>
>
> ---(end of broadcast)---
> TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
>


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: [SQL] [GENERAL] working around setQuerySnapshot limitations in functions

2002-12-16 Thread Jakub Ouhrabka
Hi,

I'm no expert on this but I think that some cases can be work around using
foo updates instead of selects. See threads "LOCK TABLE oddness in PLpgSQL
function called via JDBC" and "SetQuerySnapshot, once again" in the
archives.
But I think there is no general work around using only plpgsql. I'd also
love to hear about one...

HTH,

kuba

> I have a function called move_tree() which moves a node from 1 parent to
> another (in a nested set tree) by computing some necessary values, then
> updating the nodes as needed.  The problem I have is that when the
> function is called multiple times simultaneously, each function does a
> setQuerySnapshot at the start of the function, and then cannot see the
> changes made by the other function calls. I had thought to do a lock on
> the table at the start of my function, but due to the fact
> setQuerySnapshot is called at the function start (before I can acquire a
> lock) I can't guarantee no changes have been made before the function
> tries to make changes. ISTM the only way around this is to do a lock on
> the table before calling the function. The major downside to that it
> requires me to put that logic inside any apps that use the function,
> rather than being able to keep it inside the database.  Does anyone see
> another work-around?
>
> Robert Treat
>
>
>
>
>
> ---(end of broadcast)---
> TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
>



---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster