[SQL] Performance of a view

2005-11-16 Thread John McCawley

Hello all,

I have a fairly complex query whose performance problem I have isolated 
to a fairly small subset.  The pertinent parts of the table structure 
are as follows:


//-

tbl_claim
claim_id integer SERIAL PRIMARY KEY;
claimnum varchar(32);

//-

tbl_invoice
invoice_id integer SERIAL PRIMARY KEY;
claim_id integer integer;
invoicedate timestamp;
active integer;

//-

there is an index on claimnum, and the claim_id in tbl_invoice is a 
foreign key which references tbl_claim


I have a view which is defined as follows:

//-
SELECT tbl_claim.claim_id, count(tbl_invoice.invoice_id) AS count, 
min(tbl_invoice.invoicedate) AS invoicedate

 FROM tbl_claim
 LEFT JOIN tbl_invoice ON tbl_claim.claim_id = tbl_invoice.claim_id AND 
tbl_invoice.active = 1

GROUP BY tbl_claim.claim_id;
//-


If I runn the following:

EXPLAIN ANALYZE SELECT tbl_claim.claim_id FROM tbl_claim WHERE claimnum 
= 'L1J8823';


I get:

Index Scan using idx_claim_claimnum on tbl_claim  (cost=0.00..10.01 
rows=2 width=4) (actual time=0.079..0.088 rows=2 loops=1)

 Index Cond: ((claimnum)::text = 'L1J8823'::text)
Total runtime: 0.123 ms


If I run:

EXPLAIN ANALYZE SELECT
tbl_claim.claim_id FROM tbl_claim INNER JOIN vw_claiminvoicecount ON 
tbl_claim.claim_id = vw_claiminvoicecount.claim_id WHERE

tbl_claim.claim_id = 217778;

I get:
Nested Loop  (cost=17.21..25.50 rows=4 width=4) (actual 
time=0.069..0.076 rows=1 loops=1)
 ->  Index Scan using tbl_claim_pkey on tbl_claim  (cost=0.00..8.21 
rows=2 width=4) (actual time=0.020..0.021 rows=1 loops=1)

   Index Cond: (claim_id = 217778)
 ->  Materialize  (cost=17.21..17.23 rows=2 width=4) (actual 
time=0.044..0.047 rows=1 loops=1)
   ->  Subquery Scan vw_claiminvoicecount  (cost=0.00..17.21 rows=2 
width=4) (actual time=0.041..0.043 rows=1 loops=1)
 ->  GroupAggregate  (cost=0.00..17.19 rows=2 width=16) 
(actual time=0.039..0.040 rows=1 loops=1)
   ->  Nested Loop Left Join  (cost=0.00..17.17 rows=2 
width=16) (actual time=0.024..0.030 rows=1 loops=1)
 ->  Index Scan using tbl_claim_pkey on 
tbl_claim  (cost=0.00..8.21 rows=2 width=4) (actual time=0.005..0.007 
rows=1 loops=1)

   Index Cond: (217778 = claim_id)
 ->  Index Scan using idx_tbl_invoice_claim_id 
on tbl_invoice  (cost=0.00..4.39 rows=7 width=16) (actual 
time=0.014..0.018 rows=1 loops=1)
   Index Cond: ("outer".claim_id = 
tbl_invoice.claim_id)

   Filter: (active = 1)
Total runtime: 0.232 ms


However, if I run:

EXPLAIN ANALYZE SELECT
tbl_claim.claim_id FROM tbl_claim INNER JOIN vw_claiminvoicecount ON 
tbl_claim.claim_id = vw_claiminvoicecount.claim_id WHERE

tbl_claim.claimnum = 'L1J8823';


I get:

Merge Join  (cost=60015.93..69488.39 rows=3 width=4) (actual 
time=4605.711..4605.762 rows=2 loops=1)

 Merge Cond: ("outer".claim_id = "inner".claim_id)
 ->  Subquery Scan vw_claiminvoicecount  (cost=60005.91..68940.54 
rows=215119 width=4) (actual time=3074.520..4491.423 rows=157215 loops=1)
   ->  GroupAggregate  (cost=60005.91..66789.35 rows=215119 
width=16) (actual time=3074.515..4265.315 rows=157215 loops=1)
 ->  Merge Left Join  (cost=60005.91..64100.37 rows=215119 
width=16) (actual time=3074.493..3845.516 rows=162280 loops=1)

   Merge Cond: ("outer".claim_id = "inner".claim_id)
   ->  Sort  (cost=29403.35..29941.15 rows=215119 
width=4) (actual time=1253.372..1392.089 rows=157216 loops=1)

 Sort Key: tbl_claim.claim_id
 ->  Seq Scan on tbl_claim  (cost=0.00..7775.19 
rows=215119 width=4) (actual time=0.031..336.606 rows=215119 loops=1)
   ->  Sort  (cost=30602.56..31146.52 rows=217582 
width=16) (actual time=1821.075..1967.639 rows=151988 loops=1)

 Sort Key: tbl_invoice.claim_id
 ->  Seq Scan on tbl_invoice  
(cost=0.00..6967.61 rows=217582 width=16) (actual time=0.066..507.189 
rows=219530 loops=1)

   Filter: (active = 1)
 ->  Sort  (cost=10.02..10.03 rows=2 width=4) (actual time=0.144..0.145 
rows=2 loops=1)

   Sort Key: tbl_claim.claim_id
   ->  Index Scan using idx_claim_claimnum on tbl_claim  
(cost=0.00..10.01 rows=2 width=4) (actual time=0.120..0.127 rows=2 loops=1)

 Index Cond: ((claimnum)::text = 'L1J8823'::text)
Total runtime: 4620.653 ms


I roughly understand what is happening...in the first query, the dataset 
is being knocked down to one row, then somehow the view is being 
constructed using only that subset of the claim table.  In the second 
query, the view is being constructed from the entire dataset which is 
hundreds of thousands of rows, and thus is much slower.


My question is how would I go about obtaining th

Re: [SQL] Foreign key to 2 tables problem

2005-11-22 Thread John McCawley
I've never seen anything like that.  I'm sure it's conceivable that you 
could write a weird trigger for it, but you have to consider 
maintainability, and what your queries are going to look like.  I 
haven't seen your datamodel, but it would seem that you could accomplish 
what you're looking for by having two separate foriegn key columns in 
the bankaccount table, one for the customer and one for the supplier.  
While your queries may end up somewhat funky, I can't imagine they'd be 
any worse than what would occur with what you're suggesting.


However, if customers or suppliers can have multiple accounts, you are 
going to need an intermediate table, as suggested by Neil.



Joost Kraaijeveld wrote:


Hi,

Is there a way to create a foreign key to 2 tables: e.g. a bankaccount
table that has a column "owner", that must point to a record in either
the customer or the supplier table?


TIA

 



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [SQL] Foreign key to 2 tables problem

2005-11-22 Thread John McCawley


However, if customers or suppliers can have multiple accounts, you are 
going to need an intermediate table, as suggested by Neil. 


Scratch that.  If accounts can have multiple owners you'll need an 
intermediate table.



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [SQL] deferrable on unique

2005-11-22 Thread John McCawley

It's a low-tech solution but you could:

 begin
 update t1 set seq=-1 where id=5
 update t1 set seq=5 where id=4
 update t1 set seq=4 where id=-1
 end

This is assuming that you don't naturally have -1 as a valid value of 
that column.


chester c young wrote:


table t1:
 id integer primary key,
 seq integer not null unique

the seq is for ordering the rows as the user likes.  however, if the
rows are moved around, eg
 begin
 update t1 set seq=4 where id=5
 update t1 set seq=5 where id=4
 end
will bomb because the first update has two rows of seq=4 (although
correct after the transaction).

I thought "deferrable initally deferred" would fix this, but the phrase
is not allowed on unique.

is this correct?  any ideas?


__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster
 



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [SQL] DB design and foreign keys

2005-12-13 Thread John McCawley
Table orders defines the column order_code as a serial, which simple 
makes a trigger which gives a new value to the column on insert.  Note 
that there is NO guarantee that ths column will be unique.  You can 
manually update the value to whatever you want.  If you wish this column 
to be unique, you must specify it on creation, or later do an alter 
table add constraint to the column.


A foreign key requires that the referenced column be unique (DB 
enforced, not just coincidentally unique), and that' s why your table 
creation is failing.


Gianluca Riccardi wrote:


hello all,
i'm usign PostgreSQL 7.4.7 in a Debian 3.1

following is the SQL schema of my (very)small DB for a (very small)web 
business application:


-- SQL schema for business-test-db

CREATE TABLE customers (
  customer_code serial,
  alfa_customer_code varchar(6),
  customer_name character varying(250) NOT NULL,
  address character varying(250) NOT NULL,
  city character varying(250) NOT NULL,
  zip_code character varying(8) NOT NULL,
  prov character varying(30) NOT NULL,
  security character varying(15) NOT NULL,
  tel character varying(30),
  tel2 character varying(20) NOT NULL,
  fax character varying(250),
  url character varying(250),
  email1 character varying(250) NOT NULL,
  email2 character varying(250) NOT NULL,
  discount1 integer,
  discount2 integer,
  PRIMARY KEY (customer_code)
);

CREATE TABLE users  (
  id smallint NOT NULL,
  login varchar(20) NOT NULL,
  pwd varchar(20) NOT NULL,
  name varchar(20) NOT NULL,
  customer_code int REFERENCES customers (customer_code),
  valid date,
  primary key (id)
);

CREATE TABLE products   (
  id serial,
  code varchar(60) UNIQUE NOT NULL,
  description varchar(250) NOT NULL,
  dimensions varchar(250) NOT NULL,
  price numeric NOT NULL,
  state boolean,
  PRIMARY KEY (id)
);

CREATE TABLE orders  (
  id serial,
  order_code serial,
  customer_code integer REFERENCES customers (customer_code) NOT NULL,
  order_date time without time zone NOT NULL,
  remote_ip inet NOT NULL,
  order_time timestamp with time zone NOT NULL,
  order_type varchar(10) NOT NULL,
  state varchar(10) NOT NULL,
  PRIMARY KEY (id, order_code)
);

CREATE TABLE order_items (
  id serial,
  order_code integer REFERENCES orders (order_code) NOT NULL,
  customer_code integer REFERENCES customers (customer_code) NOT NULL,
  product_code varchar(60) REFERENCES products (code) NOT NULL,
  qty int NOT NULL,
  price numeric REFERENCES products (price) NOT NULL,
  row_price numeric,
  PRIMARY KEY (id, order_code)
);


--
-- END OF FILE

the tables: customers, users, products and orders are created as the 
SQL states.


when i try to create the table order_items postgresql gives the 
following error:


business-test-db=# CREATE TABLE order_items (
business-test-db(#id serial,
business-test-db(#order_code integer REFERENCES orders 
(order_code) NOT NULL,
business-test-db(#customer_code integer REFERENCES customers 
(customer_code) NOT NULL,
business-test-db(#product_code varchar(60) REFERENCES products 
(code) NOT NULL,

business-test-db(#qty int NOT NULL,
business-test-db(#price numeric REFERENCES products (price) NOT NULL,
business-test-db(#row_price numeric,
business-test-db(#PRIMARY KEY (id, order_code)
business-test-db(# );
NOTICE:  CREATE TABLE will create implicit sequence 
"order_items_id_seq" for "serial" column "order_items.id"
NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index 
"order_items_pkey" for table "order_items"
ERROR:  there is no unique constraint matching given keys for 
referenced table "orders"

business-test-db=#


i'm a RTFM man, but i miss the point from the documentation obviously, 
because what i don't understand is why the referenced column isn't 
considered to be unique.
More doubts come into play when i see that the referenced key 
customers(customer_code) by the referencing table orders gives no errors.
I'm not a native english speaker so probably that gives some more 
difficulties.


Thanks in advance to all will contribute a focusing help.

best regards from a proude-to-be postgresql user :-),
Gianluca Riccardi

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster



---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [SQL] "large" IN/NOT IN subqueries result in query returning

2005-12-27 Thread John McCawley
At a glance I would guess that NULL values in one or both of your tables 
is the culprit.  NULL values always return false.


Example:

A quick test on my database:

select count(*) FROM tbl_employee;

count
---
 2689

select count(*) FROM tbl_employee WHERE username IS NULL;
count
---
   35

So I have 35 null values.

create table tbl_foo (username varchar(32));

insert into tbl_foo (username) values ('poop');

select count(*) FROM tbl_employee WHERE tbl_employee.username NOT IN 
(select tbl_foo.username FROM tbl_foo);

count
---
 2654

So I only get 2654 values that are NOT IN the set 'poop'...i.e. the NULL 
values are not included when I use the "NOT IN" my query.  Nulls can be 
confusing.


Hope this helps.

George Pavlov wrote:


The following looks like a bug to me, but please someone review and tell
me what I might be missing. Seems that past a certain result set size a
"[NOT] IN (subquery)" stops behaving as expected and returns 0 matches
even when there should be matches. No errors are returned, just faulty
data. The exact threshholds seem to depend on the data, subquery and
possibly the indexes in place. Nothing in the documentation mentions a
limit to what "IN subquery" can take so I am quite dismayed (if there is
a limit I would expect an ERROR/WARNING/something, but not wrong data).
Read below for details. This has been tried on PG 8.0.4 and 8.1.1 on
WinXP, and 8.0.4 and 8.0.5 on Linux.

I have two tables:
 t1 (id, name, a_type)
 t2 (id, name)

The t1 table is "big" (483211 rows), the name column contains
duplicates, the a_type has only two values. The t2 table is "small" (40
rows), the name values are unique. Some, but not all, t2.name values are
present in t1.name. To be precise, there are 10 t2.name values that do
not occur in t1.name (this is based on extraneous knowledge). I want to
find out which of the t2.name values are not ever used in t1.name:

select count(*) from t2 where t2.name not in (
 select t1.name from t1);
--> 0

This should return 10, instead it returns 0!!! Adding a LIMIT to the
subquery and doing some trial and error produces very interesting
results:

select count(*) from t2 where t2.name not in (
 select t1.name from t1 limit 261683) 
--> 13

select count(*) from t2 where t2.name not in (
 select t1.name from t1 limit 261684) 
--> 0


What is so magical about 261683? The JOIN alternative produces the
correct results no matter what:

select count(*) 
from t2 left join t1 using (name)
where t1.name is null 
--> 10


This pretty much summarizes the issue. Any thoughts greatly appreciated.
Follow a few variations to show how the threshhold varies.

-- restrict to only one a_type
 



---(end of broadcast)---
TIP 6: explain analyze is your friend