[SQL] Performance of a view
Hello all, I have a fairly complex query whose performance problem I have isolated to a fairly small subset. The pertinent parts of the table structure are as follows: //- tbl_claim claim_id integer SERIAL PRIMARY KEY; claimnum varchar(32); //- tbl_invoice invoice_id integer SERIAL PRIMARY KEY; claim_id integer integer; invoicedate timestamp; active integer; //- there is an index on claimnum, and the claim_id in tbl_invoice is a foreign key which references tbl_claim I have a view which is defined as follows: //- SELECT tbl_claim.claim_id, count(tbl_invoice.invoice_id) AS count, min(tbl_invoice.invoicedate) AS invoicedate FROM tbl_claim LEFT JOIN tbl_invoice ON tbl_claim.claim_id = tbl_invoice.claim_id AND tbl_invoice.active = 1 GROUP BY tbl_claim.claim_id; //- If I runn the following: EXPLAIN ANALYZE SELECT tbl_claim.claim_id FROM tbl_claim WHERE claimnum = 'L1J8823'; I get: Index Scan using idx_claim_claimnum on tbl_claim (cost=0.00..10.01 rows=2 width=4) (actual time=0.079..0.088 rows=2 loops=1) Index Cond: ((claimnum)::text = 'L1J8823'::text) Total runtime: 0.123 ms If I run: EXPLAIN ANALYZE SELECT tbl_claim.claim_id FROM tbl_claim INNER JOIN vw_claiminvoicecount ON tbl_claim.claim_id = vw_claiminvoicecount.claim_id WHERE tbl_claim.claim_id = 217778; I get: Nested Loop (cost=17.21..25.50 rows=4 width=4) (actual time=0.069..0.076 rows=1 loops=1) -> Index Scan using tbl_claim_pkey on tbl_claim (cost=0.00..8.21 rows=2 width=4) (actual time=0.020..0.021 rows=1 loops=1) Index Cond: (claim_id = 217778) -> Materialize (cost=17.21..17.23 rows=2 width=4) (actual time=0.044..0.047 rows=1 loops=1) -> Subquery Scan vw_claiminvoicecount (cost=0.00..17.21 rows=2 width=4) (actual time=0.041..0.043 rows=1 loops=1) -> GroupAggregate (cost=0.00..17.19 rows=2 width=16) (actual time=0.039..0.040 rows=1 loops=1) -> Nested Loop Left Join (cost=0.00..17.17 rows=2 width=16) (actual time=0.024..0.030 rows=1 loops=1) -> Index Scan using tbl_claim_pkey on tbl_claim (cost=0.00..8.21 rows=2 width=4) (actual time=0.005..0.007 rows=1 loops=1) Index Cond: (217778 = claim_id) -> Index Scan using idx_tbl_invoice_claim_id on tbl_invoice (cost=0.00..4.39 rows=7 width=16) (actual time=0.014..0.018 rows=1 loops=1) Index Cond: ("outer".claim_id = tbl_invoice.claim_id) Filter: (active = 1) Total runtime: 0.232 ms However, if I run: EXPLAIN ANALYZE SELECT tbl_claim.claim_id FROM tbl_claim INNER JOIN vw_claiminvoicecount ON tbl_claim.claim_id = vw_claiminvoicecount.claim_id WHERE tbl_claim.claimnum = 'L1J8823'; I get: Merge Join (cost=60015.93..69488.39 rows=3 width=4) (actual time=4605.711..4605.762 rows=2 loops=1) Merge Cond: ("outer".claim_id = "inner".claim_id) -> Subquery Scan vw_claiminvoicecount (cost=60005.91..68940.54 rows=215119 width=4) (actual time=3074.520..4491.423 rows=157215 loops=1) -> GroupAggregate (cost=60005.91..66789.35 rows=215119 width=16) (actual time=3074.515..4265.315 rows=157215 loops=1) -> Merge Left Join (cost=60005.91..64100.37 rows=215119 width=16) (actual time=3074.493..3845.516 rows=162280 loops=1) Merge Cond: ("outer".claim_id = "inner".claim_id) -> Sort (cost=29403.35..29941.15 rows=215119 width=4) (actual time=1253.372..1392.089 rows=157216 loops=1) Sort Key: tbl_claim.claim_id -> Seq Scan on tbl_claim (cost=0.00..7775.19 rows=215119 width=4) (actual time=0.031..336.606 rows=215119 loops=1) -> Sort (cost=30602.56..31146.52 rows=217582 width=16) (actual time=1821.075..1967.639 rows=151988 loops=1) Sort Key: tbl_invoice.claim_id -> Seq Scan on tbl_invoice (cost=0.00..6967.61 rows=217582 width=16) (actual time=0.066..507.189 rows=219530 loops=1) Filter: (active = 1) -> Sort (cost=10.02..10.03 rows=2 width=4) (actual time=0.144..0.145 rows=2 loops=1) Sort Key: tbl_claim.claim_id -> Index Scan using idx_claim_claimnum on tbl_claim (cost=0.00..10.01 rows=2 width=4) (actual time=0.120..0.127 rows=2 loops=1) Index Cond: ((claimnum)::text = 'L1J8823'::text) Total runtime: 4620.653 ms I roughly understand what is happening...in the first query, the dataset is being knocked down to one row, then somehow the view is being constructed using only that subset of the claim table. In the second query, the view is being constructed from the entire dataset which is hundreds of thousands of rows, and thus is much slower. My question is how would I go about obtaining th
Re: [SQL] Foreign key to 2 tables problem
I've never seen anything like that. I'm sure it's conceivable that you could write a weird trigger for it, but you have to consider maintainability, and what your queries are going to look like. I haven't seen your datamodel, but it would seem that you could accomplish what you're looking for by having two separate foriegn key columns in the bankaccount table, one for the customer and one for the supplier. While your queries may end up somewhat funky, I can't imagine they'd be any worse than what would occur with what you're suggesting. However, if customers or suppliers can have multiple accounts, you are going to need an intermediate table, as suggested by Neil. Joost Kraaijeveld wrote: Hi, Is there a way to create a foreign key to 2 tables: e.g. a bankaccount table that has a column "owner", that must point to a record in either the customer or the supplier table? TIA ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [SQL] Foreign key to 2 tables problem
However, if customers or suppliers can have multiple accounts, you are going to need an intermediate table, as suggested by Neil. Scratch that. If accounts can have multiple owners you'll need an intermediate table. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [SQL] deferrable on unique
It's a low-tech solution but you could: begin update t1 set seq=-1 where id=5 update t1 set seq=5 where id=4 update t1 set seq=4 where id=-1 end This is assuming that you don't naturally have -1 as a valid value of that column. chester c young wrote: table t1: id integer primary key, seq integer not null unique the seq is for ordering the rows as the user likes. however, if the rows are moved around, eg begin update t1 set seq=4 where id=5 update t1 set seq=5 where id=4 end will bomb because the first update has two rows of seq=4 (although correct after the transaction). I thought "deferrable initally deferred" would fix this, but the phrase is not allowed on unique. is this correct? any ideas? __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [SQL] DB design and foreign keys
Table orders defines the column order_code as a serial, which simple makes a trigger which gives a new value to the column on insert. Note that there is NO guarantee that ths column will be unique. You can manually update the value to whatever you want. If you wish this column to be unique, you must specify it on creation, or later do an alter table add constraint to the column. A foreign key requires that the referenced column be unique (DB enforced, not just coincidentally unique), and that' s why your table creation is failing. Gianluca Riccardi wrote: hello all, i'm usign PostgreSQL 7.4.7 in a Debian 3.1 following is the SQL schema of my (very)small DB for a (very small)web business application: -- SQL schema for business-test-db CREATE TABLE customers ( customer_code serial, alfa_customer_code varchar(6), customer_name character varying(250) NOT NULL, address character varying(250) NOT NULL, city character varying(250) NOT NULL, zip_code character varying(8) NOT NULL, prov character varying(30) NOT NULL, security character varying(15) NOT NULL, tel character varying(30), tel2 character varying(20) NOT NULL, fax character varying(250), url character varying(250), email1 character varying(250) NOT NULL, email2 character varying(250) NOT NULL, discount1 integer, discount2 integer, PRIMARY KEY (customer_code) ); CREATE TABLE users ( id smallint NOT NULL, login varchar(20) NOT NULL, pwd varchar(20) NOT NULL, name varchar(20) NOT NULL, customer_code int REFERENCES customers (customer_code), valid date, primary key (id) ); CREATE TABLE products ( id serial, code varchar(60) UNIQUE NOT NULL, description varchar(250) NOT NULL, dimensions varchar(250) NOT NULL, price numeric NOT NULL, state boolean, PRIMARY KEY (id) ); CREATE TABLE orders ( id serial, order_code serial, customer_code integer REFERENCES customers (customer_code) NOT NULL, order_date time without time zone NOT NULL, remote_ip inet NOT NULL, order_time timestamp with time zone NOT NULL, order_type varchar(10) NOT NULL, state varchar(10) NOT NULL, PRIMARY KEY (id, order_code) ); CREATE TABLE order_items ( id serial, order_code integer REFERENCES orders (order_code) NOT NULL, customer_code integer REFERENCES customers (customer_code) NOT NULL, product_code varchar(60) REFERENCES products (code) NOT NULL, qty int NOT NULL, price numeric REFERENCES products (price) NOT NULL, row_price numeric, PRIMARY KEY (id, order_code) ); -- -- END OF FILE the tables: customers, users, products and orders are created as the SQL states. when i try to create the table order_items postgresql gives the following error: business-test-db=# CREATE TABLE order_items ( business-test-db(#id serial, business-test-db(#order_code integer REFERENCES orders (order_code) NOT NULL, business-test-db(#customer_code integer REFERENCES customers (customer_code) NOT NULL, business-test-db(#product_code varchar(60) REFERENCES products (code) NOT NULL, business-test-db(#qty int NOT NULL, business-test-db(#price numeric REFERENCES products (price) NOT NULL, business-test-db(#row_price numeric, business-test-db(#PRIMARY KEY (id, order_code) business-test-db(# ); NOTICE: CREATE TABLE will create implicit sequence "order_items_id_seq" for "serial" column "order_items.id" NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "order_items_pkey" for table "order_items" ERROR: there is no unique constraint matching given keys for referenced table "orders" business-test-db=# i'm a RTFM man, but i miss the point from the documentation obviously, because what i don't understand is why the referenced column isn't considered to be unique. More doubts come into play when i see that the referenced key customers(customer_code) by the referencing table orders gives no errors. I'm not a native english speaker so probably that gives some more difficulties. Thanks in advance to all will contribute a focusing help. best regards from a proude-to-be postgresql user :-), Gianluca Riccardi ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [SQL] "large" IN/NOT IN subqueries result in query returning
At a glance I would guess that NULL values in one or both of your tables is the culprit. NULL values always return false. Example: A quick test on my database: select count(*) FROM tbl_employee; count --- 2689 select count(*) FROM tbl_employee WHERE username IS NULL; count --- 35 So I have 35 null values. create table tbl_foo (username varchar(32)); insert into tbl_foo (username) values ('poop'); select count(*) FROM tbl_employee WHERE tbl_employee.username NOT IN (select tbl_foo.username FROM tbl_foo); count --- 2654 So I only get 2654 values that are NOT IN the set 'poop'...i.e. the NULL values are not included when I use the "NOT IN" my query. Nulls can be confusing. Hope this helps. George Pavlov wrote: The following looks like a bug to me, but please someone review and tell me what I might be missing. Seems that past a certain result set size a "[NOT] IN (subquery)" stops behaving as expected and returns 0 matches even when there should be matches. No errors are returned, just faulty data. The exact threshholds seem to depend on the data, subquery and possibly the indexes in place. Nothing in the documentation mentions a limit to what "IN subquery" can take so I am quite dismayed (if there is a limit I would expect an ERROR/WARNING/something, but not wrong data). Read below for details. This has been tried on PG 8.0.4 and 8.1.1 on WinXP, and 8.0.4 and 8.0.5 on Linux. I have two tables: t1 (id, name, a_type) t2 (id, name) The t1 table is "big" (483211 rows), the name column contains duplicates, the a_type has only two values. The t2 table is "small" (40 rows), the name values are unique. Some, but not all, t2.name values are present in t1.name. To be precise, there are 10 t2.name values that do not occur in t1.name (this is based on extraneous knowledge). I want to find out which of the t2.name values are not ever used in t1.name: select count(*) from t2 where t2.name not in ( select t1.name from t1); --> 0 This should return 10, instead it returns 0!!! Adding a LIMIT to the subquery and doing some trial and error produces very interesting results: select count(*) from t2 where t2.name not in ( select t1.name from t1 limit 261683) --> 13 select count(*) from t2 where t2.name not in ( select t1.name from t1 limit 261684) --> 0 What is so magical about 261683? The JOIN alternative produces the correct results no matter what: select count(*) from t2 left join t1 using (name) where t1.name is null --> 10 This pretty much summarizes the issue. Any thoughts greatly appreciated. Follow a few variations to show how the threshhold varies. -- restrict to only one a_type ---(end of broadcast)--- TIP 6: explain analyze is your friend