Re: [GENERAL] eliminating records not in (select id ... so SLOW?
On Fri, 01 Aug 2008 10:33:59 -0400 Tom Lane [EMAIL PROTECTED] wrote: Ivan Sergio Borgonovo [EMAIL PROTECTED] writes: Well I reached 3Gb of work_mem and still I got: Seq Scan on catalog_categoryitem (cost=31747.84..4019284477.13 rows=475532 width=6) Filter: (NOT (subplan)) SubPlan - Materialize (cost=31747.84..38509.51 rows=676167 width=8) - Seq Scan on catalog_items (cost=0.00..31071.67 rows=676167 width=8) Huh. The only way I can see for that to happen is if the datatypes involved aren't hashable. What's the datatypes of the two columns being compared, anyway? I changed both columns to bigint. I added 2 indexes on the ItemID column of both tables and increased work_mem to 3Gb [sic]. The query got executed in ~1300ms... but explain gave the same output as the one above. The problem is solved... but curious mind want to know. -- Ivan Sergio Borgonovo http://www.webthatworks.it -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] eliminating records not in (select id ... so SLOW?
On Thu, 31 Jul 2008 21:37:39 -0400 Tom Lane [EMAIL PROTECTED] wrote: Ivan Sergio Borgonovo [EMAIL PROTECTED] writes: I'm doing something like: delete from table1 where id not in (select id from table2). table1 contains ~1M record table2 contains ~ 600K record and id is unique. That's going to pretty much suck unless you've got work_mem set high enough to allow a hashed subplan plan --- which is likely to require tens of MB for this case, I don't recall exactly what Thanks. the per-row overhead is. Experiment until EXPLAIN tells you it'll use a hashed subplan. explain delete from catalog_categoryitem where ItemID not in (select ItemID from catalog_items); Well I reached 3Gb of work_mem and still I got: Seq Scan on catalog_categoryitem (cost=31747.84..4019284477.13 rows=475532 width=6) Filter: (NOT (subplan)) SubPlan - Materialize (cost=31747.84..38509.51 rows=676167 width=8) - Seq Scan on catalog_items (cost=0.00..31071.67 rows=676167 width=8) I've this too: alter table catalog_items cluster on catalog_items_pkey; should I drop it? This is just a dev box. I loaded the 2 tables with 2 not coherent set of data just to play with, before adding all the pk/fk I need. I could just truncate the tables and reload them from coherent sources. But what if I *really* had to execute that query? Any other magic I could play to speed it up? thanks -- Ivan Sergio Borgonovo http://www.webthatworks.it -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] eliminating records not in (select id ... so SLOW?
Ivan Sergio Borgonovo wrote: But what if I *really* had to execute that query? Any other magic I could play to speed it up? A trick that is sometimes spectacularly efficient is to rewrite the query to use an outer join instead of NOT IN. Try: DELETE FROM table1 WHERE id IN (SELECT table1.id FROM table1 LEFT JOIN table2 ON table1.id=table2.id WHERE table2.id IS NULL) Best regards, -- Daniel PostgreSQL-powered mail user agent and storage: http://www.manitou-mail.org -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] eliminating records not in (select id ... so SLOW?
ok try this delete from catalog_categoryitem where not exists (select id from catalog_items where catalog_items.ItemID = catalog_categoryitem.ItemID); --- On Thu, 7/31/08, Ivan Sergio Borgonovo [EMAIL PROTECTED] wrote: From: Ivan Sergio Borgonovo [EMAIL PROTECTED] Subject: Re: [GENERAL] eliminating records not in (select id ... so SLOW? To: Cc: PostgreSQL pgsql-general@postgresql.org Date: Thursday, July 31, 2008, 11:01 PM On Thu, 31 Jul 2008 14:59:29 -0700 (PDT) Lennin Caro [EMAIL PROTECTED] wrote: The box is a 2x dual core Xeon (below 2GHz) with 4Gb ram. Default debian etch setup. you recently run vacuum ? The tables are pretty stable. I think no more than 20 records were modified (update/insert/delete) during the whole history of the 2 tables. autovacuum is running regularly. The actual query running is: begin; create index catalog_categoryitem_ItemsID_index on catalog_categoryitem using btree (ItemID); delete from catalog_categoryitem where ItemID not in (select ItemID from catalog_items); commit; That's what came back Timing is on. BEGIN Time: 0.198 ms CREATE INDEX Time: 3987.991 ms The query is still running... As a reminder catalog_categoryitem should contain less than 1M record. catalog_items should contain a bit more than 600K record where ItemID is unique (a pk actually). PostgreSQL comes from the default install from Debian etch (8.1.X). It's configuration hasn't been modified. -- Ivan Sergio Borgonovo http://www.webthatworks.it -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] eliminating records not in (select id ... so SLOW?
Ivan Sergio Borgonovo [EMAIL PROTECTED] writes: Well I reached 3Gb of work_mem and still I got: Seq Scan on catalog_categoryitem (cost=31747.84..4019284477.13 rows=475532 width=6) Filter: (NOT (subplan)) SubPlan - Materialize (cost=31747.84..38509.51 rows=676167 width=8) - Seq Scan on catalog_items (cost=0.00..31071.67 rows=676167 width=8) Huh. The only way I can see for that to happen is if the datatypes involved aren't hashable. What's the datatypes of the two columns being compared, anyway? regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] eliminating records not in (select id ... so SLOW?
On Fri, 01 Aug 2008 10:33:59 -0400 Tom Lane [EMAIL PROTECTED] wrote: Ivan Sergio Borgonovo [EMAIL PROTECTED] writes: Well I reached 3Gb of work_mem and still I got: Seq Scan on catalog_categoryitem (cost=31747.84..4019284477.13 rows=475532 width=6) Filter: (NOT (subplan)) SubPlan - Materialize (cost=31747.84..38509.51 rows=676167 width=8) - Seq Scan on catalog_items (cost=0.00..31071.67 rows=676167 width=8) Huh. The only way I can see for that to happen is if the datatypes involved aren't hashable. What's the datatypes of the two columns being compared, anyway? That S in CS should mean sober! thanks to svn I'd say you're right... one column was int the other bigint. Among other things I was just fixing that kind of mistakes. If that could be the reason I'll report if things got better once I finish to normalise the DB. BTW does pg 8.3 save you from such kind of mistake being stricter with auto cast? Tom sorry for sending this just to your personal email. -- Ivan Sergio Borgonovo http://www.webthatworks.it -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] eliminating records not in (select id ... so SLOW?
Ivan Sergio Borgonovo [EMAIL PROTECTED] writes: Tom Lane [EMAIL PROTECTED] wrote: Huh. The only way I can see for that to happen is if the datatypes involved aren't hashable. What's the datatypes of the two columns being compared, anyway? thanks to svn I'd say you're right... one column was int the other bigint. Ah. 8.3 can hash certain cross-type comparisons (including that one) but prior versions won't. regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] eliminating records not in (select id ... so SLOW?
I'm doing something like: delete from table1 where id not in (select id from table2). both id are indexed. table1 contains ~1M record table2 contains ~ 600K record and id is unique. The box is a 2x dual core Xeon (below 2GHz) with 4Gb ram. Default debian etch setup. It has been working for over 2h now. Is it normal? -- Ivan Sergio Borgonovo http://www.webthatworks.it -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] eliminating records not in (select id ... so SLOW?
--- On Thu, 7/31/08, Ivan Sergio Borgonovo [EMAIL PROTECTED] wrote: From: Ivan Sergio Borgonovo [EMAIL PROTECTED] Subject: [GENERAL] eliminating records not in (select id ... so SLOW? To: PostgreSQL pgsql-general@postgresql.org Date: Thursday, July 31, 2008, 9:45 PM I'm doing something like: delete from table1 where id not in (select id from table2). both id are indexed. table1 contains ~1M record table2 contains ~ 600K record and id is unique. The box is a 2x dual core Xeon (below 2GHz) with 4Gb ram. Default debian etch setup. It has been working for over 2h now. Is it normal? -- Ivan Sergio Borgonovo http://www.webthatworks.it you recently run vacuum ? -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] eliminating records not in (select id ... so SLOW?
On Thu, 31 Jul 2008 14:59:29 -0700 (PDT) Lennin Caro [EMAIL PROTECTED] wrote: The box is a 2x dual core Xeon (below 2GHz) with 4Gb ram. Default debian etch setup. you recently run vacuum ? The tables are pretty stable. I think no more than 20 records were modified (update/insert/delete) during the whole history of the 2 tables. autovacuum is running regularly. The actual query running is: begin; create index catalog_categoryitem_ItemsID_index on catalog_categoryitem using btree (ItemID); delete from catalog_categoryitem where ItemID not in (select ItemID from catalog_items); commit; That's what came back Timing is on. BEGIN Time: 0.198 ms CREATE INDEX Time: 3987.991 ms The query is still running... As a reminder catalog_categoryitem should contain less than 1M record. catalog_items should contain a bit more than 600K record where ItemID is unique (a pk actually). PostgreSQL comes from the default install from Debian etch (8.1.X). It's configuration hasn't been modified. -- Ivan Sergio Borgonovo http://www.webthatworks.it -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] eliminating records not in (select id ... so SLOW?
Ivan Sergio Borgonovo [EMAIL PROTECTED] writes: I'm doing something like: delete from table1 where id not in (select id from table2). table1 contains ~1M record table2 contains ~ 600K record and id is unique. That's going to pretty much suck unless you've got work_mem set high enough to allow a hashed subplan plan --- which is likely to require tens of MB for this case, I don't recall exactly what the per-row overhead is. Experiment until EXPLAIN tells you it'll use a hashed subplan. BTW, don't bother with creating the index, it doesn't help for this. regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general