Re: [GENERAL] eliminating records not in (select id ... so SLOW?

2008-08-04 Thread Ivan Sergio Borgonovo
On Fri, 01 Aug 2008 10:33:59 -0400
Tom Lane [EMAIL PROTECTED] wrote:

 Ivan Sergio Borgonovo [EMAIL PROTECTED] writes:
  Well I reached 3Gb of work_mem and still I got:
 
  Seq Scan on catalog_categoryitem  (cost=31747.84..4019284477.13
  rows=475532 width=6)
Filter: (NOT (subplan))
SubPlan
  -  Materialize  (cost=31747.84..38509.51 rows=676167
  width=8)   -  Seq Scan on catalog_items
  (cost=0.00..31071.67 rows=676167 width=8)
 
 Huh.  The only way I can see for that to happen is if the datatypes
 involved aren't hashable.  What's the datatypes of the two columns
 being compared, anyway?

I changed both columns to bigint.
I added 2 indexes on the ItemID column of both tables and increased
work_mem to 3Gb [sic].
The query got executed in ~1300ms... but explain gave the same
output as the one above.

The problem is solved... but curious mind want to know.

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] eliminating records not in (select id ... so SLOW?

2008-08-01 Thread Ivan Sergio Borgonovo
On Thu, 31 Jul 2008 21:37:39 -0400
Tom Lane [EMAIL PROTECTED] wrote:

 Ivan Sergio Borgonovo [EMAIL PROTECTED] writes:
  I'm doing something like:
  delete from table1 where id not in (select id from table2).
  table1 contains ~1M record table2 contains ~ 600K record and id
  is unique.

 That's going to pretty much suck unless you've got work_mem set
 high enough to allow a hashed subplan plan --- which is likely
 to require tens of MB for this case, I don't recall exactly what

Thanks.

 the per-row overhead is.  Experiment until EXPLAIN tells you it'll
 use a hashed subplan.

explain delete from catalog_categoryitem where ItemID not in (select
ItemID from catalog_items);

Well I reached 3Gb of work_mem and still I got:

Seq Scan on catalog_categoryitem  (cost=31747.84..4019284477.13
rows=475532 width=6)
  Filter: (NOT (subplan))
  SubPlan
-  Materialize  (cost=31747.84..38509.51 rows=676167 width=8)
  -  Seq Scan on catalog_items  (cost=0.00..31071.67
rows=676167 width=8)

I've this too:
alter table catalog_items cluster on catalog_items_pkey;
should I drop it?

This is just a dev box. I loaded the 2 tables with 2 not coherent
set of data just to play with, before adding all the pk/fk I need.
I could just truncate the tables and reload them from coherent
sources.

But what if I *really* had to execute that query?
Any other magic I could play to speed it up?

thanks

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] eliminating records not in (select id ... so SLOW?

2008-08-01 Thread Daniel Verite

Ivan Sergio Borgonovo wrote:


But what if I *really* had to execute that query?
Any other magic I could play to speed it up?


A trick that is sometimes spectacularly efficient is to rewrite the 
query to use an outer join instead of NOT IN.


Try:

DELETE FROM table1 WHERE id IN 
(SELECT table1.id FROM table1 LEFT JOIN table2 ON table1.id=table2.id 
WHERE table2.id IS NULL)



Best regards,
--
Daniel
PostgreSQL-powered mail user agent and storage: 
http://www.manitou-mail.org


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] eliminating records not in (select id ... so SLOW?

2008-08-01 Thread Lennin Caro
ok try this


delete from catalog_categoryitem where not exists (select id from catalog_items 
where catalog_items.ItemID = catalog_categoryitem.ItemID);


--- On Thu, 7/31/08, Ivan Sergio Borgonovo [EMAIL PROTECTED] wrote:

 From: Ivan Sergio Borgonovo [EMAIL PROTECTED]
 Subject: Re: [GENERAL] eliminating records not in (select id ... so SLOW?
 To: 
 Cc: PostgreSQL pgsql-general@postgresql.org
 Date: Thursday, July 31, 2008, 11:01 PM
 On Thu, 31 Jul 2008 14:59:29 -0700 (PDT)
 Lennin Caro [EMAIL PROTECTED] wrote:
 
   The box is a 2x dual core Xeon (below 2GHz) with
 4Gb ram.
   Default debian etch setup.
 
  you recently run vacuum ? 
 
 The tables are pretty stable. I think no more than 20
 records were
 modified (update/insert/delete) during the whole history of
 the 2
 tables.
 
 autovacuum is running regularly.
 
 The actual query running is:
 
 begin;
  create index catalog_categoryitem_ItemsID_index on
catalog_categoryitem using btree (ItemID);
  delete from catalog_categoryitem
where ItemID not in (select ItemID from catalog_items);
 commit;
 
 That's what came back
 Timing is on.
 BEGIN
 Time: 0.198 ms
 CREATE INDEX
 Time: 3987.991 ms
 
 The query is still running...
 
 As a reminder catalog_categoryitem should contain less than
 1M
 record.
 catalog_items should contain a bit more than 600K record
 where
 ItemID is unique (a pk actually).
 PostgreSQL comes from the default install from Debian etch
 (8.1.X).
 It's configuration hasn't been modified.
 
 -- 
 Ivan Sergio Borgonovo
 http://www.webthatworks.it
 
 
 -- 
 Sent via pgsql-general mailing list
 (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general


  


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] eliminating records not in (select id ... so SLOW?

2008-08-01 Thread Tom Lane
Ivan Sergio Borgonovo [EMAIL PROTECTED] writes:
 Well I reached 3Gb of work_mem and still I got:

 Seq Scan on catalog_categoryitem  (cost=31747.84..4019284477.13
 rows=475532 width=6)
   Filter: (NOT (subplan))
   SubPlan
 -  Materialize  (cost=31747.84..38509.51 rows=676167 width=8)
   -  Seq Scan on catalog_items  (cost=0.00..31071.67
 rows=676167 width=8)

Huh.  The only way I can see for that to happen is if the datatypes
involved aren't hashable.  What's the datatypes of the two columns
being compared, anyway?

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] eliminating records not in (select id ... so SLOW?

2008-08-01 Thread Ivan Sergio Borgonovo
On Fri, 01 Aug 2008 10:33:59 -0400
Tom Lane [EMAIL PROTECTED] wrote:

 Ivan Sergio Borgonovo [EMAIL PROTECTED] writes:
  Well I reached 3Gb of work_mem and still I got:
 
  Seq Scan on catalog_categoryitem  (cost=31747.84..4019284477.13
  rows=475532 width=6)
Filter: (NOT (subplan))
SubPlan
  -  Materialize  (cost=31747.84..38509.51 rows=676167
  width=8)   -  Seq Scan on catalog_items
  (cost=0.00..31071.67 rows=676167 width=8)
 
 Huh.  The only way I can see for that to happen is if the datatypes
 involved aren't hashable.  What's the datatypes of the two columns
 being compared, anyway?

That S in CS should mean sober!

thanks to svn I'd say you're right... one column was int the other
bigint.
Among other things I was just fixing that kind of mistakes.

If that could be the reason I'll report if things got better once I
finish to normalise the DB.

BTW does pg 8.3 save you from such kind of mistake being stricter
with auto cast?

Tom sorry for sending this just to your personal email.


-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] eliminating records not in (select id ... so SLOW?

2008-08-01 Thread Tom Lane
Ivan Sergio Borgonovo [EMAIL PROTECTED] writes:
 Tom Lane [EMAIL PROTECTED] wrote:
 Huh.  The only way I can see for that to happen is if the datatypes
 involved aren't hashable.  What's the datatypes of the two columns
 being compared, anyway?

 thanks to svn I'd say you're right... one column was int the other
 bigint.

Ah.  8.3 can hash certain cross-type comparisons (including that one)
but prior versions won't.

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] eliminating records not in (select id ... so SLOW?

2008-07-31 Thread Ivan Sergio Borgonovo
I'm doing something like:

delete from table1 where id not in (select id from table2).

both id are indexed.

table1 contains ~1M record table2 contains ~ 600K record and id is
unique.

The box is a 2x dual core Xeon (below 2GHz) with 4Gb ram.
Default debian etch setup.

It has been working for over 2h now.

Is it normal?

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] eliminating records not in (select id ... so SLOW?

2008-07-31 Thread Lennin Caro



--- On Thu, 7/31/08, Ivan Sergio Borgonovo [EMAIL PROTECTED] wrote:

 From: Ivan Sergio Borgonovo [EMAIL PROTECTED]
 Subject: [GENERAL] eliminating records not in (select id ... so SLOW?
 To: PostgreSQL pgsql-general@postgresql.org
 Date: Thursday, July 31, 2008, 9:45 PM
 I'm doing something like:
 
 delete from table1 where id not in (select id from table2).
 
 both id are indexed.
 
 table1 contains ~1M record table2 contains ~ 600K record
 and id is
 unique.
 
 The box is a 2x dual core Xeon (below 2GHz) with 4Gb ram.
 Default debian etch setup.
 
 It has been working for over 2h now.
 
 Is it normal?
 
 -- 
 Ivan Sergio Borgonovo
 http://www.webthatworks.it
 
you recently run vacuum ? 

 -- 
 Sent via pgsql-general mailing list
 (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general


  


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] eliminating records not in (select id ... so SLOW?

2008-07-31 Thread Ivan Sergio Borgonovo
On Thu, 31 Jul 2008 14:59:29 -0700 (PDT)
Lennin Caro [EMAIL PROTECTED] wrote:

  The box is a 2x dual core Xeon (below 2GHz) with 4Gb ram.
  Default debian etch setup.

 you recently run vacuum ? 

The tables are pretty stable. I think no more than 20 records were
modified (update/insert/delete) during the whole history of the 2
tables.

autovacuum is running regularly.

The actual query running is:

begin;
 create index catalog_categoryitem_ItemsID_index on
   catalog_categoryitem using btree (ItemID);
 delete from catalog_categoryitem
   where ItemID not in (select ItemID from catalog_items);
commit;

That's what came back
Timing is on.
BEGIN
Time: 0.198 ms
CREATE INDEX
Time: 3987.991 ms

The query is still running...

As a reminder catalog_categoryitem should contain less than 1M
record.
catalog_items should contain a bit more than 600K record where
ItemID is unique (a pk actually).
PostgreSQL comes from the default install from Debian etch (8.1.X).
It's configuration hasn't been modified.

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] eliminating records not in (select id ... so SLOW?

2008-07-31 Thread Tom Lane
Ivan Sergio Borgonovo [EMAIL PROTECTED] writes:
 I'm doing something like:
 delete from table1 where id not in (select id from table2).
 table1 contains ~1M record table2 contains ~ 600K record and id is
 unique.

That's going to pretty much suck unless you've got work_mem set high
enough to allow a hashed subplan plan --- which is likely to require
tens of MB for this case, I don't recall exactly what the per-row
overhead is.  Experiment until EXPLAIN tells you it'll use a hashed
subplan.

BTW, don't bother with creating the index, it doesn't help for this.

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general