Re [GENERAL] Deleting 173000 records takes forever
Alexander Farber wrote: delete from pref_users where id not in (select distinct id from pref_money); Hi, try instead delete from pref_users where id in (select id from pref_users except select id from pref_money); Regards Wolfgang Hamann -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Deleting 173000 records takes forever, blocks async queries for unrelated records
Alexander Farber wrote: in a Facebook game running on PostgreSQL 8.4.13 and having so many players: # select count(*) from pref_users; count 223964 I am trying to get rid of inactive users, who just visited the canvas page, but never played (I'm sure, Facebook has a clever-sounding name for them): # select count(*) from pref_users where id not in (select distinct id from pref_money); count 173936 (1 row) So I call: # delete from pref_users where id not in (select distinct id from pref_money); but that query lasts forever and what's more troubling me - it blocks the async queries of my game daemon (the Perl function pg_ready starts returning false all the time and my game accumulates thousands of yet-to-be-executed SQL queries). The good news is, that my quad server doesn't hang - I just see 1 postmaster process at 90-100% CPU but total load is 20%. I hope it's a backend and not the postmaster. Also my game daemon in Perl recovers and executes the thousands of queued up async queries, when I interrupt the above DELETE query with CTRL-C at the pgsql prompt - i.e. my game is not buggy. My question is how handle this? Why does deleting takes so long, is it because of CASCADES? And why does it make the pg_ready calls of my game daemon return false? The users I'm deleting aren't active, they shouldn't intersect with the async queries of my game daemon. Below are the both SQL tables involved, thank you for any insights. It would help if you send EXPLAIN output for the DELETE statement. # \d pref_money Table public.pref_money Column | Type |Modifiers +---+- id | character varying(32) | money | integer | not null yw | character(7) | default to_char(now(), 'IYYY-IW'::text) Indexes: pref_money_money_index btree (money DESC) pref_money_yw_index btree (yw) Foreign-key constraints: pref_money_id_fkey FOREIGN KEY (id) REFERENCES pref_users(id) ON DELETE CASCADE The problem is very likely that you have no index on the id column. That means that for each id deleted from pref_users, the cascading delete has to perform a full table scan on pref_money to find the corresponding rows. This is very likely the cause of your problem. Indeed, this table does not have a primary key. That is usually a bad idea. How about PRIMARY KEY (id, yw)? That should speed up the query. # \d pref_users Table public.pref_users Column |Type | Modifiers +-+ id | character varying(32) | not null [...] Indexes: pref_users_pkey PRIMARY KEY, btree (id) Referenced by: [many tables] While you are at it, check all the other tables referencing pref_users and make sure that they have an index on the referencing column. Yours, Laurenz Albe -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Deleting 173000 records takes forever, blocks async queries for unrelated records
On Fri, 1 Feb 2013 10:38:26 +0100, Alexander Farber wrote: Hello, in a Facebook game running on PostgreSQL 8.4.13 and having so many players: # select count(*) from pref_users; count 223964 I am trying to get rid of inactive users, who just visited the canvas page, but never played (I'm sure, Facebook has a clever-sounding name for them): # select count(*) from pref_users where id not in (select distinct id from pref_money); count 173936 (1 row) So I call: # delete from pref_users where id not in (select distinct id from pref_money); but that query lasts forever and what's more troubling me - it blocks the async queries of my game daemon (the Perl function pg_ready starts returning false all the time and my game accumulates thousands of yet-to-be-executed SQL queries). The good news is, that my quad server doesn't hang - I just see 1 postmaster process at 90-100% CPU but total load is 20%. Also my game daemon in Perl recovers and executes the thousands of queued up async queries, when I interrupt the above DELETE query with CTRL-C at the pgsql prompt - i.e. my game is not buggy. My question is how handle this? Why does deleting takes so long, is it because of CASCADES? And why does it make the pg_ready calls of my game daemon return false? The users I'm deleting aren't active, they shouldn't intersect with the async queries of my game daemon. Below are the both SQL tables involved, thank you for any insights. Regards Alex # \d pref_money Table public.pref_money Column | Type |Modifiers +---+- id | character varying(32) | money | integer | not null yw | character(7) | default to_char(now(), 'IYYY-IW'::text) Indexes: pref_money_money_index btree (money DESC) pref_money_yw_index btree (yw) Foreign-key constraints: pref_money_id_fkey FOREIGN KEY (id) REFERENCES pref_users(id) ON DELETE CASCADE I believe an index on pref_money.id could help. Without it Postgres is, if I understand correctly, making 173936 table scans on pref_money to try (and fail) to find for each deleted row the referencing row in that table. # \d pref_users [...] -- Michał Politowski -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Deleting 173000 records takes forever, blocks async queries for unrelated records
Hi, What Albe said about PKs. I'm also not very fond of people using text in keys (even if it's a varchar or a char field). Test the same thing with a numeric key and you are likely to see a difference. Apart from making sure your design is ok, you might want to keep this stuff well in the background, firing it as a frequent (and small) job. Sort of your own private vacuum job. Something like delete from pref_users where id not in ( select distinct id from pref_money ) limit 100; Where 100 may be any number of records you find to be compatible with a smooth performance. Keep calling this stuff at a suitable interval (X secs in between each call), and it will silently do the cleaning without creating giant transactions. You definitely want to make a good design BEFORE doing this, though. Cheers Bèrto On 1 February 2013 09:38, Alexander Farber alexander.far...@gmail.com wrote: Hello, in a Facebook game running on PostgreSQL 8.4.13 and having so many players: # select count(*) from pref_users; count 223964 I am trying to get rid of inactive users, who just visited the canvas page, but never played (I'm sure, Facebook has a clever-sounding name for them): # select count(*) from pref_users where id not in (select distinct id from pref_money); count 173936 (1 row) So I call: # delete from pref_users where id not in (select distinct id from pref_money); but that query lasts forever and what's more troubling me - it blocks the async queries of my game daemon (the Perl function pg_ready starts returning false all the time and my game accumulates thousands of yet-to-be-executed SQL queries). The good news is, that my quad server doesn't hang - I just see 1 postmaster process at 90-100% CPU but total load is 20%. Also my game daemon in Perl recovers and executes the thousands of queued up async queries, when I interrupt the above DELETE query with CTRL-C at the pgsql prompt - i.e. my game is not buggy. My question is how handle this? Why does deleting takes so long, is it because of CASCADES? And why does it make the pg_ready calls of my game daemon return false? The users I'm deleting aren't active, they shouldn't intersect with the async queries of my game daemon. Below are the both SQL tables involved, thank you for any insights. Regards Alex # \d pref_money Table public.pref_money Column | Type |Modifiers +---+- id | character varying(32) | money | integer | not null yw | character(7) | default to_char(now(), 'IYYY-IW'::text) Indexes: pref_money_money_index btree (money DESC) pref_money_yw_index btree (yw) Foreign-key constraints: pref_money_id_fkey FOREIGN KEY (id) REFERENCES pref_users(id) ON DELETE CASCADE # \d pref_users Table public.pref_users Column |Type | Modifiers +-+ id | character varying(32) | not null first_name | character varying(64) | last_name | character varying(64) | female | boolean | avatar | character varying(128) | city | character varying(64) | login | timestamp without time zone | default now() last_ip| inet| logout | timestamp without time zone | vip| timestamp without time zone | mail | character varying(254) | medals | integer | not null default 0 Indexes: pref_users_pkey PRIMARY KEY, btree (id) Referenced by: TABLE pref_cards CONSTRAINT pref_cards_id_fkey FOREIGN KEY (id) REFERENCES pref_users(id) ON DELETE CASCADE TABLE pref_catch CONSTRAINT pref_catch_id_fkey FOREIGN KEY (id) REFERENCES pref_users(id) ON DELETE CASCADE TABLE pref_chat CONSTRAINT pref_chat_id_fkey FOREIGN KEY (id) REFERENCES pref_users(id) ON DELETE CASCADE TABLE pref_discuss CONSTRAINT pref_discuss_id_fkey FOREIGN KEY (id) REFERENCES pref_users(id) ON DELETE CASCADE TABLE pref_game CONSTRAINT pref_game_id_fkey FOREIGN KEY (id) REFERENCES pref_users(id) ON DELETE CASCADE TABLE pref_luck CONSTRAINT pref_luck_id_fkey FOREIGN KEY (id) REFERENCES pref_users(id) ON DELETE CASCADE TABLE pref_match CONSTRAINT pref_match_id_fkey FOREIGN KEY (id) REFERENCES pref_users(id) ON DELETE CASCADE TABLE pref_misere CONSTRAINT pref_misere_id_fkey FOREIGN KEY (id) REFERENCES pref_users(id) ON DELETE CASCADE TABLE pref_money CONSTRAINT pref_money_id_fkey FOREIGN KEY (id) REFERENCES pref_users(id) ON DELETE CASCADE TABLE pref_pass CONSTRAINT pref_pass_id_fkey FOREIGN KEY (id) REFERENCES pref_users(id) ON DELETE CASCADE TABLE pref_payment CONSTRAINT pref_payment_id_fkey FOREIGN KEY
Re: [GENERAL] Deleting 173000 records takes forever, blocks async queries for unrelated records
righto. You need a CTE to do that create table deletable ( id bigint not null primary key); create table condition ( id bigint not null primary key); insert into deletable select generate_series(1,500); insert into condition select generate_series(1,50); WITH target AS ( select id from deletable d where d.id not in (select id from condition) limit 10 ) delete from deletable where id in (select id from target); This will open a hole from 51 to 60, next time from 61 to 70 etc... Cheers Bèrto On 1 February 2013 11:30, Alexander Farber alexander.far...@gmail.com wrote: Thanks, I will add an index(id, yw) and check. The limit suggestion (yes, I already have a vacuum cronjob to purge spam-users from drupal_users etc.) doesn't work: # delete from pref_users where id not in (select distinct id from pref_money) limit 10; ERROR: syntax error at or near limit LINE 2: ...ere id not in (select distinct id from pref_money) limit 10; -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general -- == If Pac-Man had affected us as kids, we'd all be running around in a darkened room munching pills and listening to repetitive music. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Deleting 173000 records takes forever, blocks async queries for unrelated records
# delete from pref_users where id not in (select distinct id from pref_money) limit 10; ERROR: syntax error at or near limit LINE 2: ...ere id not in (select distinct id from pref_money) limit 10; Or this? DELETE FROM pref_users WHERE id IN ( SELECT id FROM pref_users WHERE id NOT IN ( SELECT DISTINCT id FROM pref_money ) LIMIT 10 ); Saw elsewhere another suggestion with EXCEPT, which is also sexy. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general