Re: [SQL] DELETE FROM takes forever

2011-02-12 Thread Jasen Betts
On 2011-02-10, Josh  wrote:
> Hi
>
> I'm trying to do a DELETE FROM on my large table (about 800 million
> rows) based on the contents of another, moderately large table (about
> 110 million rows). The command I'm using is:
>
> DELETE FROM records WHERE id NOT IN (SELECT id FROM unique_records);
>
> This process ran for about two weeks before I decided to stop it -- it
> was dragging down the DB server. I can understand long-running
> processes, but two weeks seems a bit much even for a big table.

I find that scripting deletes of smaller numbers of records can help
here, long-running queries do bad things to the efficiency of postgres.

on strategy that could work for your task would be to create a temp
table first:


 create temp table planned_deletions as select id from records except
 SELECT id FROM unique_records;

 create index badids on planned_deletions(id);

the repeatedly

 delete from records where id in ( select id from planned_deletions limit 1 
order by id);
 delete from planned_deletions where id in ( select id from planned_deletions 
limit 1 order by id);

until there are none left.

possibly pausing a few seconds between each slug if there is a heavy
load on the server (that you were able to run the query for 2 weeks
suggests that there may not be).

> Is this the best way to approach the problem? Is there a better way?
>
> Some background: The server is version 8.3, running nothing but Pg.
> The 'records' table has 'id' as its primary key, and one other index
> on another column. The table is referenced by just about every other
> table in my DB (about 15 other tables) via foreign key constraints,
> which I don't want to break (which is why I'm not just recreating the
> table rather than deleting rows). Most of the dependent tables have ON
> DELETE CASCADE. The 'unique_records' table is a temp table I got via
> something like: SELECT DISTINCT (other_column) id INTO unique_records
> FROM records

if you can accept the down-time I would drop the constraints 

(if you don't have them on file do a pg_dump --schema-only , and grep it 
for the ADD CONSTRIANT commands, use sed or similar to create matching 
DROP CONSTRAINT commands, run them)

then rebuild the table

then reintroduce the constraints, keep a copy of the grep output above

-- 
⚂⚃ 100% natural

-- 
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql


Re: [SQL] UTF characters compromising data import.

2011-02-12 Thread Jasen Betts
On 2011-02-08, Gavin Beau Baumanis  wrote:

> I understand the error message - but what I don't know is what I
> need to set the encoding to - in order to import  / use the data.

if you run it through  

  iconv --from-code=ASCII -to-code=UTF8 -c

it'll strip out all the non-ascii symbols,  without knowing the
encoding it's impossible to assign any useful meaning to them.
This step may render your data useless, it would be much better to
find out what the encoding should be.

perhaps you can figure it out by observation?

-- 
⚂⚃ 100% natural

-- 
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql