Re: [PERFORM] Suggestions for a data-warehouse migration routine

2005-04-28 Thread Mischa Sandberg
Quoting Richard Rowell [EMAIL PROTECTED]:

 I've ported enough of my companies database to Postgres to make
 warehousing on PG a real possibility.  I thought I would toss my
 data
 migration architecture ideas out for the list to shoot apart..
 
[...]
Not much feedback required.

Yes, dropping the entire database is faster and simpler.
If your database is small enough that you can rebuild it from scratch
every time, go for it.

Yes, vacuum analyze required; creating indexes alone does not create
statistics.

From a I'd dump an extract of pg_stat[io_]user_(tables|indexes)
to see how index usage and table load changes over time.
-- 
Dreams come true, not free. -- S.Sondheim, ITW 


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


[PERFORM] Suggestions for a data-warehouse migration routine

2005-04-27 Thread Richard Rowell
I've ported enough of my companies database to Postgres to make
warehousing on PG a real possibility.  I thought I would toss my data
migration architecture ideas out for the list to shoot apart..

1.  Script on production server dumps the production database (MSSQL) to
a set of delimited text files.  
2.  Script on production server moves files via FTP to a Postgres
database server.  
3.  File Alteration Monitor trigger on PG server executes script when
last file is transferred.
4.  Script on PG server drops the target database (by issuing a dropdb
command).
5.  Script on PG server re-creates target database. (createdb command)
6.  Script on PG server re-creates the tables.
7.  Script on PG server issues COPY commands to import data.
8.  Script on PG server indexes tables.
9.  Script on PG server builds de-normalized reporting tables.
10. Script on PG server indexes the reporting tables.
11. Script on PG server creates needed reporting functions.
12. Vacuum analyze?

My question revolves around the drop/create for the database.  Is their
significant downside to this approach?  I'm taking this approach because
it is simpler from a scripting point of view to simply start from
scratch on each warehouse update.  If I do not drop the database I would
need to delete the contents of each table and drop all indexes prior to
the COPY/data import.  My assumption is all the table deletes and index
drops would be more expensive then just droping/re-creating the entire
database.

Also, is the Vacuum analyze step needed on a freshly minted database
where the indexes have all been newly created?

Thanks in advance for all feedback.


-- 


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Suggestions for a data-warehouse migration routine

2005-04-27 Thread John A Meinel
Richard Rowell wrote:
I've ported enough of my companies database to Postgres to make
warehousing on PG a real possibility.  I thought I would toss my data
migration architecture ideas out for the list to shoot apart..
1.  Script on production server dumps the production database (MSSQL) to
a set of delimited text files.
2.  Script on production server moves files via FTP to a Postgres
database server.
3.  File Alteration Monitor trigger on PG server executes script when
last file is transferred.
4.  Script on PG server drops the target database (by issuing a dropdb
command).
5.  Script on PG server re-creates target database. (createdb command)
6.  Script on PG server re-creates the tables.
7.  Script on PG server issues COPY commands to import data.
8.  Script on PG server indexes tables.
9.  Script on PG server builds de-normalized reporting tables.
10. Script on PG server indexes the reporting tables.
11. Script on PG server creates needed reporting functions.
12. Vacuum analyze?
My question revolves around the drop/create for the database.  Is their
significant downside to this approach?  I'm taking this approach because
it is simpler from a scripting point of view to simply start from
scratch on each warehouse update.  If I do not drop the database I would
need to delete the contents of each table and drop all indexes prior to
the COPY/data import.  My assumption is all the table deletes and index
drops would be more expensive then just droping/re-creating the entire
database.
I believe you are correct. If you are going to completely wipe the
database, just drop it and re-create. Deleting is much slower than
dropping. (One of the uses of partitioning is so that you can just drop
one of the tables, rather than deleting the entries). Dropping the whole
db skips any Foreign Key checks, etc.
Also, is the Vacuum analyze step needed on a freshly minted database
where the indexes have all been newly created?
Thanks in advance for all feedback.
ANALYZE is needed, since you haven't updated any of your statistics yet.
So the planner doesn't really know how many rows there are.
VACUUM probably isn't since everything should be pretty well aligned.
John
=:-


signature.asc
Description: OpenPGP digital signature