CLASSIFICATION: UNCLASSIFIED
SECURITY CONTROL MARKING: FOR OFFICIAL USE ONLY

Perhaps Oracle will do a better job for you in this area.

-----Original Message-----
From: Tom Cunningham [mailto:[EMAIL PROTECTED]
Sent: Friday, January 14, 2005 3:16 AM
To: mysql@lists.mysql.com
Subject: data-warehouse ETL with MySQL


Hi everybody.

This is just a note to tell people what I'm doing, hoping I might get
some comments.

In short: I have been using MySQL SQL scripts to do data-warehouse ETL
for about 6 months. It's gone well, but perhaps getting a little
creaky now.

The general format is this:
 1. Add any new records to my dimensions, if not already there. Done
with a "LEFT JOIN dimension ON (..dimension attributes..) WHERE
dimension.pk IS NULL"

 2. Insert the facts, using the dimensional keys.

 3. Update the fact records' other columns.

The predictable problem I'm having is the lack of abstractness of SQL,
so I have to repeat things which I don't want to repeat.

I'm looking at an alternative, which may cut down repetition, and I'm
curious if anyone else has thought or written about this sort of
stuff.

The alternative:
 1. Put *all* the ETL data into a flat table first.

 2. Have a unique index on *all* the rows in my slowly-changing
dimension. So new dimension records are simply added with an "INSERT
IGNORE" from the flat table.

 3. Insert the facts from the flat table using NATURAL JOINs with the
dimension tables.

This second approach has advantages & disadvantages:

GOOD: Less mentioning of column names, so the script is shorter & more
robust to change.

GOOD: The last advantage particularly good for what I'd call
"attribute-combination" columns than the other method, where the
dimension has no natural key.

BAD: All dimension columns must be considered to be slowly-changing
(becuase of the unique  key and natural join) - though doesn't seem
serious problem now.


I have some sample scripts fleshing these ideas out more, if anyone's
interested (tom, at, videoisland, dot, com).

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:
http://lists.mysql.com/[EMAIL PROTECTED]


Classification: UNCLASSIFIED
Security Control Marking: FOR OFFICIAL USE ONLY


-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Reply via email to