CLASSIFICATION: UNCLASSIFIED SECURITY CONTROL MARKING: FOR OFFICIAL USE ONLY
Perhaps Oracle will do a better job for you in this area. -----Original Message----- From: Tom Cunningham [mailto:[EMAIL PROTECTED] Sent: Friday, January 14, 2005 3:16 AM To: mysql@lists.mysql.com Subject: data-warehouse ETL with MySQL Hi everybody. This is just a note to tell people what I'm doing, hoping I might get some comments. In short: I have been using MySQL SQL scripts to do data-warehouse ETL for about 6 months. It's gone well, but perhaps getting a little creaky now. The general format is this: 1. Add any new records to my dimensions, if not already there. Done with a "LEFT JOIN dimension ON (..dimension attributes..) WHERE dimension.pk IS NULL" 2. Insert the facts, using the dimensional keys. 3. Update the fact records' other columns. The predictable problem I'm having is the lack of abstractness of SQL, so I have to repeat things which I don't want to repeat. I'm looking at an alternative, which may cut down repetition, and I'm curious if anyone else has thought or written about this sort of stuff. The alternative: 1. Put *all* the ETL data into a flat table first. 2. Have a unique index on *all* the rows in my slowly-changing dimension. So new dimension records are simply added with an "INSERT IGNORE" from the flat table. 3. Insert the facts from the flat table using NATURAL JOINs with the dimension tables. This second approach has advantages & disadvantages: GOOD: Less mentioning of column names, so the script is shorter & more robust to change. GOOD: The last advantage particularly good for what I'd call "attribute-combination" columns than the other method, where the dimension has no natural key. BAD: All dimension columns must be considered to be slowly-changing (becuase of the unique key and natural join) - though doesn't seem serious problem now. I have some sample scripts fleshing these ideas out more, if anyone's interested (tom, at, videoisland, dot, com). -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED] Classification: UNCLASSIFIED Security Control Marking: FOR OFFICIAL USE ONLY -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED]