On Thu, Apr 3, 2008 at 2:28 PM, Dre <[EMAIL PROTECTED]> wrote:
>  1) Several sources seem to suggest MyISAM is a good choice for data
> warehousing, but due to my lack of experience in a transaction-less world,
> this makes me a little nervous.

MyISAM has the advantage of very fast loading.  It's much faster to
load than InnoDB.  However, InnoDB had better query performance on the
warehouse data I used.

> How do you handle data inconsistency
> problems when ETL jobs fail?

Usually, with a warehouse system the answer is that you rebuild it.  I
have built an incrementally loading ETL system, but I was under the
impression that is not the norm.

> I
> can see doing something like manually cleaning out the necessary tables
> before you re-run, but that seems a bit messy to me.  Anyone figure out a
> better approach?

I think it's pretty common to use a Y-loading approach, with one empty
schema and then do an atomic RENAME at the end to swap the new tables
into place.  When I used MyISAM tables, I did it that way.

>  2) Isn't the lack of bitmap indexes a problem in the warehouse? Most FKs in
> the fact tables will be low cardinality columns; queries that didn't use
> date would be very slow on large fact tables (MS SQL had this problem).  Has
> anyone run into this with MySQL?

You can bundle up low-cardinality columns by using a "junk dimension."
 It made a big difference for me.  You make a table with all of the
combinations of the low-cardinality fields and assign a key to each
combination.

There are some good presentation on data warehousing available from
the website for the MySQL Users Conference.

- Perrin

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Reply via email to