Re: Data Warehouse on MySQL questions
Hey Baron, Your blog post was quite informative; your suggestion to use a combination of merged MyISAM tables and InnoDB for the live partition made a lot of sense, and it sounds like the path I'll need to follow. I appreciate the information! Baron Schwartz wrote: Hi, I'll just address the things others didn't answer. On Thu, Apr 3, 2008 at 2:28 PM, Dre <[EMAIL PROTECTED]> wrote: 1) Several sources seem to suggest MyISAM is a good choice for data warehousing, but due to my lack of experience in a transaction-less world, One approach you might consider is using MyISAM tables and MERGE tables together. This makes it much easier to remove old data, if that is a concern, and keeps each table a manageable size. For example, put each day's data in a new MyISAM table and then create a MERGE over them. You might also look into MySQL 5.1's partitioning, too. Brian Miezejevski gave a great session on this at last year's MySQL Conference. The slides don't seem to be available online, but I wrote up my notes on the session: http://www.xaprb.com/blog/2007/04/27/mysql-conference-and-expo-2007-day-4/ You can probably email him and get the slides as well. Regards Baron -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Data Warehouse on MySQL questions
Hi, I'll just address the things others didn't answer. On Thu, Apr 3, 2008 at 2:28 PM, Dre <[EMAIL PROTECTED]> wrote: > 1) Several sources seem to suggest MyISAM is a good choice for data > warehousing, but due to my lack of experience in a transaction-less world, One approach you might consider is using MyISAM tables and MERGE tables together. This makes it much easier to remove old data, if that is a concern, and keeps each table a manageable size. For example, put each day's data in a new MyISAM table and then create a MERGE over them. You might also look into MySQL 5.1's partitioning, too. Brian Miezejevski gave a great session on this at last year's MySQL Conference. The slides don't seem to be available online, but I wrote up my notes on the session: http://www.xaprb.com/blog/2007/04/27/mysql-conference-and-expo-2007-day-4/ You can probably email him and get the slides as well. Regards Baron -- Baron Schwartz, Senior Consultant, Percona Inc. Tel: +1 888 401 3401 ext 507 24/7 Emergency Line +1 888 401 3401 ext 911 Our Services: http://www.percona.com/services.html Our Blog: http://www.mysqlperformanceblog.com/ -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Data Warehouse on MySQL questions
On Thu, Apr 3, 2008 at 2:28 PM, Dre <[EMAIL PROTECTED]> wrote: > 1) Several sources seem to suggest MyISAM is a good choice for data > warehousing, but due to my lack of experience in a transaction-less world, > this makes me a little nervous. MyISAM has the advantage of very fast loading. It's much faster to load than InnoDB. However, InnoDB had better query performance on the warehouse data I used. > How do you handle data inconsistency > problems when ETL jobs fail? Usually, with a warehouse system the answer is that you rebuild it. I have built an incrementally loading ETL system, but I was under the impression that is not the norm. > I > can see doing something like manually cleaning out the necessary tables > before you re-run, but that seems a bit messy to me. Anyone figure out a > better approach? I think it's pretty common to use a Y-loading approach, with one empty schema and then do an atomic RENAME at the end to swap the new tables into place. When I used MyISAM tables, I did it that way. > 2) Isn't the lack of bitmap indexes a problem in the warehouse? Most FKs in > the fact tables will be low cardinality columns; queries that didn't use > date would be very slow on large fact tables (MS SQL had this problem). Has > anyone run into this with MySQL? You can bundle up low-cardinality columns by using a "junk dimension." It made a big difference for me. You make a table with all of the combinations of the low-cardinality fields and assign a key to each combination. There are some good presentation on data warehousing available from the website for the MySQL Users Conference. - Perrin -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Data Warehouse on MySQL questions
I've built several datamarts using perl and MySQL. The largest ones have been up to about 30GB, so I'm not quite on your scale. for #1, I have an etl_id in the fact table so I can track back any particular ETL job. I typically make it a dimension and include date, time, software version, etc. That doesn't help so much if you're messing up your dimension tables, but I haven't typically run into that problem based on the designs I've used. For #2, I haven't built anything big enough for it to be a concern yet.. Also, LOAD DATA INFILE is your friend :) On Thu, Apr 3, 2008 at 11:28 AM, Dre <[EMAIL PROTECTED]> wrote: > Hey folks, > > I'm currently deciding whether to build a decent sized (around 300-500GB, > although honestly, I've got little to base that on at the moment) data > warehouse in postgreSQL or MySQL. I've developed several in MS SQL and > postgreSQL, but the client is comfortable with MySQL, and I'd prefer to use > that as the platform since it will be less painful for them to manage when > I'm gone. I'm hoping that someone with experience building a warehouse on > MySQL will be able to answer two outstanding questions I have: > > 1) Several sources seem to suggest MyISAM is a good choice for data > warehousing, but due to my lack of experience in a transaction-less world, > this makes me a little nervous. How do you handle data inconsistency > problems when ETL jobs fail? (For the record, I don't use a separate tool > for the ETL; I usually use perl/shell scripts to interact with the file > system, and pl/pgsql or transact-sql once the data is loaded into the > staging database. For each file that is loaded, I'll identify steps that > must be posted together, and wrap them in a transaction in the ETL job.) I > can see doing something like manually cleaning out the necessary tables > before you re-run, but that seems a bit messy to me. Anyone figure out a > better approach? > > 2) Isn't the lack of bitmap indexes a problem in the warehouse? Most FKs in > the fact tables will be low cardinality columns; queries that didn't use > date would be very slow on large fact tables (MS SQL had this problem). Has > anyone run into this with MySQL? > > Many thanks in advance! > > -- > MySQL General Mailing List > For list archives: http://lists.mysql.com/mysql > To unsubscribe: > http://lists.mysql.com/[EMAIL PROTECTED] > > -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]