Re: scalability of MySQL - future plans?
Adequate data warehouse performance requires more than just hardware. 2 crucial make-or-break software features are partitioning and parallel query. On very large tables - accessing a large slice of the data via index is completely unfeasible. Table scan is the only option. Partitioning allows you to scan only the necessary segments instead of reading the whole table and rejecting massive numbers of rows. Parallel query breaks the job up so that multiple processes of the OS can participate and speed up the process. These features are an absolute necessity if we wanted to migrate our large databases from Oracle to MySQL. We are eager for MySQL to make them priority features. MySQL's market appeal would just explode. We will do our best to contribute to the effort if we can. I'd like to urge others who plan to use MySQL with large databases to consider doing the same. Thanks, Udi "Heikki Tuuri" <[EMAIL PROTECTED]> 11/12/2004 06:57 AM To: <[EMAIL PROTECTED]> cc: Subject:Re: scalability of MySQL - future plans? Jacek, - Original Message - From: "Jacek Becla" <[EMAIL PROTECTED]> Newsgroups: mailing.database.myodbc Sent: Friday, November 12, 2004 2:30 AM Subject: scalability of MySQL - future plans? > Hello, > > What are the plans regarding improving scalability of MySQL? We are > currently trying to decide what technology/product to use for a large > project that will generate ~600TB/year starting in 2012. Any pointers to > related articles or hints how safe is to assume that MySQL will be able > to handle petabyte-scale dataset in 8-10 years would be greatly > appreciated. hmm... this mostly depends on hardware. With the innodb_file_per_table option, a single InnoDB table can be 64 TB in size, and you can have 4 billion such tables. With current PC hardware, the speed of a single CPU allows you to insert 10 000 rows per second, if the load is not disk-bound. Let us assume that a single row in 100 bytes. That makes 1 MB/s, which is 30 TB/year. CPU speed will probably double every 4 years or so. Thus, CPU speed will suffice if you use a multiprocessor. Normally, a database server has main memory at least 1 % of the data size. Is 6000 GB RAM realistic in 2012? Memory sizes will probably double every 2 to 3 years. If a high-end server today has 32 GB of RAM, in year 2012 it might have 512 GB of RAM. You will need a huge server. The worst problem is the disk seek time. If your tables have secondary indexes where the insertion order is random, a modern disk, in combination with the InnoDB insert buffer, can insert maybe 200 random records per second. That is 100 rows/s for a typical table. You are going to insert 200 000 rows/s. You may need a disk farm of 4000 physical disks. Such disk farms exist today, but they are expensive, and we have no experience how Linux performs on them. Probably by 2012, Linux is good enough, if not yet today. If you insert rows in large batches to tables smaller than your main memory, or if you insert in the prder of the primary key, and you do not have secondary keys, then there are no random accesses to disks, and you do not need a disk farm. A typical disk in 2012 may store 1 TB. Thus, you will need at least 600 disks anyway. How long does it take to build an index to a 64 TB table if you have 6 TB of memory? If the index completely fits in the memory, then this is sequential disk I/O. With today's high end disks, you can read 60 MB/s. Building an index with a single disk would take 2 weeks. In 2012, it might take only 3 days. Conclusion: MySQL/InnoDB is able to handle that workload of 600 TB/year in year 2012. But you will need a huge server which has 10 x the memory of a high-end server, and 600 - 4000 physical disk drives. The following link describes a system with 512 GB of memory, and 2000 disk drives: http://www.tpc.org/results/individual_results/IBM/IBM_690_040217_es.pdf The system costs 5.6 million US dollars. > Best regards, > Jacek Becla > Stanford University Best regards, Heikki Tuuri Innobase Oy Foreign keys, transactions, and row level locking for MySQL InnoDB Hot Backup - a hot backup tool for InnoDB which also backs up MyISAM tables http://www.innodb.com/order.php Order MySQL technical support from https://order.mysql.com/ -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: [OT] HP DL760G2 vs Opteron for large DB on Linux
It's a great server (the DL760 G2). They really opened up the bus and the Xeon 3.0 is superfast. "darren" <[EMAIL PROTECTED]> 10/03/2004 07:51 PM To: [EMAIL PROTECTED] cc: Subject:[OT] HP DL760G2 vs Opteron for large DB on Linux Hi all, I am looking to purchase hardware for a large database running MySQL or Oracle on Linux. Was thinking of the Opteron for its good scalability and memory bandwidth until my friend told me that the DL760 from HP that uses the F8 (from Intel Profusion) chipset allowing up to 8 CPUs and memeory up to 32GB. I have always though that the Xeons cannot go beyond 4 CPUs and 4GB but this changes the equation. Has anyone got experience with this server or similar ones? How is the performance for running databases, esp in terms of large RAM usage? -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Mysql growing pains, 4 days to create index on one table!
Here's another option to load without requiring a primary key (requires a LOT of extra disk space and fast CPU, and a batch window to run). Load the new daily data into the table without checking for dupes. Then create a new version of the table with distinct values. Something like this (assuming your table has 2 columns which are char (3) and char (5) for simplicity's sake) in pseudocode: (remember to pad all columns to the maximum width to make them uniform, right spaces on char, left zeroes on numerics) CREATE NEW_TABLE AS SELECT SUBSTR (DISTINCT (RPAD(COL1,3) || RPAD(COL2,5)),1,3) AS COL1, SUBSTR (DISTINCT (RPAD(COL1,3) || RPAD(COL2,5)),4,5) AS COL2 FROM OLD_TABLE matt ryan <[EMAIL PROTECTED]> 07/16/2004 07:43 AM To: cc: [EMAIL PROTECTED] Subject:Re: Mysql growing pains, 4 days to create index on one table! Donny Simonton wrote: >Matt, >I've been reading this thread for a while and at this point, I would say >that you would need to provide the table structures and queries that you are >running. > >For example, we have one table that has 8 billion rows in it and it close to >100 gigs and we can hammer it all day long without any problems. It really >depends on how you are doing things. > >But as far as you mentioning about mysql not using multiple indexes, it >does. You just have to create an index on multiple fields at one time. >I've got tables with 10 fields in one index, now the trick with mysql is >that you must use all top 10 fields in your where clause for mysql to really >take advantage of the index. > >But I would definitely send the list your table structure with your indexes >and some of your selects and inserts. You can always change the names of >things if you don't want people to know the names of everything. > >Just my 2 cents. > > Original email has the table structure, query speed is not an issue (it is, but I will deal with that later) the issue is insert speed, I get 150k-1M records daily, of these, only 5-10 % are new records, I load the EBCDIC file into a temp table, and then do "insert ignore into historytable select * from temp table" and I cant create multiple field indexes, I would need 100 indexes on a table, the index side already exceeds the table size, I tried to add an index on date, but gave up because it ran for 2 days and was not done yet. -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Mysql growing pains, 4 days to create index on one table!
Reordering the primary key wouldn't necessarily speed up the key check. The reason for selecting a particular sequence within a primary key is to put the columns that are available the most often - upfront - so that the index will have at least something to bite on. Can you parittion further? You want to shoot for reading no more than a 1% slice for a given query. If possible. Can you partition by ranges of a column? Values 01 to 100 in partition 1, 11 to 200 in partition 2, etc? Anything that's specified consistently in every query? Then again - if your table is huge... and your queries are all over the place... and the user can specify any search criteria... and there's no predictable pattern you can partition on... and you frequently add and/or remove lots of data from your table... and your queries pull lots of rows... You might be out of luck with MySQL ... sorry. You may need to switch to a database that has a parallel query facility. Then - every query becomes a massive table scan but gets divided into multiple concurrent subqueries - and overall the job finishes in a reasonable amount of time. The epitomy of brute force. It's hard to rationalize initially but after a while you see it's the only way to go. Remember - indexes are no longer required. We have a billion row 100GB table the users search any and every way. Response time is less than a minute. We are anxiously waiting to see this technology added to MySQL. Maybe one day we'll have some money to contribute to the effort. Parallel query is not trivial. That's why these databases are expensive. I can send you more details privately if you are interested. matt ryan <[EMAIL PROTECTED]> 07/14/2004 12:27 PM To: [EMAIL PROTECTED] cc: Subject:Re: Mysql growing pains, 4 days to create index on one table! [EMAIL PROTECTED] wrote: >You may want more indexes but you might be getting killed because you already have too many. > >To test - try loading into a table without indexes and see if it makes a difference. > >At the very least - check to see if the primary index which starts with 'dic' can make your special 'dic' index superfluous. > >If write speed is a bottleneck you might consider Raid-1 instead of Raid-5. > >Reading lots of rows via index is a killer. Depending on your hardware it may be cheaper to table scan 50 rows than to read 1 via index. However, this requires partitioning of the data based on some column which appears in every query and acts as an initial filter. If you are lucky enough to be in that situation - consider a MERGE table. > > > > These tables are merged, the total table size is huge, on this particular table, it's , 45,449,534 rows, however, all the merge tables combined are 258,840,305 records perhaps I should reorder the pimary key, putting the longest most unique record up front, and the least unique at the end, would that speed up the key check? I can tell that almost everything is read IO, very little write IO -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Mysql growing pains, 4 days to create index on one table!
That's the whole point. Eliminate your indexes and your load problems are solved. Especially given the fact that you insert ignore and don't use the primary key to validate uniqueness. matt ryan <[EMAIL PROTECTED]> 07/15/2004 11:38 AM To: cc: [EMAIL PROTECTED] Subject:Re: Mysql growing pains, 4 days to create index on one table! > > You might be out of luck with MySQL ... sorry. > > You may need to switch to a database that has a parallel query > facility. Then - every query becomes a massive table scan but gets > divided into multiple concurrent subqueries - and overall the job > finishes in a reasonable amount of time. The epitomy of brute force. > It's hard to rationalize initially but after a while you see it's the > only way to go. Remember - indexes are no longer required. > > We have a billion row 100GB table the users search any and every way. > Response time is less than a minute. > > We are anxiously waiting to see this technology added to MySQL. Maybe > one day we'll have some money to contribute to the effort. Parallel > query is not trivial. That's why these databases are expensive. > > I can send you more details privately if you are interested. I've used it, with oracle, but oracles index searches are better, hit the best one first, then 2nd best, then 3rd, but I really dont want to go to oracle, it's too complicated for my tech's vs mysql, hit the best one first, and use no other Query time is a non issue at this point, it's load time, load daily file into temp table, then insert ignore into main table, on key violation the violating record is ignored load time is the issue, the server loads files 15 hours a day, that big primary key makes loading any table over 2-3 gig VERY slow I thought it was a bug, everything was great untill you get up to about 3 or 4 gig, then it gets SLOW -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]