Re: what journal options should I use on linux?
Just use XFS. it's a solve problem.. Kevin On 3/8/07, Christopher A. Kantarjiev [EMAIL PROTECTED] wrote: I'm setting up mysql on linux for the first time (have been using OpenBSD and NetBSD with UFS until now). The default file system is ext3fs, and I don't mind that, but it seems really silly to use a journaled file system for the database data - doubling my writes. In particular, I have a couple of use cases where I spend a week or so creating a 17GB data (table) file and its 15GB index file, and then do sparse queries out of it. I need as much write speed as I can get. I certainly don't want to have every data block written twice, once to the journal and once to the file, along with the extra seeks. What do people with this sort of large problem use on Linux? Thanks, chris -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED] -- Founder/CEO Tailrank.com Location: San Francisco, CA AIM/YIM: sfburtonator Skype: burtonator Blog: feedblog.org Cell: 415-637-8078
Re: what journal options should I use on linux?
Enabling journaling isn't going to halve your performance. Remember, a journal is a record of what happened. It is only added to, not updated, so while there is overhead, performance is fairly good. ext3 also has a few different journaling optins. Journaling is mainly for quick recovery and corruption prevention in case of a crash. This is something you would want your OS running on. If you can live with your database getting corrupted and can live with the time it would take to restore a backup, then you can forgoe journaling. I would create a partition for you OS and then a partition for your data. One can be journalled, the other not. You can even optimize the block size on the partition for your data, or use a different file system for your data partition altogether. Here is an interesting article to read on ext3 journaling overhead. http://www-128.ibm.com/developerworks/library/l-fs8.html - Original Message - From: Christopher A. Kantarjiev [EMAIL PROTECTED] To: mysql@lists.mysql.com; [EMAIL PROTECTED] Sent: Thursday, March 08, 2007 10:49 PM Subject: what journal options should I use on linux? I'm setting up mysql on linux for the first time (have been using OpenBSD and NetBSD with UFS until now). The default file system is ext3fs, and I don't mind that, but it seems really silly to use a journaled file system for the database data - doubling my writes. In particular, I have a couple of use cases where I spend a week or so creating a 17GB data (table) file and its 15GB index file, and then do sparse queries out of it. I need as much write speed as I can get. I certainly don't want to have every data block written twice, once to the journal and once to the file, along with the extra seeks. What do people with this sort of large problem use on Linux? Thanks, chris -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED] -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: what journal options should I use on linux?
Brent, Thanks for your response. Enabling journaling isn't going to halve your performance. I was careful to say write speed, not performance. I already have my data and index files on separate drives (raid volumes, actually, each made up of complete drives). What I see is that the index drive is being clobbered during table creation, because mysql can't keep it all in memory. This is a long standing problem with MyISAM files, where the index code isn't 64-bit safe. Yes, 64-bit. This is a quad-processor opteron with 16GB of ram. The index file is 15GB these days, so even if My ISAM *could* hold more than about 3GB of index in its data structures, it probably wouldn't all fit in memory. Did I mention that this is a big data problem? Please don't tell me to use InnoDB. It's much too slow for this purpose. Here is an interesting article to read on ext3 journaling overhead. http://www-128.ibm.com/developerworks/library/l-fs8.html Interesting, if only to show how dangerous it is to publish results that aren't understood. The author doesn't say anything about testing methodology, so I have no idea whether or not to trust the results. 16MB files are toys; they easily fit completely in memory and Linux makes it difficult to clear the buffer cache between runs. Was the machine rebooted between every test? When he runs these tests again with files that are bigger than available RAM, I'll be a lot more interested. -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: what journal options should I use on linux?
Yes, the article did lack a lot of methodology information. Since you already have things separated and it seems your data isn't ciritical, meaning losing it would be a pain as opposed to life threatening, you can turn off journaling for those file systems. You may actually want to look at other file system for those volumes too. I'm not going to recommend InnoDB, it doesn't seem like you need transactions and clearly you are outside the physicial size where InndoDB performs well. This might be a silly question, but did you max out the myisam_sort_buffer_size and key_buffer_size settings? Both can go up to 4GB. After indexes are created you can drop them back down since those large numbers are usually unnecessary for daily use. You can also turn off indexing when loading the data, then turn it back on when the data is loaded (if you haven't already). But I've had some funky things happen when I've done this with large datasets (500 million records) and auto increment fields. - Original Message - From: Chris Kantarjiev [EMAIL PROTECTED] To: [EMAIL PROTECTED]; [EMAIL PROTECTED]; mysql@lists.mysql.com Sent: Friday, March 09, 2007 12:09 PM Subject: Re: what journal options should I use on linux? Brent, Thanks for your response. Enabling journaling isn't going to halve your performance. I was careful to say write speed, not performance. I already have my data and index files on separate drives (raid volumes, actually, each made up of complete drives). What I see is that the index drive is being clobbered during table creation, because mysql can't keep it all in memory. This is a long standing problem with MyISAM files, where the index code isn't 64-bit safe. Yes, 64-bit. This is a quad-processor opteron with 16GB of ram. The index file is 15GB these days, so even if My ISAM *could* hold more than about 3GB of index in its data structures, it probably wouldn't all fit in memory. Did I mention that this is a big data problem? Please don't tell me to use InnoDB. It's much too slow for this purpose. Here is an interesting article to read on ext3 journaling overhead. http://www-128.ibm.com/developerworks/library/l-fs8.html Interesting, if only to show how dangerous it is to publish results that aren't understood. The author doesn't say anything about testing methodology, so I have no idea whether or not to trust the results. 16MB files are toys; they easily fit completely in memory and Linux makes it difficult to clear the buffer cache between runs. Was the machine rebooted between every test? When he runs these tests again with files that are bigger than available RAM, I'll be a lot more interested. -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: what journal options should I use on linux?
Yes, the article did lack a lot of methodology information. This one is *lots* better. http://www.bullopensource.org/ext4/sqlbench/index.html Losing data is always bad - that's why it's in a database, not a filesystem. But these systems have been pretty reliable and are on UPS, etc. This is a created table, so it's not life critical ... but it is expensive. This might be a silly question, but did you max out the myisam_sort_buffer_size and key_buffer_size settings? Both can go up to 4GB. Yup. Not nearly big enough - index is 15GB! And the code seems to misbehave, leading to crashes, at the limit. You can also turn off indexing when loading the data, then turn it back on when the data is loaded (if you haven't already). We need INSERT IGNORE, so this isn't really an option for us, unfortunately. I'm going to mount them as ext2fs for the time being. Best, chris -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: what journal options should I use on linux?
Would using merge tables be an option? You would have to split your data, but then each individual table could be within MySQL memory limits. Divide and conquer. You would need a way to work around your insert ignore requirement. But the end result could be much faster. The solution I use for duplicate filtering is to to create a load table which loads a set of records, then compares those records against the merge table for duplicates, deleting any found. Then the load table is added to the merge table and the process is repeated for the next batch of data. Adding 2.5 million rows to a 500 million row table takes about 2 hours, a daily occurance. The bottleneck is that a record is considered a duplicate if another similar one exists withing a 24 hour time period. So it's a range match rather than a direct comparison, which means I couldn't use insert ignore anyway. Thanks for the link. - Original Message - From: Chris Kantarjiev [EMAIL PROTECTED] To: [EMAIL PROTECTED]; [EMAIL PROTECTED]; mysql@lists.mysql.com Sent: Friday, March 09, 2007 12:42 PM Subject: Re: what journal options should I use on linux? Yes, the article did lack a lot of methodology information. This one is *lots* better. http://www.bullopensource.org/ext4/sqlbench/index.html Losing data is always bad - that's why it's in a database, not a filesystem. But these systems have been pretty reliable and are on UPS, etc. This is a created table, so it's not life critical ... but it is expensive. This might be a silly question, but did you max out the myisam_sort_buffer_size and key_buffer_size settings? Both can go up to 4GB. Yup. Not nearly big enough - index is 15GB! And the code seems to misbehave, leading to crashes, at the limit. You can also turn off indexing when loading the data, then turn it back on when the data is loaded (if you haven't already). We need INSERT IGNORE, so this isn't really an option for us, unfortunately. I'm going to mount them as ext2fs for the time being. Best, chris -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: what journal options should I use on linux?
Merge tables aren't an easy option here. There's a higher level data partitioning that we're putting into place, though, and it's been shown to help a lot. But I also want to get as much out of the file system as I can. The solution I use for duplicate filtering is to to create a load table which loads a set of records, then compares those records against the merge table for duplicates, deleting any found. Then the load table is added to the merge table and the process is repeated for the next batch of data. I don't think this will help us, but it's an interesting technique. We use staging tables to cut the load in a bunch of places. I think the true answer to this particular problem lies outside SQL and instead with a private index structure that is tuned for dealing with duplicates...it would help if the MyISAM engine was a little more clever about really large indexes. Best, chris -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
what journal options should I use on linux?
I'm setting up mysql on linux for the first time (have been using OpenBSD and NetBSD with UFS until now). The default file system is ext3fs, and I don't mind that, but it seems really silly to use a journaled file system for the database data - doubling my writes. In particular, I have a couple of use cases where I spend a week or so creating a 17GB data (table) file and its 15GB index file, and then do sparse queries out of it. I need as much write speed as I can get. I certainly don't want to have every data block written twice, once to the journal and once to the file, along with the extra seeks. What do people with this sort of large problem use on Linux? Thanks, chris -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]