Re: what journal options should I use on linux?

2007-03-09 Thread Kevin Burton

Just use XFS. it's a solve problem..

Kevin

On 3/8/07, Christopher A. Kantarjiev [EMAIL PROTECTED] wrote:


I'm setting up mysql on linux for the first time (have been using OpenBSD
and
NetBSD with UFS until now). The default file system is ext3fs, and I don't
mind
that, but it seems really silly to use a journaled file system for the
database
data - doubling my writes.

In particular, I have a couple of use cases where I spend a week or so
creating
a 17GB data (table) file and its 15GB index file, and then do sparse
queries out
of it. I need as much write speed as I can get. I certainly don't want to
have
every data block written twice, once to the journal and once to the file,
along
with the extra seeks.

What do people with this sort of large problem use on Linux?

Thanks,
chris


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]





--
Founder/CEO Tailrank.com
Location: San Francisco, CA
AIM/YIM: sfburtonator
Skype: burtonator
Blog: feedblog.org
Cell: 415-637-8078


Re: what journal options should I use on linux?

2007-03-09 Thread Brent Baisley
Enabling journaling isn't going to halve your performance. Remember, a journal is a record of what happened. It is only added to, 
not updated, so while there is overhead, performance is fairly good. ext3 also has a few different journaling optins.


Journaling is mainly for quick recovery and corruption prevention in case of a crash. This is something you would want your OS 
running on. If you can live with your database getting corrupted and can live with the time it would take to restore a backup, then 
you can forgoe journaling.
I would create a partition for you OS and then a partition for your data. One can be journalled, the other not. You can even 
optimize the block size on the partition for your data, or use a different file system for your data partition altogether.


Here is an interesting article to read on ext3 journaling overhead.
http://www-128.ibm.com/developerworks/library/l-fs8.html

- Original Message - 
From: Christopher A. Kantarjiev [EMAIL PROTECTED]

To: mysql@lists.mysql.com; [EMAIL PROTECTED]
Sent: Thursday, March 08, 2007 10:49 PM
Subject: what journal options should I use on linux?


I'm setting up mysql on linux for the first time (have been using OpenBSD and NetBSD with UFS until now). The default file system 
is ext3fs, and I don't mind that, but it seems really silly to use a journaled file system for the database data - doubling my 
writes.


In particular, I have a couple of use cases where I spend a week or so creating a 17GB data (table) file and its 15GB index file, 
and then do sparse queries out of it. I need as much write speed as I can get. I certainly don't want to have every data block 
written twice, once to the journal and once to the file, along with the extra seeks.


What do people with this sort of large problem use on Linux?

Thanks,
chris


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]




--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: what journal options should I use on linux?

2007-03-09 Thread Chris Kantarjiev
Brent,

Thanks for your response.

 Enabling journaling isn't going to halve your performance.

I was careful to say write speed, not performance. 

I already have my data and index files on separate drives (raid
volumes, actually, each made up of complete drives). What I see
is that the index drive is being clobbered during table creation,
because mysql can't keep it all in memory. This is a long standing
problem with MyISAM files, where the index code isn't 64-bit safe.

Yes, 64-bit. This is a quad-processor opteron with 16GB of ram.
The index file is 15GB these days, so even if My ISAM *could*
hold more than about 3GB of index in its data structures, it
probably wouldn't all fit in memory.

Did I mention that this is a big data problem?

Please don't tell me to use InnoDB. It's much too slow for this purpose.

 Here is an interesting article to read on ext3 journaling overhead.
 http://www-128.ibm.com/developerworks/library/l-fs8.html

Interesting, if only to show how dangerous it is to publish
results that aren't understood. The author doesn't say anything
about testing methodology, so I have no idea whether or not to
trust the results. 16MB files are toys; they easily fit completely
in memory and Linux makes it difficult to clear the buffer cache
between runs. Was the machine rebooted between every test? When 
he runs these tests again with files that are bigger than available
RAM, I'll be a lot more interested.

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: what journal options should I use on linux?

2007-03-09 Thread Brent Baisley

Yes, the article did lack a lot of methodology information.

Since you already have things separated and it seems your data isn't ciritical, meaning losing it would be a pain as opposed to 
life threatening, you can turn off journaling for those file systems. You may actually want to look at other file system for those 
volumes too.
I'm not going to recommend InnoDB, it doesn't seem like you need transactions and clearly you are outside the physicial size where 
InndoDB performs well.


This might be a silly question, but did you max out the myisam_sort_buffer_size and key_buffer_size settings? Both can go up to 4GB. 
After indexes are created you can drop them back down since those large numbers are usually unnecessary for daily use.


You can also turn off indexing when loading the data, then turn it back on when the data is loaded (if you haven't already). But 
I've had some funky things happen when I've done this with large datasets (500 million records) and auto increment fields.



- Original Message - 
From: Chris Kantarjiev [EMAIL PROTECTED]

To: [EMAIL PROTECTED]; [EMAIL PROTECTED]; mysql@lists.mysql.com
Sent: Friday, March 09, 2007 12:09 PM
Subject: Re: what journal options should I use on linux?



Brent,

Thanks for your response.


Enabling journaling isn't going to halve your performance.


I was careful to say write speed, not performance.

I already have my data and index files on separate drives (raid
volumes, actually, each made up of complete drives). What I see
is that the index drive is being clobbered during table creation,
because mysql can't keep it all in memory. This is a long standing
problem with MyISAM files, where the index code isn't 64-bit safe.

Yes, 64-bit. This is a quad-processor opteron with 16GB of ram.
The index file is 15GB these days, so even if My ISAM *could*
hold more than about 3GB of index in its data structures, it
probably wouldn't all fit in memory.

Did I mention that this is a big data problem?

Please don't tell me to use InnoDB. It's much too slow for this purpose.


Here is an interesting article to read on ext3 journaling overhead.
http://www-128.ibm.com/developerworks/library/l-fs8.html


Interesting, if only to show how dangerous it is to publish
results that aren't understood. The author doesn't say anything
about testing methodology, so I have no idea whether or not to
trust the results. 16MB files are toys; they easily fit completely
in memory and Linux makes it difficult to clear the buffer cache
between runs. Was the machine rebooted between every test? When
he runs these tests again with files that are bigger than available
RAM, I'll be a lot more interested. 



--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: what journal options should I use on linux?

2007-03-09 Thread Chris Kantarjiev
 Yes, the article did lack a lot of methodology information.

This one is *lots* better.

http://www.bullopensource.org/ext4/sqlbench/index.html

Losing data is always bad - that's why it's in a database, not a filesystem.
But these systems have been pretty reliable and are on UPS, etc. This
is a created table, so it's not life critical ... but it is expensive.

 This might be a silly question, but did you max out the 
 myisam_sort_buffer_size 
 and key_buffer_size settings? Both can go up to 4GB.

Yup. Not nearly big enough - index is 15GB!
And the code seems to misbehave, leading to crashes, at the limit.

 You can also turn off indexing when loading the data, then turn it back on 
 when the data is loaded (if you haven't already).

We need INSERT IGNORE, so this isn't really an option for us,
unfortunately.

I'm going to mount them as ext2fs for the time being.

Best,
chris

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: what journal options should I use on linux?

2007-03-09 Thread Brent Baisley

Would using merge tables be an option? You would have to split your data, but 
then each individual table could be within MySQL
memory limits. Divide and conquer. You would need a way to work around your 
insert ignore requirement. But the end result could be
much faster.

The solution I use for duplicate filtering is to to create a load table 
which loads a set of records, then compares those
records against the merge table for duplicates, deleting any found. Then the 
load table is added to the merge table and the process
is repeated for the next batch of data.

Adding 2.5 million rows to a 500 million row table takes about 2 hours, a daily 
occurance. The bottleneck is that a record is
considered a duplicate if another similar one exists withing a 24 hour time 
period. So it's a range match rather than a direct
comparison, which means I couldn't use insert ignore anyway.

Thanks for the link.

- Original Message - 
From: Chris Kantarjiev [EMAIL PROTECTED]

To: [EMAIL PROTECTED]; [EMAIL PROTECTED]; mysql@lists.mysql.com
Sent: Friday, March 09, 2007 12:42 PM
Subject: Re: what journal options should I use on linux?



Yes, the article did lack a lot of methodology information.


This one is *lots* better.

http://www.bullopensource.org/ext4/sqlbench/index.html

Losing data is always bad - that's why it's in a database, not a filesystem.
But these systems have been pretty reliable and are on UPS, etc. This
is a created table, so it's not life critical ... but it is expensive.


This might be a silly question, but did you max out the
myisam_sort_buffer_size
and key_buffer_size settings? Both can go up to 4GB.


Yup. Not nearly big enough - index is 15GB!
And the code seems to misbehave, leading to crashes, at the limit.


You can also turn off indexing when loading the data, then turn it back on
when the data is loaded (if you haven't already).


We need INSERT IGNORE, so this isn't really an option for us,
unfortunately.

I'm going to mount them as ext2fs for the time being.

Best,
chris 



--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: what journal options should I use on linux?

2007-03-09 Thread Chris Kantarjiev
Merge tables aren't an easy option here. There's a higher level
data partitioning that we're putting into place, though, and it's
been shown to help a lot. But I also want to get as much out of the
file system as I can.

 The solution I use for duplicate filtering is to to create a
 load table which loads a set of records, then compares those
 records against the merge table for duplicates, deleting any found.
 Then the load table is added to the merge table and the process is
 repeated for the next batch of data.

I don't think this will help us, but it's an interesting technique.
We use staging tables to cut the load in a bunch of places. 

I think the true answer to this particular problem lies outside SQL
and instead with a private index structure that is tuned for dealing
with duplicates...it would help if the MyISAM engine was a little
more clever about really large indexes.

Best,
chris

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



what journal options should I use on linux?

2007-03-08 Thread Christopher A. Kantarjiev
I'm setting up mysql on linux for the first time (have been using OpenBSD and 
NetBSD with UFS until now). The default file system is ext3fs, and I don't mind 
that, but it seems really silly to use a journaled file system for the database 
data - doubling my writes.


In particular, I have a couple of use cases where I spend a week or so creating 
a 17GB data (table) file and its 15GB index file, and then do sparse queries out 
of it. I need as much write speed as I can get. I certainly don't want to have 
every data block written twice, once to the journal and once to the file, along 
with the extra seeks.


What do people with this sort of large problem use on Linux?

Thanks,
chris


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]