Re: hypothetical question about data storage

2013-07-29 Thread Manuel Arostegui
2013/7/30 Rick James 

> Elevator...  If the RAID _controller_ does the Elevator stuff, any OS
> optimizations are wasted.
> And there have been benchmarks backing that up.  (Sorry, don't have any
> links handy.)
>
> RAID 5/10 ...  The testing I have done shows very little difference.
>  However, you can slant the conclusion by picking one versus the other of:
> "For a given amount of disk space... RAID-X is better than Y."
> "For a given number of drives... RAID-Y is better than X."


The tests I have done with RAID5 vs RAID10 the difference is huge, at least
in our clusters with heavy writes.
We usually do RAIDS over 4 or 8 SAS disks (15krpm).
The performance of each type of RAID needs to be tested for your concrete
scenario, you can find lot of benchmarks out there, but you need to test
your workload to be sure what works better for you. As Rick said, with
BBUs, disk schedulers, write back/write thru configuration etc things can
change.

The last tests with SSD disks shows no difference, so for the new servers
with SSD we're going for RAID5 as you get more disk space :-)

Just my 2 cents!
Manuel.


-- 
Manuel Aróstegui
Systems Team
tuenti.com


Re: hypothetical question about data storage

2013-07-29 Thread Carsten Pedersen

On 30-07-2013 01:16, Rick James wrote:

Elevator...  If the RAID _controller_ does the Elevator stuff, any OS
optimizations are wasted. And there have been benchmarks backing that
up.  (Sorry, don't have any links handy.)

RAID 5/10 ...  The testing I have done shows very little difference.


...right up to the day one of the disks fail, and you thought you could 
just plug in a new spindle and let the system take care of the rest...


http://www.miracleas.com/BAARF/
http://www.miracleas.com/BAARF/RAID5_versus_RAID10.txt

/ Carsten

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql



RE: hypothetical question about data storage

2013-07-29 Thread Rick James
Elevator...  If the RAID _controller_ does the Elevator stuff, any OS 
optimizations are wasted.
And there have been benchmarks backing that up.  (Sorry, don't have any links 
handy.)

RAID 5/10 ...  The testing I have done shows very little difference.  However, 
you can slant the conclusion by picking one versus the other of:
"For a given amount of disk space... RAID-X is better than Y."
"For a given number of drives... RAID-Y is better than X."

When writing a random block, RAID-5 does not need to touch all the drives, only 
the one with parity.  Suitable XORs will update it correctly.  So, a write hits 
2 drives, whether you have RAID-5 or -10.

Some people make the chunk size 64KB (etc); not 512B.  With the Controller 
involved, there is not necessarily any benefit for large vs small chunk size.  
Writes are delayed until the it is optimal.  This leads to large streaming 
writes to each drive, regardless of chunk size (when writing a large stream).

A heavily used InnoDB system will be writing random 16KB blocks.

(I have no insight into RAID-6.)

> -Original Message-
> From: Johan De Meersman [mailto:vegiv...@tuxera.be]
> Sent: Monday, July 29, 2013 3:38 PM
> To: Rick James; will...@techservsys.com; mysql@lists.mysql.com
> Subject: RE: hypothetical question about data storage
> 
> Rick James  wrote:
> >
> >For MySQL + RAID, a Linux elevator strategy of 'deadline' or 'noop' is
> >optimal.  (The default, 'cfq', is not as good.)
> 
> I should look into those again at some point. Do you have a brief word as
> to why they're better?
> 
> 
> >A RAID controller with multiple drives striped (and optionally
> >parity-checked) (RAID-5, -10) and with a BBU (Battery Backed Write
> >Cache) is excellent for I/O.
> 
> Very true. 10 is traditionally considered better - it's certainly faster -
> but 5 is of course cheaper :-)
> 
> I'd like to add that 4+1 is the optimal configuration for RAID5 , as that
> makes for a stripe of 2kb, assuming 512b sectors of course. You then pick
> an fs that supports blocks of that size , which means that no write will
> ever need to perform a read first to calculate the checksum.
> 
> 
> 
> 
> --
> Sent from my Android phone with K-9 Mail. Please excuse my brevity.


RE: hypothetical question about data storage

2013-07-29 Thread Johan De Meersman
Rick James  wrote:
>
>For MySQL + RAID, a Linux elevator strategy of 'deadline' or 'noop' is
>optimal.  (The default, 'cfq', is not as good.)

I should look into those again at some point. Do you have a brief word as to 
why they're better?


>A RAID controller with multiple drives striped (and optionally
>parity-checked) (RAID-5, -10) and with a BBU (Battery Backed Write
>Cache) is excellent for I/O.

Very true. 10 is traditionally considered better - it's certainly faster - but 
5 is of course cheaper :-) 

I'd like to add that 4+1 is the optimal configuration for RAID5 , as that makes 
for a stripe of 2kb, assuming 512b sectors of course. You then pick an fs that 
supports blocks of that size , which means that no write will ever need to 
perform a read first to calculate the checksum.




-- 
Sent from my Android phone with K-9 Mail. Please excuse my brevity.

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql



RE: hypothetical question about data storage

2013-07-29 Thread Rick James
Most RAID controllers will happily do Elevator stuff like you mentioned.
So will Linux.

For MySQL + RAID, a Linux elevator strategy of 'deadline' or 'noop' is optimal. 
 (The default, 'cfq', is not as good.)

A RAID controller with multiple drives striped (and optionally parity-checked) 
(RAID-5, -10) and with a BBU (Battery Backed Write Cache) is excellent for I/O.

I don't know about "chronologically later".  InnoDB "does the right thing", as 
long as the OS does not cheat on fsync, etc.

> 1/10/10A/10aa342
Only 16 subdirectories per directory?  I would expect 256 to be more efficient 
overall.  This is because of fewer levels.  Scanning 256 is probably less 
costly than doing an extra level.  (Yeah, again, I can't _prove_ it in _your_ 
environment.)

4K tables on a single machine -- that is beginning to get into 'big' in 
reference to ulimit, table_open_cache, etc.  That is, if you went much past 
that, you would be getting into new areas of inefficiency.

I do not like splitting a database "table" into multiple tables, except by 
PARTITIONing.  PARTITIONing would also provide a 'instantaneous' way of purging 
old data.  (DROP PARTITION + REORGANIZE PARTITION)

Almost always (again no proof for your case), a single table is more efficient 
than many tables.  This applies to PARTITIONing, too, but there are can be 
other gains by using PARTITIONing.

InnoDB has a 64TB limit per PARTITION.

> -Original Message-
> From: william drescher [mailto:will...@techservsys.com]
> Sent: Saturday, July 27, 2013 4:32 AM
> To: mysql@lists.mysql.com
> Subject: Re: hypothetical question about data storage
> 
> On 7/26/2013 6:58 PM, Chris Knipe wrote:
> > The issue that we have identified is caused by seek time - hundreds of
> > clients simultaneously searching for a single file.  The only real way
> > to explain this is to run 100 concurrent instances of bonnie++ doing
> > random read/writes... Your disk utilization and disk latency
> > essentially goes through the roof resulting in IO wait and insanely
> > high load averages (we've seen it spike to over 150 on a 8-core Xeon -
> > at which time the application (at a 40 load average already) stops
> > processing requests to prevent the server crashing).
> 
> back in the day (many years ago) when I worked for IBM we had disk
> controllers that would queue and sort pending reads so that the heads
> would seek from low tracks across the disk to high tracks and then back to
> low. This resulted in very low seek _averages_.
> The controller was smart enough to make sure that if a write occurred,
> chronologically later reads got the right data, even if it had not been
> physically written to disk yet.
> 
> Is there such a controller available now?
> 
> bill
> 
> 
> --
> MySQL General Mailing List
> For list archives: http://lists.mysql.com/mysql
> To unsubscribe:http://lists.mysql.com/mysql


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql