Re: md RAID

2010-04-08 Thread Craig Falconer

Solor Vox wrote, On 04/09/2010 11:25 AM:

Sorry, I didn't remember asking for help in choosing RAID type.  Guess
I should re-read my own message.


Given some of your earlier comments - I'm guessing this is for a media 
PC of some description?


Cos if all you're storing is broadcast TV  its not really that 
important IMO.


I've done a wee bit with mythtv, and it has a feature called Storage 
Pools.  This allows you to set a collection of disks of any size (and 
quality) as One-Big-Disk but loss of one disk will not affect the rest.

The other media center apps may have something similar.

So you'd get all 6 TB of storage, but with no protection.



--
Craig Falconer


RE: Moving /var - problem with /var/lock and /var/run?

2010-04-08 Thread Steve Holdoway
On Fri, 2010-04-09 at 12:06 +1200, Bryce Stenberg wrote:
> 
> > -Original Message-
> > From: Steve Holdoway [mailto:st...@greengecko.co.nz]
> > The last colum in fstab is marked pass. This defines in what order
> > partitions are mounted. You must mount /var in the first pass, as
> > software needs it there immediately. So change the root and /var pass
> > values to 0 and all should be well.
> > 
> 
> Thanks Steve, I set it to '1' to force it to be checked (as per Hadley's
> comment) and it appears to have booted up fine without errors.
> 
> Cheers everyone,
>   Bryce Stenberg.

Well, that's not quite what Hads said. Filesystems are checked when
marked as dirty, and every x mounts ( see tune2fs for for details on how
to manipulate this and annoy sysadmins on ext2/3?4 file systems ). The
fsck stuff is performed in passes so that ( for example ) dependencies
like /var/www can be set up to be mounted after /var. 

As I suspected, this field also affects the order in which the file
systems are mounted even if fsck is not required. By changing the /
and /var pass values to be the same, you ensured there was no dependency
between the two, and both were mounted at the same time, ensuring the
availability of /var when necessary.

Steve

-- 
Steve Holdoway 
http://www.greengecko.co.nz
MSN: st...@greengecko.co.nz
GPG Fingerprint = B337 828D 03E1 4F11 CB90  853C C8AB AF04 EF68 52E0



RE: Moving /var - problem with /var/lock and /var/run?

2010-04-08 Thread Bryce Stenberg


> -Original Message-
> From: Steve Holdoway [mailto:st...@greengecko.co.nz]
> The last colum in fstab is marked pass. This defines in what order
> partitions are mounted. You must mount /var in the first pass, as
> software needs it there immediately. So change the root and /var pass
> values to 0 and all should be well.
>

Thanks Steve, I set it to '1' to force it to be checked (as per Hadley's
comment) and it appears to have booted up fine without errors.

Cheers everyone,
  Bryce Stenberg.




DISCLAIMER: If you have received this email in error, please notify us 
immediately by reply email, facsimile or collect telephone call to +64 3 
9641200 and destroy the original.  Please refer to full DISCLAIMER at 
http://www.hrnz.co.nz/eDisclaimer.htm







Re: md RAID

2010-04-08 Thread Steve Holdoway
On Fri, 2010-04-09 at 11:21 +1200, Solor Vox wrote:
> On 9 April 2010 11:07, Craig Falconer  wrote:
> 
> > Nice - I saw somewhere that the likelyhood of losing a second drive
> > increases exponentially once one has failed or started erroring.
> >
> > One way to reduce that risk is to assemble the raid on drives of different
> > brands/models or different production runs.  Then again... that seagate
> > firmware bug last year affected many models/sizes
> >
> > --
> > Craig Falconer
> >
> >
> 
> Good point Craig, also that using linux md raid allows you to use any
> controller you want.  So if you controller dies you can still get your
> data off any machine with room for the drives.  Plus, RAID5 software
> can actually be FASTER on software since the CRC parity is done on the
> processor vs slower raid controller board. (I've read some controllers
> can off-load that to CPU but that's not supported well in Linux)
> 
> And, software RAID can do cool stuff like odd number of RAID10 disks. :)
> 
> sV
> 
Only if you're careful. Many BIOSes will recognise raid controllers with
fake ( aka Windows ) raid functionality, and automagically install
support for it.

So be real careful what raid support you install in your kernel, and
which modules you blacklist.

Steve.

-- 
Steve Holdoway 
http://www.greengecko.co.nz
MSN: st...@greengecko.co.nz
GPG Fingerprint = B337 828D 03E1 4F11 CB90  853C C8AB AF04 EF68 52E0



Re: md RAID

2010-04-08 Thread Solor Vox
On 9 April 2010 11:20, Hadley Rich  wrote:
> On Fri, 2010-04-09 at 11:17 +1200, Solor Vox wrote:
>> Geez, you'd think I posted about using a windows box or something, so
>> many people going after the thing I didn't ask. O.o
>
> You'd think you weren't grateful for the help either.

Sorry, I didn't remember asking for help in choosing RAID type.  Guess
I should re-read my own message.


RE: Moving /var - problem with /var/lock and /var/run?

2010-04-08 Thread Steve Holdoway
On Fri, 2010-04-09 at 11:10 +1200, Hadley Rich wrote:
> On Fri, 2010-04-09 at 11:03 +1200, Steve Holdoway wrote:
> > The last colum in fstab is marked pass. This defines in what order
> > partitions are mounted. You must mount /var in the first pass, as
> > software needs it there immediately. So change the root and /var pass
> > values to 0 and all should be well. 
> 
> I don't know if it might be used to define what order they are mounted
> in, but I believe the official use it what order to fsck partitions
> in. / should be 1 and other partitions should be > 1 or 0 if you don't
> want them checked.
> 
> >From the man page;
> 
> The  sixth field, (fs_passno), is used by the fsck(8) program to
> determine the order in which filesystem checks are done at reboot time.
> The root filesystem should be specified with a fs_passno of 1, and other
> filesystems should have a fs_passno of 2.  Filesystems  within  a  drive
> will  be checked  sequentially,  but filesystems on different drives
> will be checked at the same time to utilize parallelism available in the
> hardware.  If the sixth field is not present or zero, a value of zero is
> returned and fsck will assume that the filesystem does not need to be
> checked.
> 
> 
> hads
> 
I thought it was both, but am happy to stand/sit corrected (:

Cheers,

Steve.


-- 
Steve Holdoway 
http://www.greengecko.co.nz
MSN: st...@greengecko.co.nz
GPG Fingerprint = B337 828D 03E1 4F11 CB90  853C C8AB AF04 EF68 52E0



Re: md RAID

2010-04-08 Thread Solor Vox
On 9 April 2010 11:07, Craig Falconer  wrote:

> Nice - I saw somewhere that the likelyhood of losing a second drive
> increases exponentially once one has failed or started erroring.
>
> One way to reduce that risk is to assemble the raid on drives of different
> brands/models or different production runs.  Then again... that seagate
> firmware bug last year affected many models/sizes
>
> --
> Craig Falconer
>
>

Good point Craig, also that using linux md raid allows you to use any
controller you want.  So if you controller dies you can still get your
data off any machine with room for the drives.  Plus, RAID5 software
can actually be FASTER on software since the CRC parity is done on the
processor vs slower raid controller board. (I've read some controllers
can off-load that to CPU but that's not supported well in Linux)

And, software RAID can do cool stuff like odd number of RAID10 disks. :)

sV


Re: md RAID

2010-04-08 Thread Hadley Rich
On Fri, 2010-04-09 at 11:17 +1200, Solor Vox wrote:
> Geez, you'd think I posted about using a windows box or something, so
> many people going after the thing I didn't ask. O.o 

You'd think you weren't grateful for the help either.




Re: md RAID

2010-04-08 Thread Solor Vox
On 9 April 2010 10:58, Steve Holdoway  wrote:
> You state this as fact... I find it strange, both from theory and
> experience. A random, fairly recent article ( yeah, it's not brilliant,
> but... )
>
> http://www.myhostnews.com/2008/09/optimizing-raid-performance-bencmarks/
>
> suggests that, while RAID 5 may be fastest with sequential reads,
> greatly is an exaggeration of the difference.
>
> As I said before, you need to suck it and see with your own hardware
> setup, and loading ( things like memory available for caching may make a
> huge difference for example ).
>
> Steve
>

Geez, you'd think I posted about using a windows box or something, so
many people going after the thing I didn't ask. O.o

I've tested myself and found RAID5 to be better for large file reads.
But don't take my word for either.  Do your own research or see a few
others here:

http://kendalvandyke.blogspot.com/2009/02/disk-performance-hands-on-part-5-raid.html
http://ubuntuforums.org/showthread.php?p=9039994#post9039994

People go back and forth all day debating about types of RAID.
Generally it's accepted fact that RAID5 is better for reads, and
RAID10 for writes.  But you are also forgetting lots of other factors
that you didn't  have information on such as:

Case:  Only has room for 4 drives (M-ATX)
PSU: More drives requires larger power supply
Cost: RAID10 requires 6 where RAID5 will only take 4 for the same space.
Motherboard: M-ATX board with 6 SATA, but one for blu-ray drive and
only pci-e slot for tv capture card.

I realize that you guys feel strongly about RAID10, but I wasn't
really asking for opinions on it.  :p  Just on file system, partition,
and RAID alignment.  Give me some credit that I've consider all the
factors and went with what I thought was best for my needs.

Cheers,
sV


RE: Moving /var - problem with /var/lock and /var/run?

2010-04-08 Thread Hadley Rich
On Fri, 2010-04-09 at 11:03 +1200, Steve Holdoway wrote:
> The last colum in fstab is marked pass. This defines in what order
> partitions are mounted. You must mount /var in the first pass, as
> software needs it there immediately. So change the root and /var pass
> values to 0 and all should be well. 

I don't know if it might be used to define what order they are mounted
in, but I believe the official use it what order to fsck partitions
in. / should be 1 and other partitions should be > 1 or 0 if you don't
want them checked.

>From the man page;

The  sixth field, (fs_passno), is used by the fsck(8) program to
determine the order in which filesystem checks are done at reboot time.
The root filesystem should be specified with a fs_passno of 1, and other
filesystems should have a fs_passno of 2.  Filesystems  within  a  drive
will  be checked  sequentially,  but filesystems on different drives
will be checked at the same time to utilize parallelism available in the
hardware.  If the sixth field is not present or zero, a value of zero is
returned and fsck will assume that the filesystem does not need to be
checked.


hads

-- 
http://nicegear.co.nz
New Zealand's Open Source Hardware Supplier



Re: md RAID

2010-04-08 Thread Craig Falconer

Bryce Stenberg wrote, On 04/09/2010 10:21 AM:

My experience with RAID is all from windows - but it may translate to
Linux.
I would have ask why not use Hardware RAID (unless not available) so in
the OS all your dealing with is a single disk setup rather than all this
software RAID complication?


Windows software raid is arse, thats why.

Linux software raid shows you all the gory detail and lets you shoot 
yourself quite successfully.  Its much more versatile.




As a side note on the Informix list I watch it is repeatedly said not to
use RAID 5 if you can - explanation here:
http://www.miracleas.com/BAARF/RAID5_versus_RAID10.txt


Nice - I saw somewhere that the likelyhood of losing a second drive 
increases exponentially once one has failed or started erroring.


One way to reduce that risk is to assemble the raid on drives of 
different brands/models or different production runs.  Then again... 
that seagate firmware bug last year affected many models/sizes


--
Craig Falconer



RE: Moving /var - problem with /var/lock and /var/run?

2010-04-08 Thread Steve Holdoway
On Fri, 2010-04-09 at 10:57 +1200, Bryce Stenberg wrote:
> 
> > -Original Message-
> > From: Wayne Rooney [mailto:wroo...@ihug.co.nz]
> > I don't think your /etc/fstab is quite right.  Can you post the file
> so we
> > can
> > see it.
> > 
> 
> 
> Fstab:
> 
> # /etc/fstab: static file system information.
> #
> # Use 'blkid -o value -s UUID' to print the universally unique
> identifier
> # for a device; this may be used with UUID= as a more robust way to name
> # devices that works even if disks are added and removed. See fstab(5).
> #
> #
> proc/proc   procdefaults0   0
> /dev/mapper/grp--a-root /   ext3errors=remount-ro 0
> 1
> 
> ## this is the row I added...
> /dev/mapper/grp--b-third /var   ext3defaults  0
> 2
> 
> LABEL=BootPart/boot   ext2defaults0   2
> /dev/mapper/grp--b-second /home   ext3defaults0
> 2
> 
> UUID=12f9e615-f44d-4392-bd81-457927f82142 noneswapsw
> 0   0
> /dev/scd0   /media/cdrom0   udf,iso9660 user,noauto,exec,utf8 0
> 0
> /dev/sde1 /media/usbdrive autorw,user,exec0   0
> 
> 
> Cheers, Bryce.
FOUND IT!

Was confused as I'v been mounting /var separately since it was split
away from /usr way back in the dark ages.

The last colum in fstab is marked pass. This defines in what order
partitions are mounted. You must mount /var in the first pass, as
software needs it there immediately. So change the root and /var pass
values to 0 and all should be well.

Once that is changed, there's no need for any of these weird /.var,
symbolic liks, etc solutions.

hth,

Steve


-- 
Steve Holdoway 
http://www.greengecko.co.nz
MSN: st...@greengecko.co.nz
GPG Fingerprint = B337 828D 03E1 4F11 CB90  853C C8AB AF04 EF68 52E0



Re: md RAID

2010-04-08 Thread Steve Holdoway
On Fri, 2010-04-09 at 10:33 +1200, Solor Vox wrote:

> writes a lot, then RAID10 is better.   However, RAID5 has better
> cost/GB ratio (N-1 vs N/2 for RAID10) and greatly out performs RAID10
> on reads.  
You state this as fact... I find it strange, both from theory and
experience. A random, fairly recent article ( yeah, it's not brilliant,
but... )

http://www.myhostnews.com/2008/09/optimizing-raid-performance-bencmarks/

suggests that, while RAID 5 may be fastest with sequential reads,
greatly is an exaggeration of the difference.

As I said before, you need to suck it and see with your own hardware
setup, and loading ( things like memory available for caching may make a
huge difference for example ).

Steve

-- 
Steve Holdoway 
http://www.greengecko.co.nz
MSN: st...@greengecko.co.nz
GPG Fingerprint = B337 828D 03E1 4F11 CB90  853C C8AB AF04 EF68 52E0



RE: Moving /var - problem with /var/lock and /var/run?

2010-04-08 Thread Bryce Stenberg


> -Original Message-
> From: Barry [mailto:barr...@paradise.net.nz]
>
> Could you move /var to where you want it, /new_pos/var,then make a
soft
> link to it from /var.Then alter fstab to mount the ptn, then
reboot
>

I'm not sure I understand - the new partition will mount as /var. What
is this 'soft link' thing, is it like a windows shortcut?

Cheers, Bryce.




DISCLAIMER: If you have received this email in error, please notify us 
immediately by reply email, facsimile or collect telephone call to +64 3 
9641200 and destroy the original.  Please refer to full DISCLAIMER at 
http://www.hrnz.co.nz/eDisclaimer.htm







RE: Moving /var - problem with /var/lock and /var/run?

2010-04-08 Thread Bryce Stenberg


> -Original Message-
> From: Wayne Rooney [mailto:wroo...@ihug.co.nz]
> I don't think your /etc/fstab is quite right.  Can you post the file
so we
> can
> see it.
>


Fstab:

# /etc/fstab: static file system information.
#
# Use 'blkid -o value -s UUID' to print the universally unique
identifier
# for a device; this may be used with UUID= as a more robust way to name
# devices that works even if disks are added and removed. See fstab(5).
#
#
proc/proc   procdefaults0   0
/dev/mapper/grp--a-root /   ext3errors=remount-ro 0
1

## this is the row I added...
/dev/mapper/grp--b-third /var   ext3defaults0
2

LABEL=BootPart  /boot   ext2defaults0   2
/dev/mapper/grp--b-second /home   ext3defaults0
2

UUID=12f9e615-f44d-4392-bd81-457927f82142 noneswapsw
0   0
/dev/scd0   /media/cdrom0   udf,iso9660 user,noauto,exec,utf8 0
0
/dev/sde1   /media/usbdrive autorw,user,exec0   0


Cheers, Bryce.




DISCLAIMER: If you have received this email in error, please notify us 
immediately by reply email, facsimile or collect telephone call to +64 3 
9641200 and destroy the original.  Please refer to full DISCLAIMER at 
http://www.hrnz.co.nz/eDisclaimer.htm







Re: md RAID

2010-04-08 Thread Solor Vox
Hi Bryce,

> My experience with RAID is all from windows - but it may translate to
> Linux.
> I would have ask why not use Hardware RAID (unless not available) so in
> the OS all your dealing with is a single disk setup rather than all this
> software RAID complication?
>
> As a side note on the Informix list I watch it is repeatedly said not to
> use RAID 5 if you can - explanation here:
> http://www.miracleas.com/BAARF/RAID5_versus_RAID10.txt
>
> Regards,
>  Bryce Stenberg.

While hardware controllers do often "hide" the values discussed, they
still apply and changing your partitions and file systems to align
with these will greatly improve performance.

Yes, people debate the RAID5 vs RAID10 all day.  RAID10 doesn't have
the "write hole" that RAID5 does, it can also lose up to N/2 drives
and still work,  rebuilding is much faster, and it gives better write
performance.   So for someone who need more security of data and
writes a lot, then RAID10 is better.   However, RAID5 has better
cost/GB ratio (N-1 vs N/2 for RAID10) and greatly out performs RAID10
on reads.  Much of my data (2TB of DVDs) will be read only.  So as
always, use the tool that works best for the job at hand.  I'm not
saying RAID5 is always better then RAID10, just that it's best tool
for my needs in this setup.

Cheers,
sV


RE: md RAID

2010-04-08 Thread Bryce Stenberg
> -Original Message-
> From: Solor Vox [mailto:solor...@gmail.com]
> Subject: md RAID
>
> If you're still here, I've been trying to work out the optimal chunk
> size, stripe width, and stride for a 6TB RAID-5 array I'm building.
>

My experience with RAID is all from windows - but it may translate to
Linux.
I would have ask why not use Hardware RAID (unless not available) so in
the OS all your dealing with is a single disk setup rather than all this
software RAID complication?

As a side note on the Informix list I watch it is repeatedly said not to
use RAID 5 if you can - explanation here:
http://www.miracleas.com/BAARF/RAID5_versus_RAID10.txt

Regards,
  Bryce Stenberg.




DISCLAIMER: If you have received this email in error, please notify us 
immediately by reply email, facsimile or collect telephone call to +64 3 
9641200 and destroy the original.  Please refer to full DISCLAIMER at 
http://www.hrnz.co.nz/eDisclaimer.htm







Re: md RAID

2010-04-08 Thread Solor Vox
On 9 April 2010 10:07, Steve Holdoway  wrote:
> I note that most* of these NAS boxes use xfs, although that is the only
> file system that has completely blown up in my face in the last 10
> years!
>
> Steve
> * OK, I've only seen about half a dozen of them (:

Have to admit that I've heard some bad things about xfs.  But then
again, I've had reiserfs3 (at the time) blow up on me more then once
as well.  Reiserfs seems to shine with lots of small files, perfect
for say gentoo portage trees.  But for larger files, tests I've seen
show that ext4 with the new extents and xfs are much better.  I don't
know much about jfs, so I'll look into it.

Why did you stay away from ext4?  I could see xfs...

Also, did you guys use LVM?  And did you align all four layers?  (md,
pv, lv, fs)

sV


Re: md RAID

2010-04-08 Thread Steve Holdoway
On Fri, 2010-04-09 at 09:47 +1200, Hadley Rich wrote:
> On Fri, 2010-04-09 at 09:39 +1200, Craig Falconer wrote:
> > Another variable here is fsck time.  We found jfs to have the most 
> > consistent fsck times (not the shortest, but never the longest)
> > However that was for backup drives with lots of files.
> 
> JFS is my favourite.
> 
I note that most* of these NAS boxes use xfs, although that is the only
file system that has completely blown up in my face in the last 10
years!

Steve
* OK, I've only seen about half a dozen of them (:

-- 
Steve Holdoway 
http://www.greengecko.co.nz
MSN: st...@greengecko.co.nz
GPG Fingerprint = B337 828D 03E1 4F11 CB90  853C C8AB AF04 EF68 52E0



Re: md RAID

2010-04-08 Thread Hadley Rich
On Fri, 2010-04-09 at 09:39 +1200, Craig Falconer wrote:
> Another variable here is fsck time.  We found jfs to have the most 
> consistent fsck times (not the shortest, but never the longest)
> However that was for backup drives with lots of files.

JFS is my favourite.

-- 
http://nicegear.co.nz
New Zealand's Open Source Hardware Supplier



Re: md RAID

2010-04-08 Thread Solor Vox
Hi Steve,

> 1. I wouldn't touch ext4 for this.

Why?

> 2. What about reiser4?

Reiser is much better for smaller files, where ext4 (extents) and xfs
are much better for larger files like I'm using.


> 3. PARTITIONING! Having just lived through it, watch out for the newer
> ( WD only?? ) disks with 4kB sectors but don't report it. That brought
> throughput down to < 1MB/sec.

I think you mean alignment of the partition, which is kind of what I'm
talking about.  But in your case your drive had 4KB sectors vs typical
512B.

> 4. If your primary intention is performance ( rather than getting the
> best of all worlds ), why not RAID10? IMO disks are too cheap to worry
> with RAID5. ( 1.5TB is certainly the sweet spot pricewise, but most
> mobos have 6 SATA slots )

I would agree that RAID10 is much better for write performance.  But
for read and cost, RAID5 does outperforms RAID10.  And adding a drive
to RAID5 only increases the cost and read performance ratio.  I do
like RAID10's ability to lose up to N/2 drives and write performance
though.  Plus I have six ports, but need the last port for the
DVD/Blu-ray drive as they seem to be SATA only.

>
> I would certainly do some basic testing, as the best answer will depend
> on the hardware you choose, and the mix of sizes of the files you wish
> to serve. I have had poor performance from some mobos and SATA ( ATI
> Technologies Inc SB700/SB800 SATA Controller as an example ) drivers, so
> some research is a good idea.
>

Yeah, I just was trying not to spend a week rebuilding and testing the array.

Cheers,
sV


Re: md RAID

2010-04-08 Thread Craig Falconer

Solor Vox wrote, On 04/09/2010 09:16 AM:

So for argument's sake, lets say that of the
usable 4.5TB, 4TB is for large 8GB and up files.  I also plan on
either ext4 or xfs.


Another variable here is fsck time.  We found jfs to have the most 
consistent fsck times (not the shortest, but never the longest)  However 
that was for backup drives with lots of files.



While this all may seem like a bit much, getting it right can mean an
extra 30-50MB/s or more from the array.  So, has anyone done this type
of optimization?  I'd really rather not spend a week(s) testing
different values as 6TB arrays can take several hours to build.


You've really got no option but to test.

I suggest you create a test regime that creates and destroys raids, and 
tests them.   Your tests don't need to be full sized, but you'd have to 
wait for the md to finish synching.


We go with raid1 with some minor exceptions which are raid5.  For them, 
we found the defaults is "good enough"  unless you push numbers right 
out to the ends, where performance drops off massively.


Even the default settings should be enough to saturate gig ethernet.


--
Craig Falconer


Re: md RAID

2010-04-08 Thread Steve Holdoway
On Fri, 2010-04-09 at 09:16 +1200, Solor Vox wrote:
> Hey all,
> 
> I'm going to warn you beforehand and say that this message is
> technical and academic discussions of the inner-workings of md-RAID
> and file systems.  If you haven't had your morning coffee or don't
> want a headache, please stop reading now. :)
> 
> If you're still here, I've been trying to work out the optimal chunk
> size, stripe width, and stride for a 6TB RAID-5 array I'm building.
> 
> For hardware, I've got 4x1.5TB Samsung SATA2 drives.  I'm going to use
> Linux md in RAID-5 configuration.  Primary use for this box is HD
> video and DVD  storage.  So for argument's sake, lets say that of the
> usable 4.5TB, 4TB is for large 8GB and up files.  I also plan on
> either ext4 or xfs.
> 
> One last thing to get out of the way is meaning of all the block
> sizes.  Unfortunately, people tend to use “block size” to mean many
> different things.  So to prevent this, I'm going to use the following.
> 
> Stride – number of bytes written to disk before moving to next in array.
> Stripe width – stride size * data disks in array, so 3 in my case.
> Chunk size – File system “block size” or bytes per inode.
> Page size – Linux kernel cache page size, almost always 4KB on x86 hardware
> 
> Now comes the fun part, picking the correct values for creating the
> array and file-system.  The arguments for this are very academic and
> very specific for intended use.  Typically most people try for
> “position” optimization by picking a FS chunk size that matches the
> RAID stripe width.  By matching the array, you reduce the number of
> read/writes to access each file.  While this works in theory, you
> can't ensure that the stripe is written perfectly across the array.
> And unless your chunk size matches your page size, the operation isn't
> atomic anyway.
> 
> The other method is “transfer” optimization where you make the FS
> chunk sizes smaller ensuring that files are broken up across the
> array.  The theory here is that using more then one drive at a time to
> read the file will increase transfer performance.  This however
> increases the number of read/write operations needed for the same size
> file with larger chunks.
> 
> Things get even more fun when LVM is thrown into the mix.  As LVM will
> create a physical volume that contains logical volumes.  The FS is
> then put on the LV so trying to align the FS to the array no longer
> makes sense.  You can set the metasize for PV so it is aligned with
> the array.  So the assumption here is that the FS should be aligned
> with the PV.
> 
> While this all may seem like a bit much, getting it right can mean an
> extra 30-50MB/s or more from the array.  So, has anyone done this type
> of optimization?  I'd really rather not spend a week(s) testing
> different values as 6TB arrays can take several hours to build.
> 
> Cheers,
> sV
> 
Just to throw a bit more into the mix...

1. I wouldn't touch ext4 for this.
2. What about reiser4?
3. PARTITIONING! Having just lived through it, watch out for the newer
( WD only?? ) disks with 4kB sectors but don't report it. That brought
throughput down to < 1MB/sec.
4. If your primary intention is performance ( rather than getting the
best of all worlds ), why not RAID10? IMO disks are too cheap to worry
with RAID5. ( 1.5TB is certainly the sweet spot pricewise, but most
mobos have 6 SATA slots )

I would certainly do some basic testing, as the best answer will depend
on the hardware you choose, and the mix of sizes of the files you wish
to serve. I have had poor performance from some mobos and SATA ( ATI
Technologies Inc SB700/SB800 SATA Controller as an example ) drivers, so
some research is a good idea.

Enjoy your weekend!

Steve



-- 
Steve Holdoway 
http://www.greengecko.co.nz
MSN: st...@greengecko.co.nz
GPG Fingerprint = B337 828D 03E1 4F11 CB90  853C C8AB AF04 EF68 52E0



md RAID

2010-04-08 Thread Solor Vox
Hey all,

I'm going to warn you beforehand and say that this message is
technical and academic discussions of the inner-workings of md-RAID
and file systems.  If you haven't had your morning coffee or don't
want a headache, please stop reading now. :)

If you're still here, I've been trying to work out the optimal chunk
size, stripe width, and stride for a 6TB RAID-5 array I'm building.

For hardware, I've got 4x1.5TB Samsung SATA2 drives.  I'm going to use
Linux md in RAID-5 configuration.  Primary use for this box is HD
video and DVD  storage.  So for argument's sake, lets say that of the
usable 4.5TB, 4TB is for large 8GB and up files.  I also plan on
either ext4 or xfs.

One last thing to get out of the way is meaning of all the block
sizes.  Unfortunately, people tend to use “block size” to mean many
different things.  So to prevent this, I'm going to use the following.

Stride – number of bytes written to disk before moving to next in array.
Stripe width – stride size * data disks in array, so 3 in my case.
Chunk size – File system “block size” or bytes per inode.
Page size – Linux kernel cache page size, almost always 4KB on x86 hardware

Now comes the fun part, picking the correct values for creating the
array and file-system.  The arguments for this are very academic and
very specific for intended use.  Typically most people try for
“position” optimization by picking a FS chunk size that matches the
RAID stripe width.  By matching the array, you reduce the number of
read/writes to access each file.  While this works in theory, you
can't ensure that the stripe is written perfectly across the array.
And unless your chunk size matches your page size, the operation isn't
atomic anyway.

The other method is “transfer” optimization where you make the FS
chunk sizes smaller ensuring that files are broken up across the
array.  The theory here is that using more then one drive at a time to
read the file will increase transfer performance.  This however
increases the number of read/write operations needed for the same size
file with larger chunks.

Things get even more fun when LVM is thrown into the mix.  As LVM will
create a physical volume that contains logical volumes.  The FS is
then put on the LV so trying to align the FS to the array no longer
makes sense.  You can set the metasize for PV so it is aligned with
the array.  So the assumption here is that the FS should be aligned
with the PV.

While this all may seem like a bit much, getting it right can mean an
extra 30-50MB/s or more from the array.  So, has anyone done this type
of optimization?  I'd really rather not spend a week(s) testing
different values as 6TB arrays can take several hours to build.

Cheers,
sV


Re: Moving /var - problem with /var/lock and /var/run?

2010-04-08 Thread Barry

Wayne Rooney wrote:

On Thursday 08 April 2010 09:42:29 Bryce Stenberg wrote:


Now when booting I get:

mount: mount point /dev/.var/run does not exist
montall: mount /var/run [700] terminated with status 32

and

mount: mount point /dev/.var/lock does not exist
mountall: mount /var/lock [700] terminated with status 32


I don't think your /etc/fstab is quite right.  Can you post the file so we can 
see it.


Wayne

Could you move /var to where you want it, /new_pos/var,then make a soft 
link to it from /var.Then alter fstab to mount the ptn, then reboot


Barry


Re: Moving /var - problem with /var/lock and /var/run?

2010-04-08 Thread Wayne Rooney
On Thursday 08 April 2010 09:42:29 Bryce Stenberg wrote:

> Now when booting I get:
>
> mount: mount point /dev/.var/run does not exist
> montall: mount /var/run [700] terminated with status 32
>
> and
>
> mount: mount point /dev/.var/lock does not exist
> mountall: mount /var/lock [700] terminated with status 32

I don't think your /etc/fstab is quite right.  Can you post the file so we can 
see it.

Wayne