Right, it seems that using individual disks without RAID although possible 
isn't a good idea because of the non automation of disk replacement. Also there 
would be a problem with the maximum filesize.

Going to the idea of using RAID controllers would you think that for say 16 
disks(or 12) Raid 5 would be fine  given the data is already replicated 
somewhere in another node in a very unlikely event you loose a node.
Now in a node with more number of disk slots could create multiple Raid 5 
logical volumes, but will Gluster be smart enough to not put replicated data on 
two logical volumes residing on the same node ?

I don't even consider using RAID 10 as that would be a big waste of space 
because as the data is already replicated between nodes, having it replicated 
on the disks it would drop the usable space to 1/4 of the Raw. If I have 
latency sensitive applications I wouldn't probably use Gluster for that, but 
something else. For hosting non performance intensive applications I think 
Gluster is fine. Also in a medium sized cluster it would give a good throughput 
when running backups for example.
But bottom line the maximum performance you get from a single file is what a 
single RAID logical volume where the file resides can do.

Regards,

Fernando

-----Original Message-----
From: Brian Candler [mailto:b.cand...@pobox.com] 
Sent: 14 June 2012 14:55
To: Fernando Frediani (Qube)
Cc: 'gluster-users@gluster.org'
Subject: Re: [Gluster-users] RAID options for Gluster

On Thu, Jun 14, 2012 at 11:06:32AM +0000, Fernando Frediani (Qube) wrote:
>    No RAID (individual hot swappable disks):
> 
>    Each disk is a brick individually (server:/disk1, server:/disk2, etc)
>    so no RAID controller is required. As the data is replicated if one
>    fail the data must exist in another disk on another node.
> 
>    Pros:
> 
>    Cheaper to build as there is no cost for a expensive RAID controller.

Except that software (md) RAID is free and works with a HBA.

>    Improved performance as writes have to be done only on a single disk
>    not in the entire RAID5/6 Array.
> 
>    Make better usage of the Raw space as there is no disk for parity on a
>    RAID 5/6
> 
> 
>    Cons:
> 
>    If a failed disk gets replaced the data need to be replicated over the
>    network (not a big deal if using Infiniband or 1Gbps+ Network)
> 
>    The biggest file size is the size of one disk if using a volume type
>    Distributed.

Additional Cons:

* You will probably need to write your own tools to monitor and notify you when 
a disk fails in the array (wherease there are easily-available existing tools 
for md RAID, including E-mail notifications and SNMP integration)

* The process of swapping a disk is not a simple hot-swap: you need to replace 
the failed drive, mkfs a new filesystem, and re-introduce it into the gluster 
volume.  This is something you will need to document procedures for and test 
carefully, whereas RAID swaps are relatively no-brainer.

* For a large configuration with hundreds of drives, it can become ungainly to 
have a gluster volume with hundreds of bricks.

>    RAID doesn’t scale well beyond ~16 disks

But you can groups your disks into multiple RAID volumes.

>    Attaching a JBOD to a node and creating multiple RAID Arrays(or a
>    single server with more disk slots) instead of adding a new node can
>    save power(no need CPU, Memory, Motherboard), but having multiple
>    bricks on the same node might happen the data is replicated inside the
>    same node making the downtime of a node something critical, or does
>    Gluster is smart to replicate data to a brick in a different node ?

It's not automatic, you configure it explicitly. If your replica count is 2 
then you give it pairs of bricks, and data will be replicated onto each brick 
in the pair. It's your responsibility to ensure that those two bricks are on 
different servers, if high availability is your concern.

Another alternative to consider: RAID10 on each node. Eliminates the 
performance penalty of RAID5/6, indeed will give you improved read performance 
compared to single disks, but halves your available storage capacity.

You can of course mix-and-match. e.g. RAID5 for backup volumes; RAID10 for 
highly active read/write volumes; some gluster volumes are replicated and some 
are not, etc.  This can become a management headache if it gets too complex 
though.
_______________________________________________
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users

Reply via email to