It really sounds like you're looking for a better RAID system, not a
distributed storage system.
I've been using ZFS on FreeBSD for years. The Linux port meets nearly
all of your needs, while acting more like a conventional software RAID.
BtrFS also has a lot of these features, but I'm not familiar enough to
advocate for it.
I feel that Ceph is better than mdraid because:
1) When ceph cluster is far from being full, 'rebuilding' will be much faster
vs mdraid
ZFS only rebuilds allocated parts of the disk, same as Ceph.
2) You can easily change the number of replicas
This is not as straight forwards, but it is available. ZFS gives
several different RAID-like levels, and it lets you control the number
of copies you keep on disk. So you can create something that looks like
RAID10 (stripes of mirrors), or a RAID5+. With 6 disks, I'd go RAIDZ-2
(2 parity disks, for ~12TB usable). RAIDZ-2 is than RAID10-like (in my
PostgreSQL benchmarks, YMMV), and safer. With 2 parity disks, you'd
have to lose 3 disks to lose data. Just keep in mind that ZFS is not
RAID, just RAID-like. I still call the volumes a RAID10 or RAID5, but
the analogy breaks down below the volume level.
If you have really important data, you can also tell it to keep 2 (or
more) copies of the file, regardless of type of RAID. You can set that
replica policy per file, or per filesystem.
3) When multiple disks have bad sectors, I suspect ceph will be much easier to
recover data from than from
mdraid which will simply never finish rebuilding.
ZFS checksums every block. If you're using RAID10-like, it will recover
blocks that failed the checksum from the mirror. If you're using
RAID5-like, it will rebuild from parity. Because it has a checksum of
every block, it only rebuilds the failed ones. It does have to checksum
every block to find the failed once though. My 10TB volumes takes about
12 hours to replace a failed 2TB disk.
4) If we need to migrate data over to a different server with no downtime, we
just add more OSDs, wait, and
then remove the old ones :-)
ZFS snapshot && ZFS send. It's not completely online, but I've moved
5TB to a new server with a 5 minute outage window (pre-copy all the
data, shutdown, send a final snapshot, flip the clients to the new server).
If you can't tell, I'm a big fan of ZFS. I'm hoping to run my dev Ceph
cluster on ZFS soon.
*Craig Lewis*
Senior Systems Engineer
Office +1.714.602.1309
Email cle...@centraldesktop.com <mailto:cle...@centraldesktop.com>
*Central Desktop. Work together in ways you never thought possible.*
Connect with us Website <http://www.centraldesktop.com/> | Twitter
<http://www.twitter.com/centraldesktop> | Facebook
<http://www.facebook.com/CentralDesktop> | LinkedIn
<http://www.linkedin.com/groups?gid=147417> | Blog
<http://cdblog.centraldesktop.com/>
On 8/13/13 00:47 , Dmitry Postrigan wrote:
This will be a single server configuration, the goal is to replace mdraid,
hence I tried to use localhost
(nothing more will be added to the cluster). Are you saying it will be less
fault tolerant than a RAID-10?
Ceph is a distributed object store. If you stay within a single machine,
keep using a local RAID solution (hardware or software).
Why would you want to make this switch?
I do not think RAID-10 on 6 3TB disks is going to be reliable at all. I have
simulated several failures, and
it looks like a rebuild will take a lot of time. Funnily, during one of these
experiments, another drive
failed, and I had lost the entire array. Good luck recovering from that...
I feel that Ceph is better than mdraid because:
1) When ceph cluster is far from being full, 'rebuilding' will be much faster
vs mdraid
2) You can easily change the number of replicas
3) When multiple disks have bad sectors, I suspect ceph will be much easier to
recover data from than from
mdraid which will simply never finish rebuilding.
4) If we need to migrate data over to a different server with no downtime, we
just add more OSDs, wait, and
then remove the old ones :-)
This is my initial observation though, so please correct me if I am wrong.
Dmitry
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com