Re: [ceph-users] Ceph instead of RAID
On Tue, Aug 13, 2013 at 10:41:53AM -0500, Mark Nelson wrote: Hi Mark, On 08/13/2013 02:56 AM, Dmitry Postrigan wrote: I am currently installing some backup servers with 6x3TB drives in them. I played with RAID-10 but I was not impressed at all with how it performs during a recovery. Anyway, I thought what if instead of RAID-10 I use ceph? All 6 disks will be local, so I could simply create 6 local OSDs + a monitor, right? Is there anything I need to watch out for in such configuration? You can do that. Although it's nice to play with and everything, I wouldn't recommend doing it. It will give you more pain than pleasure. Any specific reason? I just got it up and running, an after simulating some failures, I like it much better than mdraid. Again, this only applies to large arrays (6x3TB in my case). I would not use ceph to replace a RAID-1 array of course, but it looks like a good idea to replace a large RAID10 array with a local ceph installation. The only thing I do not enjoy about ceph is performance. Probably need to do more tweaking, but so far numbers are not very impressive. I have two exactly same servers running same OS, kernel, etc. Each server has 6x 3TB drives (same model and firmware #). Server 1 runs ceph (2 replicas) Server 2 runs mdraid (raid-10) I ran some very basic benchmarks on both servers: dd if=/dev/zero of=/storage/test.bin bs=1M count=10 Ceph: 113 MB/s mdraid: 467 MB/s dd if=/storage/test.bin of=/dev/null bs=1M Ceph: 114 MB/s mdraid: 550 MB/s As you can see, mdraid is by far faster than ceph. It could be by design, or perhaps I am not doing it right. Even despite such difference in speed, I would still go with ceph because *I think* it is more reliable. couple of things: 1) Ceph is doing full data journal writes so is going to eat (at least) half of your write performance right there. 2) Ceph tends to like lots of concurrency. You'll probably see higher numbers with multiple dd reads/writes going at once. 3) Ceph is a lot more complex than something like mdraid. It gives you a lot more power and flexibility but the cost is greater complexity. There are probably things you can tune to get your numbers up, but it could take some work. Having said all of this, my primary test box is a single server and I can get 90MB/s+ per drive out of Ceph (with 24 drives!), but if I Could you share the configurations and parameters you have modified, or where I could find the associate documents? was building a production box and never planned to expand to multiple servers, I'd certainly be looking into zfs or btrfs RAID. Mark Dmitry ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Best regards, Guangliang ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph instead of RAID
On 08/13/2013 09:23 AM, Jeffrey 'jf' Lim wrote: Anyway, I thought what if instead of RAID-10 I use ceph? All 6 disks will be local, so I could simply create 6 local OSDs + a monitor, right? Is there anything I need to watch out for in such configuration? You can do that. Although it's nice to play with and everything, I wouldn't recommend doing it. It will give you more pain than pleasure. How so? Care to elaborate? Ceph is a complex system, built for clusters. It does some stuff in software that is otherwhise done in hardware (raid controllers). The nature of the complexity of a cluster system is a lot of overhead compared to a local raid [whatever] system, and latency of disk i/o will naturally suffer a bit. An OSD needs about 300 MB of RAM (may vary on your PGs), times 6 is a waste of nearly 2 GB of RAM (compared to a local RAID). Also ceph is young, and it does indeed have some bugs. RAID is old, and very mature. Although I rely on ceph on a productive cluster, too, it is way harder to maintain than a simple local raid. When a disk fails in ceph you don't have to worry about your data, which is a good thing, but you have to worry about the rebuilding (which isn't too hard, but at least you need to know SOMETHING about ceph), with (hardware) RAID you simply replace the disk, and it will be rebuilt. Others will find more reasons why this is not the best idea for a production system. Don't get me wrong, I'm a big supporter of ceph, but only for clusters, not for single systems. wogri -jf -- He who settles on the idea of the intelligent man as a static entity only shows himself to be a fool. Every nonfree program has a lord, a master -- and if you use the program, he is your master. --Richard Stallman -- DI (FH) Wolfgang Hennerbichler Software Development Unit Advanced Computing Technologies RISC Software GmbH A company of the Johannes Kepler University Linz IT-Center Softwarepark 35 4232 Hagenberg Austria Phone: +43 7236 3343 245 Fax: +43 7236 3343 250 wolfgang.hennerbich...@risc-software.at http://www.risc-software.at ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph instead of RAID
This will be a single server configuration, the goal is to replace mdraid, hence I tried to use localhost (nothing more will be added to the cluster). Are you saying it will be less fault tolerant than a RAID-10? Ceph is a distributed object store. If you stay within a single machine, keep using a local RAID solution (hardware or software). Why would you want to make this switch? I do not think RAID-10 on 6 3TB disks is going to be reliable at all. I have simulated several failures, and it looks like a rebuild will take a lot of time. Funnily, during one of these experiments, another drive failed, and I had lost the entire array. Good luck recovering from that... I feel that Ceph is better than mdraid because: 1) When ceph cluster is far from being full, 'rebuilding' will be much faster vs mdraid 2) You can easily change the number of replicas 3) When multiple disks have bad sectors, I suspect ceph will be much easier to recover data from than from mdraid which will simply never finish rebuilding. 4) If we need to migrate data over to a different server with no downtime, we just add more OSDs, wait, and then remove the old ones :-) This is my initial observation though, so please correct me if I am wrong. Dmitry ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph instead of RAID
On 08/13/2013 09:47 AM, Dmitry Postrigan wrote: Why would you want to make this switch? I do not think RAID-10 on 6 3TB disks is going to be reliable at all. I have simulated several failures, and it looks like a rebuild will take a lot of time. Funnily, during one of these experiments, another drive failed, and I had lost the entire array. Good luck recovering from that... good point. I feel that Ceph is better than mdraid because: 1) When ceph cluster is far from being full, 'rebuilding' will be much faster vs mdraid true 2) You can easily change the number of replicas true 3) When multiple disks have bad sectors, I suspect ceph will be much easier to recover data from than from mdraid which will simply never finish rebuilding. maybe not true. also if you have one disk that is starting to be slow (because of upcoming failure), ceph will slow down drastically, and you need to find the failing disk. 4) If we need to migrate data over to a different server with no downtime, we just add more OSDs, wait, and then remove the old ones :-) true. but maybe not as easy and painless as you would expect it to be. also bear in mind that ceph needs a monitor up and running all time. This is my initial observation though, so please correct me if I am wrong. ceph is easier to maintain than most distributed systems I know, but still harder than a local RAID. Keep that in mind. Dmitry Wolfgang ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- DI (FH) Wolfgang Hennerbichler Software Development Unit Advanced Computing Technologies RISC Software GmbH A company of the Johannes Kepler University Linz IT-Center Softwarepark 35 4232 Hagenberg Austria Phone: +43 7236 3343 245 Fax: +43 7236 3343 250 wolfgang.hennerbich...@risc-software.at http://www.risc-software.at ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph instead of RAID
On 08/13/2013 02:56 AM, Dmitry Postrigan wrote: I am currently installing some backup servers with 6x3TB drives in them. I played with RAID-10 but I was not impressed at all with how it performs during a recovery. Anyway, I thought what if instead of RAID-10 I use ceph? All 6 disks will be local, so I could simply create 6 local OSDs + a monitor, right? Is there anything I need to watch out for in such configuration? You can do that. Although it's nice to play with and everything, I wouldn't recommend doing it. It will give you more pain than pleasure. Any specific reason? I just got it up and running, an after simulating some failures, I like it much better than mdraid. Again, this only applies to large arrays (6x3TB in my case). I would not use ceph to replace a RAID-1 array of course, but it looks like a good idea to replace a large RAID10 array with a local ceph installation. The only thing I do not enjoy about ceph is performance. Probably need to do more tweaking, but so far numbers are not very impressive. I have two exactly same servers running same OS, kernel, etc. Each server has 6x 3TB drives (same model and firmware #). Server 1 runs ceph (2 replicas) Server 2 runs mdraid (raid-10) I ran some very basic benchmarks on both servers: dd if=/dev/zero of=/storage/test.bin bs=1M count=10 Ceph: 113 MB/s mdraid: 467 MB/s dd if=/storage/test.bin of=/dev/null bs=1M Ceph: 114 MB/s mdraid: 550 MB/s As you can see, mdraid is by far faster than ceph. It could be by design, or perhaps I am not doing it right. Even despite such difference in speed, I would still go with ceph because *I think* it is more reliable. couple of things: 1) Ceph is doing full data journal writes so is going to eat (at least) half of your write performance right there. 2) Ceph tends to like lots of concurrency. You'll probably see higher numbers with multiple dd reads/writes going at once. 3) Ceph is a lot more complex than something like mdraid. It gives you a lot more power and flexibility but the cost is greater complexity. There are probably things you can tune to get your numbers up, but it could take some work. Having said all of this, my primary test box is a single server and I can get 90MB/s+ per drive out of Ceph (with 24 drives!), but if I was building a production box and never planned to expand to multiple servers, I'd certainly be looking into zfs or btrfs RAID. Mark Dmitry ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph instead of RAID
Hi, I'd just like to echo what Wolfgang said about ceph being a complex system. I initially started out testing ceph with a setup much like yours. And while it overall performed ok, it was not as good as sw raid on the same machine. Also, as Mark said you'll have at very best half write speeds because of how the journaling works if you do larger continuous writes. Ceph really shines with multiple servers multiple concurrency. My testmachine was running for ½ a year+ (going from argonaut - cuttlefish) and in that process I came to realize that mixing types of disk (and size) was a bad idea (some enterprise SATA, some fast desktop and some green disks) - as speed will be determined by the slowest drive in your setup (that's why they're advocating using similar hw if at all possible I guess). I also experienced all the challenging issues having to deal with a very young technology; osds suddenly refusing to start, pg's going into various incomplete/down/inconsistent states, monitor leveldb running full, monitor dying at weird times and well - I think it is good for a learning experience, but like Wolfgang said I think it is too much hassle for too little gain when you have something like raid10/zfs around. But, by all means, don't let us discourage you if you want to go this route - ceph's unique self-healing ability was what drew me into running a single machine in the first place. Cheers, Martin On Tue, Aug 13, 2013 at 9:32 AM, Wolfgang Hennerbichler wolfgang.hennerbich...@risc-software.at wrote: On 08/13/2013 09:23 AM, Jeffrey 'jf' Lim wrote: Anyway, I thought what if instead of RAID-10 I use ceph? All 6 disks will be local, so I could simply create 6 local OSDs + a monitor, right? Is there anything I need to watch out for in such configuration? You can do that. Although it's nice to play with and everything, I wouldn't recommend doing it. It will give you more pain than pleasure. How so? Care to elaborate? Ceph is a complex system, built for clusters. It does some stuff in software that is otherwhise done in hardware (raid controllers). The nature of the complexity of a cluster system is a lot of overhead compared to a local raid [whatever] system, and latency of disk i/o will naturally suffer a bit. An OSD needs about 300 MB of RAM (may vary on your PGs), times 6 is a waste of nearly 2 GB of RAM (compared to a local RAID). Also ceph is young, and it does indeed have some bugs. RAID is old, and very mature. Although I rely on ceph on a productive cluster, too, it is way harder to maintain than a simple local raid. When a disk fails in ceph you don't have to worry about your data, which is a good thing, but you have to worry about the rebuilding (which isn't too hard, but at least you need to know SOMETHING about ceph), with (hardware) RAID you simply replace the disk, and it will be rebuilt. Others will find more reasons why this is not the best idea for a production system. Don't get me wrong, I'm a big supporter of ceph, but only for clusters, not for single systems. wogri -jf -- He who settles on the idea of the intelligent man as a static entity only shows himself to be a fool. Every nonfree program has a lord, a master -- and if you use the program, he is your master. --Richard Stallman -- DI (FH) Wolfgang Hennerbichler Software Development Unit Advanced Computing Technologies RISC Software GmbH A company of the Johannes Kepler University Linz IT-Center Softwarepark 35 4232 Hagenberg Austria Phone: +43 7236 3343 245 Fax: +43 7236 3343 250 wolfgang.hennerbich...@risc-software.at http://www.risc-software.at ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Ceph instead of RAID
Hello community, I am currently installing some backup servers with 6x3TB drives in them. I played with RAID-10 but I was not impressed at all with how it performs during a recovery. Anyway, I thought what if instead of RAID-10 I use ceph? All 6 disks will be local, so I could simply create 6 local OSDs + a monitor, right? Is there anything I need to watch out for in such configuration? Another thing. I am using ceph-deploy and I have noticed that when I do this: ceph-deploy --verbose new localhost the ceph.conf file is created in the current folder instead of /etc. Is this normal? Also, in the ceph.conf there's a line: mon host = ::1 Is this normal or I need to change this to point to localhost? Thanks for any feedback on this. Dmitry ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph instead of RAID
On 08/12/2013 06:49 PM, Dmitry Postrigan wrote: Hello community, I am currently installing some backup servers with 6x3TB drives in them. I played with RAID-10 but I was not impressed at all with how it performs during a recovery. Anyway, I thought what if instead of RAID-10 I use ceph? All 6 disks will be local, so I could simply create 6 local OSDs + a monitor, right? Is there anything I need to watch out for in such configuration? I mean, you can certainly do that. 1 mon and all OSDs on one server is not particularly fault-tolerant, perhaps, but if you have multiple such servers in the cluster, sure, why not? Another thing. I am using ceph-deploy and I have noticed that when I do this: ceph-deploy --verbose new localhost the ceph.conf file is created in the current folder instead of /etc. Is this normal? Yes. ceph-deploy also distributes ceph.conf where it needs to go. Also, in the ceph.conf there's a line: mon host = ::1 Is this normal or I need to change this to point to localhost? You want to configure the machines such that they have resolvable 'real' IP addresses: http://ceph.com/docs/master/start/quick-start-preflight/#hostname-resolution Thanks for any feedback on this. Dmitry ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Dan Mick, Filesystem Engineering Inktank Storage, Inc. http://inktank.com Ceph docs: http://ceph.com/docs ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com