Re: [ceph-users] Ceph instead of RAID

2013-09-17 Thread Guangliang Zhao
On Tue, Aug 13, 2013 at 10:41:53AM -0500, Mark Nelson wrote:

Hi Mark,

 On 08/13/2013 02:56 AM, Dmitry Postrigan wrote:
 I am currently installing some backup servers with 6x3TB drives in them. I 
 played with RAID-10 but I was not
 impressed at all with how it performs during a recovery.
 
 Anyway, I thought what if instead of RAID-10 I use ceph? All 6 disks will 
 be local, so I could simply create
 6 local OSDs + a monitor, right? Is there anything I need to watch out for 
 in such configuration?
 
 You can do that. Although it's nice to play with and everything, I
 wouldn't recommend doing it. It will give you more pain than pleasure.
 
 Any specific reason? I just got it up and running, an after simulating some 
 failures, I like it much better than
 mdraid. Again, this only applies to large arrays (6x3TB in my case). I would 
 not use ceph to replace a RAID-1
 array of course, but it looks like a good idea to replace a large RAID10 
 array with a local ceph installation.
 
 The only thing I do not enjoy about ceph is performance. Probably need to do 
 more tweaking, but so far numbers
 are not very impressive. I have two exactly same servers running same OS, 
 kernel, etc. Each server has 6x 3TB
 drives (same model and firmware #).
 
 Server 1 runs ceph (2 replicas)
 Server 2 runs mdraid (raid-10)
 
 I ran some very basic benchmarks on both servers:
 
 dd if=/dev/zero of=/storage/test.bin bs=1M count=10
 Ceph: 113 MB/s
 mdraid: 467 MB/s
 
 
 dd if=/storage/test.bin of=/dev/null bs=1M
 Ceph: 114 MB/s
 mdraid: 550 MB/s
 
 
 As you can see, mdraid is by far faster than ceph. It could be by design, 
 or perhaps I am not doing it
 right. Even despite such difference in speed, I would still go with ceph 
 because *I think* it is more reliable.
 
 couple of things:
 
 1) Ceph is doing full data journal writes so is going to eat (at
 least) half of your write performance right there.
 
 2) Ceph tends to like lots of concurrency.  You'll probably see
 higher numbers with multiple dd reads/writes going at once.
 
 3) Ceph is a lot more complex than something like mdraid.  It gives
 you a lot more power and flexibility but the cost is greater
 complexity. There are probably things you can tune to get your
 numbers up, but it could take some work.
 
 Having said all of this, my primary test box is a single server and
 I can get 90MB/s+ per drive out of Ceph (with 24 drives!), but if I

Could you share the configurations and parameters you have modified, or
where I could find the associate documents? 

 was building a production box and never planned to expand to
 multiple servers, I'd certainly be looking into zfs or btrfs RAID.
 
 Mark
 
 
 Dmitry
 
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 
 
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

-- 
Best regards,
Guangliang
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph instead of RAID

2013-08-13 Thread Wolfgang Hennerbichler


On 08/13/2013 09:23 AM, Jeffrey 'jf' Lim wrote:
 Anyway, I thought what if instead of RAID-10 I use ceph? All 6 disks will 
 be local, so I could simply create
 6 local OSDs + a monitor, right? Is there anything I need to watch out for 
 in such configuration?

 You can do that. Although it's nice to play with and everything, I
 wouldn't recommend doing it. It will give you more pain than pleasure.
 
 How so? Care to elaborate?

Ceph is a complex system, built for clusters. It does some stuff in
software that is otherwhise done in hardware (raid controllers). The
nature of the complexity of a cluster system is a lot of overhead
compared to a local raid [whatever] system, and latency of disk i/o will
naturally suffer a bit. An OSD needs about 300 MB of RAM (may vary on
your PGs), times 6 is a waste of nearly 2 GB of RAM (compared to a
local RAID). Also ceph is young, and it does indeed have some bugs. RAID
is old, and very mature. Although I rely on ceph on a productive
cluster, too, it is way harder to maintain than a simple local raid.
When a disk fails in ceph you don't have to worry about your data, which
is a good thing, but you have to worry about the rebuilding (which isn't
too hard, but at least you need to know SOMETHING about ceph), with
(hardware) RAID you simply replace the disk, and it will be rebuilt.

Others will find more reasons why this is not the best idea for a
production system.

Don't get me wrong, I'm a big supporter of ceph, but only for clusters,
not for single systems.

wogri

 -jf
 
 
 --
 He who settles on the idea of the intelligent man as a static entity
 only shows himself to be a fool.
 
 Every nonfree program has a lord, a master --
 and if you use the program, he is your master.
 --Richard Stallman
 


-- 
DI (FH) Wolfgang Hennerbichler
Software Development
Unit Advanced Computing Technologies
RISC Software GmbH
A company of the Johannes Kepler University Linz

IT-Center
Softwarepark 35
4232 Hagenberg
Austria

Phone: +43 7236 3343 245
Fax: +43 7236 3343 250
wolfgang.hennerbich...@risc-software.at
http://www.risc-software.at
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph instead of RAID

2013-08-13 Thread Dmitry Postrigan
 This will be a single server configuration, the goal is to replace mdraid, 
 hence I tried to use localhost
 (nothing more will be added to the cluster). Are you saying it will be less 
 fault tolerant than a RAID-10?

 Ceph is a distributed object store. If you stay within a single machine,
 keep using a local RAID solution (hardware or software).

 Why would you want to make this switch?

I do not think RAID-10 on 6 3TB disks is going to be reliable at all. I have 
simulated several failures, and
it looks like a rebuild will take a lot of time. Funnily, during one of these 
experiments, another drive
failed, and I had lost the entire array. Good luck recovering from that...

I feel that Ceph is better than mdraid because:
1) When ceph cluster is far from being full, 'rebuilding' will be much faster 
vs mdraid
2) You can easily change the number of replicas
3) When multiple disks have bad sectors, I suspect ceph will be much easier to 
recover data from than from
mdraid which will simply never finish rebuilding.
4) If we need to migrate data over to a different server with no downtime, we 
just add more OSDs, wait, and
then remove the old ones :-)

This is my initial observation though, so please correct me if I am wrong.

Dmitry

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph instead of RAID

2013-08-13 Thread Wolfgang Hennerbichler


On 08/13/2013 09:47 AM, Dmitry Postrigan wrote:

 Why would you want to make this switch?
 
 I do not think RAID-10 on 6 3TB disks is going to be reliable at all. I have 
 simulated several failures, and
 it looks like a rebuild will take a lot of time. Funnily, during one of these 
 experiments, another drive
 failed, and I had lost the entire array. Good luck recovering from that...

good point.

 I feel that Ceph is better than mdraid because:
 1) When ceph cluster is far from being full, 'rebuilding' will be much faster 
 vs mdraid

true

 2) You can easily change the number of replicas

true

 3) When multiple disks have bad sectors, I suspect ceph will be much easier 
 to recover data from than from
 mdraid which will simply never finish rebuilding.

maybe not true. also if you have one disk that is starting to be slow
(because of upcoming failure), ceph will slow down drastically, and you
need to find the failing disk.

 4) If we need to migrate data over to a different server with no downtime, we 
 just add more OSDs, wait, and
 then remove the old ones :-)

true. but maybe not as easy and painless as you would expect it to be.
also bear in mind that ceph needs a monitor up and running all time.

 This is my initial observation though, so please correct me if I am wrong.

ceph is easier to maintain than most distributed systems I know, but
still harder than a local RAID. Keep that in mind.

 Dmitry

Wolfgang

 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 


-- 
DI (FH) Wolfgang Hennerbichler
Software Development
Unit Advanced Computing Technologies
RISC Software GmbH
A company of the Johannes Kepler University Linz

IT-Center
Softwarepark 35
4232 Hagenberg
Austria

Phone: +43 7236 3343 245
Fax: +43 7236 3343 250
wolfgang.hennerbich...@risc-software.at
http://www.risc-software.at
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph instead of RAID

2013-08-13 Thread Mark Nelson

On 08/13/2013 02:56 AM, Dmitry Postrigan wrote:

I am currently installing some backup servers with 6x3TB drives in them. I 
played with RAID-10 but I was not
impressed at all with how it performs during a recovery.

Anyway, I thought what if instead of RAID-10 I use ceph? All 6 disks will be 
local, so I could simply create
6 local OSDs + a monitor, right? Is there anything I need to watch out for in 
such configuration?



You can do that. Although it's nice to play with and everything, I
wouldn't recommend doing it. It will give you more pain than pleasure.


Any specific reason? I just got it up and running, an after simulating some 
failures, I like it much better than
mdraid. Again, this only applies to large arrays (6x3TB in my case). I would 
not use ceph to replace a RAID-1
array of course, but it looks like a good idea to replace a large RAID10 array 
with a local ceph installation.

The only thing I do not enjoy about ceph is performance. Probably need to do 
more tweaking, but so far numbers
are not very impressive. I have two exactly same servers running same OS, 
kernel, etc. Each server has 6x 3TB
drives (same model and firmware #).

Server 1 runs ceph (2 replicas)
Server 2 runs mdraid (raid-10)

I ran some very basic benchmarks on both servers:

dd if=/dev/zero of=/storage/test.bin bs=1M count=10
Ceph: 113 MB/s
mdraid: 467 MB/s


dd if=/storage/test.bin of=/dev/null bs=1M
Ceph: 114 MB/s
mdraid: 550 MB/s


As you can see, mdraid is by far faster than ceph. It could be by design, or 
perhaps I am not doing it
right. Even despite such difference in speed, I would still go with ceph 
because *I think* it is more reliable.


couple of things:

1) Ceph is doing full data journal writes so is going to eat (at least) 
half of your write performance right there.


2) Ceph tends to like lots of concurrency.  You'll probably see higher 
numbers with multiple dd reads/writes going at once.


3) Ceph is a lot more complex than something like mdraid.  It gives you 
a lot more power and flexibility but the cost is greater complexity. 
There are probably things you can tune to get your numbers up, but it 
could take some work.


Having said all of this, my primary test box is a single server and I 
can get 90MB/s+ per drive out of Ceph (with 24 drives!), but if I was 
building a production box and never planned to expand to multiple 
servers, I'd certainly be looking into zfs or btrfs RAID.


Mark



Dmitry

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph instead of RAID

2013-08-13 Thread Martin B Nielsen
Hi,

I'd just like to echo what Wolfgang said about ceph being a complex system.

I initially started out testing ceph with a setup much like yours. And
while it overall performed ok, it was not as good as sw raid on the same
machine.

Also, as Mark said you'll have at very best half write speeds because of
how the journaling works if you do larger continuous writes.

Ceph really shines with multiple servers  multiple concurrency.

My testmachine was running for ½ a year+ (going from argonaut -
cuttlefish) and in that process I came to realize that mixing types of disk
(and size) was a bad idea (some enterprise SATA, some fast desktop and some
green disks) - as speed will be determined by the slowest drive in your
setup (that's why they're advocating using similar hw if at all possible I
guess).

I also experienced all the challenging issues having to deal with a very
young technology; osds suddenly refusing to start, pg's going into various
incomplete/down/inconsistent states, monitor leveldb running full, monitor
dying at weird times and well - I think it is good for a learning
experience, but like Wolfgang said I think it is too much hassle for too
little gain when you have something like raid10/zfs around.

But, by all means, don't let us discourage you if you want to go this route
- ceph's unique self-healing ability was what drew me into running a single
machine in the first place.

Cheers,
Martin



On Tue, Aug 13, 2013 at 9:32 AM, Wolfgang Hennerbichler 
wolfgang.hennerbich...@risc-software.at wrote:



 On 08/13/2013 09:23 AM, Jeffrey 'jf' Lim wrote:
  Anyway, I thought what if instead of RAID-10 I use ceph? All 6 disks
 will be local, so I could simply create
  6 local OSDs + a monitor, right? Is there anything I need to watch out
 for in such configuration?
 
  You can do that. Although it's nice to play with and everything, I
  wouldn't recommend doing it. It will give you more pain than pleasure.
 
  How so? Care to elaborate?

 Ceph is a complex system, built for clusters. It does some stuff in
 software that is otherwhise done in hardware (raid controllers). The
 nature of the complexity of a cluster system is a lot of overhead
 compared to a local raid [whatever] system, and latency of disk i/o will
 naturally suffer a bit. An OSD needs about 300 MB of RAM (may vary on
 your PGs), times 6 is a waste of nearly 2 GB of RAM (compared to a
 local RAID). Also ceph is young, and it does indeed have some bugs. RAID
 is old, and very mature. Although I rely on ceph on a productive
 cluster, too, it is way harder to maintain than a simple local raid.
 When a disk fails in ceph you don't have to worry about your data, which
 is a good thing, but you have to worry about the rebuilding (which isn't
 too hard, but at least you need to know SOMETHING about ceph), with
 (hardware) RAID you simply replace the disk, and it will be rebuilt.

 Others will find more reasons why this is not the best idea for a
 production system.

 Don't get me wrong, I'm a big supporter of ceph, but only for clusters,
 not for single systems.

 wogri

  -jf
 
 
  --
  He who settles on the idea of the intelligent man as a static entity
  only shows himself to be a fool.
 
  Every nonfree program has a lord, a master --
  and if you use the program, he is your master.
  --Richard Stallman
 


 --
 DI (FH) Wolfgang Hennerbichler
 Software Development
 Unit Advanced Computing Technologies
 RISC Software GmbH
 A company of the Johannes Kepler University Linz

 IT-Center
 Softwarepark 35
 4232 Hagenberg
 Austria

 Phone: +43 7236 3343 245
 Fax: +43 7236 3343 250
 wolfgang.hennerbich...@risc-software.at
 http://www.risc-software.at
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Ceph instead of RAID

2013-08-12 Thread Dmitry Postrigan
Hello community,

I am currently installing some backup servers with 6x3TB drives in them. I 
played with RAID-10 but I was not
impressed at all with how it performs during a recovery.

Anyway, I thought what if instead of RAID-10 I use ceph? All 6 disks will be 
local, so I could simply create
6 local OSDs + a monitor, right? Is there anything I need to watch out for in 
such configuration?

Another thing. I am using ceph-deploy and I have noticed that when I do this:

ceph-deploy --verbose  new localhost

the ceph.conf file is created in the current folder instead of /etc. Is this 
normal?

Also, in the ceph.conf there's a line:
mon host = ::1
Is this normal or I need to change this to point to localhost?

Thanks for any feedback on this.

Dmitry

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph instead of RAID

2013-08-12 Thread Dan Mick



On 08/12/2013 06:49 PM, Dmitry Postrigan wrote:

Hello community,

I am currently installing some backup servers with 6x3TB drives in them. I 
played with RAID-10 but I was not
impressed at all with how it performs during a recovery.

Anyway, I thought what if instead of RAID-10 I use ceph? All 6 disks will be 
local, so I could simply create
6 local OSDs + a monitor, right? Is there anything I need to watch out for in 
such configuration?


I mean, you can certainly do that.  1 mon and all OSDs on one server is 
not particularly fault-tolerant, perhaps, but if you have multiple such 
servers in the cluster, sure, why not?



Another thing. I am using ceph-deploy and I have noticed that when I do this:

 ceph-deploy --verbose  new localhost

the ceph.conf file is created in the current folder instead of /etc. Is this 
normal?


Yes.  ceph-deploy also distributes ceph.conf where it needs to go.


Also, in the ceph.conf there's a line:
 mon host = ::1
Is this normal or I need to change this to point to localhost?



You want to configure the machines such that they have resolvable 'real' 
IP addresses:


http://ceph.com/docs/master/start/quick-start-preflight/#hostname-resolution



Thanks for any feedback on this.

Dmitry

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



--
Dan Mick, Filesystem Engineering
Inktank Storage, Inc.   http://inktank.com
Ceph docs: http://ceph.com/docs
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com