Re: [ceph-users] questions about monitor data and ceph recovery

Martin B Nielsen Tue, 25 Feb 2014 02:24:13 -0800

Hi Pavel,

Will try and answer some of your questions:


My first question will be about monitor data directory. How much space I
> need to reserve for it? Can monitor-fs be corrupted if monitor goes out of
> storage space?
>

We have about 20GB partitions for monitors - they really don't use much
space, but in case you need to do some extra logging it is nice to have
(and ceph doing max debug consumes scary amounts of space).
Also if you look in the monitor log they constantly monitor for free space.
I don't know what will happen if a monitor runs full (or close to full),
but I'm guessing that monitor will simply be marked as down or stopped
somehow. You can change some of the values for a mon about how much data to
keep before trimming etc.


>
> I also have questions about ceph auto-recovery process.
> For example, I have two nodes with 8 drives on each, each drive is
> presented as separate osd. The number of replicas = 2. I have wrote a crush
> ruleset, which picks two nodes and one osd on each to store replicas. Which
> will happens on following scenarios:
>
> 1. One drive in one node failed. Will ceph automatically re-replicate
> affected objects? Where replicas will be stored?
>
Yes, as long as you have available space on the node that lost one OSD the
data that was on that disk will be distributed aross the remaining 7 OSD on
that node (according to your CRUSH rules)


>
> 1.1 The failed osd will appears online again with all of it's data. How
> ceph cluster will deal with it?
>
This is just how I _think_ it works; please correct me if I'm wrong. All
OSD have an internal map (pg map) which is constantly updated throughout
the cluster. When any OSD goes offline/down and is started back up the
latest pgmap of that OSD is 'diffed' up vs the latest map from the cluster
and then the cluster can generate a new map based on what it has/had, what
is missing/updated and generate a new map with the objects the newly
started OSD should have. Then it will start to replicate and only get the
changed/new objects.

Bottom line, this really just works and works very well.


>
> 2. One node (with 8 osds) goes offline. Will ceph automatically replicate
> all objects on the remaining node to maintain number of replicas = 2?
>
No, because it can no longer satisfy your CRUSH rules. Your crush rule
states 1x copy pr. node and it will keep it that way. The cluster will go
into a degraded state until you can bring up another node (ie all your data
now is very vulnerable). I think it is often suggested to run with 3x
replica if possible - or at the very least nr_nodes = replicas + 1. If you
had to make it replicate on the remaining node you'd have to change your
CRUSH rule to replicate based on OSD and not node. But then you'll most
likely have problems when 1 node dies because objects could easily be on 2x
OSD on the failed node.


>
> 2.1 The failed node goes online again with all data. How ceph cluster will
> deal with it?
>
Same as the above with the OSD.

Cheers,
Martin


> Thanks in advance,
>   Pavel.
>
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] questions about monitor data and ceph recovery

Reply via email to