Sounds to me like you've put yourself at too much risk - *if* I'm reading your message right about your configuration, you have multiple hosts accessing OSDs that are stored on a single shared box - so if that single shared box (single point of failure for multiple nodes) goes down it's possible for multiple replicas to disappear at the same time which could halt the operation of your cluster if the masters and the replicas are both on OSDs within that single shared storage system...
On Thu, Jul 9, 2015 at 5:42 AM, Mallikarjun Biradar < mallikarjuna.bira...@gmail.com> wrote: > Hi all, > > Setup details: > Two storage enclosures each connected to 4 OSD nodes (Shared storage). > Failure domain is Chassis (enclosure) level. Replication count is 2. > Each host has allotted with 4 drives. > > I have active client IO running on cluster. (Random write profile with > 4M block size & 64 Queue depth). > > One of enclosure had power loss. So all OSD's from hosts that are > connected to this enclosure went down as expected. > > But client IO got paused. After some time enclosure & hosts connected > to it came up. > And all OSD's on that hosts came up. > > Till this time, cluster was not serving IO. Once all hosts & OSD's > pertaining to that enclosure came up, client IO resumed. > > > Can anybody help me why cluster not serving IO during enclosure > failure. OR its a bug? > > -Thanks & regards, > Mallikarjun Biradar > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com