This is kind of unsolvable problem, in CAP , we choose Consistency and Availability, thus we had to lose Partition tolerance.
There are three networks here , mon<-> osd, osd<-public->osd, osd<- cluster-> osd. If some of the networks are reachable but some are not, likely the flipping will happen. -----Original Message----- From: ceph-devel-ow...@vger.kernel.org [mailto:ceph-devel-ow...@vger.kernel.org] On Behalf Of huang jun Sent: Sunday, September 13, 2015 5:46 PM To: zhao.ming...@h3c.com Cc: ceph-devel@vger.kernel.org Subject: Re: 答复: 2 replications,flapping can not stop for a very long time 2015-09-13 14:07 GMT+08:00 zhao.ming...@h3c.com <zhao.ming...@h3c.com>: > hi, do you set both public_network and cluster_network, but just cut off the > cluster_network? > And do you have not only one osd on the same host? > =============================yes,public network+cluster network,and I > cut off the cluster network; 2 node ,each node has serveral osds; > > If so, maybe you can not get stable, now the osd have peers in the prev and > next osd id, they can exchange ping message. > you cut off the cluster_network, the outbox peer osds can not detect the > ping, they reports the osd failure to MON, and MON gather enough reporters > and reports, then the osd will be marked down. > =============================when osd recv a new map and it is marked down,it > think MON wrongly mark me down,what will it do,join the cluster again or > other actions?can you give me some more detailed explanation? It will send a boot message to MON, and will be marked UP by MON. > > But the osd can reports to MON bc the public_network is ok, MON thinks the > osd wronly marked down, mark it to UP. > =============================you mean that MON recv message ONE TIME from > this osd then it will mark this osd up? > > So flapping happens again and again. > ============================= I tried 3 replications,(public network + > cluster network,3 node,each node has serveral osds),although it will > occur flapping,but after serveral minutes it will be stable, compared with 2 > replications situation, I wait for the same intervals,the cluster can not be > stable; so I'm confused about the machnism that how monitor can decide which > osd is actually down? > It's weird, if you cut off the cluster_network, the osds in other node can not get the ping messages, and naturally think the osd is failed. > thanks > > -----邮件原件----- > 发件人: huang jun [mailto:hjwsm1...@gmail.com] > 发送时间: 2015年9月13日 10:39 > 收件人: zhaomingyue 09440 (RD) > 抄送: ceph-devel@vger.kernel.org > 主题: Re: 2 replications,flapping can not stop for a very long time > > hi, do you set both public_network and cluster_network, but just cut off the > cluster_network? > And do you have not only one osd on the same host? > If so, maybe you can not get stable, now the osd have peers in the prev and > next osd id, they can exchange ping message. > you cut off the cluster_network, the outbox peer osds can not detect the > ping, they reports the osd failure to MON, and MON gather enough reporters > and reports, then the osd will be marked down. > But the osd can reports to MON bc the public_network is ok, MON thinks the > osd wronly marked down, mark it to UP. > So flapping happens again and again. > > 2015-09-12 20:26 GMT+08:00 zhao.ming...@h3c.com <zhao.ming...@h3c.com>: >> >> Hi, >> I'm testing reliability of ceph recently, and I have met the flapping >> problem. >> I have 2 replications, and cut off the cluster network ,now flapping can >> not stop,I have wait more than 30min, but status of osds are still not >> stable; >> I want to know about when monitor recv reports from osds ,how it can >> mark one osd down? >> (reports && reporter && grace) need to satisfied some conditions, how to >> calculate the grace? >> and how long will the flapping stop?Does the flapping must be stopped by >> configure,such as configure an osd lost? >> Can someone help me ? >> Thanks~ >> --------------------------------------------------------------------- >> - >> --------------------------------------------------------------- >> 本邮件及其附件含有杭州华三通信技术有限公司的保密信息,仅限于发送给上面地址中列出 >> 的个人或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、 >> 或散发)本邮件中的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本 >> 邮件! >> This e-mail and its attachments contain confidential information from >> H3C, which is intended only for the person or entity whose address is >> listed above. Any use of the information contained herein in any way >> (including, but not limited to, total or partial disclosure, >> reproduction, or dissemination) by persons other than the intended >> recipient(s) is prohibited. If you receive this e-mail in error, >> please notify the sender by phone or email immediately and delete it! > > > > -- > thanks > huangjun -- thanks huangjun -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html