Re: 答复: 2 replications,flapping can not stop for a very long time

huang jun Sun, 13 Sep 2015 02:47:23 -0700

2015-09-13 14:07 GMT+08:00 zhao.ming...@h3c.com <zhao.ming...@h3c.com>:
> hi, do you set both public_network and cluster_network, but just cut off the 
> cluster_network?
> And do you have not only one osd on the same host?
> =============================yes,public network+cluster network,and I cut off 
> the cluster network; 2 node ,each node has serveral osds;
>
> If so, maybe you can not get stable, now the osd have peers in the prev and 
> next osd id, they can exchange ping message.
> you cut off the cluster_network, the outbox peer osds can not detect the 
> ping, they reports the osd failure to MON, and MON gather enough reporters 
> and reports, then the osd will be marked down.
> =============================when osd recv a new map and it is marked down,it 
> think MON wrongly mark me down,what will it do,join the cluster again or 
> other actions?can you give me some more detailed explanation?


It will send a boot message to MON, and will be marked UP by MON.
>
> But the osd can reports to MON bc the public_network is ok,  MON thinks the 
> osd wronly marked down, mark it to UP.
> =============================you mean that MON recv message ONE TIME from 
> this osd then it will mark this osd up?
>

> So flapping happens again and again.
> ============================= I tried 3 replications,(public network + 
> cluster network,3 node,each node has serveral osds),although it will occur 
> flapping,but after serveral minutes it will be stable,
> compared with 2 replications situation, I wait for the same intervals,the 
> cluster can not be stable;
> so I'm confused about the machnism that how monitor can decide which osd is 
> actually down?
>
It's weird, if you cut off the cluster_network, the osds in other node
can not get the ping messages, and naturally think the osd is failed.

> thanks
>
> -----邮件原件-----
> 发件人: huang jun [mailto:hjwsm1...@gmail.com]
> 发送时间: 2015年9月13日 10:39
> 收件人: zhaomingyue 09440 (RD)
> 抄送: ceph-devel@vger.kernel.org
> 主题: Re: 2 replications,flapping can not stop for a very long time
>
> hi, do you set both public_network and cluster_network, but just cut off the 
> cluster_network?
> And do you have not only one osd on the same host?
> If so, maybe you can not get stable, now the osd have peers in the prev and 
> next osd id, they can exchange ping message.
> you cut off the cluster_network, the outbox peer osds can not detect the 
> ping, they reports the osd failure to MON, and MON gather enough reporters 
> and reports, then the osd will be marked down.
> But the osd can reports to MON bc the public_network is ok,  MON thinks the 
> osd wronly marked down, mark it to UP.
> So flapping happens again and again.
>
> 2015-09-12 20:26 GMT+08:00 zhao.ming...@h3c.com <zhao.ming...@h3c.com>:
>>
>> Hi,
>> I'm testing reliability of ceph recently, and I have met the flapping 
>> problem.
>> I have 2 replications, and cut off the cluster network ,now  flapping can 
>> not stop,I have wait more than 30min, but status of osds are still not 
>> stable;
>>     I want to know about  when monitor recv reports from osds ,how it can 
>> mark one osd down?
>>     (reports && reporter && grace) need to satisfied some conditions, how to 
>> calculate the grace?
>> and how long will the flapping  stop?Does the flapping must be stopped by 
>> configure,such as configure an osd lost?
>> Can someone help me ?
>> Thanks~
>> ----------------------------------------------------------------------
>> ---------------------------------------------------------------
>> 本邮件及其附件含有杭州华三通信技术有限公司的保密信息，仅限于发送给上面地址中列出
>> 的个人或群组。禁止任何其他人以任何形式使用（包括但不限于全部或部分地泄露、复制、
>> 或散发）本邮件中的信息。如果您错收了本邮件，请您立即电话或邮件通知发件人并删除本
>> 邮件！
>> This e-mail and its attachments contain confidential information from
>> H3C, which is intended only for the person or entity whose address is
>> listed above. Any use of the information contained herein in any way
>> (including, but not limited to, total or partial disclosure,
>> reproduction, or dissemination) by persons other than the intended
>> recipient(s) is prohibited. If you receive this e-mail in error,
>> please notify the sender by phone or email immediately and delete it!
>
>
>
> --
> thanks
> huangjun



-- 
thanks
huangjun
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: 答复: 2 replications,flapping can not stop for a very long time

Reply via email to