subject:"Re\: \[ClusterLabs\] Current DC becomes None suddenly"

Re: [ClusterLabs] Current DC becomes None suddenly

2015-10-08 Thread Pritam Kharat

Could some one please reply to this query ?


On Sat, Oct 3, 2015 at 12:17 AM, Pritam Kharat <
pritam.kha...@oneconvergence.com> wrote:

>
> Hi,
>
> I have set up a ACTIVE/PASSIVE HA
>
> *Issue 1) *
>
> *corosync.conf*  file is
>
> # Please read the openais.conf.5 manual page
>
> totem {
>
> version: 2
>
> # How long before declaring a token lost (ms)
> token: 1
>
> # How many token retransmits before forming a new configuration
> token_retransmits_before_loss_const: 20
>
> # How long to wait for join messages in the membership protocol
> (ms)
> join: 1
>
> # How long to wait for consensus to be achieved before starting a
> new round of membership configuration (ms)
> consensus: 12000
>
> # Turn off the virtual synchrony filter
> vsftype: none
>
> # Number of messages that may be sent by one processor on receipt
> of the token
> max_messages: 20
>
> # Limit generated nodeids to 31-bits (positive signed integers)
> clear_node_high_bit: yes
>
> # Disable encryption
> secauth: off
>
> # How many threads to use for encryption/decryption
> threads: 0
>
> # Optionally assign a fixed node id (integer)
> # nodeid: 1234
>
> # This specifies the mode of redundant ring, which may be none,
> active, or passive.
> rrp_mode: none
> interface {
> # The following values need to be set based on your
> environment
> ringnumber: 0
> bindnetaddr: 192.168.101.0
> mcastport: 5405
> }
>
> transport: udpu
> }
>
> amf {
> mode: disabled
> }
>
> quorum {
> # Quorum for the Pacemaker Cluster Resource Manager
> provider: corosync_votequorum
> expected_votes: 1
> }
>
>
> nodelist {
>
> node {
> ring0_addr: 192.168.101.73
> }
>
> node {
> ring0_addr: 192.168.101.74
> }
> }
>
> aisexec {
> user:   root
> group:  root
> }
>
>
> logging {
> fileline: off
> to_stderr: yes
> to_logfile: yes
> to_syslog: yes
> syslog_facility: daemon
> logfile: /var/log/corosync/corosync.log
> debug: off
> timestamp: on
> logger_subsys {
> subsys: AMF
> debug: off
> tags: enter|leave|trace1|trace2|trace3|trace4|trace6
> }
> }
>
> And I have added 5 resources - 1 is VIP and 4 are upstart jobs
> Node names are configured as -> sc-node-1(ACTIVE) and sc-node-2(PASSIVE)
> Resources are running on ACTIVE node
>
> Default cluster properties -
>
>   
>  value="1.1.10-42f2063"/>
>  name="cluster-infrastructure" value="corosync"/>
>  id="cib-bootstrap-options-no-quorum-policy"/>
>  id="cib-bootstrap-options-stonith-enabled"/>
>  id="cib-bootstrap-options-cluster-recheck-interval"/>
>  id="cib-bootstrap-options-default-action-timeout"/>
>   
>
>
> But sometimes after 2-3 migrations from ACTIVE to STANDBY and then from
> STANDBY to ACTIVE,
> both nodes become OFFLINE and Current DC becomes None, I have disabled the
> stonith property and even quorum is ignored
>
> root@sc-node-2:/usr/lib/python2.7/dist-packages/sc# crm status
> Last updated: Sat Oct  3 00:01:40 2015
> Last change: Fri Oct  2 23:38:28 2015 via crm_resource on sc-node-1
> Stack: corosync
> Current DC: NONE
> 2 Nodes configured
> 5 Resources configured
>
> OFFLINE: [ sc-node-1 sc-node-2 ]
>
> What is going wrong here ? What is the reason for node Current DC becoming
> None suddenly ? Is corosync.conf okay ? Are default cluster properties fine
> ? Help will be appreciated.
>
>
> *Issue 2)*
> Command used to add upstart job is
>
> crm configure primitive service upstart:service meta allow-migrate=true
> migration-threshold=5 failure-timeout=30s op monitor interval=15s
>  timeout=60s
>
> But still sometimes I see fail count going to INFINITY. Why ? How can we
> avoid it ? Resource should have migrated as soon as it reaches migration
> threshold.
>
> * Node sc-node-2:
>service: migration-threshold=5 fail-count=100 last-failure='Fri Oct
>  2 23:38:53 2015'
>service1: migration-threshold=5 fail-count=100 last-failure='Fri
> Oct  2 23:38:53 2015'
>
> Failed actions:
> service_start_0 (node=sc-node-2, call=-1, rc=1, status=Timed Out,
> last-rc-change=Fri Oct  2 23:38:53 2015
> , queued=0ms, exec=0ms
> ): unknown error
> service1_start_0 (node=sc-node-2, call=-1, rc=1, status=Timed Out,
> last-rc-change=Fri Oct  2 23:38:53 2015
> , queued=0ms, exec=0ms
>
>
>
>
> --
> Thanks and Regards,
> Pritam Kharat.
>



-- 
Thanks and Regards,
Pritam Kharat.
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http

Re: [ClusterLabs] Current DC becomes None suddenly

2015-10-08 Thread Ken Gaillot

On 10/02/2015 01:47 PM, Pritam Kharat wrote:
> Hi,
> 
> I have set up a ACTIVE/PASSIVE HA
> 
> *Issue 1) *
> 
> *corosync.conf*  file is
> 
> # Please read the openais.conf.5 manual page
> 
> totem {
> 
> version: 2
> 
> # How long before declaring a token lost (ms)
> token: 1
> 
> # How many token retransmits before forming a new configuration
> token_retransmits_before_loss_const: 20
> 
> # How long to wait for join messages in the membership protocol (ms)
> join: 1
> 
> # How long to wait for consensus to be achieved before starting a
> new round of membership configuration (ms)
> consensus: 12000
> 
> # Turn off the virtual synchrony filter
> vsftype: none
> 
> # Number of messages that may be sent by one processor on receipt
> of the token
> max_messages: 20
> 
> # Limit generated nodeids to 31-bits (positive signed integers)
> clear_node_high_bit: yes
> 
> # Disable encryption
> secauth: off
> 
> # How many threads to use for encryption/decryption
> threads: 0
> 
> # Optionally assign a fixed node id (integer)
> # nodeid: 1234
> 
> # This specifies the mode of redundant ring, which may be none,
> active, or passive.
> rrp_mode: none
> interface {
> # The following values need to be set based on your
> environment
> ringnumber: 0
> bindnetaddr: 192.168.101.0
> mcastport: 5405
> }
> 
> transport: udpu
> }
> 
> amf {
> mode: disabled
> }
> 
> quorum {
> # Quorum for the Pacemaker Cluster Resource Manager
> provider: corosync_votequorum
> expected_votes: 1

If you're using a recent version of corosync, use "two_node: 1" instead
of "expected_votes: 1", and get rid of "no-quorum-policy: ignore" in the
pacemaker cluster options.

> }
> 
> 
> nodelist {
> 
> node {
> ring0_addr: 192.168.101.73
> }
> 
> node {
> ring0_addr: 192.168.101.74
> }
> }
> 
> aisexec {
> user:   root
> group:  root
> }
> 
> 
> logging {
> fileline: off
> to_stderr: yes
> to_logfile: yes
> to_syslog: yes
> syslog_facility: daemon
> logfile: /var/log/corosync/corosync.log
> debug: off
> timestamp: on
> logger_subsys {
> subsys: AMF
> debug: off
> tags: enter|leave|trace1|trace2|trace3|trace4|trace6
> }
> }
> 
> And I have added 5 resources - 1 is VIP and 4 are upstart jobs
> Node names are configured as -> sc-node-1(ACTIVE) and sc-node-2(PASSIVE)
> Resources are running on ACTIVE node
> 
> Default cluster properties -
> 
>   
>  value="1.1.10-42f2063"/>
>  name="cluster-infrastructure" value="corosync"/>
>  id="cib-bootstrap-options-no-quorum-policy"/>
>  id="cib-bootstrap-options-stonith-enabled"/>
>  id="cib-bootstrap-options-cluster-recheck-interval"/>
>  id="cib-bootstrap-options-default-action-timeout"/>
>   
> 
> 
> But sometimes after 2-3 migrations from ACTIVE to STANDBY and then from
> STANDBY to ACTIVE,
> both nodes become OFFLINE and Current DC becomes None, I have disabled the
> stonith property and even quorum is ignored

Disabling stonith isn't helping you. The cluster needs stonith to
recover from difficult situations, so it's easier to get into weird
states like this without it.

> root@sc-node-2:/usr/lib/python2.7/dist-packages/sc# crm status
> Last updated: Sat Oct  3 00:01:40 2015
> Last change: Fri Oct  2 23:38:28 2015 via crm_resource on sc-node-1
> Stack: corosync
> Current DC: NONE
> 2 Nodes configured
> 5 Resources configured
> 
> OFFLINE: [ sc-node-1 sc-node-2 ]
> 
> What is going wrong here ? What is the reason for node Current DC becoming
> None suddenly ? Is corosync.conf okay ? Are default cluster properties fine
> ? Help will be appreciated.

I'd recommend seeing how the problem behaves with stonith enabled, but
in any case you'll need to dive into the logs to figure what starts the
chain of events.

> 
> *Issue 2)*
> Command used to add upstart job is
> 
> crm configure primitive service upstart:service meta allow-migrate=true
> migration-threshold=5 failure-timeout=30s op monitor interval=15s
>  timeout=60s
> 
> But still sometimes I see fail count going to INFINITY. Why ? How can we
> avoid it ? Resource should have migrated as soon as it reaches migration
> threshold.
> 
> * Node sc-node-2:
>service: migration-threshold=5 fail-count=100 last-failure='Fri Oct
>  2 23:38:53 2015'
>service1: migration-threshold=5 fail-count=100 last-failure='Fri Oct
>  2 23:38:53 2015'
> 
> Failed actions:
> service_start_0 (node=sc-node-2, call=-1, rc=1, status=Timed Out,
> last-rc-change=Fri Oct  2 23:38:53 2015
> , queued=0ms, exec=0ms
> ): unknown e

Re: [ClusterLabs] Current DC becomes None suddenly

2015-10-08 Thread Pritam Kharat

Hi Ken,

Thanks for reply.

On Thu, Oct 8, 2015 at 8:13 PM, Ken Gaillot  wrote:

> On 10/02/2015 01:47 PM, Pritam Kharat wrote:
> > Hi,
> >
> > I have set up a ACTIVE/PASSIVE HA
> >
> > *Issue 1) *
> >
> > *corosync.conf*  file is
> >
> > # Please read the openais.conf.5 manual page
> >
> > totem {
> >
> > version: 2
> >
> > # How long before declaring a token lost (ms)
> > token: 1
> >
> > # How many token retransmits before forming a new configuration
> > token_retransmits_before_loss_const: 20
> >
> > # How long to wait for join messages in the membership protocol
> (ms)
> > join: 1
> >
> > # How long to wait for consensus to be achieved before starting a
> > new round of membership configuration (ms)
> > consensus: 12000
> >
> > # Turn off the virtual synchrony filter
> > vsftype: none
> >
> > # Number of messages that may be sent by one processor on receipt
> > of the token
> > max_messages: 20
> >
> > # Limit generated nodeids to 31-bits (positive signed integers)
> > clear_node_high_bit: yes
> >
> > # Disable encryption
> > secauth: off
> >
> > # How many threads to use for encryption/decryption
> > threads: 0
> >
> > # Optionally assign a fixed node id (integer)
> > # nodeid: 1234
> >
> > # This specifies the mode of redundant ring, which may be none,
> > active, or passive.
> > rrp_mode: none
> > interface {
> > # The following values need to be set based on your
> > environment
> > ringnumber: 0
> > bindnetaddr: 192.168.101.0
> > mcastport: 5405
> > }
> >
> > transport: udpu
> > }
> >
> > amf {
> > mode: disabled
> > }
> >
> > quorum {
> > # Quorum for the Pacemaker Cluster Resource Manager
> > provider: corosync_votequorum
> > expected_votes: 1
>
> If you're using a recent version of corosync, use "two_node: 1" instead
> of "expected_votes: 1", and get rid of "no-quorum-policy: ignore" in the
> pacemaker cluster options.
>
>-> We are using corosync version 2.3.3. Do we above mentioned change
for this version ?



> > }
> >
> >
> > nodelist {
> >
> > node {
> > ring0_addr: 192.168.101.73
> > }
> >
> > node {
> > ring0_addr: 192.168.101.74
> > }
> > }
> >
> > aisexec {
> > user:   root
> > group:  root
> > }
> >
> >
> > logging {
> > fileline: off
> > to_stderr: yes
> > to_logfile: yes
> > to_syslog: yes
> > syslog_facility: daemon
> > logfile: /var/log/corosync/corosync.log
> > debug: off
> > timestamp: on
> > logger_subsys {
> > subsys: AMF
> > debug: off
> > tags: enter|leave|trace1|trace2|trace3|trace4|trace6
> > }
> > }
> >
> > And I have added 5 resources - 1 is VIP and 4 are upstart jobs
> > Node names are configured as -> sc-node-1(ACTIVE) and sc-node-2(PASSIVE)
> > Resources are running on ACTIVE node
> >
> > Default cluster properties -
> >
> >   
> >  > value="1.1.10-42f2063"/>
> >  > name="cluster-infrastructure" value="corosync"/>
> >  > id="cib-bootstrap-options-no-quorum-policy"/>
> >  > id="cib-bootstrap-options-stonith-enabled"/>
> >  > id="cib-bootstrap-options-cluster-recheck-interval"/>
> >  > id="cib-bootstrap-options-default-action-timeout"/>
> >   
> >
> >
> > But sometimes after 2-3 migrations from ACTIVE to STANDBY and then from
> > STANDBY to ACTIVE,
> > both nodes become OFFLINE and Current DC becomes None, I have disabled
> the
> > stonith property and even quorum is ignored
>
> Disabling stonith isn't helping you. The cluster needs stonith to
> recover from difficult situations, so it's easier to get into weird
> states like this without it.
>
> > root@sc-node-2:/usr/lib/python2.7/dist-packages/sc# crm status
> > Last updated: Sat Oct  3 00:01:40 2015
> > Last change: Fri Oct  2 23:38:28 2015 via crm_resource on sc-node-1
> > Stack: corosync
> > Current DC: NONE
> > 2 Nodes configured
> > 5 Resources configured
> >
> > OFFLINE: [ sc-node-1 sc-node-2 ]
> >
> > What is going wrong here ? What is the reason for node Current DC
> becoming
> > None suddenly ? Is corosync.conf okay ? Are default cluster properties
> fine
> > ? Help will be appreciated.
>
> I'd recommend seeing how the problem behaves with stonith enabled, but
> in any case you'll need to dive into the logs to figure what starts the
> chain of events.
>
>
   -> We are seeing this issue when we try rebooting the vms

>
> > *Issue 2)*
> > Command used to add upstart job is
> >
> > crm configure primitive service upstart:service meta allow-migrate=true
> > migration-threshold=5 failure-timeout=30s op monitor interval=15s
> >  timeout=60s
> >
> > But still som

Re: [ClusterLabs] Current DC becomes None suddenly

2015-10-08 Thread Pritam Kharat

Hi Ken,

Please see inline comments of last mail

On Thu, Oct 8, 2015 at 8:25 PM, Pritam Kharat <
pritam.kha...@oneconvergence.com> wrote:

> Hi Ken,
>
> Thanks for reply.
>
> On Thu, Oct 8, 2015 at 8:13 PM, Ken Gaillot  wrote:
>
>> On 10/02/2015 01:47 PM, Pritam Kharat wrote:
>> > Hi,
>> >
>> > I have set up a ACTIVE/PASSIVE HA
>> >
>> > *Issue 1) *
>> >
>> > *corosync.conf*  file is
>> >
>> > # Please read the openais.conf.5 manual page
>> >
>> > totem {
>> >
>> > version: 2
>> >
>> > # How long before declaring a token lost (ms)
>> > token: 1
>> >
>> > # How many token retransmits before forming a new configuration
>> > token_retransmits_before_loss_const: 20
>> >
>> > # How long to wait for join messages in the membership protocol
>> (ms)
>> > join: 1
>> >
>> > # How long to wait for consensus to be achieved before starting
>> a
>> > new round of membership configuration (ms)
>> > consensus: 12000
>> >
>> > # Turn off the virtual synchrony filter
>> > vsftype: none
>> >
>> > # Number of messages that may be sent by one processor on
>> receipt
>> > of the token
>> > max_messages: 20
>> >
>> > # Limit generated nodeids to 31-bits (positive signed integers)
>> > clear_node_high_bit: yes
>> >
>> > # Disable encryption
>> > secauth: off
>> >
>> > # How many threads to use for encryption/decryption
>> > threads: 0
>> >
>> > # Optionally assign a fixed node id (integer)
>> > # nodeid: 1234
>> >
>> > # This specifies the mode of redundant ring, which may be none,
>> > active, or passive.
>> > rrp_mode: none
>> > interface {
>> > # The following values need to be set based on your
>> > environment
>> > ringnumber: 0
>> > bindnetaddr: 192.168.101.0
>> > mcastport: 5405
>> > }
>> >
>> > transport: udpu
>> > }
>> >
>> > amf {
>> > mode: disabled
>> > }
>> >
>> > quorum {
>> > # Quorum for the Pacemaker Cluster Resource Manager
>> > provider: corosync_votequorum
>> > expected_votes: 1
>>
>> If you're using a recent version of corosync, use "two_node: 1" instead
>> of "expected_votes: 1", and get rid of "no-quorum-policy: ignore" in the
>> pacemaker cluster options.
>>
>>-> We are using corosync version 2.3.3. Do we above mentioned change
> for this version ?
>
>
>
>> > }
>> >
>> >
>> > nodelist {
>> >
>> > node {
>> > ring0_addr: 192.168.101.73
>> > }
>> >
>> > node {
>> > ring0_addr: 192.168.101.74
>> > }
>> > }
>> >
>> > aisexec {
>> > user:   root
>> > group:  root
>> > }
>> >
>> >
>> > logging {
>> > fileline: off
>> > to_stderr: yes
>> > to_logfile: yes
>> > to_syslog: yes
>> > syslog_facility: daemon
>> > logfile: /var/log/corosync/corosync.log
>> > debug: off
>> > timestamp: on
>> > logger_subsys {
>> > subsys: AMF
>> > debug: off
>> > tags: enter|leave|trace1|trace2|trace3|trace4|trace6
>> > }
>> > }
>> >
>> > And I have added 5 resources - 1 is VIP and 4 are upstart jobs
>> > Node names are configured as -> sc-node-1(ACTIVE) and sc-node-2(PASSIVE)
>> > Resources are running on ACTIVE node
>> >
>> > Default cluster properties -
>> >
>> >   
>> > > > value="1.1.10-42f2063"/>
>> > > > name="cluster-infrastructure" value="corosync"/>
>> > > > id="cib-bootstrap-options-no-quorum-policy"/>
>> > > > id="cib-bootstrap-options-stonith-enabled"/>
>> > > > id="cib-bootstrap-options-cluster-recheck-interval"/>
>> > > > id="cib-bootstrap-options-default-action-timeout"/>
>> >   
>> >
>> >
>> > But sometimes after 2-3 migrations from ACTIVE to STANDBY and then from
>> > STANDBY to ACTIVE,
>> > both nodes become OFFLINE and Current DC becomes None, I have disabled
>> the
>> > stonith property and even quorum is ignored
>>
>> Disabling stonith isn't helping you. The cluster needs stonith to
>> recover from difficult situations, so it's easier to get into weird
>> states like this without it.
>>
>> > root@sc-node-2:/usr/lib/python2.7/dist-packages/sc# crm status
>> > Last updated: Sat Oct  3 00:01:40 2015
>> > Last change: Fri Oct  2 23:38:28 2015 via crm_resource on sc-node-1
>> > Stack: corosync
>> > Current DC: NONE
>> > 2 Nodes configured
>> > 5 Resources configured
>> >
>> > OFFLINE: [ sc-node-1 sc-node-2 ]
>> >
>> > What is going wrong here ? What is the reason for node Current DC
>> becoming
>> > None suddenly ? Is corosync.conf okay ? Are default cluster properties
>> fine
>> > ? Help will be appreciated.
>>
>> I'd recommend seeing how the problem behaves with stonith enabled, but
>> in any case you'll need to dive into the logs to figure what starts the
>> chai

Re: [ClusterLabs] Current DC becomes None suddenly

2015-10-08 Thread Ken Gaillot

On 10/08/2015 09:55 AM, Pritam Kharat wrote:
> Hi Ken,
> 
> Thanks for reply.
> 
> On Thu, Oct 8, 2015 at 8:13 PM, Ken Gaillot  wrote:
> 
>> On 10/02/2015 01:47 PM, Pritam Kharat wrote:
>>> Hi,
>>>
>>> I have set up a ACTIVE/PASSIVE HA
>>>
>>> *Issue 1) *
>>>
>>> *corosync.conf*  file is
>>>
>>> # Please read the openais.conf.5 manual page
>>>
>>> totem {
>>>
>>> version: 2
>>>
>>> # How long before declaring a token lost (ms)
>>> token: 1
>>>
>>> # How many token retransmits before forming a new configuration
>>> token_retransmits_before_loss_const: 20
>>>
>>> # How long to wait for join messages in the membership protocol
>> (ms)
>>> join: 1
>>>
>>> # How long to wait for consensus to be achieved before starting a
>>> new round of membership configuration (ms)
>>> consensus: 12000
>>>
>>> # Turn off the virtual synchrony filter
>>> vsftype: none
>>>
>>> # Number of messages that may be sent by one processor on receipt
>>> of the token
>>> max_messages: 20
>>>
>>> # Limit generated nodeids to 31-bits (positive signed integers)
>>> clear_node_high_bit: yes
>>>
>>> # Disable encryption
>>> secauth: off
>>>
>>> # How many threads to use for encryption/decryption
>>> threads: 0
>>>
>>> # Optionally assign a fixed node id (integer)
>>> # nodeid: 1234
>>>
>>> # This specifies the mode of redundant ring, which may be none,
>>> active, or passive.
>>> rrp_mode: none
>>> interface {
>>> # The following values need to be set based on your
>>> environment
>>> ringnumber: 0
>>> bindnetaddr: 192.168.101.0
>>> mcastport: 5405
>>> }
>>>
>>> transport: udpu
>>> }
>>>
>>> amf {
>>> mode: disabled
>>> }
>>>
>>> quorum {
>>> # Quorum for the Pacemaker Cluster Resource Manager
>>> provider: corosync_votequorum
>>> expected_votes: 1
>>
>> If you're using a recent version of corosync, use "two_node: 1" instead
>> of "expected_votes: 1", and get rid of "no-quorum-policy: ignore" in the
>> pacemaker cluster options.
>>
>>-> We are using corosync version 2.3.3. Do we above mentioned change
> for this version ?

Yes, you can use two_node.

FYI, two_node automatically enables wait_for_all, which means that when
a node first starts up, it waits until it can see the other node before
forming the cluster. So once the cluster is running, it can handle the
failure of one node, and the other will continue. But to start, both
nodes needs to be present.

>>> }
>>>
>>>
>>> nodelist {
>>>
>>> node {
>>> ring0_addr: 192.168.101.73
>>> }
>>>
>>> node {
>>> ring0_addr: 192.168.101.74
>>> }
>>> }
>>>
>>> aisexec {
>>> user:   root
>>> group:  root
>>> }
>>>
>>>
>>> logging {
>>> fileline: off
>>> to_stderr: yes
>>> to_logfile: yes
>>> to_syslog: yes
>>> syslog_facility: daemon
>>> logfile: /var/log/corosync/corosync.log
>>> debug: off
>>> timestamp: on
>>> logger_subsys {
>>> subsys: AMF
>>> debug: off
>>> tags: enter|leave|trace1|trace2|trace3|trace4|trace6
>>> }
>>> }
>>>
>>> And I have added 5 resources - 1 is VIP and 4 are upstart jobs
>>> Node names are configured as -> sc-node-1(ACTIVE) and sc-node-2(PASSIVE)
>>> Resources are running on ACTIVE node
>>>
>>> Default cluster properties -
>>>
>>>   
>>> >> value="1.1.10-42f2063"/>
>>> >> name="cluster-infrastructure" value="corosync"/>
>>> >> id="cib-bootstrap-options-no-quorum-policy"/>
>>> >> id="cib-bootstrap-options-stonith-enabled"/>
>>> >> id="cib-bootstrap-options-cluster-recheck-interval"/>
>>> >> id="cib-bootstrap-options-default-action-timeout"/>
>>>   
>>>
>>>
>>> But sometimes after 2-3 migrations from ACTIVE to STANDBY and then from
>>> STANDBY to ACTIVE,
>>> both nodes become OFFLINE and Current DC becomes None, I have disabled
>> the
>>> stonith property and even quorum is ignored
>>
>> Disabling stonith isn't helping you. The cluster needs stonith to
>> recover from difficult situations, so it's easier to get into weird
>> states like this without it.
>>
>>> root@sc-node-2:/usr/lib/python2.7/dist-packages/sc# crm status
>>> Last updated: Sat Oct  3 00:01:40 2015
>>> Last change: Fri Oct  2 23:38:28 2015 via crm_resource on sc-node-1
>>> Stack: corosync
>>> Current DC: NONE
>>> 2 Nodes configured
>>> 5 Resources configured
>>>
>>> OFFLINE: [ sc-node-1 sc-node-2 ]
>>>
>>> What is going wrong here ? What is the reason for node Current DC
>> becoming
>>> None suddenly ? Is corosync.conf okay ? Are default cluster properties
>> fine
>>> ? Help will be appreciated.
>>
>> I'd recommend seeing how the problem behaves with stonith e

Re: [ClusterLabs] Current DC becomes None suddenly

Re: [ClusterLabs] Current DC becomes None suddenly

Re: [ClusterLabs] Current DC becomes None suddenly

Re: [ClusterLabs] Current DC becomes None suddenly

Re: [ClusterLabs] Current DC becomes None suddenly

5 matches

Site Navigation

Mail list logo

Footer information