Re: [Ocfs2-users] New node..new problems

2008-10-09 Thread Tao Ma

Hi,
Dante Garro wrote:
> Sunil, now I fall in count of messages are related to node 0, but the new is
> node 1 and does not care about the value I've setup allways says 14000 ms.
> Do this change your diagnostic?
Node1 start connection with node0, so you see the messages related to 
node0 on node1. It looks like your configuration in node1 is wrong.
Please make sure that value of O2CB_HEARTBEAT_THRESHOLD in 
/etc/sysconfig/o2cb of node1 is the same as that in node0.

Regards,
Tao

> 
> 
> -Mensaje original-
> De: Sunil Mushran [mailto:[EMAIL PROTECTED] 
> Enviado el: Jueves, 09 de Octubre de 2008 06:02 p.m.
> Para: Dante Garro
> CC: 'ocfs2-users@oss.oracle.com'
> Asunto: Re: [Ocfs2-users] New node..new problems
> 
> Yeah the cluster timeouts are not consistent. Update and restart the cluster
> on the new node (or all nodes as the case might be).
> 
> Hint: cat /sys/kernel/config/cluster//idle_timeout_ms
> to see the active heartbeat threshold.
> 
> Dante Garro wrote:
>> Hi all, because problems with ocfs2 release of Debian distribution 
>> decided to remake my cluster replacing it by CentOS based installation.
>> Started replacing one of the nodes keeping the other working.
>>
>> On this recently created node the following errors appears:
>>
>> drbd0: Writing meta data super block now.
>> (2558,1):o2hb_check_slot:881 ERROR: Node 0 on device drbd0 has a dead 
>> count of 14000 ms, but our count is 13000 ms.
>> Please double check your configuration values for
> 'O2CB_HEARTBEAT_THRESHOLD'
>> OCFS2 1.2.9 Wed Sep 24 19:26:41 PDT 2008 (build
>> a693806cb619dd7f225004092b675ede)
>> (2520,1):o2net_connect_expired:1585 ERROR: no connection established 
>> with node 0 after 30.0 seconds, giving up and returning errors.
>> (2556,1):dlm_request_join:901 ERROR: status = -107
>> (2556,1):dlm_try_to_join_domain:1049 ERROR: status = -107
>> (2556,1):dlm_join_domain:1321 ERROR: status = -107
>> (2556,1):dlm_register_domain:1514 ERROR: status = -107
>> (2556,1):ocfs2_dlm_init:2024 ERROR: status = -107
>> (2556,1):ocfs2_mount_volume:1133 ERROR: status = -107
>> ocfs2: Unmounting device (147,0) on (node 1)
>> (2591,1):o2hb_check_slot:881 ERROR: Node 0 on device drbd0 has a dead 
>> count of 14000 ms, but our count is 13000 ms.
>> Please double check your configuration values for
> 'O2CB_HEARTBEAT_THRESHOLD'
>> (2520,1):o2net_connect_expired:1585 ERROR: no connection established 
>> with node 0 after 30.0 seconds, giving up and returning errors.
>> (2589,1):dlm_request_join:901 ERROR: status = -107
>> (2589,1):dlm_try_to_join_domain:1049 ERROR: status = -107
>> (2589,1):dlm_join_domain:1321 ERROR: status = -107
>> (2589,1):dlm_register_domain:1514 ERROR: status = -107
>> (2589,1):ocfs2_dlm_init:2024 ERROR: status = -107
>> (2589,1):ocfs2_mount_volume:1133 ERROR: status = -107
>> ocfs2: Unmounting device (147,0) on (node 1)
>>
>> I've changed the parameter O2CB_HEARTBEAT_THRESHOLD according O2CB 
>> adviced me, but It don't resolve the issue.
>>
>> I hope someone could give me a clue.
>>
>> Thanks in advance.
>>
>> Dante
>>
>>
>> ___
>> Ocfs2-users mailing list
>> Ocfs2-users@oss.oracle.com
>> http://oss.oracle.com/mailman/listinfo/ocfs2-users
>>   
> 
> ___
> Ocfs2-users mailing list
> Ocfs2-users@oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-users

___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users


Re: [Ocfs2-users] New node..new problems

2008-10-09 Thread Dante Garro
Sunil, now I fall in count of messages are related to node 0, but the new is
node 1 and does not care about the value I've setup allways says 14000 ms.
Do this change your diagnostic?


-Mensaje original-
De: Sunil Mushran [mailto:[EMAIL PROTECTED] 
Enviado el: Jueves, 09 de Octubre de 2008 06:02 p.m.
Para: Dante Garro
CC: 'ocfs2-users@oss.oracle.com'
Asunto: Re: [Ocfs2-users] New node..new problems

Yeah the cluster timeouts are not consistent. Update and restart the cluster
on the new node (or all nodes as the case might be).

Hint: cat /sys/kernel/config/cluster//idle_timeout_ms
to see the active heartbeat threshold.

Dante Garro wrote:
> Hi all, because problems with ocfs2 release of Debian distribution 
> decided to remake my cluster replacing it by CentOS based installation.
> Started replacing one of the nodes keeping the other working.
>
> On this recently created node the following errors appears:
>
> drbd0: Writing meta data super block now.
> (2558,1):o2hb_check_slot:881 ERROR: Node 0 on device drbd0 has a dead 
> count of 14000 ms, but our count is 13000 ms.
> Please double check your configuration values for
'O2CB_HEARTBEAT_THRESHOLD'
> OCFS2 1.2.9 Wed Sep 24 19:26:41 PDT 2008 (build
> a693806cb619dd7f225004092b675ede)
> (2520,1):o2net_connect_expired:1585 ERROR: no connection established 
> with node 0 after 30.0 seconds, giving up and returning errors.
> (2556,1):dlm_request_join:901 ERROR: status = -107
> (2556,1):dlm_try_to_join_domain:1049 ERROR: status = -107
> (2556,1):dlm_join_domain:1321 ERROR: status = -107
> (2556,1):dlm_register_domain:1514 ERROR: status = -107
> (2556,1):ocfs2_dlm_init:2024 ERROR: status = -107
> (2556,1):ocfs2_mount_volume:1133 ERROR: status = -107
> ocfs2: Unmounting device (147,0) on (node 1)
> (2591,1):o2hb_check_slot:881 ERROR: Node 0 on device drbd0 has a dead 
> count of 14000 ms, but our count is 13000 ms.
> Please double check your configuration values for
'O2CB_HEARTBEAT_THRESHOLD'
> (2520,1):o2net_connect_expired:1585 ERROR: no connection established 
> with node 0 after 30.0 seconds, giving up and returning errors.
> (2589,1):dlm_request_join:901 ERROR: status = -107
> (2589,1):dlm_try_to_join_domain:1049 ERROR: status = -107
> (2589,1):dlm_join_domain:1321 ERROR: status = -107
> (2589,1):dlm_register_domain:1514 ERROR: status = -107
> (2589,1):ocfs2_dlm_init:2024 ERROR: status = -107
> (2589,1):ocfs2_mount_volume:1133 ERROR: status = -107
> ocfs2: Unmounting device (147,0) on (node 1)
>
> I've changed the parameter O2CB_HEARTBEAT_THRESHOLD according O2CB 
> adviced me, but It don't resolve the issue.
>
> I hope someone could give me a clue.
>
> Thanks in advance.
>
> Dante
>
>
> ___
> Ocfs2-users mailing list
> Ocfs2-users@oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-users
>   

___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users


Re: [Ocfs2-users] New node..new problems

2008-10-09 Thread Sunil Mushran
Yeah the cluster timeouts are not consistent. Update and restart
the cluster on the new node (or all nodes as the case might be).

Hint: cat /sys/kernel/config/cluster//idle_timeout_ms
to see the active heartbeat threshold.

Dante Garro wrote:
> Hi all, because problems with ocfs2 release of Debian distribution decided
> to remake my cluster replacing it by CentOS based installation. 
> Started replacing one of the nodes keeping the other working.
>
> On this recently created node the following errors appears:
>
> drbd0: Writing meta data super block now.
> (2558,1):o2hb_check_slot:881 ERROR: Node 0 on device drbd0 has a dead count
> of 14000 ms, but our count is 13000 ms.
> Please double check your configuration values for 'O2CB_HEARTBEAT_THRESHOLD'
> OCFS2 1.2.9 Wed Sep 24 19:26:41 PDT 2008 (build
> a693806cb619dd7f225004092b675ede)
> (2520,1):o2net_connect_expired:1585 ERROR: no connection established with
> node 0 after 30.0 seconds, giving up and returning errors.
> (2556,1):dlm_request_join:901 ERROR: status = -107
> (2556,1):dlm_try_to_join_domain:1049 ERROR: status = -107
> (2556,1):dlm_join_domain:1321 ERROR: status = -107
> (2556,1):dlm_register_domain:1514 ERROR: status = -107
> (2556,1):ocfs2_dlm_init:2024 ERROR: status = -107
> (2556,1):ocfs2_mount_volume:1133 ERROR: status = -107
> ocfs2: Unmounting device (147,0) on (node 1)
> (2591,1):o2hb_check_slot:881 ERROR: Node 0 on device drbd0 has a dead count
> of 14000 ms, but our count is 13000 ms.
> Please double check your configuration values for 'O2CB_HEARTBEAT_THRESHOLD'
> (2520,1):o2net_connect_expired:1585 ERROR: no connection established with
> node 0 after 30.0 seconds, giving up and returning errors.
> (2589,1):dlm_request_join:901 ERROR: status = -107
> (2589,1):dlm_try_to_join_domain:1049 ERROR: status = -107
> (2589,1):dlm_join_domain:1321 ERROR: status = -107
> (2589,1):dlm_register_domain:1514 ERROR: status = -107
> (2589,1):ocfs2_dlm_init:2024 ERROR: status = -107
> (2589,1):ocfs2_mount_volume:1133 ERROR: status = -107
> ocfs2: Unmounting device (147,0) on (node 1)
>
> I've changed the parameter O2CB_HEARTBEAT_THRESHOLD according O2CB adviced
> me, but It don't resolve the issue.
>
> I hope someone could give me a clue.
>
> Thanks in advance.
>
> Dante
>
>
> ___
> Ocfs2-users mailing list
> Ocfs2-users@oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-users
>   


___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users


[Ocfs2-users] New node..new problems

2008-10-09 Thread Dante Garro
Hi all, because problems with ocfs2 release of Debian distribution decided
to remake my cluster replacing it by CentOS based installation. 
Started replacing one of the nodes keeping the other working.

On this recently created node the following errors appears:

drbd0: Writing meta data super block now.
(2558,1):o2hb_check_slot:881 ERROR: Node 0 on device drbd0 has a dead count
of 14000 ms, but our count is 13000 ms.
Please double check your configuration values for 'O2CB_HEARTBEAT_THRESHOLD'
OCFS2 1.2.9 Wed Sep 24 19:26:41 PDT 2008 (build
a693806cb619dd7f225004092b675ede)
(2520,1):o2net_connect_expired:1585 ERROR: no connection established with
node 0 after 30.0 seconds, giving up and returning errors.
(2556,1):dlm_request_join:901 ERROR: status = -107
(2556,1):dlm_try_to_join_domain:1049 ERROR: status = -107
(2556,1):dlm_join_domain:1321 ERROR: status = -107
(2556,1):dlm_register_domain:1514 ERROR: status = -107
(2556,1):ocfs2_dlm_init:2024 ERROR: status = -107
(2556,1):ocfs2_mount_volume:1133 ERROR: status = -107
ocfs2: Unmounting device (147,0) on (node 1)
(2591,1):o2hb_check_slot:881 ERROR: Node 0 on device drbd0 has a dead count
of 14000 ms, but our count is 13000 ms.
Please double check your configuration values for 'O2CB_HEARTBEAT_THRESHOLD'
(2520,1):o2net_connect_expired:1585 ERROR: no connection established with
node 0 after 30.0 seconds, giving up and returning errors.
(2589,1):dlm_request_join:901 ERROR: status = -107
(2589,1):dlm_try_to_join_domain:1049 ERROR: status = -107
(2589,1):dlm_join_domain:1321 ERROR: status = -107
(2589,1):dlm_register_domain:1514 ERROR: status = -107
(2589,1):ocfs2_dlm_init:2024 ERROR: status = -107
(2589,1):ocfs2_mount_volume:1133 ERROR: status = -107
ocfs2: Unmounting device (147,0) on (node 1)

I've changed the parameter O2CB_HEARTBEAT_THRESHOLD according O2CB adviced
me, but It don't resolve the issue.

I hope someone could give me a clue.

Thanks in advance.

Dante


___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users