Re: [Ocfs2-users] Pb with ocfs2 & dlm on Fedora 13

2010-11-09 Thread Alain.Moulle
Hi Tao

yep , that's it ! iptables was running on node3 !

Many thanks !

Have a good day.
Regards
Alain

___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users


Re: [Ocfs2-users] Pb with ocfs2 & dlm on Fedora 13

2010-11-09 Thread Sunil Mushran

Is iptables running on node3? If so, stop it.

On 11/9/2010 3:42 PM, Alain.Moulle wrote:

Hi Tao,

yes , on the three nodes theMax Node Slots is 8
echo 'stats'|debugfs.ocfs2 /dev/sdc1|grep Slots
debugfs.ocfs2 1.4.3
Max Node Slots: 8
Regards,
Alain

Tao Ma a écrit :

Hi Alain,

On 11/09/2010 04:49 PM, Alain.Moulle wrote:
   

   Hi,
The three cluster.conf are exactly the same on the 3 nodes.
The errors messages are :

-nodes1:
o2net: accepted connection from node selfxl-5 (num 1) at
10.197.189.218:
o2net: no longer connected to node selfxl-5 (num 1) at
10.197.189.218:

-nodes2:
(1457,1):o2net_connect_expired:1656 ERROR: no connection established
with node 1 after 30.0 seconds, giving up and returning errors.

Note that once a mount is refused for example on node3, if
I umount the FS on node1 for example, then I can mount it
on node3.
 

Oh, so do you have enough slots for all these 3 nodes to mount?

What's the output for the below command?
echo 'stats'|debugfs.ocfs2 /dev/sdx|grep Slots

Regards,
Tao
   

Note also that when the mound is refused for example on node3,
I've check that this node3"pings"  successfully both other
nodes on IP addr given in cluster.conf.

Alain




Tao Ma a écrit :
 

Hi Alain,

On 11/08/2010 11:08 PM, Alain.Moulle wrote:

   

Hi,

I have a problem on Fedora13 with releases :
ocfs2  1.4.3-5.fc13.x86_64
dlm_tool 3.0.17

With a 3 nodes ocfs2 cluster, I can't mount FS on the three nodes at the same 
time
but only on two nodes   among the 3 nodes  , whatever the two nodes are among 
the 3 nodes.

The errors are :
"(1475,0):o2net_connect_expired:1656 ERROR: no connection established
with node 2 after 30.0 seconds, giving up and returning errors.
(2175,0):dlm_request_join:1035 ERROR: status = -107
(2175,0):dlm_try_to_join_domain:1209 ERROR: status = -107
(2175,0):dlm_join_domain:1487 ERROR: status = -107
(2175,0):dlm_register_domain:1753 ERROR: status = -107
(2175,0):o2cb_cluster_connect:313 ERROR: status = -107
(2175,0):ocfs2_dlm_init:2995 ERROR: status = -107
(2175,0):ocfs2_mount_volume:1789 ERROR: status = -107
ocfs2: Unmounting device (8,16) on (node 0)
o2net: no longer connected to node selfxl-4 (num 0) at
10.197.189.204:
o2net: connected to node selfxl-4 (num 0) at 10.197.189.204:

It seems to be a lock management problem
Is it an already known issue ?
Is there an available patch ?

 

It doesn't look like a dlm problem, but a network problem. ;)
So your first error is o2net_connect_expired.
So it seems that the 3rd node can't connect with node 2.
Could you please check the error message in node 2?

btw, I would deem that the cluster.conf is the same among the 3 nodes,
and you you can connect to (which is used by ocfs2) of node 2 from
node 3.

Regards,
Tao

___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com   
http://oss.oracle.com/mailman/listinfo/ocfs2-users



   


___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users


   



___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users


___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users

Re: [Ocfs2-users] Pb with ocfs2 & dlm on Fedora 13

2010-11-09 Thread Alain.Moulle

Hi Tao,

yes , on the three nodes theMax Node Slots is 8

echo 'stats'|debugfs.ocfs2 /dev/sdc1|grep Slots
debugfs.ocfs2 1.4.3
Max Node Slots: 8

Regards,
Alain

Tao Ma a écrit :

Hi Alain,

On 11/09/2010 04:49 PM, Alain.Moulle wrote:
  

  Hi,
The three cluster.conf are exactly the same on the 3 nodes.
The errors messages are :

-nodes1:
o2net: accepted connection from node selfxl-5 (num 1) at
10.197.189.218:
o2net: no longer connected to node selfxl-5 (num 1) at
10.197.189.218:

-nodes2:
(1457,1):o2net_connect_expired:1656 ERROR: no connection established
with node 1 after 30.0 seconds, giving up and returning errors.

Note that once a mount is refused for example on node3, if
I umount the FS on node1 for example, then I can mount it
on node3.


Oh, so do you have enough slots for all these 3 nodes to mount?

What's the output for the below command?
echo 'stats'|debugfs.ocfs2 /dev/sdx|grep Slots

Regards,
Tao
  

Note also that when the mound is refused for example on node3,
I've check that this node3"pings"  successfully both other
nodes on IP addr given in cluster.conf.

Alain




Tao Ma a écrit :


Hi Alain,

On 11/08/2010 11:08 PM, Alain.Moulle wrote:

  

   Hi,

I have a problem on Fedora13 with releases :
ocfs2  1.4.3-5.fc13.x86_64
dlm_tool 3.0.17

With a 3 nodes ocfs2 cluster, I can't mount FS on the three nodes at the same 
time
but only on two nodes   among the 3 nodes  , whatever the two nodes are among 
the 3 nodes.

The errors are :
"(1475,0):o2net_connect_expired:1656 ERROR: no connection established
with node 2 after 30.0 seconds, giving up and returning errors.
(2175,0):dlm_request_join:1035 ERROR: status = -107
(2175,0):dlm_try_to_join_domain:1209 ERROR: status = -107
(2175,0):dlm_join_domain:1487 ERROR: status = -107
(2175,0):dlm_register_domain:1753 ERROR: status = -107
(2175,0):o2cb_cluster_connect:313 ERROR: status = -107
(2175,0):ocfs2_dlm_init:2995 ERROR: status = -107
(2175,0):ocfs2_mount_volume:1789 ERROR: status = -107
ocfs2: Unmounting device (8,16) on (node 0)
o2net: no longer connected to node selfxl-4 (num 0) at
10.197.189.204:
o2net: connected to node selfxl-4 (num 0) at 10.197.189.204:

It seems to be a lock management problem
Is it an already known issue ?
Is there an available patch ?



It doesn't look like a dlm problem, but a network problem. ;)
So your first error is o2net_connect_expired.
So it seems that the 3rd node can't connect with node 2.
Could you please check the error message in node 2?

btw, I would deem that the cluster.conf is the same among the 3 nodes,
and you you can connect to (which is used by ocfs2) of node 2 from
node 3.

Regards,
Tao

___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com  
http://oss.oracle.com/mailman/listinfo/ocfs2-users



  


___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users


  


___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users

Re: [Ocfs2-users] Pb with ocfs2 & dlm on Fedora 13

2010-11-09 Thread Tao Ma
Hi Alain,

On 11/09/2010 04:49 PM, Alain.Moulle wrote:
>   Hi,
> The three cluster.conf are exactly the same on the 3 nodes.
> The errors messages are :
>
> -nodes1:
>   o2net: accepted connection from node selfxl-5 (num 1) at
> 10.197.189.218:
> o2net: no longer connected to node selfxl-5 (num 1) at
> 10.197.189.218:
>
> -nodes2:
>   (1457,1):o2net_connect_expired:1656 ERROR: no connection established
> with node 1 after 30.0 seconds, giving up and returning errors.
>
> Note that once a mount is refused for example on node3, if
> I umount the FS on node1 for example, then I can mount it
> on node3.
Oh, so do you have enough slots for all these 3 nodes to mount?

What's the output for the below command?
echo 'stats'|debugfs.ocfs2 /dev/sdx|grep Slots

Regards,
Tao
> Note also that when the mound is refused for example on node3,
> I've check that this node3"pings"  successfully both other
> nodes on IP addr given in cluster.conf.
>
> Alain
>
>
>
>
> Tao Ma a écrit :
>> Hi Alain,
>>
>> On 11/08/2010 11:08 PM, Alain.Moulle wrote:
>>
>>>Hi,
>>>
>>> I have a problem on Fedora13 with releases :
>>> ocfs2  1.4.3-5.fc13.x86_64
>>> dlm_tool 3.0.17
>>>
>>> With a 3 nodes ocfs2 cluster, I can't mount FS on the three nodes at the 
>>> same time
>>> but only on two nodes   among the 3 nodes  , whatever the two nodes are 
>>> among the 3 nodes.
>>>
>>> The errors are :
>>> "(1475,0):o2net_connect_expired:1656 ERROR: no connection established
>>> with node 2 after 30.0 seconds, giving up and returning errors.
>>> (2175,0):dlm_request_join:1035 ERROR: status = -107
>>> (2175,0):dlm_try_to_join_domain:1209 ERROR: status = -107
>>> (2175,0):dlm_join_domain:1487 ERROR: status = -107
>>> (2175,0):dlm_register_domain:1753 ERROR: status = -107
>>> (2175,0):o2cb_cluster_connect:313 ERROR: status = -107
>>> (2175,0):ocfs2_dlm_init:2995 ERROR: status = -107
>>> (2175,0):ocfs2_mount_volume:1789 ERROR: status = -107
>>> ocfs2: Unmounting device (8,16) on (node 0)
>>> o2net: no longer connected to node selfxl-4 (num 0) at
>>> 10.197.189.204:
>>> o2net: connected to node selfxl-4 (num 0) at 10.197.189.204:
>>>
>>> It seems to be a lock management problem
>>> Is it an already known issue ?
>>> Is there an available patch ?
>>>
>> It doesn't look like a dlm problem, but a network problem. ;)
>> So your first error is o2net_connect_expired.
>> So it seems that the 3rd node can't connect with node 2.
>> Could you please check the error message in node 2?
>>
>> btw, I would deem that the cluster.conf is the same among the 3 nodes,
>> and you you can connect to (which is used by ocfs2) of node 2 from
>> node 3.
>>
>> Regards,
>> Tao
>>
>> ___
>> Ocfs2-users mailing list
>> Ocfs2-users@oss.oracle.com  
>> http://oss.oracle.com/mailman/listinfo/ocfs2-users
>>
>>
>>
>

___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users


Re: [Ocfs2-users] Pb with ocfs2 & dlm on Fedora 13

2010-11-09 Thread Alain.Moulle

Hi,
The three cluster.conf are exactly the same on the 3 nodes.
The errors messages are :

-nodes1:
o2net: accepted connection from node selfxl-5 (num 1) at
10.197.189.218:
o2net: no longer connected to node selfxl-5 (num 1) at
10.197.189.218:

-nodes2:
(1457,1):o2net_connect_expired:1656 ERROR: no connection established
with node 1 after 30.0 seconds, giving up and returning errors.

Note that once a mount is refused for example on node3, if
I umount the FS on node1 for example, then I can mount it
on node3. 
Note also that when the mound is refused for example on node3, 
I've check that this node3 "pings" successfully both other

nodes on IP addr given in cluster.conf.

Alain



Tao Ma a écrit :

Hi Alain,

On 11/08/2010 11:08 PM, Alain.Moulle wrote:
  

  Hi,

I have a problem on Fedora13 with releases :
ocfs2  1.4.3-5.fc13.x86_64
dlm_tool 3.0.17

With a 3 nodes ocfs2 cluster, I can't mount FS on the three nodes at the same 
time
but only on two nodes   among the 3 nodes  , whatever the two nodes are among 
the 3 nodes.

The errors are :
"(1475,0):o2net_connect_expired:1656 ERROR: no connection established
with node 2 after 30.0 seconds, giving up and returning errors.
(2175,0):dlm_request_join:1035 ERROR: status = -107
(2175,0):dlm_try_to_join_domain:1209 ERROR: status = -107
(2175,0):dlm_join_domain:1487 ERROR: status = -107
(2175,0):dlm_register_domain:1753 ERROR: status = -107
(2175,0):o2cb_cluster_connect:313 ERROR: status = -107
(2175,0):ocfs2_dlm_init:2995 ERROR: status = -107
(2175,0):ocfs2_mount_volume:1789 ERROR: status = -107
ocfs2: Unmounting device (8,16) on (node 0)
o2net: no longer connected to node selfxl-4 (num 0) at
10.197.189.204:
o2net: connected to node selfxl-4 (num 0) at 10.197.189.204:

It seems to be a lock management problem
Is it an already known issue ?
Is there an available patch ?


It doesn't look like a dlm problem, but a network problem. ;)
So your first error is o2net_connect_expired.
So it seems that the 3rd node can't connect with node 2.
Could you please check the error message in node 2?

btw, I would deem that the cluster.conf is the same among the 3 nodes, 
and you you can connect to (which is used by ocfs2) of node 2 from 
node 3.


Regards,
Tao

___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users


  


___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users

Re: [Ocfs2-users] Pb with ocfs2 & dlm on Fedora 13

2010-11-08 Thread Tao Ma
Hi Alain,

On 11/08/2010 11:08 PM, Alain.Moulle wrote:
>   Hi,
>
> I have a problem on Fedora13 with releases :
> ocfs2  1.4.3-5.fc13.x86_64
> dlm_tool 3.0.17
>
> With a 3 nodes ocfs2 cluster, I can't mount FS on the three nodes at the same 
> time
> but only on two nodes   among the 3 nodes  , whatever the two nodes are among 
> the 3 nodes.
>
> The errors are :
> "(1475,0):o2net_connect_expired:1656 ERROR: no connection established
> with node 2 after 30.0 seconds, giving up and returning errors.
> (2175,0):dlm_request_join:1035 ERROR: status = -107
> (2175,0):dlm_try_to_join_domain:1209 ERROR: status = -107
> (2175,0):dlm_join_domain:1487 ERROR: status = -107
> (2175,0):dlm_register_domain:1753 ERROR: status = -107
> (2175,0):o2cb_cluster_connect:313 ERROR: status = -107
> (2175,0):ocfs2_dlm_init:2995 ERROR: status = -107
> (2175,0):ocfs2_mount_volume:1789 ERROR: status = -107
> ocfs2: Unmounting device (8,16) on (node 0)
> o2net: no longer connected to node selfxl-4 (num 0) at
> 10.197.189.204:
> o2net: connected to node selfxl-4 (num 0) at 10.197.189.204:
>
> It seems to be a lock management problem
> Is it an already known issue ?
> Is there an available patch ?
It doesn't look like a dlm problem, but a network problem. ;)
So your first error is o2net_connect_expired.
So it seems that the 3rd node can't connect with node 2.
Could you please check the error message in node 2?

btw, I would deem that the cluster.conf is the same among the 3 nodes, 
and you you can connect to (which is used by ocfs2) of node 2 from 
node 3.

Regards,
Tao

___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users