Re: [Ocfs2-users] ocfs2 - Kernel panic on many write/read from both

2011-12-22 Thread Marek Królikowski
Ok i reconfigure server and do again test hope tommorow die again because i 
see in log he crash after 10 hours work with no problem.
Thanks


-Oryginalna wiadomość- 
From: srinivas eeda
Sent: Thursday, December 22, 2011 9:12 PM
To: Marek Królikowski
Cc: ocfs2-users@oss.oracle.com
Subject: Re: [Ocfs2-users] ocfs2 - Kernel panic on many write/read from both

We need to know what happened to node 2. Was the node rebooted because
of a network timeout or kernel panic? can you please configure
netconsole, serial console and rerun the test?

On 12/22/2011 8:08 AM, Marek Królikowski wrote:
> Hello
> After 24 hours i see TEST-MAIL2 reboot ( possible kernel panic) but 
> TEST-MAIL1 got in dmesg:
> TEST-MAIL1 ~ #dmesg
> [cut]
> o2net: accepted connection from node TEST-MAIL2 (num 1) at 
> 172.17.1.252:
> o2dlm: Node 1 joins domain B24C4493BBC74FEAA3371E2534BB3611
> o2dlm: Nodes in domain B24C4493BBC74FEAA3371E2534BB3611: 0 1
> o2net: connection to node TEST-MAIL2 (num 1) at 172.17.1.252: has been 
> idle for 60.0 seconds, shutting it down.
> (swapper,0,0):o2net_idle_timer:1562 Here are some times that might help 
> debug the situation: (Timer: 33127732045, Now 33187808090, DataReady 
> 33127732039, Advance 33127732051-33127732051, Key 0xebb9cd47, Func 506, 
> FuncTime 33127732045-33127732048)
> o2net: no longer connected to node TEST-MAIL2 (num 1) at 172.17.1.252:
> (du,5099,12):dlm_do_master_request:1324 ERROR: link to 1 went down!
> (du,5099,12):dlm_get_lock_resource:907 ERROR: status = -112
> (dlm_thread,14321,1):dlm_send_proxy_ast_msg:484 ERROR: 
> B24C4493BBC74FEAA3371E2534BB3611: res M0cf023ef70, 
> error -112 send AST to node 1
> (dlm_thread,14321,1):dlm_flush_asts:605 ERROR: status = -112
> (dlm_thread,14321,1):dlm_send_proxy_ast_msg:484 ERROR: 
> B24C4493BBC74FEAA3371E2534BB3611: res P00, 
> error -107 send AST to node 1
> (dlm_thread,14321,1):dlm_flush_asts:605 ERROR: status = -107
> (kworker/u:3,5071,0):o2net_connect_expired:1724 ERROR: no connection 
> established with node 1 after 60.0 seconds, giving up and returning 
> errors.
> (o2hb-B24C4493BB,14310,0):o2dlm_eviction_cb:267 o2dlm has evicted node 1 
> from group B24C4493BBC74FEAA3371E2534BB3611
> (ocfs2rec,5504,6):dlm_get_lock_resource:834 
> B24C4493BBC74FEAA3371E2534BB3611:M15f023ef70: at least 
> one node (1) to recover before lock mastery can begin
> (ocfs2rec,5504,6):dlm_get_lock_resource:888 
> B24C4493BBC74FEAA3371E2534BB3611:M15f023ef70: at least 
> one node (1) to recover before lock mastery can begin
> (du,5099,12):dlm_restart_lock_mastery:1213 ERROR: node down! 1
> (du,5099,12):dlm_wait_for_lock_mastery:1030 ERROR: status = -11
> (du,5099,12):dlm_get_lock_resource:888 
> B24C4493BBC74FEAA3371E2534BB3611:N0020924f: at least one node (1) 
> to recover before lock mastery can begin
> (dlm_reco_thread,14322,0):dlm_get_lock_resource:834 
> B24C4493BBC74FEAA3371E2534BB3611:$RECOVERY: at least one node (1) to 
> recover before lock mastery can begin
> (dlm_reco_thread,14322,0):dlm_get_lock_resource:868 
> B24C4493BBC74FEAA3371E2534BB3611: recovery map is not empty, but must 
> master $RECOVERY lock now
> (dlm_reco_thread,14322,0):dlm_do_recovery:523 (14322) Node 0 is the 
> Recovery Master for the Dead Node 1 for Domain 
> B24C4493BBC74FEAA3371E2534BB3611
> (ocfs2rec,5504,6):ocfs2_replay_journal:1549 Recovering node 1 from slot 1 
> on device (253,0)
> (ocfs2rec,5504,6):ocfs2_begin_quota_recovery:407 Beginning quota recovery 
> in slot 1
> (kworker/u:0,2909,0):ocfs2_finish_quota_recovery:599 Finishing quota 
> recovery in slot 1
>
> And i try give this command:
> debugfs.ocfs2 -l ENTRY EXIT DLM_GLUE QUOTA INODE DISK_ALLOC EXTENT_MAP 
> allow
> debugfs.ocfs2: Unable to write log mask "ENTRY": No such file or directory
> debugfs.ocfs2 -l ENTRY EXIT DLM_GLUE QUOTA INODE DISK_ALLOC EXTENT_MAP off
> debugfs.ocfs2: Unable to write log mask "ENTRY": No such file or directory
>
> But not working
>
>
> -Oryginalna wiadomość- From: Srinivas Eeda
> Sent: Wednesday, December 21, 2011 8:43 PM
> To: Marek Królikowski
> Cc: ocfs2-users@oss.oracle.com
> Subject: Re: [Ocfs2-users] ocfs2 - Kernel panic on many write/read from 
> both
>
> Those numbers look good. Basically with the fixes backed out and another
> fix I gave, you are not seeing that many orphans hanging around and
> hence not seeing the process stuck kernel stacks. You can run the test
> longer or if you are satisfied, please enable quotas and re-run the test
> with the modified kernel. You might see a dead lock which needs to be
> fixed(I was not able to reproduce this yet). If the system hangs, please
> capture the following and provide me the output
>
> 1. echo t > /proc/sysrq-trigger
> 2. debugfs.ocfs2 -l ENTRY EXIT DLM_GLUE QUOTA INODE DISK_ALLOC EXTENT_MAP 
> allow
> 3. wait for 10 minutes
> 4. debugfs.ocfs2 -l ENTRY EXIT DLM_GLUE QUOTA INODE DISK_ALLOC EXTENT_MAP 
> 

Re: [Ocfs2-users] ocfs2 - Kernel panic on many write/read from both

2011-12-22 Thread srinivas eeda
We need to know what happened to node 2. Was the node rebooted because 
of a network timeout or kernel panic? can you please configure 
netconsole, serial console and rerun the test?

On 12/22/2011 8:08 AM, Marek Królikowski wrote:
> Hello
> After 24 hours i see TEST-MAIL2 reboot ( possible kernel panic) but 
> TEST-MAIL1 got in dmesg:
> TEST-MAIL1 ~ #dmesg
> [cut]
> o2net: accepted connection from node TEST-MAIL2 (num 1) at 
> 172.17.1.252:
> o2dlm: Node 1 joins domain B24C4493BBC74FEAA3371E2534BB3611
> o2dlm: Nodes in domain B24C4493BBC74FEAA3371E2534BB3611: 0 1
> o2net: connection to node TEST-MAIL2 (num 1) at 172.17.1.252: has 
> been idle for 60.0 seconds, shutting it down.
> (swapper,0,0):o2net_idle_timer:1562 Here are some times that might 
> help debug the situation: (Timer: 33127732045, Now 33187808090, 
> DataReady 33127732039, Advance 33127732051-33127732051, Key 
> 0xebb9cd47, Func 506, FuncTime 33127732045-33127732048)
> o2net: no longer connected to node TEST-MAIL2 (num 1) at 
> 172.17.1.252:
> (du,5099,12):dlm_do_master_request:1324 ERROR: link to 1 went down!
> (du,5099,12):dlm_get_lock_resource:907 ERROR: status = -112
> (dlm_thread,14321,1):dlm_send_proxy_ast_msg:484 ERROR: 
> B24C4493BBC74FEAA3371E2534BB3611: res M0cf023ef70, 
> error -112 send AST to node 1
> (dlm_thread,14321,1):dlm_flush_asts:605 ERROR: status = -112
> (dlm_thread,14321,1):dlm_send_proxy_ast_msg:484 ERROR: 
> B24C4493BBC74FEAA3371E2534BB3611: res P00, 
> error -107 send AST to node 1
> (dlm_thread,14321,1):dlm_flush_asts:605 ERROR: status = -107
> (kworker/u:3,5071,0):o2net_connect_expired:1724 ERROR: no connection 
> established with node 1 after 60.0 seconds, giving up and returning 
> errors.
> (o2hb-B24C4493BB,14310,0):o2dlm_eviction_cb:267 o2dlm has evicted node 
> 1 from group B24C4493BBC74FEAA3371E2534BB3611
> (ocfs2rec,5504,6):dlm_get_lock_resource:834 
> B24C4493BBC74FEAA3371E2534BB3611:M15f023ef70: at 
> least one node (1) to recover before lock mastery can begin
> (ocfs2rec,5504,6):dlm_get_lock_resource:888 
> B24C4493BBC74FEAA3371E2534BB3611:M15f023ef70: at 
> least one node (1) to recover before lock mastery can begin
> (du,5099,12):dlm_restart_lock_mastery:1213 ERROR: node down! 1
> (du,5099,12):dlm_wait_for_lock_mastery:1030 ERROR: status = -11
> (du,5099,12):dlm_get_lock_resource:888 
> B24C4493BBC74FEAA3371E2534BB3611:N0020924f: at least one node 
> (1) to recover before lock mastery can begin
> (dlm_reco_thread,14322,0):dlm_get_lock_resource:834 
> B24C4493BBC74FEAA3371E2534BB3611:$RECOVERY: at least one node (1) to 
> recover before lock mastery can begin
> (dlm_reco_thread,14322,0):dlm_get_lock_resource:868 
> B24C4493BBC74FEAA3371E2534BB3611: recovery map is not empty, but must 
> master $RECOVERY lock now
> (dlm_reco_thread,14322,0):dlm_do_recovery:523 (14322) Node 0 is the 
> Recovery Master for the Dead Node 1 for Domain 
> B24C4493BBC74FEAA3371E2534BB3611
> (ocfs2rec,5504,6):ocfs2_replay_journal:1549 Recovering node 1 from 
> slot 1 on device (253,0)
> (ocfs2rec,5504,6):ocfs2_begin_quota_recovery:407 Beginning quota 
> recovery in slot 1
> (kworker/u:0,2909,0):ocfs2_finish_quota_recovery:599 Finishing quota 
> recovery in slot 1
>
> And i try give this command:
> debugfs.ocfs2 -l ENTRY EXIT DLM_GLUE QUOTA INODE DISK_ALLOC EXTENT_MAP 
> allow
> debugfs.ocfs2: Unable to write log mask "ENTRY": No such file or 
> directory
> debugfs.ocfs2 -l ENTRY EXIT DLM_GLUE QUOTA INODE DISK_ALLOC EXTENT_MAP 
> off
> debugfs.ocfs2: Unable to write log mask "ENTRY": No such file or 
> directory
>
> But not working
>
>
> -Oryginalna wiadomość- From: Srinivas Eeda
> Sent: Wednesday, December 21, 2011 8:43 PM
> To: Marek Królikowski
> Cc: ocfs2-users@oss.oracle.com
> Subject: Re: [Ocfs2-users] ocfs2 - Kernel panic on many write/read 
> from both
>
> Those numbers look good. Basically with the fixes backed out and another
> fix I gave, you are not seeing that many orphans hanging around and
> hence not seeing the process stuck kernel stacks. You can run the test
> longer or if you are satisfied, please enable quotas and re-run the test
> with the modified kernel. You might see a dead lock which needs to be
> fixed(I was not able to reproduce this yet). If the system hangs, please
> capture the following and provide me the output
>
> 1. echo t > /proc/sysrq-trigger
> 2. debugfs.ocfs2 -l ENTRY EXIT DLM_GLUE QUOTA INODE DISK_ALLOC 
> EXTENT_MAP allow
> 3. wait for 10 minutes
> 4. debugfs.ocfs2 -l ENTRY EXIT DLM_GLUE QUOTA INODE DISK_ALLOC 
> EXTENT_MAP off
> 5. echo t > /proc/sysrq-trigger
>

___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users


Re: [Ocfs2-users] One node, two clusters?

2011-12-22 Thread Sunil Mushran
On 12/22/2011 10:39 AM, Kushnir, Michael (NIH/NLM/LHC) [C] wrote:
> Is there a separate DLM instance for each ocfs2 volume?
>
> I have two "sub-clusters" in the same cluster... A 10 node Hadoop cluster 
> sharing a SATA RAID10 and a Two node web server cluster sharing a SSD RAID0. 
> One server mounts both volumes to move data between as necessary.
>
> This morning I got the following error (see end of message), and all nodes 
> lost access to all storage. I'm trying to mitigate risk of this happening 
> again.
>
> My hadoop nodes are used to generate search engine indexes, so they can go 
> down. But my web servers provide the search engine service so I need them to 
> not be tied to my hadoop nodes. I just feel safer that way. At the same time, 
>  I need a "bridge" node to move data between the two. I can do it via NFS or 
> SCP, but I figured it'd be worth while to ask if one node can be in two 
> different clusters.
>
> Dec 22 09:15:42 lhce-imed-web1 kernel: 
> (updatedb,1832,1):dlm_get_lock_resource:898 
> 042F68B6AF134E5C9A9EDF4D7BD7BE99:O0013d2ef94: at least 
> one node (11) to recover before lock mastery can begin
>

You should add ocfs2 to PRUNEFS in /etc/updatedb.conf. updatedb generates
a lot of io and network traffic. And it will happen around the same time on all 
nodes.

Yes, each volume has a different dlm domain (instance).

___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users


Re: [Ocfs2-users] One node, two clusters?

2011-12-22 Thread Kushnir, Michael (NIH/NLM/LHC) [C]
Is there a separate DLM instance for each ocfs2 volume?

I have two "sub-clusters" in the same cluster... A 10 node Hadoop cluster 
sharing a SATA RAID10 and a Two node web server cluster sharing a SSD RAID0. 
One server mounts both volumes to move data between as necessary. 

This morning I got the following error (see end of message), and all nodes lost 
access to all storage. I'm trying to mitigate risk of this happening again. 

My hadoop nodes are used to generate search engine indexes, so they can go 
down. But my web servers provide the search engine service so I need them to 
not be tied to my hadoop nodes. I just feel safer that way. At the same time,  
I need a "bridge" node to move data between the two. I can do it via NFS or 
SCP, but I figured it'd be worth while to ask if one node can be in two 
different clusters. 

Dec 22 09:15:42 lhce-imed-web1 kernel: 
(updatedb,1832,1):dlm_get_lock_resource:898 
042F68B6AF134E5C9A9EDF4D7BD7BE99:O0013d2ef94: at least one 
node (11) to recover before lock mastery can begin

Thanks,
Mike


-Original Message-
From: Sunil Mushran [mailto:sunil.mush...@oracle.com] 
Sent: Thursday, December 22, 2011 1:21 PM
To: Werner Flamme
Cc: ocfs2-users ML
Subject: Re: [Ocfs2-users] One node, two clusters?

You don't need to have two clusters for this. This can be accomplished with one 
cluster with the default local heartbeat.

Create one cluster.conf with all the nodes. All nodes, except the one machine, 
will mount from just one san. The common node will mount from both sans.

If you look at the cluster membership, other than the common node, all nodes 
will be interacting (network connection, etc.) with nodes that they can see on 
the san.

On 12/22/2011 09:40 AM, Werner Flamme wrote:
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
>
> Kushnir, Michael (NIH/NLM/LHC) [C] [22.12.2011 18:20]:
>> Is it possible to have one machine be part of two different ocfs2 
>> clusters with two different sans? Kind of to serve as a bridge for 
>> moving data between two clusters but without actually fully combining 
>> the two clusters?
>>
>> Thanks, Michael
> Michael,
>
> I asked this two years ago and the answer was no.
>
> When I look at /etc/ocfs2/cluster.conf, I do not see a possibility to 
> configure a second cluster. Though the nodes must be assigned to a 
> cluster (and exactly one cluster, this is), there ist only one entry 
> "cluster:" in the file, and so there is no way to define a second one.
>
> We synced via rsync :-(
>
> HTH
> Werner
>
> -BEGIN PGP SIGNATURE-
> Version: GnuPG v2.0.18 (GNU/Linux)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
>
> iEYEARECAAYFAk7za4EACgkQk33Krq8b42MvSwCfQAXzqVQRPyhOdFrKM8PCPqbf
> g0cAn20CV4rjzXNrTa/YGaUeNlO3+rmc
> =CBmQ
> -END PGP SIGNATURE-
>
> ___
> Ocfs2-users mailing list
> Ocfs2-users@oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-users


___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users

___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users


Re: [Ocfs2-users] One node, two clusters?

2011-12-22 Thread Sunil Mushran
You don't need to have two clusters for this. This can be accomplished
with one cluster with the default local heartbeat.

Create one cluster.conf with all the nodes. All nodes, except the one
machine, will mount from just one san. The common node will mount from
both sans.

If you look at the cluster membership, other than the common node,
all nodes will be interacting (network connection, etc.) with nodes that
they can see on the san.

On 12/22/2011 09:40 AM, Werner Flamme wrote:
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
>
> Kushnir, Michael (NIH/NLM/LHC) [C] [22.12.2011 18:20]:
>> Is it possible to have one machine be part of two different ocfs2
>> clusters with two different sans? Kind of to serve as a bridge for
>> moving data between two clusters but without actually fully
>> combining the two clusters?
>>
>> Thanks, Michael
> Michael,
>
> I asked this two years ago and the answer was no.
>
> When I look at /etc/ocfs2/cluster.conf, I do not see a possibility to
> configure a second cluster. Though the nodes must be assigned to a
> cluster (and exactly one cluster, this is), there ist only one entry
> "cluster:" in the file, and so there is no way to define a second one.
>
> We synced via rsync :-(
>
> HTH
> Werner
>
> -BEGIN PGP SIGNATURE-
> Version: GnuPG v2.0.18 (GNU/Linux)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
>
> iEYEARECAAYFAk7za4EACgkQk33Krq8b42MvSwCfQAXzqVQRPyhOdFrKM8PCPqbf
> g0cAn20CV4rjzXNrTa/YGaUeNlO3+rmc
> =CBmQ
> -END PGP SIGNATURE-
>
> ___
> Ocfs2-users mailing list
> Ocfs2-users@oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-users


___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users


Re: [Ocfs2-users] One node, two clusters?

2011-12-22 Thread Werner Flamme
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Kushnir, Michael (NIH/NLM/LHC) [C] [22.12.2011 18:20]:
> Is it possible to have one machine be part of two different ocfs2
> clusters with two different sans? Kind of to serve as a bridge for
> moving data between two clusters but without actually fully
> combining the two clusters?
> 
> Thanks, Michael

Michael,

I asked this two years ago and the answer was no.

When I look at /etc/ocfs2/cluster.conf, I do not see a possibility to
configure a second cluster. Though the nodes must be assigned to a
cluster (and exactly one cluster, this is), there ist only one entry
"cluster:" in the file, and so there is no way to define a second one.

We synced via rsync :-(

HTH
Werner

-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.18 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk7za4EACgkQk33Krq8b42MvSwCfQAXzqVQRPyhOdFrKM8PCPqbf
g0cAn20CV4rjzXNrTa/YGaUeNlO3+rmc
=CBmQ
-END PGP SIGNATURE-

___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users


[Ocfs2-users] One node, two clusters?

2011-12-22 Thread Kushnir, Michael (NIH/NLM/LHC) [C]
Is it possible to have one machine be part of two different ocfs2 clusters with 
two different sans? Kind of to serve as a bridge for moving data between two 
clusters but without actually fully combining the two clusters?

Thanks,
Michael

___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users


Re: [Ocfs2-users] ocfs2 - Kernel panic on many write/read from both

2011-12-22 Thread Marek Królikowski
Hello
After 24 hours i see TEST-MAIL2 reboot ( possible kernel panic) but 
TEST-MAIL1 got in dmesg:
TEST-MAIL1 ~ #dmesg
[cut]
o2net: accepted connection from node TEST-MAIL2 (num 1) at 172.17.1.252:
o2dlm: Node 1 joins domain B24C4493BBC74FEAA3371E2534BB3611
o2dlm: Nodes in domain B24C4493BBC74FEAA3371E2534BB3611: 0 1
o2net: connection to node TEST-MAIL2 (num 1) at 172.17.1.252: has been 
idle for 60.0 seconds, shutting it down.
(swapper,0,0):o2net_idle_timer:1562 Here are some times that might help 
debug the situation: (Timer: 33127732045, Now 33187808090, DataReady 
33127732039, Advance 33127732051-33127732051, Key 0xebb9cd47, Func 506, 
FuncTime 33127732045-33127732048)
o2net: no longer connected to node TEST-MAIL2 (num 1) at 172.17.1.252:
(du,5099,12):dlm_do_master_request:1324 ERROR: link to 1 went down!
(du,5099,12):dlm_get_lock_resource:907 ERROR: status = -112
(dlm_thread,14321,1):dlm_send_proxy_ast_msg:484 ERROR: 
B24C4493BBC74FEAA3371E2534BB3611: res M0cf023ef70, 
error -112 send AST to node 1
(dlm_thread,14321,1):dlm_flush_asts:605 ERROR: status = -112
(dlm_thread,14321,1):dlm_send_proxy_ast_msg:484 ERROR: 
B24C4493BBC74FEAA3371E2534BB3611: res P00, 
error -107 send AST to node 1
(dlm_thread,14321,1):dlm_flush_asts:605 ERROR: status = -107
(kworker/u:3,5071,0):o2net_connect_expired:1724 ERROR: no connection 
established with node 1 after 60.0 seconds, giving up and returning errors.
(o2hb-B24C4493BB,14310,0):o2dlm_eviction_cb:267 o2dlm has evicted node 1 
from group B24C4493BBC74FEAA3371E2534BB3611
(ocfs2rec,5504,6):dlm_get_lock_resource:834 
B24C4493BBC74FEAA3371E2534BB3611:M15f023ef70: at least 
one node (1) to recover before lock mastery can begin
(ocfs2rec,5504,6):dlm_get_lock_resource:888 
B24C4493BBC74FEAA3371E2534BB3611:M15f023ef70: at least 
one node (1) to recover before lock mastery can begin
(du,5099,12):dlm_restart_lock_mastery:1213 ERROR: node down! 1
(du,5099,12):dlm_wait_for_lock_mastery:1030 ERROR: status = -11
(du,5099,12):dlm_get_lock_resource:888 
B24C4493BBC74FEAA3371E2534BB3611:N0020924f: at least one node (1) to 
recover before lock mastery can begin
(dlm_reco_thread,14322,0):dlm_get_lock_resource:834 
B24C4493BBC74FEAA3371E2534BB3611:$RECOVERY: at least one node (1) to recover 
before lock mastery can begin
(dlm_reco_thread,14322,0):dlm_get_lock_resource:868 
B24C4493BBC74FEAA3371E2534BB3611: recovery map is not empty, but must master 
$RECOVERY lock now
(dlm_reco_thread,14322,0):dlm_do_recovery:523 (14322) Node 0 is the Recovery 
Master for the Dead Node 1 for Domain B24C4493BBC74FEAA3371E2534BB3611
(ocfs2rec,5504,6):ocfs2_replay_journal:1549 Recovering node 1 from slot 1 on 
device (253,0)
(ocfs2rec,5504,6):ocfs2_begin_quota_recovery:407 Beginning quota recovery in 
slot 1
(kworker/u:0,2909,0):ocfs2_finish_quota_recovery:599 Finishing quota 
recovery in slot 1

And i try give this command:
debugfs.ocfs2 -l ENTRY EXIT DLM_GLUE QUOTA INODE DISK_ALLOC EXTENT_MAP allow
debugfs.ocfs2: Unable to write log mask "ENTRY": No such file or directory
debugfs.ocfs2 -l ENTRY EXIT DLM_GLUE QUOTA INODE DISK_ALLOC EXTENT_MAP off
debugfs.ocfs2: Unable to write log mask "ENTRY": No such file or directory

But not working


-Oryginalna wiadomość- 
From: Srinivas Eeda
Sent: Wednesday, December 21, 2011 8:43 PM
To: Marek Królikowski
Cc: ocfs2-users@oss.oracle.com
Subject: Re: [Ocfs2-users] ocfs2 - Kernel panic on many write/read from both

Those numbers look good. Basically with the fixes backed out and another
fix I gave, you are not seeing that many orphans hanging around and
hence not seeing the process stuck kernel stacks. You can run the test
longer or if you are satisfied, please enable quotas and re-run the test
with the modified kernel. You might see a dead lock which needs to be
fixed(I was not able to reproduce this yet). If the system hangs, please
capture the following and provide me the output

1. echo t > /proc/sysrq-trigger
2. debugfs.ocfs2 -l ENTRY EXIT DLM_GLUE QUOTA INODE DISK_ALLOC EXTENT_MAP 
allow
3. wait for 10 minutes
4. debugfs.ocfs2 -l ENTRY EXIT DLM_GLUE QUOTA INODE DISK_ALLOC EXTENT_MAP 
off
5. echo t > /proc/sysrq-trigger


___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users