On Thu, Jun 21, 2012 at 03:46:32AM +0000, Guozhonghua wrote: > The first problem is as below: > One issue is the files copied to the device but it can't be list on node2, > using ls -al the mounted directory. > But using debug.ocfs2 on node2, it is ok to list the files copied. After > remount of the device on node2, the file can be list.
This is the kind of thing you see when locking gets unhappy. You copy on node1, it writes to the disk, but somehow node2 has not noticed. Thus, you can see the data on disk (debugfs.ocfs2), but not via the filesystme. What kind of storage is this? How are node1, node2, and node3 attached to it? How do they talk to each other? > The second is that: > Node1 is in the ocfs2 cluster, but using debug.ocfs2, and mounted.ocfs2 -f > command, can not list the node1 info. > The node2, node3 are list. And using debug.ocfs2, list the slotmap > information, there is not node1. This is very interesting. Joel > But the heartbeat information on disk is ok. > > Ant there are lot of "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" error > message in the log. > > We format the device with 32 node using the command: > mkfs.ocfs2 -b 4k -C 1M -L target100 -T vmstore -N 32 /dev/sdb > > So we have to delete the ocfs2 cluster, reboot nodes, and rebuild the ocfs2. > After all node joins into the cluster, we copy data again, and there are > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" message still. > > Jun 20 20:42:01 H3CRDS11-RD kernel: [17509.006781] INFO: task cp:22285 > blocked for more than 120 seconds. > Jun 20 20:42:01 H3CRDS11-RD kernel: [17509.016123] "echo 0 > > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > Jun 20 20:42:01 H3CRDS11-RD kernel: [17509.034724] cp D > ffffffff81806240 0 22285 5313 0x00000000 > Jun 20 20:42:01 H3CRDS11-RD kernel: [17509.034729] ffff881b952658b0 > 0000000000000082 0000000000000000 0000000000000001 > Jun 20 20:42:01 H3CRDS11-RD kernel: [17509.034739] ffff881b95265fd8 > ffff881b95265fd8 ffff881b95265fd8 0000000000013780 > Jun 20 20:42:01 H3CRDS11-RD kernel: [17509.034751] ffff880fc16044d0 > ffff881fbe41ade0 ffff882027c13780 ffff881fbe41ade0 > Jun 20 20:42:01 H3CRDS11-RD kernel: [17509.034762] Call Trace: > Jun 20 20:42:01 H3CRDS11-RD kernel: [17509.034769] [<ffffffff8165a55f>] > schedule+0x3f/0x60 > Jun 20 20:42:01 H3CRDS11-RD kernel: [17509.034777] [<ffffffff8165c35d>] > rwsem_down_failed_common+0xcd/0x170 > Jun 20 20:42:01 H3CRDS11-RD kernel: [17509.034808] [<ffffffffa059d399>] ? > ocfs2_metadata_cache_unlock+0x19/0x20 [ocfs2] > Jun 20 20:42:01 H3CRDS11-RD kernel: [17509.034815] [<ffffffff8165c435>] > rwsem_down_read_failed+0x15/0x17 > Jun 20 20:42:01 H3CRDS11-RD kernel: [17509.034826] [<ffffffff813188d4>] > call_rwsem_down_read_failed+0x14/0x30 > Jun 20 20:42:01 H3CRDS11-RD kernel: [17509.034833] [<ffffffff8165b754>] ? > down_read+0x24/0x2b > Jun 20 20:42:01 H3CRDS11-RD kernel: [17509.034859] [<ffffffffa0553b11>] > ocfs2_start_trans+0xe1/0x1e0 [ocfs2] > Jun 20 20:42:01 H3CRDS11-RD kernel: [17509.034878] [<ffffffffa052ab35>] > ocfs2_write_begin_nolock+0x945/0x1c40 [ocfs2] > Jun 20 20:42:01 H3CRDS11-RD kernel: [17509.034903] [<ffffffffa054cb90>] ? > ocfs2_inode_is_valid_to_delete+0x1f0/0x1f0 [ocfs2] > Jun 20 20:42:01 H3CRDS11-RD kernel: [17509.034927] [<ffffffffa053fa9c>] ? > ocfs2_inode_lock_full_nested+0x52c/0xa90 [ocfs2] > Jun 20 20:42:01 H3CRDS11-RD kernel: [17509.034939] [<ffffffff81647ae2>] ? > balance_dirty_pages.isra.17+0x457/0x4ba > Jun 20 20:42:01 H3CRDS11-RD kernel: [17509.034959] [<ffffffffa052bf26>] > ocfs2_write_begin+0xf6/0x210 [ocfs2] > Jun 20 20:42:01 H3CRDS11-RD kernel: [17509.034968] [<ffffffff8111752a>] > generic_perform_write+0xca/0x210 > Jun 20 20:42:01 H3CRDS11-RD kernel: [17509.034991] [<ffffffffa053d9b9>] ? > ocfs2_inode_unlock+0xb9/0x130 [ocfs2] > Jun 20 20:42:01 H3CRDS11-RD kernel: [17509.034998] [<ffffffff811176cd>] > generic_file_buffered_write+0x5d/0x90 > Jun 20 20:42:01 H3CRDS11-RD kernel: [17509.035023] [<ffffffffa054c601>] > ocfs2_file_aio_write+0x821/0x870 [ocfs2] > Jun 20 20:42:01 H3CRDS11-RD kernel: [17509.035032] [<ffffffff81177342>] > do_sync_write+0xd2/0x110 > Jun 20 20:42:01 H3CRDS11-RD kernel: [17509.035043] [<ffffffff812d7448>] ? > apparmor_file_permission+0x18/0x20 > Jun 20 20:42:01 H3CRDS11-RD kernel: [17509.035052] [<ffffffff8129cc9c>] ? > security_file_permission+0x2c/0xb0 > Jun 20 20:42:01 H3CRDS11-RD kernel: [17509.035058] [<ffffffff811778d1>] ? > rw_verify_area+0x61/0xf0 > Jun 20 20:42:01 H3CRDS11-RD kernel: [17509.035064] [<ffffffff81177c33>] > vfs_write+0xb3/0x180 > Jun 20 20:42:01 H3CRDS11-RD kernel: [17509.035070] [<ffffffff81177f5a>] > sys_write+0x4a/0x90 > Jun 20 20:42:01 H3CRDS11-RD kernel: [17509.035077] [<ffffffff81664a82>] > system_call_fastpath+0x16/0x1b > > Is there some better advice or practice? Or is there some bug? > > The information of the OS is as below and all the four node are installed > same. : > 3.2.0-23-generic #36-Ubuntu SMP Tue Apr 10 20:39:51 UTC 2012 x86_64 x86_64 > x86_64 GNU/Linux > > The host information as below: > # free > total used free shared buffers cached > Mem: 132028152 104355680 27672472 0 171496 69113032 > -/+ buffers/cache: 35071152 96957000 > Swap: 34523132 0 34523132 > > Cpu information: > Architecture: x86_64 > CPU op-mode(s): 32-bit, 64-bit > Byte Order: Little Endian > CPU(s): 24 > On-line CPU(s) list: 0-23 > Thread(s) per core: 2 > Core(s) per socket: 6 > Socket(s): 2 > NUMA node(s): 2 > Vendor ID: GenuineIntel > CPU family: 6 > Model: 44 > Stepping: 2 > CPU MHz: 2532.792 > BogoMIPS: 5065.22 > Virtualization: VT-x > L1d cache: 32K > L1i cache: 32K > L2 cache: 256K > L3 cache: 12288K > NUMA node0 CPU(s): 0,2,4,6,8,10,12,14,16,18,20,22 > NUMA node1 CPU(s): 1,3,5,7,9,11,13,15,17,19,21,23 > > > Thanks > > > ------------------------------------------------------------------------------------------------------------------------------------- > ???????????????????????????????????????? > ???????????????????????????????????????? > ???????????????????????????????????????? > ??? > This e-mail and its attachments contain confidential information from H3C, > which is > intended only for the person or entity whose address is listed above. Any use > of the > information contained herein in any way (including, but not limited to, total > or partial > disclosure, reproduction, or dissemination) by persons other than the intended > recipient(s) is prohibited. If you receive this e-mail in error, please > notify the sender > by phone or email immediately and delete it! > _______________________________________________ > Ocfs2-devel mailing list > ocfs2-de...@oss.oracle.com > http://oss.oracle.com/mailman/listinfo/ocfs2-devel -- "People with narrow minds usually have broad tongues." http://www.jlbec.org/ jl...@evilplan.org _______________________________________________ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users