Dear All, I have some strange problem, now I'm at the point, I have no idea, what's happening. The cluster has 2 meta servers (meta1 and 2) and 6 nodes (node1-6). The meta's have CentOS 5, nodes have CentOS 4. Node1,5,6 are 2.6.9-55.0.9.EL_lustre.1.6.4.1smp, the others are 2.6.9-42.0.10.EL_lustre-1.6.0.1custom-drbd. There are drbd peers, like node1-2 and so on. Nodes have 8 SATA disks on Adaptec 2610S and 2620S RAID adapter, and 3 NIC's (main network, lnet, drbd).
There are the symptoms: Paralell read is OK, fast and quiet. Single write is OK. Paralell writes with few (for example 3-4) clients is slow, above that it's stucked. The load on one or two nodes is high, and growing, the kernel is in io-wait. Usually this two nodes are node4 and node3 (with file stiping), and node4 has load for example 30-40-50, than node3 has approximately half of it. The problem is, this was OK for half year ago. Do you have any idea or any tip? Thank, tamas _______________________________________________ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss