Dear All,

I have some strange problem, now I'm at the point, I have no idea, 
what's happening.
The cluster has 2 meta servers (meta1 and 2) and 6 nodes (node1-6).
The meta's have CentOS 5, nodes have CentOS 4.
Node1,5,6 are 2.6.9-55.0.9.EL_lustre.1.6.4.1smp, the others are 
2.6.9-42.0.10.EL_lustre-1.6.0.1custom-drbd.
There are drbd peers, like node1-2 and so on.
Nodes have 8 SATA disks on Adaptec 2610S and 2620S RAID adapter, and 3 
NIC's (main network, lnet, drbd).


There are the symptoms:

Paralell read is OK, fast and quiet. Single write is OK.

Paralell writes with few (for example 3-4) clients is slow, above that 
it's stucked.
The load on one or two nodes is high, and growing, the kernel is in 
io-wait. Usually this two nodes are node4 and node3 (with file stiping), 
and node4 has load for example 30-40-50, than node3 has approximately 
half of it.

The problem is, this was OK for half year ago.

Do you have any idea or any tip?


Thank,

tamas
_______________________________________________
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Reply via email to