Hello everyone!

we chose lustre for our large filestorage system, but its performance is not 
what we expected.
Our users should be able to download at 1mbit/sec speed, thats their limit, 
however during downloading, the speed drops, sometimes even slows down to 0, in 
1-30seconds it goes back or disconnects.
The different servers attached to the lustre were used separately before as 
standalone filestorage servers and it performed 4 times better.
At the lustre clients, the load is between 100-300 because the ftp processes 
are waiting for the data of the oss.The raid arrays in the OSS perform a disk 
io of 30-40k/s, although if they are not in the lustre oss they perform disk io 
100-140k/s.

Our servers:

mgs/mdt:
2 X Intel Xeon E5620 (12M Cache, 2.40 GHz, 5.86 GT/s Intel QPI),
4 X 4GB 1067Mhz Kingston,
the lustre metadata is stored on 4 X 500GB RAID10 (only used for this ), has a 
1Gbit connection to a cisco 3650 switch (all clients and oss are connected to 
this switch)
The oss/ost servers are not equally the same. We have 3x2 different servers, 
meaning we have two of each storage, thats 6 oss servers alltogether.
We connected them as the description goes 

Server#1: dg0:
2 X Intel Xeon E5405 (12M Cache, 2.00 GHz, 1333 MHz FSB),
4 X 2GB 667Mhz Kingston
2 X 3Ware 9650SE-24M8 raid controller, with 48x1tb disks. Each controller has 
3-3 raid5 OST consisting of 8-8 units, so this server has 6x6,3TB OST = 38 TB 
storage
the server has 2x1Gbit (bond0) ethernet connection to the switch

Server#2 dg1:
exactly as server#1 dg0

Server#3 dg2:
2 X Intel Xeon E5620 (12M Cache, 2.40 GHz, 5.86 GT/s Intel QPI),
3 X 4GB 1067Mhz Kingston 
1 X 3Ware 9650SE-16ML vezérlő, with 16 x 1t disks, 3x5 Raid5 OST, alltogether 
22TB storage
the server has 2x1Gbit (bond0) ethernet connection to the switch

Server#4 dg3:
2 X Intel Xeon E5530 (8M Cache, 2.40 GHz, 5.86 GT/s Intel QPI)
8 X 4GB 1067Mhz Kingston
3 X 3Ware 9650SE-24M8 controller, with each 20-20 disks, so thats 60 x500GB 
disks. Each controller has two raid5 OST arrays with ten disks. Storage is 25TB
the server has 3x1Gbit (bond0) ethernet connection to the switch

Server#5 is like dg2, server#6 is like dg3

Note: server#4 dg3 was part of another storage before, where it was able to 
operate with 500-800 users, at 2-2.5gbit/sec bandwidth, but it could even 
operate with 1000 users at a 2.97gbit/s bandwidth.
The documentation says, even 10 000 users could be on the lustre, however, 
despite the servers being heterogenic, we don't see the reason for the system 
to be so slow.
The clients are Intel Xeon X3440@2.53GHz cpu / 3 x2 gb 1333 mhz kingston with 
hw xen support.
Each client has 3 virtual machines, so lustre has 6 same clients.We had before 
6 different intel xeon clients, and we experienced the same speed problems as 
described

Does anyone have an idea, what can cause the problem?

Thank you,
Vic
_______________________________________________
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Reply via email to