Hello everyone!
we chose lustre for our large filestorage system, but its performance is not
what we expected.
Our users should be able to download at 1mbit/sec speed, thats their limit,
however during downloading, the speed drops, sometimes even slows down to 0, in
1-30seconds it goes back or disconnects.
The different servers attached to the lustre were used separately before as
standalone filestorage servers and it performed 4 times better.
At the lustre clients, the load is between 100-300 because the ftp processes
are waiting for the data of the oss.The raid arrays in the OSS perform a disk
io of 30-40k/s, although if they are not in the lustre oss they perform disk io
100-140k/s.
Our servers:
mgs/mdt:
2 X Intel Xeon E5620 (12M Cache, 2.40 GHz, 5.86 GT/s Intel QPI),
4 X 4GB 1067Mhz Kingston,
the lustre metadata is stored on 4 X 500GB RAID10 (only used for this ), has a
1Gbit connection to a cisco 3650 switch (all clients and oss are connected to
this switch)
The oss/ost servers are not equally the same. We have 3x2 different servers,
meaning we have two of each storage, thats 6 oss servers alltogether.
We connected them as the description goes
Server#1: dg0:
2 X Intel Xeon E5405 (12M Cache, 2.00 GHz, 1333 MHz FSB),
4 X 2GB 667Mhz Kingston
2 X 3Ware 9650SE-24M8 raid controller, with 48x1tb disks. Each controller has
3-3 raid5 OST consisting of 8-8 units, so this server has 6x6,3TB OST = 38 TB
storage
the server has 2x1Gbit (bond0) ethernet connection to the switch
Server#2 dg1:
exactly as server#1 dg0
Server#3 dg2:
2 X Intel Xeon E5620 (12M Cache, 2.40 GHz, 5.86 GT/s Intel QPI),
3 X 4GB 1067Mhz Kingston
1 X 3Ware 9650SE-16ML vezérlő, with 16 x 1t disks, 3x5 Raid5 OST, alltogether
22TB storage
the server has 2x1Gbit (bond0) ethernet connection to the switch
Server#4 dg3:
2 X Intel Xeon E5530 (8M Cache, 2.40 GHz, 5.86 GT/s Intel QPI)
8 X 4GB 1067Mhz Kingston
3 X 3Ware 9650SE-24M8 controller, with each 20-20 disks, so thats 60 x500GB
disks. Each controller has two raid5 OST arrays with ten disks. Storage is 25TB
the server has 3x1Gbit (bond0) ethernet connection to the switch
Server#5 is like dg2, server#6 is like dg3
Note: server#4 dg3 was part of another storage before, where it was able to
operate with 500-800 users, at 2-2.5gbit/sec bandwidth, but it could even
operate with 1000 users at a 2.97gbit/s bandwidth.
The documentation says, even 10 000 users could be on the lustre, however,
despite the servers being heterogenic, we don't see the reason for the system
to be so slow.
The clients are Intel Xeon X3440@2.53GHz cpu / 3 x2 gb 1333 mhz kingston with
hw xen support.
Each client has 3 virtual machines, so lustre has 6 same clients.We had before
6 different intel xeon clients, and we experienced the same speed problems as
described
Does anyone have an idea, what can cause the problem?
Thank you,
Vic
_______________________________________________
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss