Dear All, I have a question about partitioning OSS with ZFS backend, where OSS has a very large storage attached.
We have a lustre file system with two OSS. Each OSS has a storage attached: $ ssh fs2 df -h Filesystem Size Used Avail Use% Mounted on /dev/root 59G 8.4G 48G 16% / tmpfs 13G 1.3M 13G 1% /run tmpfs 5.0M 0 5.0M 0% /run/lock devtmpfs 10M 0 10M 0% /dev /dev/shm 63G 0 63G 0% /dev/shm /dev/sda3 800G 197M 759G 1% /data1 chome_ost/ost 113T 41T 72T 37% /cfs/chome_ost1 $ ssh fs1 df -h Filesystem Size Used Avail Use% Mounted on /dev/root 14G 6.7G 6.5G 51% / tmpfs 1.6G 672K 1.6G 1% /run tmpfs 5.0M 0 5.0M 0% /run/lock devtmpfs 10M 0 10M 0% /dev /dev/shm 7.9G 0 7.9G 0% /dev/shm /dev/sda3 113G 188M 107G 1% /data1 chome_ost1/ost 8.6T 121G 8.5T 2% /cfs/chome_ost1 chome_ost2/ost 8.6T 195G 8.5T 3% /cfs/chome_ost2 chome_ost3/ost 8.6T 187G 8.5T 3% /cfs/chome_ost3 chome_ost4/ost 8.6T 175G 8.5T 2% /cfs/chome_ost4 Here the OSS "fs2" was installed about 6 months earlier than "fs1". Hence fs2 already has a lot of data stored, while "fs1" just started to receive data. The difference between the two OSSs are: 1. fs2: has a single OST with size 113T, which was formatted to ZFS. 2. fs1: the storage was partitioned equally into 4 partitions, and each was formatted to ZFS independently. 3. The host server of fs1 is about 8 years older than that of fs2. fs2: Xeon Silver 4214 2.2GHz (totally 24 cores) + 128GB RAM fs1: Xeon E5530 2.4GHz (totally 8 cores) + 16GB RAM Now we noticed that the averaged loading of fs2 is much much heavier than that of fs1. For fs2 the loading was usually around 30.0. For fs1 the loading was usually around 1.0. The heavy loading of fs2 often leads the following error message in dmesg: LNet: Service thread pid 19566 completed after 275.03s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Now we are wondering whether this is normal or not ? If we want to lower the loading of fs2, is it helpful to re-partition the storage of fs2 to, say, 4, and setup 4 OSTs in fs2, just like the case of fs1 ? Our Lustre version is 2.10.7. The whole Lustre file system serves a computing cluster with more than 400 CPU cores, about 70% of CPU cores are busy in computing with many I/O. Any suggestions are very appreciated. Best Regards, T.H.Hsieh _______________________________________________ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org