Hello, I ran qemu with drive file via libnfs recently, and found some performance problem and an improvement idea.
I started qemu with 6 drives parameter like nfs://127.0.0.1/dir/vm-disk-x.qcow2 which linked to a local NFS server, then used iometer in guest machine to test the 4K random read or random write IO performance. I found that while the IO depth go up, the IOPS hit a bottleneck. I looked into the causes, found that the main thread of qemu used 100% CPU. From the perf data, it show the CPU heats are send / recv calls in libnfs. By reading the source code of libnfs and qemu block drive of nfs.c, libnfs only support single work thread, and the network events of nfs interface in qemu are all registered in the epoll of main thread. That is the cause why main thread uses 100% CPU. After the analysis above, there is an improvement idea comes up. I start a thread for every drive while libnfs open drive file, then create an epoll in every drive thread to handle all of the network events. I have finished an demo modification in block/nfs.c, then rerun iometer in the guest machine, the performance increased a lot. Random read IOPS increases almost 100%, random write IOPS increases about 68%. Test model details VM configure: 6 vdisks in 1 VM Test tool and parameter: iometer with 4K random read and randwrite Backend physical drive: 2 SSDs, 6 vdisks are seperated in 2 SSDs Before modified: IO Depth 1 2 4 8 16 32 4K randread 16659 28387 42932 46868 52108 55760 4K randwrite 12212 19456 30447 30574 35788 39015 After modified: IO Depth 1 2 4 8 16 32 4K randread 17661 33115 57138 82016 99369 109410 4K randwrite 12669 21492 36017 51532 61475 65577 I could put a up to coding standard patch later. Now I want to get some advise about this modification. Is this a reasonable solution to improve performance in NFS shares? Or there is another better way? Any suggestions would be great! Also please feel free to ask question. -- Best regards, Jaden Liang