Samba Performance testing ========================== 1.0 Architecture: ----------------- Server: CPU: Intel(R) Pentium(R) III CPU family 1266MHz Memory: 1GB Kernel: Linux 2.4.18 File System: xfs-1.1 Samba version: 3.0-alpha19 Network: 1 GB point to point
Client: 1/2 GB memory and 1.6 GHZ Pentium 1.1 Introduction: ----------------- We have been trying to measure samba performance. The following are our observations. 1.2 Is it samba ? ----------------- We wanted to find out for sure whether samba was the bottleneck. So we did the following experiment. 1. dbench (to measure disk TP) 2. tbench (to measure TCP/IP TP) 3. dbench+tbench: In this experiment we wanted to find out whether system, not samba was the limitation. For each number of clients dbench and tbench was stated simultaneously. 4. nbench with clients_oplocks.txt trace (to measure samba TP) The results are as follows Num dbench tbench dbench tbench min(1,2) nbench clients alone alone (simul (simul tbench) dbench) (1) (2) 1 77.152 20.915 77.1373 19.7312 19.7312 11.5006 4 106.174 40.6007 71.2576 33.9155 33.9155 19.3349 8 93.378 56.4977 63.2581 43.745 43.745 19.8468 12 81.908 60.8616 59.0883 43.675 43.675 19.2888 16 56.834 63.6999 52.1449 41.5259 41.5259 19.3474 20 63.398 64.9676 50.9493 41.776 41.776 19.1162 24 61.818 66.6186 50.223 41.8949 41.8949 18.9119 28 55.442 67.3411 49.1058 41.5549 41.5549 19.0702 32 54.318 69.2981 47.8511 41.9139 41.9139 18.8018 36 54.986 70.1524 45.6686 41.3715 41.3715 18.3617 40 46.994 70.8444 45.2621 41.459 41.459 18.2381 44 41.702 69.8389 42.6287 41.0206 41.0206 18.1785 48 45.988 69.8389 40.4743 40.3336 40.3336 18.1683 The nbench experiment measures samba performance with the same work load trace used for other experiments. As can be seen nbench TP is much smaller than minimum of (1) and (2) which implies that samba is the performance bottleneck. (The disk configuration for the above experiment was a 11 drive RAID 5 with LVM) 1.3 Where in Samba and what is the limitation ?: ------------------------------------------------ We observe that our system is severely CPU limited. Here is the summary of top -d 1 trace of CPU usage during the period 16 nbench clients were active.(2 drive RAID 0 + LVM) User System Total Mean 34.60447761 64.14477612 98.74925373 Median 35.2 63.7 99.9 Stdev 0.070189292 0.076303659 0.06342686 So it seems that more CPU time is spent in the system. Is this compatible with what we saw in earlier Samba versions ? Then we used the Samba build in profiling facility to get some information about performance intensive code paths. We discovered that the time spent on stat calls was excessive. The time was more than the time spent on read or write calls! Here are the time consuming system calls Name num calls time(us) Min(us) Max(us) ----- -------- ------- ------ ------ syscall_opendir 189841 36913656 0 396806 syscall_readdir 2329741 40225042 0 312880 syscall_open 194256 150164226 0 1245872 syscall_close 133504 41983747 0 475361 syscall_read 320496 88093084 0 350440 syscall_write 149776 90665926 0 382059 syscall_stat 1335959 145079345 0 336839 syscall_unlink 33520 101113573 0 1132776 Here are the time consuming Trans2 calls Trans2_findfirst 57184 201725472 0 430785 Trans2_qpathinfo 147536 255836025 0 412576 and the time consuming SMB calls SMBntcreateX 175984 95263531 0 346844 SMBdskattr 27344 63275572 0 351798 SMBreadX 320496 90593419 0 350444 SMBwriteX 149776 92584721 0 382067 SMBunlink 33520 101522665 0 1132787 SMBclose 133696 66140491 0 475414 and cache statistics are ************************ Statcache ******************************* lookups: 398768 misses: 41 hits: 398727 ************************ Writecache ****************************** read_hits: 0 abutted_writes: 0 total_writes: 149776 non_oplock_writes: 149776 direct_writes: 149776 init_writes: 0 flushed_writes[SEEK]: 0 flushed_writes[READ]: 0 flushed_writes[WRITE]: 0 flushed_writes[READRAW]: 0 flushed_writes[OPLOCK_RELEASE]: 0 flushed_writes[CLOSE]: 0 flushed_writes[SYNC]: 0 flushed_writes[SIZECHANGE]: 0 num_perfect_writes: 0 num_write_caches: 0 allocated_write_caches: 0 For the above experiment <16 clients nbench 2 Dr RAID 0 + LVM> I am getting about ~21 MBytes/s. Then we removed the FIND_FIRST and QUERY_PATH_INFORMATION calls from the clients_oplocks.txt file. We can see that performance improves about 6-8 MBytes/s for 16 clients. Name num calls time(us) Min(us) Max(us) ----- -------- ------- ------ ------ syscall_opendir 83009 18155570 0 306736 syscall_readdir 938078 15806346 0 314394 syscall_open 194256 163721233 0 1682098 syscall_close 133504 50548558 0 905587 syscall_read 320496 91373880 0 319341 syscall_write 149776 94024793 0 345850 syscall_stat 597492 69316075 0 312443 syscall_unlink 33520 101812395 0 1369880 As can be seen there is a substantial reduction in stat,readdir and opendir system call times.However the CPU user and system time distribution is identical to the previous case. To dissect the impact of stat we measured the kernel dcache hit/miss statistics. We see that there is a very high hit rate at the dcache. shrink_dcache_memory was not called indicating that the kernel mm did not run short of pages. To analyze the FIND_FIRST operation we put further traces in call_trans2findfirst call path. We realized that more than 60% of the time is spent in get_lanman2_dir_entry() call. And inside get_lanman2_dir_entry call we realized that majority of the time is spent inside vfs_stat call ~(46%) and ~28% of the time is spent in mask_match and exact_match calls. We did a kernel profiling of a 60 client netbench run and found out that link_path_walk,d_lookup,kmem_cache_alloc are visited more often when the timer interrupt occurs. All in sys_stat call path. Conclusion: ----------- We think Samba needs to optimize caching of the stat calls. Individual stat calls (average = 49us) are not the concern, but the sheer number of stat calls are. Also significant BW can be gained by optimizing opendir and readdir calls (dir stream). Has anybody done this sort of profiling before ? Are these results compatible ? Are there any ongoing attempts to cache stat information ? Some insights in this regard is much appreciated. I am hoping to track down why open call is so expensive in a future exercise. Thank you Ravi ===== ------------------------------ Ravi Wijayaratne __________________________________________________ Do you Yahoo!? Yahoo! Platinum - Watch CBS' NCAA March Madness, live on your desktop! http://platinum.yahoo.com