Hello, I have a problem with very slow Windows Explorer browsing when there are a large number of directories/files. In this case, the top level folder has almost 6000 directories, admittedly large, but it works almost instantaneously when a Windows Server share was being used.
Migrating to a Samba/GlusterFS share, there is almost a 20 second delay while the explorer window populates the list. This leaves a bad impression on the storage performance. The systems are otherwise idle. To isolate the cause, I've eliminated everything, from networking, Windows, and have narrowed in on GlusterFS being the sole cause of most of the directory lag. I was optimistic on using the GlusterFS VFS libgfapi instead of FUSE with Samba, and it does help performance dramatically in some cases, but it does not help (and sometimes hurts) when compared to the CIFS FUSE mount for directory listings. NFS for directory listings, and small I/O's seems to be better, but I cannot use NFS, as I need to use CIFS for Windows clients, need ACL's, Active Directory, etc. Versions: CentOS release 6.5 (Final) # glusterd -V glusterfs 3.4.2 built on Jan 6 2014 14:31:51 # smbd -V Version 4.1.4 For testing, I've got a single GlusterFS volume, with a single ext4 brick, being accessed locally: # gluster volume info nas-cbs-0005 Volume Name: nas-cbs-0005 Type: Distribute Volume ID: 5068e9a5-d60f-439c-b319-befbf9a73a50 Status: Started Number of Bricks: 1 Transport-type: tcp Bricks: Brick1: 192.168.5.181:/exports/nas-segment-0004/nas-cbs-0005 Options Reconfigured: server.allow-insecure: on nfs.rpc-auth-allow: * nfs.disable: off nfs.addr-namelookup: off The Samba share options are: [nas-cbs-0005] path = /samba/nas-cbs-0005/cifs_share admin users = "localadmin" valid users = "localadmin" invalid users = read list = write list = "localadmin" guest ok = yes read only = no hide unreadable = yes hide dot files = yes available = yes [nas-cbs-0005-vfs] path = / vfs objects = glusterfs glusterfs:volume = nas-cbs-0005 kernel share modes = No use sendfile = false admin users = "localadmin" valid users = "localadmin" invalid users = read list = write list = "localadmin" guest ok = yes read only = no hide unreadable = yes hide dot files = yes available = yes I've locally mounted the volume three ways, with NFS, Samba CIFS through a GlusterFS FUSE mount, and VFS libgfapi mount: # mount /dev/sdr on /exports/nas-segment-0004 type ext4 (rw,noatime,auto_da_alloc,barrier,nodelalloc,journal_checksum,acl,user_xattr) /var/lib/glusterd/vols/nas-cbs-0005/nas-cbs-0005-fuse.vol on /samba/nas-cbs-0005 type fuse.glusterfs (rw,allow_other,max_read=131072) //10.10.200.181/nas-cbs-0005 on /mnt/nas-cbs-0005-cifs type cifs (rw,username=localadmin,password=localadmin) 10.10.200.181:/nas-cbs-0005 on /mnt/nas-cbs-0005 type nfs (rw,addr=10.10.200.181) //10.10.200.181/nas-cbs-0005-vfs on /mnt/nas-cbs-0005-cifs-vfs type cifs (rw,username=localadmin,password=localadmin) Directory listing 6000 empty directories benchmark results: Directory listing the ext4 mount directly is almost instantaneous of course. Directory listing the NFS mount is also very fast, less than a second. Directory listing the CIFS FUSE mount is so slow, almost 16 seconds! Directory listing the CIFS VFS libgfapi mount is about twice as fast as FUSE, but still slow at 8 seconds. Unfortunately, due to: Bug 1004327 - New files are not inheriting ACL from parent directory unless "stat-prefetch" is off for the respective gluster volume https://bugzilla.redhat.com/show_bug.cgi?id=1004327 I need to have 'stat-prefetch' off. Retesting with this setting. Directory listing 6000 empty directories benchmark results ('stat-prefetch' is off): Accessing the ext4 mount directly is almost instantaneous of course. Accessing the NFS mount is still very fast, less than a second. Accessing the CIFS FUSE mount is slow, almost 14 seconds, but slightly faster than when 'stat-prefetch' was on? Accessing the CIFS VFS libgfapi mount is now about twice as slow as FUSE, at almost 26 seconds, I guess due to 'stat- prefetch' being off! To see if the directory listing problem was due to file system metadata handling, or small I/O's, did some simple small block file I/O benchmarks with the same configuration. 64KB Sequential Writes: NFS small block writes seem slow at about 50 MB/sec. CIFS FUSE small block writes are more than twice as fast as NFS, at about 118 MB/sec. CIFS VFS libgfapi small block writes are very fast, about twice as fast as CIFS FUSE, at about 232 MB/sec. 64KB Sequential Reads: NFS small block reads are very fast, at about 334 MB/sec. CIFS FUSE small block reads are half of NFS, at about 124 MB/sec. CIFS VFS libgfapi small block reads are about the same as CIFS FUSE, at about 127 MB/sec. 4KB Sequential Writes: NFS very small block writes are very slow at about 4 MB/sec. CIFS FUSE very small block writes are faster, at about 11 MB/sec. CIFS VFS libgfapi very small block writes are twice as fast as CIFS FUSE, at about 22 MB/sec. 4KB Sequential Reads: NFS very small block reads are very fast at about 346 MB/sec. CIFS FUSE very small block reads are less than half as fast as NFS, at about 143 MB/sec. CIFS VFS libgfapi very small block reads a slight bit slower than CIFS FUSE, at about 137 MB/sec. I'm not quite sure how interpret these results. Write caching is playing a part for sure, but it should apply equally for both NFS and CIFS I would think. With small file I/O's, NFS is better at reading than CIFS, and CIFS VFS is twice as good at writing as CIFS FUSE. Sadly, CIFS VFS is about the same as CIFS FUSE at reading. Regarding the directory listing lag problem, I've tried most of the the GlusterFS volume options that seemed like they might help, but nothing really did. Gluster having 'stat-prefetch' on helps, but has to be off for the bug. BTW: I've repeated some tests with empty files instead of directories, and the results were similar. The issue is not specific to directories only. I know that small file reads and file-system metadata handling is not GlusterFS's strong suit, but is there *anything* that can be done to help it out? Any ideas? Should I hope/expect for GlusterFS 3.5.x to improve this any? Raw data is below. Any advice is appreciated. Thanks. ~ Jeff Byers ~ ########################## Directory listing of 6000 empty directories ('stat-prefetch' is on): Directory listing the ext4 mount directly is almost instantaneous of course. # sync;sync; echo '3' > /proc/sys/vm/drop_caches # time ls -l /exports/nas-segment-0004/nas-cbs-0005/cifs_share/manydirs/ >/dev/null real 0m41.235s (Throw away first time for ext4 FS cache population?) # time ls -l /exports/nas-segment-0004/nas-cbs-0005/cifs_share/manydirs/ >/dev/null real 0m0.110s # time ls -l /exports/nas-segment-0004/nas-cbs-0005/cifs_share/manydirs/ >/dev/null real 0m0.109s Directory listing the NFS mount is also very fast. # sync;sync; echo '3' > /proc/sys/vm/drop_caches # time ls -l /mnt/nas-cbs-0005/cifs_share/manydirs/ >/dev/null real 0m44.352s (Throw away first time for ext4 FS cache population?) # time ls -l /mnt/nas-cbs-0005/cifs_share/manydirs/ >/dev/null real 0m0.471s # time ls -l /mnt/nas-cbs-0005/cifs_share/manydirs/ >/dev/null real 0m0.114s Directory listing the CIFS FUSE mount is so slow, almost 16 seconds! # sync;sync; echo '3' > /proc/sys/vm/drop_caches # time ls -l /mnt/nas-cbs-0005-cifs/manydirs/ >/dev/null real 0m56.573s (Throw away first time for ext4 FS cache population?) # time ls -l /mnt/nas-cbs-0005-cifs/manydirs/ >/dev/null real 0m16.101s # time ls -l /mnt/nas-cbs-0005-cifs/manydirs/ >/dev/null real 0m15.986s Directory listing the CIFS VFS libgfapi mount is about twice as fast as FUSE, but still slow at 8 seconds. # sync;sync; echo '3' > /proc/sys/vm/drop_caches # time ls -l /mnt/nas-cbs-0005-cifs-vfs/cifs_share/manydirs/ >/dev/null real 0m48.839s (Throw away first time for ext4 FS cache population?) # time ls -l /mnt/nas-cbs-0005-cifs-vfs/cifs_share/manydirs/ >/dev/null real 0m8.197s # time ls -l /mnt/nas-cbs-0005-cifs-vfs/cifs_share/manydirs/ >/dev/null real 0m8.450s #################### Retesting directory list with Gluster default settings, including 'stat-prefetch' off, due to: Bug 1004327 - New files are not inheriting ACL from parent directory unless "stat-prefetch" is off for the respective gluster volume https://bugzilla.redhat.com/show_bug.cgi?id=1004327 # gluster volume info nas-cbs-0005 Volume Name: nas-cbs-0005 Type: Distribute Volume ID: 5068e9a5-d60f-439c-b319-befbf9a73a50 Status: Started Number of Bricks: 1 Transport-type: tcp Bricks: Brick1: 192.168.5.181:/exports/nas-segment-0004/nas-cbs-0005 Options Reconfigured: performance.stat-prefetch: off server.allow-insecure: on nfs.rpc-auth-allow: * nfs.disable: off nfs.addr-namelookup: off Directory listing of 6000 empty directories ('stat-prefetch' is off): Accessing the ext4 mount directly is almost instantaneous of course. # sync;sync; echo '3' > /proc/sys/vm/drop_caches # time ls -l /exports/nas-segment-0004/nas-cbs-0005/cifs_share/manydirs/ >/dev/null real 0m39.483s (Throw away first time for ext4 FS cache population?) # time ls -l /exports/nas-segment-0004/nas-cbs-0005/cifs_share/manydirs/ >/dev/null real 0m0.136s # time ls -l /exports/nas-segment-0004/nas-cbs-0005/cifs_share/manydirs/ >/dev/null real 0m0.109s Accessing the NFS mount is also very fast. # sync;sync; echo '3' > /proc/sys/vm/drop_caches # time ls -l /mnt/nas-cbs-0005/cifs_share/manydirs/ >/dev/null real 0m43.819s (Throw away first time for ext4 FS cache population?) # time ls -l /mnt/nas-cbs-0005/cifs_share/manydirs/ >/dev/null real 0m0.342s # time ls -l /mnt/nas-cbs-0005/cifs_share/manydirs/ >/dev/null real 0m0.200s Accessing the CIFS FUSE mount is slow, almost 14 seconds! # sync;sync; echo '3' > /proc/sys/vm/drop_caches # time ls -l /mnt/nas-cbs-0005-cifs/manydirs/ >/dev/null real 0m55.759s (Throw away first time for ext4 FS cache population?) # time ls -l /mnt/nas-cbs-0005-cifs/manydirs/ >/dev/null real 0m13.458s # time ls -l /mnt/nas-cbs-0005-cifs/manydirs/ >/dev/null real 0m13.665s Accessing the CIFS VFS libgfapi mount is now about twice as slow as FUSE, at almost 26 seconds due to 'stat-prefetch' being off! # sync;sync; echo '3' > /proc/sys/vm/drop_caches # time ls -l /mnt/nas-cbs-0005-cifs-vfs/cifs_share/manydirs/ >/dev/null real 1m2.821s (Throw away first time for ext4 FS cache population?) # time ls -l /mnt/nas-cbs-0005-cifs-vfs/cifs_share/manydirs/ >/dev/null real 0m25.563s # time ls -l /mnt/nas-cbs-0005-cifs-vfs/cifs_share/manydirs/ >/dev/null real 0m26.949s #################### 64KB Writes: NFS small block writes seem slow at about 50 MB/sec. # sync;sync; echo '3' > /proc/sys/vm/drop_caches # sgp_dd time=1 thr=4 bs=64k bpt=1 iflag=dsync oflag=dsync if=/dev/zero of=/mnt/nas-cbs-0005/cifs_share/testfile count=20k time to transfer data was 27.249756 secs, 49.25 MB/sec # sgp_dd time=1 thr=4 bs=64k bpt=1 iflag=dsync oflag=dsync if=/dev/zero of=/mnt/nas-cbs-0005/cifs_share/testfile count=20k time to transfer data was 25.893526 secs, 51.83 MB/sec CIFS FUSE small block writes are more than twice as fast as NFS, at about 118 MB/sec. # sync;sync; echo '3' > /proc/sys/vm/drop_caches # sgp_dd time=1 thr=4 bs=64k bpt=1 iflag=dsync oflag=dsync if=/dev/zero of=/mnt/nas-cbs-0005-cifs/testfile count=20k time to transfer data was 11.509077 secs, 116.62 MB/sec # sgp_dd time=1 thr=4 bs=64k bpt=1 iflag=dsync oflag=dsync if=/dev/zero of=/mnt/nas-cbs-0005-cifs/testfile count=20k time to transfer data was 11.223902 secs, 119.58 MB/sec CIFS VFS libgfapi small block writes are very fast, about twice as fast as CIFS FUSE, at about 232 MB/sec. # sync;sync; echo '3' > /proc/sys/vm/drop_caches # sgp_dd time=1 thr=4 bs=64k bpt=1 iflag=dsync oflag=dsync if=/dev/zero of=/mnt/nas-cbs-0005-cifs-vfs/cifs_share/testfile count=20k time to transfer data was 5.704753 secs, 235.27 MB/sec # sgp_dd time=1 thr=4 bs=64k bpt=1 iflag=dsync oflag=dsync if=/dev/zero of=/mnt/nas-cbs-0005-cifs-vfs/cifs_share/testfile count=20k time to transfer data was 5.862486 secs, 228.94 MB/sec 64KB Reads: NFS small block reads are very fast, at about 334 MB/sec. # sync;sync; echo '3' > /proc/sys/vm/drop_caches # sgp_dd time=1 thr=4 bs=64k bpt=1 iflag=dsync oflag=dsync of=/dev/null if=/mnt/nas-cbs-0005/cifs_share/testfile count=20k time to transfer data was 3.972426 secs, 337.87 MB/sec # sync;sync; echo '3' > /proc/sys/vm/drop_caches # sgp_dd time=1 thr=4 bs=64k bpt=1 iflag=dsync oflag=dsync of=/dev/null if=/mnt/nas-cbs-0005/cifs_share/testfile count=20k time to transfer data was 4.066978 secs, 330.02 MB/sec CIFS FUSE small block reads are half of NFS, at about 124 MB/sec. # sync;sync; echo '3' > /proc/sys/vm/drop_caches # sgp_dd time=1 thr=4 bs=64k bpt=1 iflag=dsync oflag=dsync of=/dev/null if=/mnt/nas-cbs-0005-cifs/testfile count=20k time to transfer data was 10.837072 secs, 123.85 MB/sec # sync;sync; echo '3' > /proc/sys/vm/drop_caches # sgp_dd time=1 thr=4 bs=64k bpt=1 iflag=dsync oflag=dsync of=/dev/null if=/mnt/nas-cbs-0005-cifs/testfile count=20k time to transfer data was 10.716980 secs, 125.24 MB/sec CIFS VFS libgfapi small block reads are about the same as CIFS FUSE, at about 127 MB/sec. # sync;sync; echo '3' > /proc/sys/vm/drop_caches # sgp_dd time=1 thr=4 bs=64k bpt=1 iflag=dsync oflag=dsync of=/dev/null if=/mnt/nas-cbs-0005-cifs-vfs/cifs_share/testfile count=20k time to transfer data was 10.397888 secs, 129.08 MB/sec # sync;sync; echo '3' > /proc/sys/vm/drop_caches # sgp_dd time=1 thr=4 bs=64k bpt=1 iflag=dsync oflag=dsync of=/dev/null if=/mnt/nas-cbs-0005-cifs-vfs/cifs_share/testfile count=20k time to transfer data was 10.696802 secs, 125.47 MB/sec 4KB Writes: NFS very small block writes are very slow at about 4 MB/sec. # sync;sync; echo '3' > /proc/sys/vm/drop_caches # sgp_dd time=1 thr=4 bs=4k bpt=1 iflag=dsync oflag=dsync if=/dev/zero of=/mnt/nas-cbs-0005/cifs_share/testfile count=20k time to transfer data was 20.450521 secs, 4.10 MB/sec # sgp_dd time=1 thr=4 bs=4k bpt=1 iflag=dsync oflag=dsync if=/dev/zero of=/mnt/nas-cbs-0005/cifs_share/testfile count=20k time to transfer data was 19.669923 secs, 4.26 MB/sec CIFS FUSE very small block writes are faster, at about 11 MB/sec. # sync;sync; echo '3' > /proc/sys/vm/drop_caches # sgp_dd time=1 thr=4 bs=4k bpt=1 iflag=dsync oflag=dsync if=/dev/zero of=/mnt/nas-cbs-0005-cifs/testfile count=20k time to transfer data was 7.247578 secs, 11.57 MB/sec # sgp_dd time=1 thr=4 bs=4k bpt=1 iflag=dsync oflag=dsync if=/dev/zero of=/mnt/nas-cbs-0005-cifs/testfile count=20k time to transfer data was 7.422002 secs, 11.30 MB/sec CIFS VFS libgfapi very small block writes are twice as fast as CIFS FUSE, at about 22 MB/sec. # sync;sync; echo '3' > /proc/sys/vm/drop_caches # sgp_dd time=1 thr=4 bs=4k bpt=1 iflag=dsync oflag=dsync if=/dev/zero of=/mnt/nas-cbs-0005-cifs-vfs/cifs_share/testfile count=20k time to transfer data was 3.766179 secs, 22.27 MB/sec # sgp_dd time=1 thr=4 bs=4k bpt=1 iflag=dsync oflag=dsync if=/dev/zero of=/mnt/nas-cbs-0005-cifs-vfs/cifs_share/testfile count=20k time to transfer data was 3.761176 secs, 22.30 MB/sec 4KB Reads: NFS very small block reads are very fast at about 346 MB/sec. # sync;sync; echo '3' > /proc/sys/vm/drop_caches # sgp_dd time=1 thr=4 bs=4k bpt=1 iflag=dsync oflag=dsync of=/dev/null if=/mnt/nas-cbs-0005/cifs_share/testfile count=20k time to transfer data was 0.244960 secs, 342.45 MB/sec # sync;sync; echo '3' > /proc/sys/vm/drop_caches # sgp_dd time=1 thr=4 bs=4k bpt=1 iflag=dsync oflag=dsync of=/dev/null if=/mnt/nas-cbs-0005/cifs_share/testfile count=20k time to transfer data was 0.240472 secs, 348.84 MB/sec CIFS FUSE very small block reads are less than half as fast as NFS, at about 143 MB/sec. # sync;sync; echo '3' > /proc/sys/vm/drop_caches # sgp_dd time=1 thr=4 bs=4k bpt=1 iflag=dsync oflag=dsync of=/dev/null if=/mnt/nas-cbs-0005-cifs/testfile count=20k time to transfer data was 0.606534 secs, 138.30 MB/sec # sync;sync; echo '3' > /proc/sys/vm/drop_caches # sgp_dd time=1 thr=4 bs=4k bpt=1 iflag=dsync oflag=dsync of=/dev/null if=/mnt/nas-cbs-0005-cifs/testfile count=20k time to transfer data was 0.576185 secs, 145.59 MB/sec CIFS VFS libgfapi very small block reads a slight bit slower than CIFS FUSE, at about 137 MB/sec. # sync;sync; echo '3' > /proc/sys/vm/drop_caches # sgp_dd time=1 thr=4 bs=4k bpt=1 iflag=dsync oflag=dsync of=/dev/null if=/mnt/nas-cbs-0005-cifs-vfs/cifs_share/testfile count=20k time to transfer data was 0.611328 secs, 137.22 MB/sec # sync;sync; echo '3' > /proc/sys/vm/drop_caches # sgp_dd time=1 thr=4 bs=4k bpt=1 iflag=dsync oflag=dsync of=/dev/null if=/mnt/nas-cbs-0005-cifs-vfs/cifs_share/testfile count=20k time to transfer data was 0.615834 secs, 136.22 MB/sec EOM
_______________________________________________ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users