Hello,

I have a problem with very slow Windows Explorer browsing
when there are a large number of directories/files.
In this case, the top level folder has almost 6000 directories,
admittedly large, but it works almost instantaneously when a
Windows Server share was being used.

Migrating to a Samba/GlusterFS share, there is almost a 20
second delay while the explorer window populates the list.
This leaves a bad impression on the storage performance. The
systems are otherwise idle.
To isolate the cause, I've eliminated everything, from
networking, Windows, and have narrowed in on GlusterFS
being the sole cause of most of the directory lag.

I was optimistic on using the GlusterFS VFS libgfapi instead
of FUSE with Samba, and it does help performance
dramatically in some cases, but it does not help (and
sometimes hurts) when compared to the CIFS FUSE mount
for directory listings.

NFS for directory listings, and small I/O's seems to be
better, but I cannot use NFS, as I need to use CIFS for
Windows clients, need ACL's, Active Directory, etc.

Versions:
    CentOS release 6.5 (Final)
    # glusterd -V
    glusterfs 3.4.2 built on Jan  6 2014 14:31:51
    # smbd -V
    Version 4.1.4

For testing, I've got a single GlusterFS volume, with a
single ext4 brick, being accessed locally:

# gluster volume info nas-cbs-0005
Volume Name: nas-cbs-0005
Type: Distribute
Volume ID: 5068e9a5-d60f-439c-b319-befbf9a73a50
Status: Started
Number of Bricks: 1
Transport-type: tcp
Bricks:
Brick1: 192.168.5.181:/exports/nas-segment-0004/nas-cbs-0005
Options Reconfigured:
server.allow-insecure: on
nfs.rpc-auth-allow: *
nfs.disable: off
nfs.addr-namelookup: off

The Samba share options are:

[nas-cbs-0005]
    path = /samba/nas-cbs-0005/cifs_share
    admin users = "localadmin"
    valid users = "localadmin"
    invalid users =
    read list =
    write list = "localadmin"
    guest ok = yes
    read only = no
    hide unreadable = yes
    hide dot files = yes
    available = yes

[nas-cbs-0005-vfs]
    path = /
    vfs objects = glusterfs
    glusterfs:volume = nas-cbs-0005
    kernel share modes = No
    use sendfile = false
    admin users = "localadmin"
    valid users = "localadmin"
    invalid users =
    read list =
    write list = "localadmin"
    guest ok = yes
    read only = no
    hide unreadable = yes
    hide dot files = yes
    available = yes

I've locally mounted the volume three ways, with NFS, Samba
CIFS through a GlusterFS FUSE mount, and VFS libgfapi mount:

# mount
/dev/sdr on /exports/nas-segment-0004 type ext4
(rw,noatime,auto_da_alloc,barrier,nodelalloc,journal_checksum,acl,user_xattr)
/var/lib/glusterd/vols/nas-cbs-0005/nas-cbs-0005-fuse.vol on
/samba/nas-cbs-0005 type fuse.glusterfs (rw,allow_other,max_read=131072)
//10.10.200.181/nas-cbs-0005 on /mnt/nas-cbs-0005-cifs type cifs
(rw,username=localadmin,password=localadmin)
10.10.200.181:/nas-cbs-0005 on /mnt/nas-cbs-0005 type nfs
(rw,addr=10.10.200.181)
//10.10.200.181/nas-cbs-0005-vfs on /mnt/nas-cbs-0005-cifs-vfs type cifs
(rw,username=localadmin,password=localadmin)

Directory listing 6000 empty directories benchmark results:

    Directory listing the ext4 mount directly is almost
    instantaneous of course.

    Directory listing the NFS mount is also very fast, less than a second.

    Directory listing the CIFS FUSE mount is so slow, almost 16
    seconds!

    Directory listing the CIFS VFS libgfapi mount is about twice
    as fast as FUSE, but still slow at 8 seconds.

Unfortunately, due to:

    Bug 1004327 - New files are not inheriting ACL from parent
                  directory unless "stat-prefetch" is off for
                  the respective gluster volume
    https://bugzilla.redhat.com/show_bug.cgi?id=1004327

I need to have 'stat-prefetch' off. Retesting with this
setting.

Directory listing 6000 empty directories benchmark results
('stat-prefetch' is off):

    Accessing the ext4 mount directly is almost
    instantaneous of course.

    Accessing the NFS mount is still very fast, less than a second.

    Accessing the CIFS FUSE mount is slow, almost 14
    seconds, but slightly faster than when 'stat-prefetch' was
    on?

    Accessing the CIFS VFS libgfapi mount is now about twice
    as slow as FUSE, at almost 26 seconds, I guess due
    to 'stat- prefetch' being off!

To see if the directory listing problem was due to file
system metadata handling, or small I/O's, did some simple
small block file I/O benchmarks with the same configuration.

    64KB Sequential Writes:

    NFS small block writes seem slow at about 50 MB/sec.

    CIFS FUSE small block writes are more than twice as fast as
    NFS, at about 118 MB/sec.

    CIFS VFS libgfapi small block writes are very fast, about
    twice as fast as CIFS FUSE, at about 232 MB/sec.

    64KB Sequential Reads:

    NFS small block reads are very fast, at about 334 MB/sec.

    CIFS FUSE small block reads are half of NFS, at about 124
    MB/sec.

    CIFS VFS libgfapi small block reads are about the same as
    CIFS FUSE, at about 127 MB/sec.

    4KB Sequential Writes:

    NFS very small block writes are very slow at about 4 MB/sec.

    CIFS FUSE very small block writes are faster, at about 11
    MB/sec.

    CIFS VFS libgfapi very small block writes are twice as fast
    as CIFS FUSE, at about 22 MB/sec.

    4KB Sequential Reads:

    NFS very small block reads are very fast at about 346
    MB/sec.

    CIFS FUSE very small block reads are less than half as fast
    as NFS, at about 143 MB/sec.

    CIFS VFS libgfapi very small block reads a slight bit slower
    than CIFS FUSE, at about 137 MB/sec.

I'm not quite sure how interpret these results. Write
caching is playing a part for sure, but it should apply
equally for both NFS and CIFS I would think. With small file
I/O's, NFS is better at reading than CIFS, and CIFS VFS is
twice as good at writing as CIFS FUSE. Sadly, CIFS VFS is
about the same as CIFS FUSE at reading.

Regarding the directory listing lag problem, I've tried most
of the the GlusterFS volume options that seemed like they
might help, but nothing really did.

Gluster having 'stat-prefetch' on helps, but has to be off
for the bug.
BTW: I've repeated some tests with empty files instead of
directories, and the results were similar. The issue is not
specific to directories only.

I know that small file reads and file-system metadata
handling is not GlusterFS's strong suit, but is there
*anything* that can be done to help it out? Any ideas?
Should I hope/expect for GlusterFS 3.5.x to improve this
any?

Raw data is below.

Any advice is appreciated. Thanks.

~ Jeff Byers ~

##########################

Directory listing of 6000 empty directories ('stat-prefetch'
is on):

Directory listing the ext4 mount directly is almost
instantaneous of course.

# sync;sync; echo '3' > /proc/sys/vm/drop_caches
# time ls -l /exports/nas-segment-0004/nas-cbs-0005/cifs_share/manydirs/
>/dev/null
real    0m41.235s (Throw away first time for ext4 FS cache population?)
# time ls -l /exports/nas-segment-0004/nas-cbs-0005/cifs_share/manydirs/
>/dev/null
real    0m0.110s
# time ls -l /exports/nas-segment-0004/nas-cbs-0005/cifs_share/manydirs/
>/dev/null
real    0m0.109s

Directory listing the NFS mount is also very fast.

# sync;sync; echo '3' > /proc/sys/vm/drop_caches
# time ls -l /mnt/nas-cbs-0005/cifs_share/manydirs/ >/dev/null
real    0m44.352s (Throw away first time for ext4 FS cache population?)
# time ls -l /mnt/nas-cbs-0005/cifs_share/manydirs/ >/dev/null
real    0m0.471s
# time ls -l /mnt/nas-cbs-0005/cifs_share/manydirs/ >/dev/null
real    0m0.114s

Directory listing the CIFS FUSE mount is so slow, almost 16
seconds!

# sync;sync; echo '3' > /proc/sys/vm/drop_caches
# time ls -l /mnt/nas-cbs-0005-cifs/manydirs/ >/dev/null
real    0m56.573s (Throw away first time for ext4 FS cache population?)
# time ls -l /mnt/nas-cbs-0005-cifs/manydirs/ >/dev/null
real    0m16.101s
# time ls -l /mnt/nas-cbs-0005-cifs/manydirs/ >/dev/null
real    0m15.986s

Directory listing the CIFS VFS libgfapi mount is about twice
as fast as FUSE, but still slow at 8 seconds.

# sync;sync; echo '3' > /proc/sys/vm/drop_caches
# time ls -l /mnt/nas-cbs-0005-cifs-vfs/cifs_share/manydirs/ >/dev/null
real    0m48.839s (Throw away first time for ext4 FS cache population?)
# time ls -l /mnt/nas-cbs-0005-cifs-vfs/cifs_share/manydirs/ >/dev/null
real    0m8.197s
# time ls -l /mnt/nas-cbs-0005-cifs-vfs/cifs_share/manydirs/ >/dev/null
real    0m8.450s

####################

Retesting directory list with Gluster default settings,
including 'stat-prefetch' off, due to:

    Bug 1004327 - New files are not inheriting ACL from parent directory
                  unless "stat-prefetch" is off for the respective gluster
                  volume
    https://bugzilla.redhat.com/show_bug.cgi?id=1004327

# gluster volume info nas-cbs-0005

Volume Name: nas-cbs-0005
Type: Distribute
Volume ID: 5068e9a5-d60f-439c-b319-befbf9a73a50
Status: Started
Number of Bricks: 1
Transport-type: tcp
Bricks:
Brick1: 192.168.5.181:/exports/nas-segment-0004/nas-cbs-0005
Options Reconfigured:
performance.stat-prefetch: off
server.allow-insecure: on
nfs.rpc-auth-allow: *
nfs.disable: off
nfs.addr-namelookup: off

Directory listing of 6000 empty directories ('stat-prefetch'
is off):

Accessing the ext4 mount directly is almost instantaneous of
course.

# sync;sync; echo '3' > /proc/sys/vm/drop_caches
# time ls -l /exports/nas-segment-0004/nas-cbs-0005/cifs_share/manydirs/
>/dev/null
real    0m39.483s (Throw away first time for ext4 FS cache population?)
# time ls -l /exports/nas-segment-0004/nas-cbs-0005/cifs_share/manydirs/
>/dev/null
real    0m0.136s
# time ls -l /exports/nas-segment-0004/nas-cbs-0005/cifs_share/manydirs/
>/dev/null
real    0m0.109s

Accessing the NFS mount is also very fast.

# sync;sync; echo '3' > /proc/sys/vm/drop_caches
# time ls -l /mnt/nas-cbs-0005/cifs_share/manydirs/ >/dev/null
real    0m43.819s (Throw away first time for ext4 FS cache population?)
# time ls -l /mnt/nas-cbs-0005/cifs_share/manydirs/ >/dev/null
real    0m0.342s
# time ls -l /mnt/nas-cbs-0005/cifs_share/manydirs/ >/dev/null
real    0m0.200s

Accessing the CIFS FUSE mount is slow, almost 14 seconds!

# sync;sync; echo '3' > /proc/sys/vm/drop_caches
# time ls -l /mnt/nas-cbs-0005-cifs/manydirs/ >/dev/null
real    0m55.759s (Throw away first time for ext4 FS cache population?)
# time ls -l /mnt/nas-cbs-0005-cifs/manydirs/ >/dev/null
real    0m13.458s
# time ls -l /mnt/nas-cbs-0005-cifs/manydirs/ >/dev/null
real    0m13.665s

Accessing the CIFS VFS libgfapi mount is now about twice as
slow as FUSE, at almost 26 seconds due to 'stat-prefetch'
being off!

# sync;sync; echo '3' > /proc/sys/vm/drop_caches
# time ls -l /mnt/nas-cbs-0005-cifs-vfs/cifs_share/manydirs/ >/dev/null
real    1m2.821s (Throw away first time for ext4 FS cache population?)
# time ls -l /mnt/nas-cbs-0005-cifs-vfs/cifs_share/manydirs/ >/dev/null
real    0m25.563s
# time ls -l /mnt/nas-cbs-0005-cifs-vfs/cifs_share/manydirs/ >/dev/null
real    0m26.949s

####################

64KB Writes:

NFS small block writes seem slow at about 50 MB/sec.

# sync;sync; echo '3' > /proc/sys/vm/drop_caches
# sgp_dd time=1 thr=4 bs=64k bpt=1 iflag=dsync oflag=dsync if=/dev/zero
of=/mnt/nas-cbs-0005/cifs_share/testfile count=20k
time to transfer data was 27.249756 secs, 49.25 MB/sec
# sgp_dd time=1 thr=4 bs=64k bpt=1 iflag=dsync oflag=dsync if=/dev/zero
of=/mnt/nas-cbs-0005/cifs_share/testfile count=20k
time to transfer data was 25.893526 secs, 51.83 MB/sec

CIFS FUSE small block writes are more than twice as fast as NFS, at about
118 MB/sec.

# sync;sync; echo '3' > /proc/sys/vm/drop_caches
# sgp_dd time=1 thr=4 bs=64k bpt=1 iflag=dsync oflag=dsync if=/dev/zero
of=/mnt/nas-cbs-0005-cifs/testfile count=20k
time to transfer data was 11.509077 secs, 116.62 MB/sec
# sgp_dd time=1 thr=4 bs=64k bpt=1 iflag=dsync oflag=dsync if=/dev/zero
of=/mnt/nas-cbs-0005-cifs/testfile count=20k
time to transfer data was 11.223902 secs, 119.58 MB/sec

CIFS VFS libgfapi small block writes are very fast, about
twice as fast as CIFS FUSE, at about 232 MB/sec.

# sync;sync; echo '3' > /proc/sys/vm/drop_caches
# sgp_dd time=1 thr=4 bs=64k bpt=1 iflag=dsync oflag=dsync if=/dev/zero
of=/mnt/nas-cbs-0005-cifs-vfs/cifs_share/testfile count=20k
time to transfer data was 5.704753 secs, 235.27 MB/sec
# sgp_dd time=1 thr=4 bs=64k bpt=1 iflag=dsync oflag=dsync if=/dev/zero
of=/mnt/nas-cbs-0005-cifs-vfs/cifs_share/testfile count=20k
time to transfer data was 5.862486 secs, 228.94 MB/sec

64KB Reads:

NFS small block reads are very fast, at about 334 MB/sec.

# sync;sync; echo '3' > /proc/sys/vm/drop_caches
# sgp_dd time=1 thr=4 bs=64k bpt=1 iflag=dsync oflag=dsync of=/dev/null
if=/mnt/nas-cbs-0005/cifs_share/testfile count=20k
time to transfer data was 3.972426 secs, 337.87 MB/sec
# sync;sync; echo '3' > /proc/sys/vm/drop_caches
# sgp_dd time=1 thr=4 bs=64k bpt=1 iflag=dsync oflag=dsync of=/dev/null
if=/mnt/nas-cbs-0005/cifs_share/testfile count=20k
time to transfer data was 4.066978 secs, 330.02 MB/sec

CIFS FUSE small block reads are half of NFS, at about 124
MB/sec.

# sync;sync; echo '3' > /proc/sys/vm/drop_caches
# sgp_dd time=1 thr=4 bs=64k bpt=1 iflag=dsync oflag=dsync of=/dev/null
if=/mnt/nas-cbs-0005-cifs/testfile count=20k
time to transfer data was 10.837072 secs, 123.85 MB/sec
# sync;sync; echo '3' > /proc/sys/vm/drop_caches
# sgp_dd time=1 thr=4 bs=64k bpt=1 iflag=dsync oflag=dsync of=/dev/null
if=/mnt/nas-cbs-0005-cifs/testfile count=20k
time to transfer data was 10.716980 secs, 125.24 MB/sec

CIFS VFS libgfapi small block reads are about the same as
CIFS FUSE, at about 127 MB/sec.

# sync;sync; echo '3' > /proc/sys/vm/drop_caches
# sgp_dd time=1 thr=4 bs=64k bpt=1 iflag=dsync oflag=dsync of=/dev/null
if=/mnt/nas-cbs-0005-cifs-vfs/cifs_share/testfile count=20k
time to transfer data was 10.397888 secs, 129.08 MB/sec
# sync;sync; echo '3' > /proc/sys/vm/drop_caches
# sgp_dd time=1 thr=4 bs=64k bpt=1 iflag=dsync oflag=dsync of=/dev/null
if=/mnt/nas-cbs-0005-cifs-vfs/cifs_share/testfile count=20k
time to transfer data was 10.696802 secs, 125.47 MB/sec

4KB Writes:

NFS very small block writes are very slow at about 4 MB/sec.

# sync;sync; echo '3' > /proc/sys/vm/drop_caches
# sgp_dd time=1 thr=4 bs=4k bpt=1 iflag=dsync oflag=dsync if=/dev/zero
of=/mnt/nas-cbs-0005/cifs_share/testfile count=20k
time to transfer data was 20.450521 secs, 4.10 MB/sec
# sgp_dd time=1 thr=4 bs=4k bpt=1 iflag=dsync oflag=dsync if=/dev/zero
of=/mnt/nas-cbs-0005/cifs_share/testfile count=20k
time to transfer data was 19.669923 secs, 4.26 MB/sec

CIFS FUSE very small block writes are faster, at about 11
MB/sec.

# sync;sync; echo '3' > /proc/sys/vm/drop_caches
# sgp_dd time=1 thr=4 bs=4k bpt=1 iflag=dsync oflag=dsync if=/dev/zero
of=/mnt/nas-cbs-0005-cifs/testfile count=20k
time to transfer data was 7.247578 secs, 11.57 MB/sec
# sgp_dd time=1 thr=4 bs=4k bpt=1 iflag=dsync oflag=dsync if=/dev/zero
of=/mnt/nas-cbs-0005-cifs/testfile count=20k
time to transfer data was 7.422002 secs, 11.30 MB/sec

CIFS VFS libgfapi very small block writes are twice as fast
as CIFS FUSE, at about 22 MB/sec.

# sync;sync; echo '3' > /proc/sys/vm/drop_caches
# sgp_dd time=1 thr=4 bs=4k bpt=1 iflag=dsync oflag=dsync if=/dev/zero
of=/mnt/nas-cbs-0005-cifs-vfs/cifs_share/testfile count=20k
time to transfer data was 3.766179 secs, 22.27 MB/sec
# sgp_dd time=1 thr=4 bs=4k bpt=1 iflag=dsync oflag=dsync if=/dev/zero
of=/mnt/nas-cbs-0005-cifs-vfs/cifs_share/testfile count=20k
time to transfer data was 3.761176 secs, 22.30 MB/sec

4KB Reads:

NFS very small block reads are very fast at about 346
MB/sec.

# sync;sync; echo '3' > /proc/sys/vm/drop_caches
# sgp_dd time=1 thr=4 bs=4k bpt=1 iflag=dsync oflag=dsync of=/dev/null
if=/mnt/nas-cbs-0005/cifs_share/testfile count=20k
time to transfer data was 0.244960 secs, 342.45 MB/sec
# sync;sync; echo '3' > /proc/sys/vm/drop_caches
# sgp_dd time=1 thr=4 bs=4k bpt=1 iflag=dsync oflag=dsync of=/dev/null
if=/mnt/nas-cbs-0005/cifs_share/testfile count=20k
time to transfer data was 0.240472 secs, 348.84 MB/sec

CIFS FUSE very small block reads are less than half as fast
as NFS, at about 143 MB/sec.

# sync;sync; echo '3' > /proc/sys/vm/drop_caches
# sgp_dd time=1 thr=4 bs=4k bpt=1 iflag=dsync oflag=dsync of=/dev/null
if=/mnt/nas-cbs-0005-cifs/testfile count=20k
time to transfer data was 0.606534 secs, 138.30 MB/sec
# sync;sync; echo '3' > /proc/sys/vm/drop_caches
# sgp_dd time=1 thr=4 bs=4k bpt=1 iflag=dsync oflag=dsync of=/dev/null
if=/mnt/nas-cbs-0005-cifs/testfile count=20k
time to transfer data was 0.576185 secs, 145.59 MB/sec

CIFS VFS libgfapi very small block reads a slight bit slower
than CIFS FUSE, at about 137 MB/sec.

# sync;sync; echo '3' > /proc/sys/vm/drop_caches
# sgp_dd time=1 thr=4 bs=4k bpt=1 iflag=dsync oflag=dsync of=/dev/null
if=/mnt/nas-cbs-0005-cifs-vfs/cifs_share/testfile count=20k
time to transfer data was 0.611328 secs, 137.22 MB/sec
# sync;sync; echo '3' > /proc/sys/vm/drop_caches
# sgp_dd time=1 thr=4 bs=4k bpt=1 iflag=dsync oflag=dsync of=/dev/null
if=/mnt/nas-cbs-0005-cifs-vfs/cifs_share/testfile count=20k
time to transfer data was 0.615834 secs, 136.22 MB/sec
EOM
_______________________________________________
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Reply via email to