Re: [Gluster-users] Event threads effect
Just a quick follow up - I had to somehow miss the updated thread count because fio (client side) and brick show appropriate number of threads if I raise count above 16. Seems like minimal thread count is always spawned at least on the server which probably confused me. Unfortunately the connection count to the brick is still the same (2). However the read perfomance is still the same. strace on fio threads shows reads (same on the server side), so the workload is somehow distributed between them, but performance is very different from running multiple jobs (numjobs > 1). Does anyone know how fio resp. gfapi ioengine uses multiple threads? The 200MB/s I get on client vs 1.5GB/s if the same test is run locally on the server so far seems to be caused by the network latency, but could the client be advised to open multiple connections (maybe one per thread)? netstat reports 2 connections per fio process and raising numjobs results in more connections and maxes out brick utilization. My goal is to max out IO performance of QEMU/KVM guests. So far only approximately 200MB/s are possible with for certain block sizes. -ps On Mon, Nov 7, 2016 at 4:47 PM, Pavel Szalbot wrote: > Hi everybody, > > I am trying to benchmark my cluster with fio's gfapi ioengine and evaluate > effect of various volume options on performance. I have so far observed > following: > > 1) *thread* options do not affect performance or thread count - htop > always show 2 threads on client, there are always 16 glusterfsd threads on > server > 2) running the same test locally (on the brick) shows up to 5x better > throughput than over 10GBe (MTU 9000, iperfed, pinged with DF set, no drops > on switches or cards, tcpdumped to check network issues) > 3) performance.cache-size value has no effect on performance (32MB or 1GB) > > I would expect raising client threads number leading to more TCP > connections, higher disk utilization and throughput. If I run multiple fio > jobs (numjobs=8), I am able to saturate the network link. > > Is this normal or I am missing something really badly? > > fio config: > > [global] > name=gfapi test > create_on_open=1 > volume=test3-vol > brick=gfs-3.san > ioengine=gfapi > direct=1 > bs=256k > rw=read > iodepth=1 > numjobs=1 > size=8192m > loops=1 > refill_buffers=1 > [job1] > > reconfigured volume options: > performance.client-io-threads: on > performance.cache-size: 1GB > performance.read-ahead: off > server.outstanding-rpc-limit: 128 > performance.io-thread-count: 16 > server.event-threads: 16 > client.event-threads: 16 > nfs.disable: on > transport.address-family: inet > performance.readdir-ahead: on > > -ps > ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] [Nfs-ganesha-devel] Understandings of ganesha-ha.sh
Hi, On 11/05/2016 12:29 AM, ML Wong wrote: I like to ask for some recommendations here. 1) For /usr/libexec/ganesha/ganesha-ha.sh, as we have been taking advantages of using pacemaker+corosync for some other services, however, we always run into the issue of losing other resources we setup in the cluster when we run ganesha-ha.sh add/delete. Just like to know if that's what is expected? I have already tried 2 different setups, they both give me the same result, just like to better understand if i need to find a way to work around for our environment. Do you mean you are using pacemaker+corosync to manage services other than nfs-ganesha and you loose those resources when ganesha.sh add/delete operation is performed? Could you please provide details about those resources affected? 2) Have anyone of you running into a scenario - with a demand of only adding capacity to the Gluster volume, but not necessary to add more nodes available to the Cluster? Meaning, continue adding bricks accordingly to the existing Gluster volume, but without doing the ganesha-ha.sh --add node process? Is it a bad idea? Or, between Ganesha, and Gluster, do they have to have same number of "HA_CLUSTER_NODES" as the number of Gluster peers? Its perfectly acceptable. Ganesha cluster is a subset of Gluster trusted storage pool. So you can increase the capacity of gluster volume without the need to alter the nfs-ganesha cluster. 3) Have anyone in the list tried adding nodes into Ganesha without using ganesha-ha.sh? And, just by using "pcs"? Ganesha team, do i miss any other resource, and constraints for new nodes? a) nfs_setup/mon/grace-clone b) [node]-cluster_ip-1, c) location constraint for each member-node with score priority d) location-nfs-grace-clone e) order constraints I am not sure if any one has tried out that. We strongly recommend to use the ganesha.sh script as there could be additions/fixes which went in that script w.r.t cluster configuration. You may have to double-check the resources and constraints every time you try to configure them manually. Do you see any difference between using the script and the manual configuration? Thanks, Soumya Thanks all, Melvin For gluster-user, i am not sure if this is the right list to post. Sorry for the spam. -- Developer Access Program for Intel Xeon Phi Processors Access to Intel Xeon Phi processor-based developer platforms. With one year of Intel Parallel Studio XE. Training and support from Colfax. Order your platform today. http://sdm.link/xeonphi ___ Nfs-ganesha-devel mailing list nfs-ganesha-de...@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
[Gluster-users] Unsynced gfid's that don't exist
Hi all Last Friday we had some network issues, the build in gluster heal mechanism resolved almost all unsynced entries except 16 entries. The command gluster volume heal public info displays: Brick gfs06a-gs:/mnt/public/brick1 Status: Connected Number of entries: 16 The strange thing is that I cannot find these gfids within the .glusterfs directory, also trying to resolve these gfids with the script https://gist.github.com/semiosis/4392640 doesn’t show anything. Any suggestions how to get rid of these unsynced entries messages? Regards Davy ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Feedback on DHT option "cluster.readdir-optimize"
- Original Message - > From: "Raghavendra Gowdappa" > To: "Gluster Devel" , "gluster-users" > > Sent: Tuesday, November 8, 2016 10:37:56 AM > Subject: Feedback on DHT option "cluster.readdir-optimize" > > Hi all, > > We have an option in called "cluster.readdir-optimize" which alters the > behavior of readdirp in DHT. This value affects how storage/posix treats > dentries corresponding to directories (not for files). > > When this value is on, > * DHT asks only one subvol/brick to return dentries corresponding to > directories. > * Other subvols/bricks filter dentries corresponding to directories and send > only dentries corresponding to files. > > When this value is off (this is the default value), > * All subvols return all dentries stored on them. IOW, bricks don't filter > any dentries. > * Since a directory has one dentry representing it on each subvol, dht > (loaded on client) picks up dentry only from hashed subvol. > > Note that irrespective of value of this option, _all_ subvols return dentries > corresponding to files which are stored on them. > > This option was introduced to boost performance of readdir as (when set on), > filtering of dentries happens on bricks and hence there is reduced: > 1. network traffic (with filtering all the redundant dentry information) > 2. number of readdir calls between client and server for the same number of > dentries returned to application (If filtering happens on client, lesser > number of dentries in result and hence more number of readdir calls. IOW, > result buffer is not filled to maximum capacity). > > We want to hear from you Whether you've used this option and if yes, > 1. Did it really boost readdir performance? > 2. Do you've any performance data to find out what was the percentage of > improvement (or deterioration)? > 3. Data set you had (Number of files, directories and organisation of > directories). 4. Volume information. IOW, how many subvols did dht had? > > If we find out that this option is really helping you, we can spend our > energies on fixing issues that will arise when this option is set to on. One > common issue with turning this option on is that when this option is set, > some directories might not show up in directory listing [1]. The reason for > this is that: > 1. If a directory can be created on a hashed subvol, mkdir (result to > application) will be successful, irrespective of result of mkdir on rest of > the subvols. > 2. So, any subvol we pick to give us dentries corresponding to directory need > not contain all the directories and we might miss out those directories in > listing. > > Your feedback is important for us and will help us to prioritize and improve > things. > > [1] https://www.gluster.org/pipermail/gluster-users/2016-October/028703.html > > regards, > Raghavendra ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
[Gluster-users] Feedback on DHT option "cluster.readdir-optimize"
Hi all, We have an option in called "cluster.readdir-optimize" which alters the behavior of readdirp in DHT. This value affects how storage/posix treats dentries corresponding to directories (not for files). When this value is on, * DHT asks only one subvol/brick to return dentries corresponding to directories. * Other subvols/bricks filter dentries corresponding to directories and send only dentries corresponding to files. When this value is off (this is the default value), * All subvols return all dentries stored on them. IOW, bricks don't filter any dentries. * Since a directory has one dentry representing it on each subvol, dht (loaded on client) picks up dentry only from hashed subvol. Note that irrespective of value of this option, _all_ subvols return dentries corresponding to files which are stored on them. This option was introduced to boost performance of readdir as (when set on), filtering of dentries happens on bricks and hence there is reduced: 1. network traffic (with filtering all the redundant dentry information) 2. number of readdir calls between client and server for the same number of dentries returned to application (If filtering happens on client, lesser number of dentries in result and hence more number of readdir calls. IOW, result buffer is not filled to maximum capacity). We want to hear from you Whether you've used this option and if yes, 1. Did it really boost readdir performance? 2. Do you've any performance data to find out what was the percentage of improvement (or deterioration)? 3. Data set you had (Number of files, directories and organisation of directories). If we find out that this option is really helping you, we can spend our energies on fixing issues that will arise when this option is set to on. One common issue with turning this option on is that when this option is set, some directories might not show up in directory listing [1]. The reason for this is that: 1. If a directory can be created on a hashed subvol, mkdir (result to application) will be successful, irrespective of result of mkdir on rest of the subvols. 2. So, any subvol we pick to give us dentries corresponding to directory need not contain all the directories and we might miss out those directories in listing. Your feedback is important for us and will help us to prioritize and improve things. [1] https://www.gluster.org/pipermail/gluster-users/2016-October/028703.html regards, Raghavendra ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] mounting gluster volumes issue
On Tue, Nov 8, 2016 at 5:21 AM, Thing wrote: > Hi, > > I have a 3 node raid 1 gluster setup on centos7.2. Each node can mount the > 2 gluster volumes fine. > > > [root@glusterp1 volume1]# df -h > Filesystem Size Used Avail Use% > Mounted on > /dev/mapper/centos-root 20G 1.9G 19G 10% / > devtmpfs 3.8G 0 3.8G 0% /dev > tmpfs 3.8G 4.0K 3.8G 1% > /dev/shm > tmpfs 3.8G 17M 3.8G 1% /run > tmpfs 3.8G 0 3.8G 0% > /sys/fs/cgroup > /dev/sda1 969M 197M 723M 22% /boot > /dev/mapper/centos-tmp 3.9G 33M 3.9G 1% /tmp > /dev/mapper/centos-home 50G 33M 50G 1% /home > /dev/mapper/centos-data1 120G 33M 120G 1% > /data1 > /dev/mapper/centos-var 20G 529M 20G 3% /var > /dev/mapper/centos00-var_lib 9.4G 180M 9.2G 2% > /var/lib > tmpfs 771M 0 771M 0% > /run/user/1000 > /dev/mapper/vg--gluster--prod1-gluster--prod1 932G 35M 932G 1% > /gluster-bricks/gluster-prod1 > glusterp1.ods.graywitch.co.nz:volume2 112G 33M 112G 1% > /volume2 > glusterp1.ods.graywitch.co.nz:gluster-prod-1 932G 34M 932G 1% > /volume1 > [root@glusterp1 volume1]# > === > > > However my raspberry pi2 running raspbian cannot, it fails silently. > > Logs on the pi2 shows, > > == > 8><-- > root@warlocke:/# tail /var/log/glusterfs/gluster-prod-2.log > [2016-11-08 12:22:17.776599] I [glusterfsd.c:1493:main] > 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.2.7 > [2016-11-08 12:22:17.805706] W [rpc-common.c:64:xdr_to_generic] 0-xdr: > XDR decoding failed > [2016-11-08 12:22:17.805916] E [glusterfsd-mgmt.c:621:mgmt_getspec_cbk] > 0-glusterfs: XDR decoding error > [2016-11-08 12:22:17.806021] E [glusterfsd-mgmt.c:695:mgmt_getspec_cbk] > 0-mgmt: failed to fetch volume file (key:volume2) > [2016-11-08 12:22:17.806118] W [glusterfsd.c:727:cleanup_and_exit] 0-: > received signum (0), shutting down > [2016-11-08 12:22:17.806241] I [fuse-bridge.c:3849:fini] 0-fuse: > Unmounting '/gluster-prod-2'. > root@warlocke:/# tail /var/log/glusterfs/gluster-prod-1.log > [2016-11-08 12:18:51.538029] E [glusterfsd-mgmt.c:621:mgmt_getspec_cbk] > 0-glusterfs: XDR decoding error > [2016-11-08 12:18:51.538122] E [glusterfsd-mgmt.c:695:mgmt_getspec_cbk] > 0-mgmt: failed to fetch volume file (key:/gluster-prod1) > [2016-11-08 12:18:51.538212] W [glusterfsd.c:727:cleanup_and_exit] 0-: > received signum (0), shutting down > [2016-11-08 12:18:51.538381] I [fuse-bridge.c:3849:fini] 0-fuse: > Unmounting '/gluster-prod-1'. > [2016-11-08 12:19:52.628029] I [glusterfsd.c:1493:main] > 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.2.7 > You are using a pretty old client version (3.2.7). The minimum supported version is 3.6.x and that's going to reach EOL once 3.9 is out. > [2016-11-08 12:19:52.649443] W [rpc-common.c:64:xdr_to_generic] 0-xdr: > XDR decoding failed > [2016-11-08 12:19:52.649664] E [glusterfsd-mgmt.c:621:mgmt_getspec_cbk] > 0-glusterfs: XDR decoding error > This looks to be a compatibility issue to me. Are you using latest glusterfs-server package? What's the output of glusterd --version ? I'd recommend you to upgrade your clients (and server if not done already) to the latest supported versions. [2016-11-08 12:19:52.649749] E [glusterfsd-mgmt.c:695:mgmt_getspec_cbk] > 0-mgmt: failed to fetch volume file (key:gluster-prod1) > [2016-11-08 12:19:52.649850] W [glusterfsd.c:727:cleanup_and_exit] 0-: > received signum (0), shutting down > [2016-11-08 12:19:52.649969] I [fuse-bridge.c:3849:fini] 0-fuse: > Unmounting '/gluster-prod-1'. > === > > Centos7.2's gluster version, > > == > [root@glusterp1 volume1]# rpm -qa |grep gluster > glusterfs-cli-3.8.5-1.el7.x86_64 > glusterfs-libs-3.8.5-1.el7.x86_64 > vdsm-gluster-4.18.13-1.el7.centos.noarch > centos-release-gluster38-1.0-1.el7.centos.noarch > glusterfs-fuse-3.8.5-1.el7.x86_64 > glusterfs-client-xlators-3.8.5-1.el7.x86_64 > glusterfs-server-3.8.5-1.el7.x86_64 > glusterfs-3.8.5-1.el7.x86_64 > glusterfs-geo-replication-3.8.5-1.el7.x86_64 > glusterfs-api-3.8.5-1.el7.x86_64 > [root@glusterp1 volume1]# > == > > pi's version, > > == > root@warlocke:/# apt-cache search gluster > glusterfs-client - clustered file-system (client package) > glusterfs-common - GlusterFS common libraries and translator modules > glusterfs-dbg - GlusterFS debugging symbols > glusterfs-examples - example files for the glusterfs server and client > glusterfs-server - clustered file-system (server package) > root@warlocke:/# apt-get install glusterfs-client > Reading package lists... D
[Gluster-users] Gluster clients losing acl mount option.
I'm having an issue on some clients seeming to lose their access use/set ACLs on gluster mounted filesystems, on booting and mounting by hand, the mount looks like this: server1:/input on /data/input type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,allow_other,max_read=131072) after sometime it becomes: server1:/common on /data/common type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_ other,max_read=131072) And all our ACL setting/using fail. There are no messages in the logs on either client or server. The only way to fix it is to unmount/mount. This is on CentOS 7, with the latest Gluster packages (both server and clients): [root@client ~]# rpm -qa |grep gluster glusterfs-fuse-3.8.5-1.el7.x86_64 glusterfs-libs-3.8.5-1.el7.x86_64 glusterfs-3.8.5-1.el7.x86_64 glusterfs-client-xlators-3.8.5-1.el7.x86_64 It sounds a bit like this issue: https://www.gluster.org/piperm ail/gluster-users/2016-June/027066.html But that was fixed with a newer version, which we should be running. Any ideas where to look or what might be causing this? Thanks. ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
[Gluster-users] mounting gluster volumes issue
Hi, I have a 3 node raid 1 gluster setup on centos7.2. Each node can mount the 2 gluster volumes fine. [root@glusterp1 volume1]# df -h Filesystem Size Used Avail Use% Mounted on /dev/mapper/centos-root 20G 1.9G 19G 10% / devtmpfs 3.8G 0 3.8G 0% /dev tmpfs 3.8G 4.0K 3.8G 1% /dev/shm tmpfs 3.8G 17M 3.8G 1% /run tmpfs 3.8G 0 3.8G 0% /sys/fs/cgroup /dev/sda1 969M 197M 723M 22% /boot /dev/mapper/centos-tmp 3.9G 33M 3.9G 1% /tmp /dev/mapper/centos-home 50G 33M 50G 1% /home /dev/mapper/centos-data1 120G 33M 120G 1% /data1 /dev/mapper/centos-var 20G 529M 20G 3% /var /dev/mapper/centos00-var_lib 9.4G 180M 9.2G 2% /var/lib tmpfs 771M 0 771M 0% /run/user/1000 /dev/mapper/vg--gluster--prod1-gluster--prod1 932G 35M 932G 1% /gluster-bricks/gluster-prod1 glusterp1.ods.graywitch.co.nz:volume2 112G 33M 112G 1% /volume2 glusterp1.ods.graywitch.co.nz:gluster-prod-1 932G 34M 932G 1% /volume1 [root@glusterp1 volume1]# === However my raspberry pi2 running raspbian cannot, it fails silently. Logs on the pi2 shows, == 8><-- root@warlocke:/# tail /var/log/glusterfs/gluster-prod-2.log [2016-11-08 12:22:17.776599] I [glusterfsd.c:1493:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.2.7 [2016-11-08 12:22:17.805706] W [rpc-common.c:64:xdr_to_generic] 0-xdr: XDR decoding failed [2016-11-08 12:22:17.805916] E [glusterfsd-mgmt.c:621:mgmt_getspec_cbk] 0-glusterfs: XDR decoding error [2016-11-08 12:22:17.806021] E [glusterfsd-mgmt.c:695:mgmt_getspec_cbk] 0-mgmt: failed to fetch volume file (key:volume2) [2016-11-08 12:22:17.806118] W [glusterfsd.c:727:cleanup_and_exit] 0-: received signum (0), shutting down [2016-11-08 12:22:17.806241] I [fuse-bridge.c:3849:fini] 0-fuse: Unmounting '/gluster-prod-2'. root@warlocke:/# tail /var/log/glusterfs/gluster-prod-1.log [2016-11-08 12:18:51.538029] E [glusterfsd-mgmt.c:621:mgmt_getspec_cbk] 0-glusterfs: XDR decoding error [2016-11-08 12:18:51.538122] E [glusterfsd-mgmt.c:695:mgmt_getspec_cbk] 0-mgmt: failed to fetch volume file (key:/gluster-prod1) [2016-11-08 12:18:51.538212] W [glusterfsd.c:727:cleanup_and_exit] 0-: received signum (0), shutting down [2016-11-08 12:18:51.538381] I [fuse-bridge.c:3849:fini] 0-fuse: Unmounting '/gluster-prod-1'. [2016-11-08 12:19:52.628029] I [glusterfsd.c:1493:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.2.7 [2016-11-08 12:19:52.649443] W [rpc-common.c:64:xdr_to_generic] 0-xdr: XDR decoding failed [2016-11-08 12:19:52.649664] E [glusterfsd-mgmt.c:621:mgmt_getspec_cbk] 0-glusterfs: XDR decoding error [2016-11-08 12:19:52.649749] E [glusterfsd-mgmt.c:695:mgmt_getspec_cbk] 0-mgmt: failed to fetch volume file (key:gluster-prod1) [2016-11-08 12:19:52.649850] W [glusterfsd.c:727:cleanup_and_exit] 0-: received signum (0), shutting down [2016-11-08 12:19:52.649969] I [fuse-bridge.c:3849:fini] 0-fuse: Unmounting '/gluster-prod-1'. === Centos7.2's gluster version, == [root@glusterp1 volume1]# rpm -qa |grep gluster glusterfs-cli-3.8.5-1.el7.x86_64 glusterfs-libs-3.8.5-1.el7.x86_64 vdsm-gluster-4.18.13-1.el7.centos.noarch centos-release-gluster38-1.0-1.el7.centos.noarch glusterfs-fuse-3.8.5-1.el7.x86_64 glusterfs-client-xlators-3.8.5-1.el7.x86_64 glusterfs-server-3.8.5-1.el7.x86_64 glusterfs-3.8.5-1.el7.x86_64 glusterfs-geo-replication-3.8.5-1.el7.x86_64 glusterfs-api-3.8.5-1.el7.x86_64 [root@glusterp1 volume1]# == pi's version, == root@warlocke:/# apt-cache search gluster glusterfs-client - clustered file-system (client package) glusterfs-common - GlusterFS common libraries and translator modules glusterfs-dbg - GlusterFS debugging symbols glusterfs-examples - example files for the glusterfs server and client glusterfs-server - clustered file-system (server package) root@warlocke:/# apt-get install glusterfs-client Reading package lists... Done Building dependency tree Reading state information... Done glusterfs-client is already the newest version. 0 upgraded, 0 newly installed, 0 to remove and 4 not upgraded. root@warlocke:/# dpkg -l glusterfs-client Desired=Unknown/Install/Remove/Purge/Hold | Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend |/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad) ||/ Name Version Architecture Description +++-=-===-===- ii glusterfs-client
[Gluster-users] Event threads effect
Hi everybody, I am trying to benchmark my cluster with fio's gfapi ioengine and evaluate effect of various volume options on performance. I have so far observed following: 1) *thread* options do not affect performance or thread count - htop always show 2 threads on client, there are always 16 glusterfsd threads on server 2) running the same test locally (on the brick) shows up to 5x better throughput than over 10GBe (MTU 9000, iperfed, pinged with DF set, no drops on switches or cards, tcpdumped to check network issues) 3) performance.cache-size value has no effect on performance (32MB or 1GB) I would expect raising client threads number leading to more TCP connections, higher disk utilization and throughput. If I run multiple fio jobs (numjobs=8), I am able to saturate the network link. Is this normal or I am missing something really badly? fio config: [global] name=gfapi test create_on_open=1 volume=test3-vol brick=gfs-3.san ioengine=gfapi direct=1 bs=256k rw=read iodepth=1 numjobs=1 size=8192m loops=1 refill_buffers=1 [job1] reconfigured volume options: performance.client-io-threads: on performance.cache-size: 1GB performance.read-ahead: off server.outstanding-rpc-limit: 128 performance.io-thread-count: 16 server.event-threads: 16 client.event-threads: 16 nfs.disable: on transport.address-family: inet performance.readdir-ahead: on -ps ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
[Gluster-users] GlusterFS-3.7.17 released
Hi all and apologies for the late announcement. GlusterFS-3.7.17 has been released. The release-notes for this release can be viewed at [1]. 3.7.17 packages for Fedora (24-26), Debian (wheezy, jessie, stretch), and RHEL/CentOS (5, 6, 7) are available on download.gluster.org. 3.7.17 packages for Ubuntu (trusty, wily, xenial) are in the PPA. 3.7.17 packages for SuSE (SLES, OpenSuSE, Leap42.1) will be in the SuSE Build System soon. Thanks and Regards, Samikshan [1] https://github.com/gluster/glusterfs/blob/release-3.7/doc/release-notes/3.7.17.md ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] [Gluster-devel] A question of GlusterFS dentries!
On Wed, Nov 2, 2016 at 9:54 AM, Serkan Çoban wrote: > +1 for "no-rewinddir-support" option in DHT. > We are seeing very slow directory listing specially with 1500+ brick > volume, 'ls' takes 20+ second with 1000+ files. > If its not clear, I would like to point out that serialized readdir is not the sole issue that's causing slowness. If directories are _HUGE_ then I don't expect too much of benefit from parallelizing. Also, as others have been pointing out (in various in-person discussions) there are other scalability limits like number of messages, memory consumed etc to wind calls parallely. I'll probably do a rough POC in next couple of months to see whether this idea has any substance or not and post the results. > On Wed, Nov 2, 2016 at 7:08 AM, Raghavendra Gowdappa > wrote: > > > > > > - Original Message - > >> From: "Keiviw" > >> To: gluster-de...@gluster.org > >> Sent: Tuesday, November 1, 2016 12:41:02 PM > >> Subject: [Gluster-devel] A question of GlusterFS dentries! > >> > >> Hi, > >> In GlusterFS distributed volumes, listing a non-empty directory was > slow. > >> Then I read the dht codes and found the reasons. But I was confused that > >> GlusterFS dht travesed all the bricks(in the volume) sequentially,why > not > >> use multi-thread to read dentries from multiple bricks simultaneously. > >> That's a question that's always puzzled me, Couly you please tell me > >> something about this??? > > > > readdir across subvols is sequential mostly because we have to support > rewinddir(3). We need to maintain the mapping of offset and dentry across > multiple invocations of readdir. In other words if someone did a rewinddir > to an offset corresponding to earlier dentry, subsequent readdirs should > return same set of dentries what the earlier invocation of readdir > returned. For example, in an hypothetical scenario, readdir returned > following dentries: > > > > 1. a, off=10 > > 2. b, off=2 > > 3. c, off=5 > > 4. d, off=15 > > 5. e, off=17 > > 6. f, off=13 > > > > Now if we did rewinddir to off 5 and issue readdir again we should get > following dentries: > > (c, off=5), (d, off=15), (e, off=17), (f, off=13) > > > > Within a subvol backend filesystem provides rewinddir guarantee for the > dentries present on that subvol. However, across subvols it is the > responsibility of DHT to provide the above guarantee. Which means we > should've some well defined order in which we send readdir calls (Note that > order is not well defined if we do a parallel readdir across all subvols). > So, DHT has sequential readdir which is a well defined order of reading > dentries. > > > > To give an example if we have another subvol - subvol2 - (in addiction > to the subvol above - say subvol1) with following listing: > > 1. g, off=16 > > 2. h, off=20 > > 3. i, off=3 > > 4. j, off=19 > > > > With parallel readdir we can have many ordering like - (a, b, g, h, i, > c, d, e, f, j), (g, h, a, b, c, i, j, d, e, f) etc. Now if we do (with > readdir done parallely): > > > > 1. A complete listing of the directory (which can be any one of 10P1 = > 10 ways - I hope math is correct here). > > 2. Do rewinddir (20) > > > > We cannot predict what are the set of dentries that come _after_ offset > 20. However, if we do a readdir sequentially across subvols there is only > one directory listing i.e, (a, b, c, d, e, f, g, h, i, j). So, its easier > to support rewinddir. > > > > If there is no POSIX requirement for rewinddir support, I think a > parallel readdir can easily be implemented (which improves performance > too). But unfortunately rewinddir is still a POSIX requirement. This also > opens up another possibility of a "no-rewinddir-support" option in DHT, > which if enabled results in parallel readdirs across subvols. What I am not > sure is how many users still use rewinddir? If there is a critical mass > which wants performance with a tradeoff of no rewinddir support this can be > a good feature. > > > > +gluster-users to get an opinion on this. > > > > regards, > > Raghavendra > > > >> > >> > >> > >> > >> > >> > >> ___ > >> Gluster-devel mailing list > >> gluster-de...@gluster.org > >> http://www.gluster.org/mailman/listinfo/gluster-devel > > ___ > > Gluster-users mailing list > > Gluster-users@gluster.org > > http://www.gluster.org/mailman/listinfo/gluster-users > ___ > Gluster-devel mailing list > gluster-de...@gluster.org > http://www.gluster.org/mailman/listinfo/gluster-devel > -- Raghavendra G ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Block storage with Qemu-Tcmu
[Top posting] I am planning to write a short blog to answer few similar questions that I received after posting this blog. Is iSCSI stack obligatory for block store ? Answer is No. It basically depends on the use case and choice. If we can run/manage target emulation on the client side, we don't have to bring iSCSI stack into picture. We simply export LUN using a loopback device i.e after creating the back end with qemu-tcmu storage module, we can directly export the target via loopback instead of iSCSI. So In this case we don't see overheads with iSCSI layers, but IMO overhead with iSCSI can be very minimal, may be I need the performance numbers to prove (will spin a benchmark soon) I have done some basic benchmarking taking baseline as Fuse mount and target as iSCSI exposed target via tcmu-runner, you can find them at [1] You can find more bechmark's at [2], the commit messages should explain you the configurations. Hope that answers most of your questions :) [1] https://htmlpreview.github.io/?https://github.com/pkalever/iozone_results_gluster/blob/master/block-store/iscsi-fuse-1/html_out/index.html [2] https://github.com/pkalever/iozone_results_gluster/blob/master/block-store/ -- Prasanna On Mon, Nov 7, 2016 at 2:23 PM, Gandalf Corvotempesta wrote: > Il 07 nov 2016 09:23, "Lindsay Mathieson" ha > scritto: >> >> From a quick scan, there doesn't seem to be any particular advantage >> over qemu using gfapi directly? Is this more aimed at apps that can't >> use gfapi such as vmware or as a replacement for NFS? >> > > Dump question: why should i use a block storage replacing nfs? > Nfs-ganesha makes use of libgfapi, block storage does the same but also need > the whole iscsi stack so performance could be lower > > If i don't need direct access to a block device on the client (in example > for creating custom FS or LVM and so on), the nfs ganesha should be a better > approach, right? > > Anyone compared performances between: > > 1. Fuse mount > 2. Nfs > 3. Nfs ganesha > 4. Qemu direct access via gfapi > 5. Iscsi > > ? ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Block storage with Qemu-Tcmu
On Mon, Nov 7, 2016 at 1:53 PM, Lindsay Mathieson wrote: > On 7 November 2016 at 17:01, Prasanna Kalever wrote: >> Yet another approach to achieve Gluster Block Storage is with Qemu-Tcmu. > > > Thanks Prasanna, interesting reading. > > From a quick scan, there doesn't seem to be any particular advantage > over qemu using gfapi directly? Is this more aimed at apps that can't > use gfapi such as vmware or as a replacement for NFS? > As mentioned in the conclusion part in the blog, the advantage here is easy snapshots. Qemu-tcmu will come up with '--snapshot' option (work still in progress) as much like qemu-img. Supporting this within gluster needs additional maintenance of qemu-block xlator which is the clone of qcow2 spec implementation, which could be more work. Also the qemu gluster protocol driver (access gfapi) is more mature and tested. -- Prasanna > -- > Lindsay ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Block storage with Qemu-Tcmu
Il 07 nov 2016 09:23, "Lindsay Mathieson" ha scritto: > > From a quick scan, there doesn't seem to be any particular advantage > over qemu using gfapi directly? Is this more aimed at apps that can't > use gfapi such as vmware or as a replacement for NFS? > Dump question: why should i use a block storage replacing nfs? Nfs-ganesha makes use of libgfapi, block storage does the same but also need the whole iscsi stack so performance could be lower If i don't need direct access to a block device on the client (in example for creating custom FS or LVM and so on), the nfs ganesha should be a better approach, right? Anyone compared performances between: 1. Fuse mount 2. Nfs 3. Nfs ganesha 4. Qemu direct access via gfapi 5. Iscsi ? ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Block storage with Qemu-Tcmu
On 7 November 2016 at 17:01, Prasanna Kalever wrote: > Yet another approach to achieve Gluster Block Storage is with Qemu-Tcmu. Thanks Prasanna, interesting reading. >From a quick scan, there doesn't seem to be any particular advantage over qemu using gfapi directly? Is this more aimed at apps that can't use gfapi such as vmware or as a replacement for NFS? -- Lindsay ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users