Re: [Gluster-users] Event threads effect

2016-11-07 Thread Pavel Szalbot
Just a quick follow up - I had to somehow miss the updated thread count
because fio (client side) and brick show appropriate number of threads if I
raise count above 16. Seems like minimal thread count is always spawned at
least on the server which probably confused me. Unfortunately the
connection count to the brick is still the same (2).

However the read perfomance is still the same. strace on fio threads shows
reads (same on the server side), so the workload is somehow distributed
between them, but performance is very different from running multiple jobs
(numjobs > 1). Does anyone know how fio resp. gfapi ioengine uses multiple
threads?

The 200MB/s I get on client vs 1.5GB/s if the same test is run locally on
the server so far seems to be caused by the network latency, but could the
client be advised to open multiple connections (maybe one per thread)?
netstat reports 2 connections per fio process and raising numjobs results
in more connections and maxes out brick utilization.

My goal is to max out IO performance of QEMU/KVM guests. So far only
approximately 200MB/s are possible with for certain block sizes.


-ps

On Mon, Nov 7, 2016 at 4:47 PM, Pavel Szalbot 
wrote:

> Hi everybody,
>
> I am trying to benchmark my cluster with fio's gfapi ioengine and evaluate
> effect of various volume options on performance. I have so far observed
> following:
>
> 1) *thread* options do not affect performance or thread count - htop
> always show 2 threads on client, there are always 16 glusterfsd threads on
> server
> 2) running the same test locally (on the brick) shows up to 5x better
> throughput than over 10GBe (MTU 9000, iperfed, pinged with DF set, no drops
> on switches or cards, tcpdumped to check network issues)
> 3) performance.cache-size value has no effect on performance (32MB or 1GB)
>
> I would expect raising client threads number leading to more TCP
> connections, higher disk utilization and throughput. If I run multiple fio
> jobs (numjobs=8), I am able to saturate the network link.
>
> Is this normal or I am missing something really badly?
>
> fio config:
>
> [global]
> name=gfapi test
> create_on_open=1
> volume=test3-vol
> brick=gfs-3.san
> ioengine=gfapi
> direct=1
> bs=256k
> rw=read
> iodepth=1
> numjobs=1
> size=8192m
> loops=1
> refill_buffers=1
> [job1]
>
> reconfigured volume options:
> performance.client-io-threads: on
> performance.cache-size: 1GB
> performance.read-ahead: off
> server.outstanding-rpc-limit: 128
> performance.io-thread-count: 16
> server.event-threads: 16
> client.event-threads: 16
> nfs.disable: on
> transport.address-family: inet
> performance.readdir-ahead: on
>
> -ps
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] [Nfs-ganesha-devel] Understandings of ganesha-ha.sh

2016-11-07 Thread Soumya Koduri

Hi,

On 11/05/2016 12:29 AM, ML Wong wrote:

I like to ask for some recommendations here.

1) For /usr/libexec/ganesha/ganesha-ha.sh, as we have been taking
advantages of using pacemaker+corosync for some other services, however,
we always run into the issue of losing other resources we setup in the
cluster when we run ganesha-ha.sh add/delete. Just like to know if
that's what is expected?  I have already tried 2 different setups, they
both give me the same result, just like to better understand if i need
to find a way to work around for our environment.


Do you mean you are using pacemaker+corosync to manage services other 
than nfs-ganesha and you loose those resources when ganesha.sh 
add/delete operation is performed? Could you please provide details 
about those resources affected?




2) Have anyone of you running into a scenario - with a demand of only
adding capacity to the Gluster volume, but not necessary to add more
nodes available to the Cluster? Meaning, continue adding bricks
accordingly to the existing Gluster volume, but without doing the
ganesha-ha.sh  --add node process?  Is it a bad idea? Or, between
Ganesha, and Gluster, do they have to have same number of
"HA_CLUSTER_NODES" as the number of Gluster peers?


Its perfectly acceptable. Ganesha cluster is a subset of Gluster trusted 
storage pool. So you can increase the capacity of gluster volume without 
the need to alter the nfs-ganesha cluster.




3) Have anyone in the list tried adding nodes into Ganesha without using
ganesha-ha.sh? And, just by using "pcs"? Ganesha team, do i miss any
other resource, and constraints for new nodes?
a) nfs_setup/mon/grace-clone
b) [node]-cluster_ip-1,
c) location constraint for each member-node with score priority
d) location-nfs-grace-clone
e) order constraints


I am not sure if any one has tried out that. We strongly recommend to 
use the ganesha.sh script as there could be additions/fixes which went 
in that script w.r.t cluster configuration. You may have to double-check 
the resources and constraints every time you try to configure them 
manually. Do you see any difference between using the script and the 
manual configuration?


Thanks,
Soumya



Thanks all,
Melvin
For gluster-user, i am not sure if this is the right list to post. Sorry
for the spam.


--
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today. http://sdm.link/xeonphi



___
Nfs-ganesha-devel mailing list
nfs-ganesha-de...@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


[Gluster-users] Unsynced gfid's that don't exist

2016-11-07 Thread Davy Croonen

Hi all

Last Friday we had some network issues, the build in gluster heal mechanism 
resolved almost all unsynced entries except 16 entries.

The command gluster volume heal public info displays:

Brick gfs06a-gs:/mnt/public/brick1
















Status: Connected
Number of entries: 16

The strange thing is that I cannot find these gfids within the .glusterfs 
directory, also trying to resolve these gfids with the script 
https://gist.github.com/semiosis/4392640 doesn’t show anything.

Any suggestions how to get rid of these unsynced entries messages?

Regards
Davy
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Feedback on DHT option "cluster.readdir-optimize"

2016-11-07 Thread Raghavendra Gowdappa


- Original Message -
> From: "Raghavendra Gowdappa" 
> To: "Gluster Devel" , "gluster-users" 
> 
> Sent: Tuesday, November 8, 2016 10:37:56 AM
> Subject: Feedback on DHT option "cluster.readdir-optimize"
> 
> Hi all,
> 
> We have an option in called "cluster.readdir-optimize" which alters the
> behavior of readdirp in DHT. This value affects how storage/posix treats
> dentries corresponding to directories (not for files).
> 
> When this value is on,
> * DHT asks only one subvol/brick to return dentries corresponding to
> directories.
> * Other subvols/bricks filter dentries corresponding to directories and send
> only dentries corresponding to files.
> 
> When this value is off (this is the default value),
> * All subvols return all dentries stored on them. IOW, bricks don't filter
> any dentries.
> * Since a directory has one dentry representing it on each subvol, dht
> (loaded on client) picks up dentry only from hashed subvol.
> 
> Note that irrespective of value of this option, _all_ subvols return dentries
> corresponding to files which are stored on them.
> 
> This option was introduced to boost performance of readdir as (when set on),
> filtering of dentries happens on bricks and hence there is reduced:
> 1. network traffic (with filtering all the redundant dentry information)
> 2. number of readdir calls between client and server for the same number of
> dentries returned to application (If filtering happens on client, lesser
> number of dentries in result and hence more number of readdir calls. IOW,
> result buffer is not filled to maximum capacity).
> 
> We want to hear from you Whether you've used this option and if yes,
> 1. Did it really boost readdir performance?
> 2. Do you've any performance data to find out what was the percentage of
> improvement (or deterioration)?
> 3. Data set you had (Number of files, directories and organisation of
> directories).

4. Volume information. IOW, how many subvols did dht had?

> 
> If we find out that this option is really helping you, we can spend our
> energies on fixing issues that will arise when this option is set to on. One
> common issue with turning this option on is that when this option is set,
> some directories might not show up in directory listing [1]. The reason for
> this is that:
> 1. If a directory can be created on a hashed subvol, mkdir (result to
> application) will be successful, irrespective of result of mkdir on rest of
> the subvols.
> 2. So, any subvol we pick to give us dentries corresponding to directory need
> not contain all the directories and we might miss out those directories in
> listing.
> 
> Your feedback is important for us and will help us to prioritize and improve
> things.
> 
> [1] https://www.gluster.org/pipermail/gluster-users/2016-October/028703.html
> 
> regards,
> Raghavendra
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


[Gluster-users] Feedback on DHT option "cluster.readdir-optimize"

2016-11-07 Thread Raghavendra Gowdappa
Hi all,

We have an option in called "cluster.readdir-optimize" which alters the 
behavior of readdirp in DHT. This value affects how storage/posix treats 
dentries corresponding to directories (not for files).

When this value is on, 
* DHT asks only one subvol/brick to return dentries corresponding to 
directories.
* Other subvols/bricks filter dentries corresponding to directories and send 
only dentries corresponding to files.

When this value is off (this is the default value),
* All subvols return all dentries stored on them. IOW, bricks don't filter any 
dentries.
* Since a directory has one dentry representing it on each subvol, dht (loaded 
on client) picks up dentry only from hashed subvol.

Note that irrespective of value of this option, _all_ subvols return dentries 
corresponding to files which are stored on them.

This option was introduced to boost performance of readdir as (when set on), 
filtering of dentries happens on bricks and hence there is reduced:
1. network traffic (with filtering all the redundant dentry information)
2. number of readdir calls between client and server for the same number of 
dentries returned to application (If filtering happens on client, lesser number 
of dentries in result and hence more number of readdir calls. IOW, result 
buffer is not filled to maximum capacity).

We want to hear from you Whether you've used this option and if yes,
1. Did it really boost readdir performance?
2. Do you've any performance data to find out what was the percentage of 
improvement (or deterioration)?
3. Data set you had (Number of files, directories and organisation of 
directories).

If we find out that this option is really helping you, we can spend our 
energies on fixing issues that will arise when this option is set to on. One 
common issue with turning this option on is that when this option is set, some 
directories might not show up in directory listing [1]. The reason for this is 
that:
1. If a directory can be created on a hashed subvol, mkdir (result to 
application) will be successful, irrespective of result of mkdir on rest of the 
subvols.
2. So, any subvol we pick to give us dentries corresponding to directory need 
not contain all the directories and we might miss out those directories in 
listing.

Your feedback is important for us and will help us to prioritize and improve 
things.

[1] https://www.gluster.org/pipermail/gluster-users/2016-October/028703.html

regards,
Raghavendra
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] mounting gluster volumes issue

2016-11-07 Thread Atin Mukherjee
On Tue, Nov 8, 2016 at 5:21 AM, Thing  wrote:

> Hi,
>
> I have a 3 node raid 1 gluster setup on centos7.2. Each node can mount the
> 2 gluster volumes fine.
>
> 
> [root@glusterp1 volume1]# df -h
> Filesystem Size  Used Avail Use%
> Mounted on
> /dev/mapper/centos-root 20G  1.9G   19G  10% /
> devtmpfs   3.8G 0  3.8G   0% /dev
> tmpfs  3.8G  4.0K  3.8G   1%
> /dev/shm
> tmpfs  3.8G   17M  3.8G   1% /run
> tmpfs  3.8G 0  3.8G   0%
> /sys/fs/cgroup
> /dev/sda1  969M  197M  723M  22% /boot
> /dev/mapper/centos-tmp 3.9G   33M  3.9G   1% /tmp
> /dev/mapper/centos-home 50G   33M   50G   1% /home
> /dev/mapper/centos-data1   120G   33M  120G   1%
> /data1
> /dev/mapper/centos-var  20G  529M   20G   3% /var
> /dev/mapper/centos00-var_lib   9.4G  180M  9.2G   2%
> /var/lib
> tmpfs  771M 0  771M   0%
> /run/user/1000
> /dev/mapper/vg--gluster--prod1-gluster--prod1  932G   35M  932G   1%
> /gluster-bricks/gluster-prod1
> glusterp1.ods.graywitch.co.nz:volume2  112G   33M  112G   1%
> /volume2
> glusterp1.ods.graywitch.co.nz:gluster-prod-1   932G   34M  932G   1%
> /volume1
> [root@glusterp1 volume1]#
> ===
>
>
> However my raspberry pi2 running raspbian cannot, it fails silently.
>
> Logs on the pi2 shows,
>
> ==
> 8><--
> root@warlocke:/# tail /var/log/glusterfs/gluster-prod-2.log
> [2016-11-08 12:22:17.776599] I [glusterfsd.c:1493:main]
> 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.2.7
> [2016-11-08 12:22:17.805706] W [rpc-common.c:64:xdr_to_generic]  0-xdr:
> XDR decoding failed
> [2016-11-08 12:22:17.805916] E [glusterfsd-mgmt.c:621:mgmt_getspec_cbk]
> 0-glusterfs: XDR decoding error
> [2016-11-08 12:22:17.806021] E [glusterfsd-mgmt.c:695:mgmt_getspec_cbk]
> 0-mgmt: failed to fetch volume file (key:volume2)
> [2016-11-08 12:22:17.806118] W [glusterfsd.c:727:cleanup_and_exit]  0-:
> received signum (0), shutting down
> [2016-11-08 12:22:17.806241] I [fuse-bridge.c:3849:fini] 0-fuse:
> Unmounting '/gluster-prod-2'.
> root@warlocke:/# tail /var/log/glusterfs/gluster-prod-1.log
> [2016-11-08 12:18:51.538029] E [glusterfsd-mgmt.c:621:mgmt_getspec_cbk]
> 0-glusterfs: XDR decoding error
> [2016-11-08 12:18:51.538122] E [glusterfsd-mgmt.c:695:mgmt_getspec_cbk]
> 0-mgmt: failed to fetch volume file (key:/gluster-prod1)
> [2016-11-08 12:18:51.538212] W [glusterfsd.c:727:cleanup_and_exit]  0-:
> received signum (0), shutting down
> [2016-11-08 12:18:51.538381] I [fuse-bridge.c:3849:fini] 0-fuse:
> Unmounting '/gluster-prod-1'.
> [2016-11-08 12:19:52.628029] I [glusterfsd.c:1493:main]
> 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.2.7
>

You are using a pretty old client version (3.2.7). The minimum supported
version is 3.6.x and that's going to reach EOL once 3.9 is out.


> [2016-11-08 12:19:52.649443] W [rpc-common.c:64:xdr_to_generic]  0-xdr:
> XDR decoding failed
> [2016-11-08 12:19:52.649664] E [glusterfsd-mgmt.c:621:mgmt_getspec_cbk]
> 0-glusterfs: XDR decoding error
>

This looks to be a compatibility issue to me. Are you using latest
glusterfs-server package? What's the output of glusterd --version ? I'd
recommend you to upgrade your clients (and server if not done already) to
the latest supported versions.

[2016-11-08 12:19:52.649749] E [glusterfsd-mgmt.c:695:mgmt_getspec_cbk]
> 0-mgmt: failed to fetch volume file (key:gluster-prod1)
> [2016-11-08 12:19:52.649850] W [glusterfsd.c:727:cleanup_and_exit]  0-:
> received signum (0), shutting down
> [2016-11-08 12:19:52.649969] I [fuse-bridge.c:3849:fini] 0-fuse:
> Unmounting '/gluster-prod-1'.
> ===
>
> Centos7.2's gluster version,
>
> ==
> [root@glusterp1 volume1]# rpm -qa |grep gluster
> glusterfs-cli-3.8.5-1.el7.x86_64
> glusterfs-libs-3.8.5-1.el7.x86_64
> vdsm-gluster-4.18.13-1.el7.centos.noarch
> centos-release-gluster38-1.0-1.el7.centos.noarch
> glusterfs-fuse-3.8.5-1.el7.x86_64
> glusterfs-client-xlators-3.8.5-1.el7.x86_64
> glusterfs-server-3.8.5-1.el7.x86_64
> glusterfs-3.8.5-1.el7.x86_64
> glusterfs-geo-replication-3.8.5-1.el7.x86_64
> glusterfs-api-3.8.5-1.el7.x86_64
> [root@glusterp1 volume1]#
> ==
>
> pi's version,
>
> ==
> root@warlocke:/# apt-cache search gluster
> glusterfs-client - clustered file-system (client package)
> glusterfs-common - GlusterFS common libraries and translator modules
> glusterfs-dbg - GlusterFS debugging symbols
> glusterfs-examples - example files for the glusterfs server and client
> glusterfs-server - clustered file-system (server package)
> root@warlocke:/# apt-get install glusterfs-client
> Reading package lists... D

[Gluster-users] Gluster clients losing acl mount option.

2016-11-07 Thread Kamal Shaker
I'm having an issue on some clients seeming to lose their access use/set
ACLs on gluster mounted filesystems, on booting and mounting by hand, the
mount looks like this:

server1:/input on /data/input type fuse.glusterfs
(rw,relatime,user_id=0,group_id=0,allow_other,max_read=131072)

after sometime it becomes:

server1:/common on /data/common type fuse.glusterfs
(rw,relatime,user_id=0,group_id=0,default_permissions,allow_
other,max_read=131072)


And all our ACL setting/using fail. There are no messages in the logs on
either client or server. The only way to fix it is to unmount/mount.

This is on CentOS 7, with the latest Gluster packages (both server and
clients):

[root@client ~]# rpm -qa |grep gluster
glusterfs-fuse-3.8.5-1.el7.x86_64
glusterfs-libs-3.8.5-1.el7.x86_64
glusterfs-3.8.5-1.el7.x86_64
glusterfs-client-xlators-3.8.5-1.el7.x86_64

It sounds a bit like this issue: https://www.gluster.org/piperm
ail/gluster-users/2016-June/027066.html

But that was fixed with a newer version, which we should be running.

Any ideas where to look or what might be causing this?

Thanks.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] mounting gluster volumes issue

2016-11-07 Thread Thing
Hi,

I have a 3 node raid 1 gluster setup on centos7.2. Each node can mount the
2 gluster volumes fine.


[root@glusterp1 volume1]# df -h
Filesystem Size  Used Avail Use%
Mounted on
/dev/mapper/centos-root 20G  1.9G   19G  10% /
devtmpfs   3.8G 0  3.8G   0% /dev
tmpfs  3.8G  4.0K  3.8G   1%
/dev/shm
tmpfs  3.8G   17M  3.8G   1% /run
tmpfs  3.8G 0  3.8G   0%
/sys/fs/cgroup
/dev/sda1  969M  197M  723M  22% /boot
/dev/mapper/centos-tmp 3.9G   33M  3.9G   1% /tmp
/dev/mapper/centos-home 50G   33M   50G   1% /home
/dev/mapper/centos-data1   120G   33M  120G   1% /data1
/dev/mapper/centos-var  20G  529M   20G   3% /var
/dev/mapper/centos00-var_lib   9.4G  180M  9.2G   2%
/var/lib
tmpfs  771M 0  771M   0%
/run/user/1000
/dev/mapper/vg--gluster--prod1-gluster--prod1  932G   35M  932G   1%
/gluster-bricks/gluster-prod1
glusterp1.ods.graywitch.co.nz:volume2  112G   33M  112G   1%
/volume2
glusterp1.ods.graywitch.co.nz:gluster-prod-1   932G   34M  932G   1%
/volume1
[root@glusterp1 volume1]#
===


However my raspberry pi2 running raspbian cannot, it fails silently.

Logs on the pi2 shows,

==
8><--
root@warlocke:/# tail /var/log/glusterfs/gluster-prod-2.log
[2016-11-08 12:22:17.776599] I [glusterfsd.c:1493:main]
0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.2.7
[2016-11-08 12:22:17.805706] W [rpc-common.c:64:xdr_to_generic]  0-xdr: XDR
decoding failed
[2016-11-08 12:22:17.805916] E [glusterfsd-mgmt.c:621:mgmt_getspec_cbk]
0-glusterfs: XDR decoding error
[2016-11-08 12:22:17.806021] E [glusterfsd-mgmt.c:695:mgmt_getspec_cbk]
0-mgmt: failed to fetch volume file (key:volume2)
[2016-11-08 12:22:17.806118] W [glusterfsd.c:727:cleanup_and_exit]  0-:
received signum (0), shutting down
[2016-11-08 12:22:17.806241] I [fuse-bridge.c:3849:fini] 0-fuse: Unmounting
'/gluster-prod-2'.
root@warlocke:/# tail /var/log/glusterfs/gluster-prod-1.log
[2016-11-08 12:18:51.538029] E [glusterfsd-mgmt.c:621:mgmt_getspec_cbk]
0-glusterfs: XDR decoding error
[2016-11-08 12:18:51.538122] E [glusterfsd-mgmt.c:695:mgmt_getspec_cbk]
0-mgmt: failed to fetch volume file (key:/gluster-prod1)
[2016-11-08 12:18:51.538212] W [glusterfsd.c:727:cleanup_and_exit]  0-:
received signum (0), shutting down
[2016-11-08 12:18:51.538381] I [fuse-bridge.c:3849:fini] 0-fuse: Unmounting
'/gluster-prod-1'.
[2016-11-08 12:19:52.628029] I [glusterfsd.c:1493:main]
0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.2.7
[2016-11-08 12:19:52.649443] W [rpc-common.c:64:xdr_to_generic]  0-xdr: XDR
decoding failed
[2016-11-08 12:19:52.649664] E [glusterfsd-mgmt.c:621:mgmt_getspec_cbk]
0-glusterfs: XDR decoding error
[2016-11-08 12:19:52.649749] E [glusterfsd-mgmt.c:695:mgmt_getspec_cbk]
0-mgmt: failed to fetch volume file (key:gluster-prod1)
[2016-11-08 12:19:52.649850] W [glusterfsd.c:727:cleanup_and_exit]  0-:
received signum (0), shutting down
[2016-11-08 12:19:52.649969] I [fuse-bridge.c:3849:fini] 0-fuse: Unmounting
'/gluster-prod-1'.
===

Centos7.2's gluster version,

==
[root@glusterp1 volume1]# rpm -qa |grep gluster
glusterfs-cli-3.8.5-1.el7.x86_64
glusterfs-libs-3.8.5-1.el7.x86_64
vdsm-gluster-4.18.13-1.el7.centos.noarch
centos-release-gluster38-1.0-1.el7.centos.noarch
glusterfs-fuse-3.8.5-1.el7.x86_64
glusterfs-client-xlators-3.8.5-1.el7.x86_64
glusterfs-server-3.8.5-1.el7.x86_64
glusterfs-3.8.5-1.el7.x86_64
glusterfs-geo-replication-3.8.5-1.el7.x86_64
glusterfs-api-3.8.5-1.el7.x86_64
[root@glusterp1 volume1]#
==

pi's version,

==
root@warlocke:/# apt-cache search gluster
glusterfs-client - clustered file-system (client package)
glusterfs-common - GlusterFS common libraries and translator modules
glusterfs-dbg - GlusterFS debugging symbols
glusterfs-examples - example files for the glusterfs server and client
glusterfs-server - clustered file-system (server package)
root@warlocke:/# apt-get install glusterfs-client
Reading package lists... Done
Building dependency tree
Reading state information... Done
glusterfs-client is already the newest version.
0 upgraded, 0 newly installed, 0 to remove and 4 not upgraded.
root@warlocke:/# dpkg -l glusterfs-client
Desired=Unknown/Install/Remove/Purge/Hold
|
Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name  Version Architecture
Description
+++-=-===-===-
ii  glusterfs-client   

[Gluster-users] Event threads effect

2016-11-07 Thread Pavel Szalbot
Hi everybody,

I am trying to benchmark my cluster with fio's gfapi ioengine and evaluate
effect of various volume options on performance. I have so far observed
following:

1) *thread* options do not affect performance or thread count - htop always
show 2 threads on client, there are always 16 glusterfsd threads on server
2) running the same test locally (on the brick) shows up to 5x better
throughput than over 10GBe (MTU 9000, iperfed, pinged with DF set, no drops
on switches or cards, tcpdumped to check network issues)
3) performance.cache-size value has no effect on performance (32MB or 1GB)

I would expect raising client threads number leading to more TCP
connections, higher disk utilization and throughput. If I run multiple fio
jobs (numjobs=8), I am able to saturate the network link.

Is this normal or I am missing something really badly?

fio config:

[global]
name=gfapi test
create_on_open=1
volume=test3-vol
brick=gfs-3.san
ioengine=gfapi
direct=1
bs=256k
rw=read
iodepth=1
numjobs=1
size=8192m
loops=1
refill_buffers=1
[job1]

reconfigured volume options:
performance.client-io-threads: on
performance.cache-size: 1GB
performance.read-ahead: off
server.outstanding-rpc-limit: 128
performance.io-thread-count: 16
server.event-threads: 16
client.event-threads: 16
nfs.disable: on
transport.address-family: inet
performance.readdir-ahead: on

-ps
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] GlusterFS-3.7.17 released

2016-11-07 Thread Samikshan Bairagya

Hi all and apologies for the late announcement.

GlusterFS-3.7.17 has been released. The release-notes for this release
can be viewed at [1].

3.7.17 packages for Fedora (24-26), Debian (wheezy, jessie, stretch), 
and RHEL/CentOS (5, 6, 7) are available on download.gluster.org. 3.7.17 
packages for Ubuntu (trusty, wily, xenial)  are in the PPA. 3.7.17 
packages for SuSE (SLES, OpenSuSE, Leap42.1) will be in the SuSE Build 
System soon.


Thanks and Regards,
Samikshan

[1] 
https://github.com/gluster/glusterfs/blob/release-3.7/doc/release-notes/3.7.17.md


___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] [Gluster-devel] A question of GlusterFS dentries!

2016-11-07 Thread Raghavendra G
On Wed, Nov 2, 2016 at 9:54 AM, Serkan Çoban  wrote:

> +1 for "no-rewinddir-support" option in DHT.
> We are seeing very slow directory listing specially with 1500+ brick
> volume, 'ls' takes 20+ second with 1000+ files.
>

If its not clear, I would like to point out that serialized readdir is not
the sole issue that's causing slowness. If directories are _HUGE_ then I
don't expect too much of benefit from parallelizing. Also, as others have
been pointing out (in various in-person discussions) there are other
scalability limits like number of messages, memory consumed etc to wind
calls parallely. I'll probably do a rough POC in next couple of months to
see whether this idea has any substance or not and post the results.



> On Wed, Nov 2, 2016 at 7:08 AM, Raghavendra Gowdappa
>  wrote:
> >
> >
> > - Original Message -
> >> From: "Keiviw" 
> >> To: gluster-de...@gluster.org
> >> Sent: Tuesday, November 1, 2016 12:41:02 PM
> >> Subject: [Gluster-devel] A question of GlusterFS dentries!
> >>
> >> Hi,
> >> In GlusterFS distributed volumes, listing a non-empty directory was
> slow.
> >> Then I read the dht codes and found the reasons. But I was confused that
> >> GlusterFS dht travesed all the bricks(in the volume) sequentially,why
> not
> >> use multi-thread to read dentries from multiple bricks simultaneously.
> >> That's a question that's always puzzled me, Couly you please tell me
> >> something about this???
> >
> > readdir across subvols is sequential mostly because we have to support
> rewinddir(3). We need to maintain the mapping of offset and dentry across
> multiple invocations of readdir. In other words if someone did a rewinddir
> to an offset corresponding to earlier dentry, subsequent readdirs should
> return same set of dentries what the earlier invocation of readdir
> returned. For example, in an hypothetical scenario, readdir returned
> following dentries:
> >
> > 1. a, off=10
> > 2. b, off=2
> > 3. c, off=5
> > 4. d, off=15
> > 5. e, off=17
> > 6. f, off=13
> >
> > Now if we did rewinddir to off 5 and issue readdir again we should get
> following dentries:
> > (c, off=5), (d, off=15), (e, off=17), (f, off=13)
> >
> > Within a subvol backend filesystem provides rewinddir guarantee for the
> dentries present on that subvol. However, across subvols it is the
> responsibility of DHT to provide the above guarantee. Which means we
> should've some well defined order in which we send readdir calls (Note that
> order is not well defined if we do a parallel readdir across all subvols).
> So, DHT has sequential readdir which is a well defined order of reading
> dentries.
> >
> > To give an example if we have another subvol - subvol2 - (in addiction
> to the subvol above - say subvol1) with following listing:
> > 1. g, off=16
> > 2. h, off=20
> > 3. i, off=3
> > 4. j, off=19
> >
> > With parallel readdir we can have many ordering like - (a, b, g, h, i,
> c, d, e, f, j), (g, h, a, b, c, i, j, d, e, f) etc. Now if we do (with
> readdir done parallely):
> >
> > 1. A complete listing of the directory (which can be any one of 10P1 =
> 10 ways - I hope math is correct here).
> > 2. Do rewinddir (20)
> >
> > We cannot predict what are the set of dentries that come _after_ offset
> 20. However, if we do a readdir sequentially across subvols there is only
> one directory listing i.e, (a, b, c, d, e, f, g, h, i, j). So, its easier
> to support rewinddir.
> >
> > If there is no POSIX requirement for rewinddir support, I think a
> parallel readdir can easily be implemented (which improves performance
> too). But unfortunately rewinddir is still a POSIX requirement. This also
> opens up another possibility of a "no-rewinddir-support" option in DHT,
> which if enabled results in parallel readdirs across subvols. What I am not
> sure is how many users still use rewinddir? If there is a critical mass
> which wants performance with a tradeoff of no rewinddir support this can be
> a good feature.
> >
> > +gluster-users to get an opinion on this.
> >
> > regards,
> > Raghavendra
> >
> >>
> >>
> >>
> >>
> >>
> >>
> >> ___
> >> Gluster-devel mailing list
> >> gluster-de...@gluster.org
> >> http://www.gluster.org/mailman/listinfo/gluster-devel
> > ___
> > Gluster-users mailing list
> > Gluster-users@gluster.org
> > http://www.gluster.org/mailman/listinfo/gluster-users
> ___
> Gluster-devel mailing list
> gluster-de...@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel
>



-- 
Raghavendra G
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Block storage with Qemu-Tcmu

2016-11-07 Thread Prasanna Kalever
[Top posting]

I am planning to write a short blog to answer few similar questions
that I received after posting this blog.

Is iSCSI stack obligatory for block store ?
Answer is No.

It basically depends on the use case and choice. If we can run/manage
target emulation on the client side, we don't have to bring iSCSI
stack into picture.
We simply export LUN using a loopback device i.e after creating the
back end with qemu-tcmu storage module, we can directly export the
target via loopback instead of iSCSI.
So In this case we don't see overheads with iSCSI layers, but IMO
overhead with iSCSI can be very minimal, may be I need the performance
numbers to prove (will spin a benchmark soon)

I have done some basic benchmarking taking baseline as Fuse mount and
target as iSCSI exposed target via tcmu-runner, you can find them at
[1]
You can find more bechmark's at [2], the commit messages should
explain you the configurations.

Hope that answers most of your questions :)

[1] 
https://htmlpreview.github.io/?https://github.com/pkalever/iozone_results_gluster/blob/master/block-store/iscsi-fuse-1/html_out/index.html
[2] https://github.com/pkalever/iozone_results_gluster/blob/master/block-store/

--
Prasanna



On Mon, Nov 7, 2016 at 2:23 PM, Gandalf Corvotempesta
 wrote:
> Il 07 nov 2016 09:23, "Lindsay Mathieson"  ha
> scritto:
>>
>> From a quick scan, there doesn't seem to be any particular advantage
>> over qemu using gfapi directly? Is this more aimed at apps that can't
>> use gfapi such as vmware or as a replacement for NFS?
>>
>
> Dump question:  why should i use a block storage replacing nfs?
> Nfs-ganesha makes use of libgfapi, block storage does the same but also need
> the whole iscsi stack so performance could be lower
>
> If i don't need direct access to a block device on the client (in example
> for creating custom FS or LVM and so on), the nfs ganesha should be a better
> approach, right?
>
> Anyone compared performances between:
>
> 1. Fuse mount
> 2. Nfs
> 3. Nfs ganesha
> 4. Qemu direct access via gfapi
> 5. Iscsi
>
> ?
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Block storage with Qemu-Tcmu

2016-11-07 Thread Prasanna Kalever
On Mon, Nov 7, 2016 at 1:53 PM, Lindsay Mathieson
 wrote:
> On 7 November 2016 at 17:01, Prasanna Kalever  wrote:
>> Yet another approach to achieve Gluster Block Storage is with Qemu-Tcmu.
>
>
> Thanks Prasanna, interesting reading.
>
> From a quick scan, there doesn't seem to be any particular advantage
> over qemu using gfapi directly? Is this more aimed at apps that can't
> use gfapi such as vmware or as a replacement for NFS?
>

As mentioned in the conclusion part in the blog, the advantage here is
easy snapshots.
Qemu-tcmu will come up with '--snapshot' option (work still in
progress) as much like qemu-img.
Supporting this within gluster needs additional maintenance of
qemu-block xlator which is the clone of qcow2 spec implementation,
which could be more work.

Also the qemu gluster protocol driver (access gfapi) is more mature and tested.

--
Prasanna

> --
> Lindsay
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Block storage with Qemu-Tcmu

2016-11-07 Thread Gandalf Corvotempesta
Il 07 nov 2016 09:23, "Lindsay Mathieson"  ha
scritto:
>
> From a quick scan, there doesn't seem to be any particular advantage
> over qemu using gfapi directly? Is this more aimed at apps that can't
> use gfapi such as vmware or as a replacement for NFS?
>

Dump question:  why should i use a block storage replacing nfs?
Nfs-ganesha makes use of libgfapi, block storage does the same but also
need the whole iscsi stack so performance could be lower

If i don't need direct access to a block device on the client (in example
for creating custom FS or LVM and so on), the nfs ganesha should be a
better approach, right?

Anyone compared performances between:

1. Fuse mount
2. Nfs
3. Nfs ganesha
4. Qemu direct access via gfapi
5. Iscsi

?
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Block storage with Qemu-Tcmu

2016-11-07 Thread Lindsay Mathieson
On 7 November 2016 at 17:01, Prasanna Kalever  wrote:
> Yet another approach to achieve Gluster Block Storage is with Qemu-Tcmu.


Thanks Prasanna, interesting reading.

>From a quick scan, there doesn't seem to be any particular advantage
over qemu using gfapi directly? Is this more aimed at apps that can't
use gfapi such as vmware or as a replacement for NFS?

-- 
Lindsay
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users