Oh and I see that the op-version is slightly less than the max-op-version: [root@gfs1 ~]# gluster volume get all cluster.max-op-version Option Value
------ ----- cluster.max-op-version 50400 [root@gfs1 ~]# gluster volume get all cluster.op-version Option Value ------ ----- cluster.op-version 50000 On Fri, 27 Dec 2019 at 14:22, David Cunningham <dcunning...@voisonics.com> wrote: > Hi Strahil, > > Our volume options are as below. Thanks for the suggestion to upgrade to > version 6 or 7. We could do that be simply removing the current > installation and installing the new one (since it's not live right now). We > might have to convince the customer that it's likely to succeed though, as > at the moment I think they believe that GFS is not going to work for them. > > Option Value > > ------ ----- > > cluster.lookup-unhashed on > > cluster.lookup-optimize on > > cluster.min-free-disk 10% > > cluster.min-free-inodes 5% > > cluster.rebalance-stats off > > cluster.subvols-per-directory (null) > > cluster.readdir-optimize off > > cluster.rsync-hash-regex (null) > > cluster.extra-hash-regex (null) > > cluster.dht-xattr-name trusted.glusterfs.dht > > cluster.randomize-hash-range-by-gfid off > > cluster.rebal-throttle normal > > cluster.lock-migration off > > cluster.force-migration off > > cluster.local-volume-name (null) > > cluster.weighted-rebalance on > > cluster.switch-pattern (null) > > cluster.entry-change-log on > > cluster.read-subvolume (null) > > cluster.read-subvolume-index -1 > > cluster.read-hash-mode 1 > > cluster.background-self-heal-count 8 > > cluster.metadata-self-heal on > > cluster.data-self-heal on > > cluster.entry-self-heal on > > cluster.self-heal-daemon on > > cluster.heal-timeout 600 > > cluster.self-heal-window-size 1 > > cluster.data-change-log on > > cluster.metadata-change-log on > > cluster.data-self-heal-algorithm (null) > > cluster.eager-lock on > > disperse.eager-lock on > > disperse.other-eager-lock on > > disperse.eager-lock-timeout 1 > > disperse.other-eager-lock-timeout 1 > > cluster.quorum-type none > > cluster.quorum-count (null) > > cluster.choose-local true > > cluster.self-heal-readdir-size 1KB > > cluster.post-op-delay-secs 1 > > cluster.ensure-durability on > > cluster.consistent-metadata no > > cluster.heal-wait-queue-length 128 > > cluster.favorite-child-policy none > > cluster.full-lock yes > > cluster.stripe-block-size 128KB > > cluster.stripe-coalesce true > > diagnostics.latency-measurement off > > diagnostics.dump-fd-stats off > > diagnostics.count-fop-hits off > > diagnostics.brick-log-level INFO > > diagnostics.client-log-level INFO > > diagnostics.brick-sys-log-level CRITICAL > > diagnostics.client-sys-log-level CRITICAL > > diagnostics.brick-logger (null) > > diagnostics.client-logger (null) > > diagnostics.brick-log-format (null) > > diagnostics.client-log-format (null) > > diagnostics.brick-log-buf-size 5 > > diagnostics.client-log-buf-size 5 > > diagnostics.brick-log-flush-timeout 120 > > diagnostics.client-log-flush-timeout 120 > > diagnostics.stats-dump-interval 0 > > diagnostics.fop-sample-interval 0 > > diagnostics.stats-dump-format json > > diagnostics.fop-sample-buf-size 65535 > > diagnostics.stats-dnscache-ttl-sec 86400 > > performance.cache-max-file-size 0 > > performance.cache-min-file-size 0 > > performance.cache-refresh-timeout 1 > > performance.cache-priority > > performance.cache-size 32MB > > performance.io-thread-count 16 > > performance.high-prio-threads 16 > > performance.normal-prio-threads 16 > > performance.low-prio-threads 16 > > performance.least-prio-threads 1 > > performance.enable-least-priority on > > performance.iot-watchdog-secs (null) > > performance.iot-cleanup-disconnected-reqsoff > > performance.iot-pass-through false > > performance.io-cache-pass-through false > > performance.cache-size 128MB > > performance.qr-cache-timeout 1 > > performance.cache-invalidation false > > performance.ctime-invalidation false > > performance.flush-behind on > > performance.nfs.flush-behind on > > performance.write-behind-window-size 1MB > > performance.resync-failed-syncs-after-fsyncoff > > performance.nfs.write-behind-window-size1MB > > performance.strict-o-direct off > > performance.nfs.strict-o-direct off > > performance.strict-write-ordering off > > performance.nfs.strict-write-ordering off > > performance.write-behind-trickling-writeson > > performance.aggregate-size 128KB > > performance.nfs.write-behind-trickling-writeson > > performance.lazy-open yes > > performance.read-after-open yes > > performance.open-behind-pass-through false > > performance.read-ahead-page-count 4 > > performance.read-ahead-pass-through false > > performance.readdir-ahead-pass-through false > > performance.md-cache-pass-through false > > performance.md-cache-timeout 1 > > performance.cache-swift-metadata true > > performance.cache-samba-metadata false > > performance.cache-capability-xattrs true > > performance.cache-ima-xattrs true > > performance.md-cache-statfs off > > performance.xattr-cache-list > > performance.nl-cache-pass-through false > > features.encryption off > > encryption.master-key (null) > > encryption.data-key-size 256 > > encryption.block-size 4096 > > network.frame-timeout 1800 > > network.ping-timeout 42 > > network.tcp-window-size (null) > > network.remote-dio disable > > client.event-threads 2 > > client.tcp-user-timeout 0 > > client.keepalive-time 20 > > client.keepalive-interval 2 > > client.keepalive-count 9 > > network.tcp-window-size (null) > > network.inode-lru-limit 16384 > > auth.allow * > > auth.reject (null) > > transport.keepalive 1 > > server.allow-insecure on > > server.root-squash off > > server.anonuid 65534 > > server.anongid 65534 > > server.statedump-path /var/run/gluster > > server.outstanding-rpc-limit 64 > > server.ssl (null) > > auth.ssl-allow * > > server.manage-gids off > > server.dynamic-auth on > > client.send-gids on > > server.gid-timeout 300 > > server.own-thread (null) > > server.event-threads 1 > > server.tcp-user-timeout 0 > > server.keepalive-time 20 > > server.keepalive-interval 2 > > server.keepalive-count 9 > > transport.listen-backlog 1024 > > ssl.own-cert (null) > > ssl.private-key (null) > > ssl.ca-list (null) > > ssl.crl-path (null) > > ssl.certificate-depth (null) > > ssl.cipher-list (null) > > ssl.dh-param (null) > > ssl.ec-curve (null) > > transport.address-family inet > > performance.write-behind on > > performance.read-ahead on > > performance.readdir-ahead on > > performance.io-cache on > > performance.quick-read on > > performance.open-behind on > > performance.nl-cache off > > performance.stat-prefetch on > > performance.client-io-threads off > > performance.nfs.write-behind on > > performance.nfs.read-ahead off > > performance.nfs.io-cache off > > performance.nfs.quick-read off > > performance.nfs.stat-prefetch off > > performance.nfs.io-threads off > > performance.force-readdirp true > > performance.cache-invalidation false > > features.uss off > > features.snapshot-directory .snaps > > features.show-snapshot-directory off > > features.tag-namespaces off > > network.compression off > > network.compression.window-size -15 > > network.compression.mem-level 8 > > network.compression.min-size 0 > > network.compression.compression-level -1 > > network.compression.debug false > > features.default-soft-limit 80% > > features.soft-timeout 60 > > features.hard-timeout 5 > > features.alert-time 86400 > > features.quota-deem-statfs off > > geo-replication.indexing off > > geo-replication.indexing off > > geo-replication.ignore-pid-check off > > geo-replication.ignore-pid-check off > > features.quota off > > features.inode-quota off > > features.bitrot disable > > debug.trace off > > debug.log-history no > > debug.log-file no > > debug.exclude-ops (null) > > debug.include-ops (null) > > debug.error-gen off > > debug.error-failure (null) > > debug.error-number (null) > > debug.random-failure off > > debug.error-fops (null) > > nfs.disable on > > features.read-only off > > features.worm off > > features.worm-file-level off > > features.worm-files-deletable on > > features.default-retention-period 120 > > features.retention-mode relax > > features.auto-commit-period 180 > > storage.linux-aio off > > storage.batch-fsync-mode reverse-fsync > > storage.batch-fsync-delay-usec 0 > > storage.owner-uid -1 > > storage.owner-gid -1 > > storage.node-uuid-pathinfo off > > storage.health-check-interval 30 > > storage.build-pgfid off > > storage.gfid2path on > > storage.gfid2path-separator : > > storage.reserve 1 > > storage.health-check-timeout 10 > > storage.fips-mode-rchecksum off > > storage.force-create-mode 0000 > > storage.force-directory-mode 0000 > > storage.create-mask 0777 > > storage.create-directory-mask 0777 > > storage.max-hardlinks 100 > > storage.ctime off > > storage.bd-aio off > > config.gfproxyd off > > cluster.server-quorum-type off > > cluster.server-quorum-ratio 0 > > changelog.changelog off > > changelog.changelog-dir {{ brick.path > }}/.glusterfs/changelogs > changelog.encoding ascii > > changelog.rollover-time 15 > > changelog.fsync-interval 5 > > changelog.changelog-barrier-timeout 120 > > changelog.capture-del-path off > > features.barrier disable > > features.barrier-timeout 120 > > features.trash off > > features.trash-dir .trashcan > > features.trash-eliminate-path (null) > > features.trash-max-filesize 5MB > > features.trash-internal-op off > > cluster.enable-shared-storage disable > > cluster.write-freq-threshold 0 > > cluster.read-freq-threshold 0 > > cluster.tier-pause off > > cluster.tier-promote-frequency 120 > > cluster.tier-demote-frequency 3600 > > cluster.watermark-hi 90 > > cluster.watermark-low 75 > > cluster.tier-mode cache > > cluster.tier-max-promote-file-size 0 > > cluster.tier-max-mb 4000 > > cluster.tier-max-files 10000 > > cluster.tier-query-limit 100 > > cluster.tier-compact on > > cluster.tier-hot-compact-frequency 604800 > > cluster.tier-cold-compact-frequency 604800 > > features.ctr-enabled off > > features.record-counters off > > features.ctr-record-metadata-heat off > > features.ctr_link_consistency off > > features.ctr_lookupheal_link_timeout 300 > > features.ctr_lookupheal_inode_timeout 300 > > features.ctr-sql-db-cachesize 12500 > > features.ctr-sql-db-wal-autocheckpoint 25000 > > features.selinux on > > locks.trace off > > locks.mandatory-locking off > > cluster.disperse-self-heal-daemon enable > > cluster.quorum-reads no > > client.bind-insecure (null) > > features.shard off > > features.shard-block-size 64MB > > features.shard-lru-limit 16384 > > features.shard-deletion-rate 100 > > features.scrub-throttle lazy > > features.scrub-freq biweekly > > features.scrub false > > features.expiry-time 120 > > features.cache-invalidation off > > features.cache-invalidation-timeout 60 > > features.leases off > > features.lease-lock-recall-timeout 60 > > disperse.background-heals 8 > > disperse.heal-wait-qlength 128 > > cluster.heal-timeout 600 > > dht.force-readdirp on > > disperse.read-policy gfid-hash > > cluster.shd-max-threads 1 > > cluster.shd-wait-qlength 1024 > > cluster.locking-scheme full > > cluster.granular-entry-heal no > > features.locks-revocation-secs 0 > > features.locks-revocation-clear-all false > > features.locks-revocation-max-blocked 0 > > features.locks-monkey-unlocking false > > features.locks-notify-contention no > > features.locks-notify-contention-delay 5 > > disperse.shd-max-threads 1 > > disperse.shd-wait-qlength 1024 > > disperse.cpu-extensions auto > > disperse.self-heal-window-size 1 > > cluster.use-compound-fops off > > performance.parallel-readdir off > > performance.rda-request-size 131072 > > performance.rda-low-wmark 4096 > > performance.rda-high-wmark 128KB > > performance.rda-cache-limit 10MB > > performance.nl-cache-positive-entry false > > performance.nl-cache-limit 10MB > > performance.nl-cache-timeout 60 > > cluster.brick-multiplex off > > cluster.max-bricks-per-process 0 > > disperse.optimistic-change-log on > > disperse.stripe-cache 4 > > cluster.halo-enabled False > > cluster.halo-shd-max-latency 99999 > > cluster.halo-nfsd-max-latency 5 > > cluster.halo-max-latency 5 > > cluster.halo-max-replicas 99999 > > cluster.halo-min-replicas 2 > > cluster.daemon-log-level INFO > > debug.delay-gen off > > delay-gen.delay-percentage 10% > > delay-gen.delay-duration 100000 > > delay-gen.enable > > disperse.parallel-writes on > > features.sdfs on > > features.cloudsync off > > features.utime off > > ctime.noatime on > > feature.cloudsync-storetype (null) > > > Thanks again. > > > On Wed, 25 Dec 2019 at 05:51, Strahil <hunter86...@yahoo.com> wrote: > >> Hi David, >> >> On Dec 24, 2019 02:47, David Cunningham <dcunning...@voisonics.com> >> wrote: >> > >> > Hello, >> > >> > In testing we found that actually the GFS client having access to all 3 >> nodes made no difference to performance. Perhaps that's because the 3rd >> node that wasn't accessible from the client before was the arbiter node? >> It makes sense, as no data is being generated towards the arbiter. >> > Presumably we shouldn't have an arbiter node listed under >> backupvolfile-server when mounting the filesystem? Since it doesn't store >> all the data surely it can't be used to serve the data. >> >> I have my arbiter defined as last backup and no issues so far. At least >> the admin can easily identify the bricks from the mount options. >> >> > We did have direct-io-mode=disable already as well, so that wasn't a >> factor in the performance problems. >> >> Have you checked if the client vedsion ia not too old. >> Also you can check the cluster's operation cersion: >> # gluster volume get all cluster.max-op-version >> # gluster volume get all cluster.op-version >> >> Cluster's op version should be at max-op-version. >> >> In my mind come 2 options: >> A) Upgrade to latest GLUSTER v6 or even v7 ( I know it won't be easy) and >> then set the op version to highest possible. >> # gluster volume get all cluster.max-op-version >> # gluster volume get all cluster.op-version >> >> B) Deploy a NFS Ganesha server and connect the client over NFS v4.2 (and >> control the parallel connections from Ganesha). >> >> Can you provide your Gluster volume's options? >> 'gluster volume get <VOLNAME> all' >> >> > Thanks again for any advice. >> > >> > >> > >> > On Mon, 23 Dec 2019 at 13:09, David Cunningham < >> dcunning...@voisonics.com> wrote: >> >> >> >> Hi Strahil, >> >> >> >> Thanks for that. We do have one backup server specified, but will add >> the second backup as well. >> >> >> >> >> >> On Sat, 21 Dec 2019 at 11:26, Strahil <hunter86...@yahoo.com> wrote: >> >>> >> >>> Hi David, >> >>> >> >>> Also consider using the mount option to specify backup server via >> 'backupvolfile-server=server2:server3' (you can define more but I don't >> thing replica volumes greater that 3 are usefull (maybe in some special >> cases). >> >>> >> >>> In such way, when the primary is lost, your client can reach a backup >> one without disruption. >> >>> >> >>> P.S.: Client may 'hang' - if the primary server got rebooted >> ungracefully - as the communication must timeout before FUSE addresses the >> next server. There is a special script for killing gluster processes in >> '/usr/share/gluster/scripts' which can be used for setting up a systemd >> service to do that for you on shutdown. >> >>> >> >>> Best Regards, >> >>> Strahil Nikolov >> >>> >> >>> On Dec 20, 2019 23:49, David Cunningham <dcunning...@voisonics.com> >> wrote: >> >>>> >> >>>> Hi Stahil, >> >>>> >> >>>> Ah, that is an important point. One of the nodes is not accessible >> from the client, and we assumed that it only needed to reach the GFS node >> that was mounted so didn't think anything of it. >> >>>> >> >>>> We will try making all nodes accessible, as well as >> "direct-io-mode=disable". >> >>>> >> >>>> Thank you. >> >>>> >> >>>> >> >>>> On Sat, 21 Dec 2019 at 10:29, Strahil Nikolov <hunter86...@yahoo.com> >> wrote: >> >>>>> >> >>>>> Actually I haven't clarified myself. >> >>>>> FUSE mounts on the client side is connecting directly to all bricks >> consisted of the volume. >> >>>>> If for some reason (bad routing, firewall blocked) there could be >> cases where the client can reach 2 out of 3 bricks and this can constantly >> cause healing to happen (as one of the bricks is never updated) which will >> degrade the performance and cause excessive network usage. >> >>>>> As your attachment is from one of the gluster nodes, this could be >> the case. >> >>>>> >> >>>>> Best Regards, >> >>>>> Strahil Nikolov >> >>>>> >> >>>>> В петък, 20 декември 2019 г., 01:49:56 ч. Гринуич+2, David >> Cunningham <dcunning...@voisonics.com> написа: >> >>>>> >> >>>>> >> >>>>> Hi Strahil, >> >>>>> >> >>>>> The chart attached to my original email is taken from the GFS >> server. >> >>>>> >> >>>>> I'm not sure what you mean by accessing all bricks simultaneously. >> We've mounted it from the client like this: >> >>>>> gfs1:/gvol0 /mnt/glusterfs/ glusterfs >> defaults,direct-io-mode=disable,_netdev,backupvolfile-server=gfs2,fetch-attempts=10 >> 0 0 >> >>>>> >> >>>>> Should we do something different to access all bricks >> simultaneously? >> >>>>> >> >>>>> Thanks for your help! >> >>>>> >> >>>>> >> >>>>> On Fri, 20 Dec 2019 at 11:47, Strahil Nikolov < >> hunter86...@yahoo.com> wrote: >> >>>>>> >> >>>>>> I'm not sure if you did measure the traffic from client side >> (tcpdump on a client machine) or from Server side. >> >>>>>> >> >>>>>> In both cases , please verify that the client accesses all bricks >> simultaneously, as this can cause unnecessary heals. >> >>>>>> >> >>>>>> Have you thought about upgrading to v6? There are some >> enhancements in v6 which could be beneficial. >> >>>>>> >> >>>>>> Yet, it is indeed strange that so much traffic is generated with >> FUSE. >> >>>>>> >> >>>>>> Another aproach is to test with NFSGanesha which suports pNFS and >> can natively speak with Gluster, which cant bring you closer to the >> previous setup and also provide some extra performance. >> >>>>>> >> >>>>>> >> >>>>>> Best Regards, >> >>>>>> Strahil Nikolov >> >>>>>> >> >>>>>> >> >>>>>> >> >> >> >> >> >> -- >> >> David Cunningham, Voisonics Limited >> >> http://voisonics.com/ >> >> USA: +1 213 221 1092 >> >> New Zealand: +64 (0)28 2558 3782 >> > >> > >> > >> > -- >> > David Cunningham, Voisonics Limited >> > http://voisonics.com/ >> > USA: +1 213 221 1092 >> > New Zealand: +64 (0)28 2558 3782 >> >> Best Regards, >> Strahil Nikolov >> > > > -- > David Cunningham, Voisonics Limited > http://voisonics.com/ > USA: +1 213 221 1092 > New Zealand: +64 (0)28 2558 3782 > -- David Cunningham, Voisonics Limited http://voisonics.com/ USA: +1 213 221 1092 New Zealand: +64 (0)28 2558 3782
________ Community Meeting Calendar: APAC Schedule - Every 2nd and 4th Tuesday at 11:30 AM IST Bridge: https://bluejeans.com/441850968 NA/EMEA Schedule - Every 1st and 3rd Tuesday at 01:00 PM EDT Bridge: https://bluejeans.com/441850968 Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users