Re: [Gluster-users] trashcan on dist. repl. volume with geo-replication
Hi Dietmar, I am trying to understand the problem and have few questions. 1. Is trashcan enabled only on master volume? 2. Does the 'rm -rf' done on master volume synced to slave ? 3. If trashcan is disabled, the issue goes away? The geo-rep error just says the it failed to create the directory "Oracle_VM_VirtualBox_Extension" on slave. Usually this would be because of gfid mismatch but I don't see that in your case. So I am little more interested in present state of the geo-rep. Is it still throwing same errors and same failure to sync the same directory. If so does the parent 'test1/b1' exists on slave? And doing ls on trashcan should not affect geo-rep. Is there a easy reproducer for this ? Thanks, Kotresh HR On Mon, Mar 12, 2018 at 10:13 PM, Dietmar Putzwrote: > Hello, > > in regard to > https://bugzilla.redhat.com/show_bug.cgi?id=1434066 > i have been faced to another issue when using the trashcan feature on a > dist. repl. volume running a geo-replication. (gfs 3.12.6 on ubuntu 16.04.4) > for e.g. removing an entire directory with subfolders : > tron@gl-node1:/myvol-1/test1/b1$ rm -rf * > > afterwards listing files in the trashcan : > tron@gl-node1:/myvol-1/test1$ ls -la /myvol-1/.trashcan/test1/b1/ > > leads to an outage of the geo-replication. > error on master-01 and master-02 : > > [2018-03-12 13:37:14.827204] I [master(/brick1/mvol1):1385:crawl] > _GMaster: slave's time stime=(1520861818, 0) > [2018-03-12 13:37:14.835535] E [master(/brick1/mvol1):784:log_failures] > _GMaster: ENTRY FAILEDdata=({'uid': 0, 'gfid': > 'c38f75e3-194a-4d22-9094-50ac8f8756e7', 'gid': 0, 'mode': 16877, 'entry': > '.gfid/5531bd64-ac50-462b-943e-c0bf1c52f52c/Oracle_VM_VirtualBox_Extension', > 'op': 'MKDIR'}, 2, {'gfid_mismatch': False, 'dst': False}) > [2018-03-12 13:37:14.835911] E > [syncdutils(/brick1/mvol1):299:log_raise_exception] > : The above directory failed to sync. Please fix it to proceed further. > > > both gfid's of the directories as shown in the log : > brick1/mvol1/.trashcan/test1/b1 0x5531bd64ac50462b943ec0bf1c52f52c > brick1/mvol1/.trashcan/test1/b1/Oracle_VM_VirtualBox_Extension > 0xc38f75e3194a4d22909450ac8f8756e7 > > the shown directory contains just one file which is stored on gl-node3 and > gl-node4 while node1 and 2 are in geo replication error. > since the filesize limitation of the trashcan is obsolete i'm really > interested to use the trashcan feature but i'm concerned it will interrupt > the geo-replication entirely. > does anybody else have been faced with this situation...any hints, > workarounds... ? > > best regards > Dietmar Putz > > > root@gl-node1:~/tmp# gluster volume info mvol1 > > Volume Name: mvol1 > Type: Distributed-Replicate > Volume ID: a1c74931-568c-4f40-8573-dd344553e557 > Status: Started > Snapshot Count: 0 > Number of Bricks: 2 x 2 = 4 > Transport-type: tcp > Bricks: > Brick1: gl-node1-int:/brick1/mvol1 > Brick2: gl-node2-int:/brick1/mvol1 > Brick3: gl-node3-int:/brick1/mvol1 > Brick4: gl-node4-int:/brick1/mvol1 > Options Reconfigured: > changelog.changelog: on > geo-replication.ignore-pid-check: on > geo-replication.indexing: on > features.trash-max-filesize: 2GB > features.trash: on > transport.address-family: inet > nfs.disable: on > performance.client-io-threads: off > > root@gl-node1:/myvol-1/test1# gluster volume geo-replication mvol1 > gl-node5-int::mvol1 config > special_sync_mode: partial > gluster_log_file: /var/log/glusterfs/geo-replica > tion/mvol1/ssh%3A%2F%2Froot%40192.168.178.65%3Agluster%3A% > 2F%2F127.0.0.1%3Amvol1.gluster.log > ssh_command: ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i > /var/lib/glusterd/geo-replication/secret.pem > change_detector: changelog > use_meta_volume: true > session_owner: a1c74931-568c-4f40-8573-dd344553e557 > state_file: /var/lib/glusterd/geo-replication/mvol1_gl-node5-int_mvol1/ > monitor.status > gluster_params: aux-gfid-mount acl > remote_gsyncd: /nonexistent/gsyncd > working_dir: /var/lib/misc/glusterfsd/mvol1/ssh%3A%2F%2Froot%40192.168. > 178.65%3Agluster%3A%2F%2F127.0.0.1%3Amvol1 > state_detail_file: /var/lib/glusterd/geo-replicat > ion/mvol1_gl-node5-int_mvol1/ssh%3A%2F%2Froot%40192.168. > 178.65%3Agluster%3A%2F%2F127.0.0.1%3Amvol1-detail.status > gluster_command_dir: /usr/sbin/ > pid_file: /var/lib/glusterd/geo-replication/mvol1_gl-node5-int_mvol1/ > monitor.pid > georep_session_working_dir: /var/lib/glusterd/geo-replicat > ion/mvol1_gl-node5-int_mvol1/ > ssh_command_tar: ssh -oPasswordAuthentication=no > -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replicat > ion/tar_ssh.pem > master.stime_xattr_name: trusted.glusterfs.a1c74931-568 > c-4f40-8573-dd344553e557.d62bda3a-1396-492a-ad99-7c6238d93c6a.stime > changelog_log_file: /var/log/glusterfs/geo-replica > tion/mvol1/ssh%3A%2F%2Froot%40192.168.178.65%3Agluster%3A% > 2F%2F127.0.0.1%3Amvol1-changes.log > socketdir: /var/run/gluster > volume_id: a1c74931-568c-4f40-8573-dd344553e557 > ignore_deletes: false >
[Gluster-users] trashcan on dist. repl. volume with geo-replication
Hello, in regard to https://bugzilla.redhat.com/show_bug.cgi?id=1434066 i have been faced to another issue when using the trashcan feature on a dist. repl. volume running a geo-replication. (gfs 3.12.6 on ubuntu 16.04.4) for e.g. removing an entire directory with subfolders : tron@gl-node1:/myvol-1/test1/b1$ rm -rf * afterwards listing files in the trashcan : tron@gl-node1:/myvol-1/test1$ ls -la /myvol-1/.trashcan/test1/b1/ leads to an outage of the geo-replication. error on master-01 and master-02 : [2018-03-12 13:37:14.827204] I [master(/brick1/mvol1):1385:crawl] _GMaster: slave's time stime=(1520861818, 0) [2018-03-12 13:37:14.835535] E [master(/brick1/mvol1):784:log_failures] _GMaster: ENTRY FAILED data=({'uid': 0, 'gfid': 'c38f75e3-194a-4d22-9094-50ac8f8756e7', 'gid': 0, 'mode': 16877, 'entry': '.gfid/5531bd64-ac50-462b-943e-c0bf1c52f52c/Oracle_VM_VirtualBox_Extension', 'op': 'MKDIR'}, 2, {'gfid_mismatch': False, 'dst': False}) [2018-03-12 13:37:14.835911] E [syncdutils(/brick1/mvol1):299:log_raise_exception] : The above directory failed to sync. Please fix it to proceed further. both gfid's of the directories as shown in the log : brick1/mvol1/.trashcan/test1/b1 0x5531bd64ac50462b943ec0bf1c52f52c brick1/mvol1/.trashcan/test1/b1/Oracle_VM_VirtualBox_Extension 0xc38f75e3194a4d22909450ac8f8756e7 the shown directory contains just one file which is stored on gl-node3 and gl-node4 while node1 and 2 are in geo replication error. since the filesize limitation of the trashcan is obsolete i'm really interested to use the trashcan feature but i'm concerned it will interrupt the geo-replication entirely. does anybody else have been faced with this situation...any hints, workarounds... ? best regards Dietmar Putz root@gl-node1:~/tmp# gluster volume info mvol1 Volume Name: mvol1 Type: Distributed-Replicate Volume ID: a1c74931-568c-4f40-8573-dd344553e557 Status: Started Snapshot Count: 0 Number of Bricks: 2 x 2 = 4 Transport-type: tcp Bricks: Brick1: gl-node1-int:/brick1/mvol1 Brick2: gl-node2-int:/brick1/mvol1 Brick3: gl-node3-int:/brick1/mvol1 Brick4: gl-node4-int:/brick1/mvol1 Options Reconfigured: changelog.changelog: on geo-replication.ignore-pid-check: on geo-replication.indexing: on features.trash-max-filesize: 2GB features.trash: on transport.address-family: inet nfs.disable: on performance.client-io-threads: off root@gl-node1:/myvol-1/test1# gluster volume geo-replication mvol1 gl-node5-int::mvol1 config special_sync_mode: partial gluster_log_file: /var/log/glusterfs/geo-replication/mvol1/ssh%3A%2F%2Froot%40192.168.178.65%3Agluster%3A%2F%2F127.0.0.1%3Amvol1.gluster.log ssh_command: ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret.pem change_detector: changelog use_meta_volume: true session_owner: a1c74931-568c-4f40-8573-dd344553e557 state_file: /var/lib/glusterd/geo-replication/mvol1_gl-node5-int_mvol1/monitor.status gluster_params: aux-gfid-mount acl remote_gsyncd: /nonexistent/gsyncd working_dir: /var/lib/misc/glusterfsd/mvol1/ssh%3A%2F%2Froot%40192.168.178.65%3Agluster%3A%2F%2F127.0.0.1%3Amvol1 state_detail_file: /var/lib/glusterd/geo-replication/mvol1_gl-node5-int_mvol1/ssh%3A%2F%2Froot%40192.168.178.65%3Agluster%3A%2F%2F127.0.0.1%3Amvol1-detail.status gluster_command_dir: /usr/sbin/ pid_file: /var/lib/glusterd/geo-replication/mvol1_gl-node5-int_mvol1/monitor.pid georep_session_working_dir: /var/lib/glusterd/geo-replication/mvol1_gl-node5-int_mvol1/ ssh_command_tar: ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/tar_ssh.pem master.stime_xattr_name: trusted.glusterfs.a1c74931-568c-4f40-8573-dd344553e557.d62bda3a-1396-492a-ad99-7c6238d93c6a.stime changelog_log_file: /var/log/glusterfs/geo-replication/mvol1/ssh%3A%2F%2Froot%40192.168.178.65%3Agluster%3A%2F%2F127.0.0.1%3Amvol1-changes.log socketdir: /var/run/gluster volume_id: a1c74931-568c-4f40-8573-dd344553e557 ignore_deletes: false state_socket_unencoded: /var/lib/glusterd/geo-replication/mvol1_gl-node5-int_mvol1/ssh%3A%2F%2Froot%40192.168.178.65%3Agluster%3A%2F%2F127.0.0.1%3Amvol1.socket log_file: /var/log/glusterfs/geo-replication/mvol1/ssh%3A%2F%2Froot%40192.168.178.65%3Agluster%3A%2F%2F127.0.0.1%3Amvol1.log access_mount: true root@gl-node1:/myvol-1/test1# -- ___ Gluster-users mailing list Gluster-users@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-users
[Gluster-users] Can't heal a volume: "Please check if all brick processes are running."
Hello, We have a very fresh gluster 3.10.10 installation. Our volume is created as distributed volume, 9 bricks 96TB in total (87TB after 10% of gluster disk space reservation) For some reasons I can’t “heal” the volume: # gluster volume heal gv0 Launching heal operation to perform index self heal on volume gv0 has been unsuccessful on bricks that are down. Please check if all brick processes are running. Which processes should be run on every brick for heal operation? # gluster volume status Status of volume: gv0 Gluster process TCP Port RDMA Port Online Pid -- Brick cn01-ib:/gfs/gv0/brick1/brick 0 49152 Y 70850 Brick cn02-ib:/gfs/gv0/brick1/brick 0 49152 Y 102951 Brick cn03-ib:/gfs/gv0/brick1/brick 0 49152 Y 57535 Brick cn04-ib:/gfs/gv0/brick1/brick 0 49152 Y 56676 Brick cn05-ib:/gfs/gv0/brick1/brick 0 49152 Y 56880 Brick cn06-ib:/gfs/gv0/brick1/brick 0 49152 Y 56889 Brick cn07-ib:/gfs/gv0/brick1/brick 0 49152 Y 56902 Brick cn08-ib:/gfs/gv0/brick1/brick 0 49152 Y 94920 Brick cn09-ib:/gfs/gv0/brick1/brick 0 49152 Y 56542 Task Status of Volume gv0 -- There are no active volume tasks # gluster volume info gv0 Volume Name: gv0 Type: Distribute Volume ID: 8becaf78-cf2d-4991-93bf-f2446688154f Status: Started Snapshot Count: 0 Number of Bricks: 9 Transport-type: rdma Bricks: Brick1: cn01-ib:/gfs/gv0/brick1/brick Brick2: cn02-ib:/gfs/gv0/brick1/brick Brick3: cn03-ib:/gfs/gv0/brick1/brick Brick4: cn04-ib:/gfs/gv0/brick1/brick Brick5: cn05-ib:/gfs/gv0/brick1/brick Brick6: cn06-ib:/gfs/gv0/brick1/brick Brick7: cn07-ib:/gfs/gv0/brick1/brick Brick8: cn08-ib:/gfs/gv0/brick1/brick Brick9: cn09-ib:/gfs/gv0/brick1/brick Options Reconfigured: client.event-threads: 8 performance.parallel-readdir: on performance.readdir-ahead: on cluster.nufa: on nfs.disable: on -- Best regards, Anatoliy ___ Gluster-users mailing list Gluster-users@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Expected performance for WORM scenario
Hi, Can you send us the following details: 1. gluster volume info 2. What client you are using to run this? Thanks, Nithya On 12 March 2018 at 18:16, Andreas Ericssonwrote: > Heya fellas. > > I've been struggling quite a lot to get glusterfs to perform even > halfdecently with a write-intensive workload. Testnumbers are from gluster > 3.10.7. > > We store a bunch of small files in a doubly-tiered sha1 hash fanout > directory structure. The directories themselves aren't overly full. Most of > the data we write to gluster is "write once, read probably never", so 99% > of all operations are of the write variety. > > The network between servers is sound. 10gb network cards run over a 10gb > (doh) switch. iperf reports 9.86Gbit/sec. ping reports a latency of 0.1 - > 0.2 ms. There is no firewall, no packet inspection and no nothing between > the servers, and the 10gb switch is the only path between the two machines, > so traffic isn't going over some 2mbit wifi by accident. > > Our main storage has always been really slow (write speed of roughly > 1.5MiB/s), but I had long attributed that to the extremely slow disks we > use to back it, so now that we're expanding I set up a new gluster cluster > with state of the art NVMe SSD drives to boost performance. However, > performance only hopped up to around 2.1MiB/s. Perplexed, I tried it first > with a 3-node cluster using 2GB ramdrives, which got me up to 2.4MiB/s. My > last resort was to use a single node running on ramdisk, just to 100% > exclude any network shenanigans, but the write performance stayed at an > absolutely abysmal 3MiB/s. > > Writing straight to (the same) ramdisk gives me "normal" ramdisk speed (I > don't actually remember the numbers, but my test that took 2 minutes with > gluster completed before I had time to blink). Writing straight to the > backing SSD drives gives me a throughput of 96MiB/sec. > > The test itself writes 8494 files that I simply took randomly from our > production environment, comprising a total of 63.4MiB (so average file size > is just under 8k. Most are actually close to 4k though, with the occasional > 2-or-so MB file in there. > > I have googled and read a *lot* of performance-tuning guides, but the > 3MiB/sec on single-node ramdisk seems to be far beyond the crippling one > can cause by misconfiguration of a single system. > > With this in mind; What sort of write performance can one reasonably hope > to get with gluster? Assume a 3-node cluster running on top of (small) > ramdisks on a fast and stable network. Is it just a bad fit for our > workload? > > /Andreas > > ___ > Gluster-users mailing list > Gluster-users@gluster.org > http://lists.gluster.org/mailman/listinfo/gluster-users > ___ Gluster-users mailing list Gluster-users@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Expected performance for WORM scenario
Hi, Gluster will never perform well for small files. I believe there is nothing you can do with this. Ondrej From: gluster-users-boun...@gluster.org [mailto:gluster-users-boun...@gluster.org] On Behalf Of Andreas Ericsson Sent: Monday, March 12, 2018 1:47 PM To: Gluster-users@gluster.org Subject: [Gluster-users] Expected performance for WORM scenario Heya fellas. I've been struggling quite a lot to get glusterfs to perform even halfdecently with a write-intensive workload. Testnumbers are from gluster 3.10.7. We store a bunch of small files in a doubly-tiered sha1 hash fanout directory structure. The directories themselves aren't overly full. Most of the data we write to gluster is "write once, read probably never", so 99% of all operations are of the write variety. The network between servers is sound. 10gb network cards run over a 10gb (doh) switch. iperf reports 9.86Gbit/sec. ping reports a latency of 0.1 - 0.2 ms. There is no firewall, no packet inspection and no nothing between the servers, and the 10gb switch is the only path between the two machines, so traffic isn't going over some 2mbit wifi by accident. Our main storage has always been really slow (write speed of roughly 1.5MiB/s), but I had long attributed that to the extremely slow disks we use to back it, so now that we're expanding I set up a new gluster cluster with state of the art NVMe SSD drives to boost performance. However, performance only hopped up to around 2.1MiB/s. Perplexed, I tried it first with a 3-node cluster using 2GB ramdrives, which got me up to 2.4MiB/s. My last resort was to use a single node running on ramdisk, just to 100% exclude any network shenanigans, but the write performance stayed at an absolutely abysmal 3MiB/s. Writing straight to (the same) ramdisk gives me "normal" ramdisk speed (I don't actually remember the numbers, but my test that took 2 minutes with gluster completed before I had time to blink). Writing straight to the backing SSD drives gives me a throughput of 96MiB/sec. The test itself writes 8494 files that I simply took randomly from our production environment, comprising a total of 63.4MiB (so average file size is just under 8k. Most are actually close to 4k though, with the occasional 2-or-so MB file in there. I have googled and read a *lot* of performance-tuning guides, but the 3MiB/sec on single-node ramdisk seems to be far beyond the crippling one can cause by misconfiguration of a single system. With this in mind; What sort of write performance can one reasonably hope to get with gluster? Assume a 3-node cluster running on top of (small) ramdisks on a fast and stable network. Is it just a bad fit for our workload? /Andreas - The information contained in this e-mail and in any attachments is confidential and is designated solely for the attention of the intended recipient(s). If you are not an intended recipient, you must not use, disclose, copy, distribute or retain this e-mail or any part thereof. If you have received this e-mail in error, please notify the sender by return e-mail and delete all copies of this e-mail from your computer system(s). Please direct any additional queries to: communicati...@s3group.com. Thank You. Silicon and Software Systems Limited (S3 Group). Registered in Ireland no. 378073. Registered Office: South County Business Park, Leopardstown, Dublin 18.___ Gluster-users mailing list Gluster-users@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-users
[Gluster-users] Expected performance for WORM scenario
Heya fellas. I've been struggling quite a lot to get glusterfs to perform even halfdecently with a write-intensive workload. Testnumbers are from gluster 3.10.7. We store a bunch of small files in a doubly-tiered sha1 hash fanout directory structure. The directories themselves aren't overly full. Most of the data we write to gluster is "write once, read probably never", so 99% of all operations are of the write variety. The network between servers is sound. 10gb network cards run over a 10gb (doh) switch. iperf reports 9.86Gbit/sec. ping reports a latency of 0.1 - 0.2 ms. There is no firewall, no packet inspection and no nothing between the servers, and the 10gb switch is the only path between the two machines, so traffic isn't going over some 2mbit wifi by accident. Our main storage has always been really slow (write speed of roughly 1.5MiB/s), but I had long attributed that to the extremely slow disks we use to back it, so now that we're expanding I set up a new gluster cluster with state of the art NVMe SSD drives to boost performance. However, performance only hopped up to around 2.1MiB/s. Perplexed, I tried it first with a 3-node cluster using 2GB ramdrives, which got me up to 2.4MiB/s. My last resort was to use a single node running on ramdisk, just to 100% exclude any network shenanigans, but the write performance stayed at an absolutely abysmal 3MiB/s. Writing straight to (the same) ramdisk gives me "normal" ramdisk speed (I don't actually remember the numbers, but my test that took 2 minutes with gluster completed before I had time to blink). Writing straight to the backing SSD drives gives me a throughput of 96MiB/sec. The test itself writes 8494 files that I simply took randomly from our production environment, comprising a total of 63.4MiB (so average file size is just under 8k. Most are actually close to 4k though, with the occasional 2-or-so MB file in there. I have googled and read a *lot* of performance-tuning guides, but the 3MiB/sec on single-node ramdisk seems to be far beyond the crippling one can cause by misconfiguration of a single system. With this in mind; What sort of write performance can one reasonably hope to get with gluster? Assume a 3-node cluster running on top of (small) ramdisks on a fast and stable network. Is it just a bad fit for our workload? /Andreas ___ Gluster-users mailing list Gluster-users@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-users