Re: [Gluster-users] Stale locks on shards
Pranith Kumar Karampuri kirjoitti 29.01.2018 07:32: On 29 Jan 2018 10:50 am, "Samuli Heinonen" wrote: Hi! Yes, thank you for asking. I found out this line in the production environment: lgetxattr("/tmp/zone2-ssd1-vmstor1.s6jvPu//.shard/f349ffbd-a423-4fb2-b83c-2d1d5e78e1fb.32", "glusterfs.clrlk.tinode.kblocked", 0x7f2d7c4379f0, 4096) = -1 EPERM (Operation not permitted) I was expecting .kall instead of .blocked, did you change the cli to kind blocked? Yes, I was testing this with different commands. Basicly it seems that name of the attribute is glusterfs.clrlk.t{posix,inode,entry}.k{all,blocked,granted}, am I correct? Is it necessary to set any value or just reguest the attribute with getfattr? And this one in test environment (with posix locks): lgetxattr("/tmp/g1.gHj4Bw//file38", "glusterfs.clrlk.tposix.kblocked", "box1:/gluster/1/export/: posix blocked locks=1 granted locks=0", 4096) = 77 In test environment I tried running following command which seemed to release gluster locks: getfattr -n glusterfs.clrlk.tposix.kblocked file38 So I think it would go like this in production environment with locks on shards (using aux-gfid-mount mount option): getfattr -n glusterfs.clrlk.tinode.kall .shard/f349ffbd-a423-4fb2-b83c-2d1d5e78e1fb.32 I haven't been able to try this out in production environment yet. Is there perhaps something else to notice? Would you be able to tell more about bricks crashing after releasing locks? Under what circumstances that does happen? Is it only process exporting the brick crashes or is there a possibility of data corruption? No data corruption. Brick process where you did clear-locks may crash. Best regards, Samuli Heinonen Pranith Kumar Karampuri wrote: Hi, Did you find the command from strace? On 25 Jan 2018 1:52 pm, "Pranith Kumar Karampuri" mailto:pkara...@redhat.com>> wrote: On Thu, Jan 25, 2018 at 1:49 PM, Samuli Heinonen mailto:samp...@neutraali.net>> wrote: Pranith Kumar Karampuri kirjoitti 25.01.2018 07:09: On Thu, Jan 25, 2018 at 2:27 AM, Samuli Heinonen mailto:samp...@neutraali.net>> wrote: Hi! Thank you very much for your help so far. Could you please tell an example command how to use aux-gid-mount to remove locks? "gluster vol clear-locks" seems to mount volume by itself. You are correct, sorry, this was implemented around 7 years back and I forgot that bit about it :-(. Essentially it becomes a getxattr syscall on the file. Could you give me the clear-locks command you were trying to execute and I can probably convert it to the getfattr command? I have been testing this in test environment and with command: gluster vol clear-locks g1 /.gfid/14341ccb-df7b-4f92-90d5-7814431c5a1c kind all inode Could you do strace of glusterd when this happens? It will have a getxattr with "glusterfs.clrlk" in the key. You need to execute that on the gfid-aux-mount Best regards, Samuli Heinonen Pranith Kumar Karampuri <mailto:pkara...@redhat.com <mailto:pkara...@redhat.com>> 23 January 2018 at 10.30 On Tue, Jan 23, 2018 at 1:38 PM, Samuli Heinonen mailto:samp...@neutraali.net> <mailto:samp...@neutraali.net <mailto:samp...@neutraali.net>>> wrote: Pranith Kumar Karampuri kirjoitti 23.01.2018 09:34: On Mon, Jan 22, 2018 at 12:33 AM, Samuli Heinonen mailto:samp...@neutraali.net> <mailto:samp...@neutraali.net <mailto:samp...@neutraali.net>>> wrote: Hi again, here is more information regarding issue described earlier It looks like self healing is stuck. According to "heal statistics" crawl began at Sat Jan 20 12:56:19 2018 and it's still going on (It's around Sun Jan 21 20:30 when writing this). However glustershd.log says that last heal was completed at "2018-01-20 11:00:13.090697" (which is 13:00 UTC+2). Also "heal info" has been running now for over 16 hours without any information. In statedump I can see that storage nodes have locks on files and some of those are blocked. Ie. Here again it says that ovirt8z2 is having active lock even ovirt8z2 crashed after the lock was granted.: [xlator.features.locks.zone2-ssd1-vmstor1-locks.inode] path=/.shard/3d55f8cc-cda9-489a-b0a3-fd0f43d67876.27 mandatory=0 inodelk-count=3 lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:self-heal inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0, pid = 18446744073709551610, owner=d0c6d857a87f, client=0x7f885845efa0, connection-id=sto2z2.xxx-10975-2018/01/20-10:56:14:649541-zone2-ssd1-vmstor1-client-0-0-0, granted at 2018-01-20 10:59:52 lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:metadata lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0 inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0, pid = 3420, owner=d8b9372c397f, client=0x7f8858410be0, connection-id=ovirt8z2.xxx.com [1] <htt
Re: [Gluster-users] Stale locks on shards
Hi! Yes, thank you for asking. I found out this line in the production environment: lgetxattr("/tmp/zone2-ssd1-vmstor1.s6jvPu//.shard/f349ffbd-a423-4fb2-b83c-2d1d5e78e1fb.32", "glusterfs.clrlk.tinode.kblocked", 0x7f2d7c4379f0, 4096) = -1 EPERM (Operation not permitted) And this one in test environment (with posix locks): lgetxattr("/tmp/g1.gHj4Bw//file38", "glusterfs.clrlk.tposix.kblocked", "box1:/gluster/1/export/: posix blocked locks=1 granted locks=0", 4096) = 77 In test environment I tried running following command which seemed to release gluster locks: getfattr -n glusterfs.clrlk.tposix.kblocked file38 So I think it would go like this in production environment with locks on shards (using aux-gfid-mount mount option): getfattr -n glusterfs.clrlk.tinode.kall .shard/f349ffbd-a423-4fb2-b83c-2d1d5e78e1fb.32 I haven't been able to try this out in production environment yet. Is there perhaps something else to notice? Would you be able to tell more about bricks crashing after releasing locks? Under what circumstances that does happen? Is it only process exporting the brick crashes or is there a possibility of data corruption? Best regards, Samuli Heinonen Pranith Kumar Karampuri wrote: Hi, Did you find the command from strace? On 25 Jan 2018 1:52 pm, "Pranith Kumar Karampuri" mailto:pkara...@redhat.com>> wrote: On Thu, Jan 25, 2018 at 1:49 PM, Samuli Heinonen mailto:samp...@neutraali.net>> wrote: Pranith Kumar Karampuri kirjoitti 25.01.2018 07:09: On Thu, Jan 25, 2018 at 2:27 AM, Samuli Heinonen mailto:samp...@neutraali.net>> wrote: Hi! Thank you very much for your help so far. Could you please tell an example command how to use aux-gid-mount to remove locks? "gluster vol clear-locks" seems to mount volume by itself. You are correct, sorry, this was implemented around 7 years back and I forgot that bit about it :-(. Essentially it becomes a getxattr syscall on the file. Could you give me the clear-locks command you were trying to execute and I can probably convert it to the getfattr command? I have been testing this in test environment and with command: gluster vol clear-locks g1 /.gfid/14341ccb-df7b-4f92-90d5-7814431c5a1c kind all inode Could you do strace of glusterd when this happens? It will have a getxattr with "glusterfs.clrlk" in the key. You need to execute that on the gfid-aux-mount Best regards, Samuli Heinonen Pranith Kumar Karampuri <mailto:pkara...@redhat.com <mailto:pkara...@redhat.com>> 23 January 2018 at 10.30 On Tue, Jan 23, 2018 at 1:38 PM, Samuli Heinonen mailto:samp...@neutraali.net> <mailto:samp...@neutraali.net <mailto:samp...@neutraali.net>>> wrote: Pranith Kumar Karampuri kirjoitti 23.01.2018 09:34: On Mon, Jan 22, 2018 at 12:33 AM, Samuli Heinonen mailto:samp...@neutraali.net> <mailto:samp...@neutraali.net <mailto:samp...@neutraali.net>>> wrote: Hi again, here is more information regarding issue described earlier It looks like self healing is stuck. According to "heal statistics" crawl began at Sat Jan 20 12:56:19 2018 and it's still going on (It's around Sun Jan 21 20:30 when writing this). However glustershd.log says that last heal was completed at "2018-01-20 11:00:13.090697" (which is 13:00 UTC+2). Also "heal info" has been running now for over 16 hours without any information. In statedump I can see that storage nodes have locks on files and some of those are blocked. Ie. Here again it says that ovirt8z2 is having active lock even ovirt8z2 crashed after the lock was granted.: [xlator.features.locks.zone2-ssd1-vmstor1-locks.inode] path=/.shard/3d55f8cc-cda9-489a-b0a3-fd0f43d67876.27 mandatory=0 inodelk-count=3
Re: [Gluster-users] Stale locks on shards
Pranith Kumar Karampuri kirjoitti 25.01.2018 07:09: On Thu, Jan 25, 2018 at 2:27 AM, Samuli Heinonen wrote: Hi! Thank you very much for your help so far. Could you please tell an example command how to use aux-gid-mount to remove locks? "gluster vol clear-locks" seems to mount volume by itself. You are correct, sorry, this was implemented around 7 years back and I forgot that bit about it :-(. Essentially it becomes a getxattr syscall on the file. Could you give me the clear-locks command you were trying to execute and I can probably convert it to the getfattr command? I have been testing this in test environment and with command: gluster vol clear-locks g1 /.gfid/14341ccb-df7b-4f92-90d5-7814431c5a1c kind all inode Best regards, Samuli Heinonen Pranith Kumar Karampuri <mailto:pkara...@redhat.com> 23 January 2018 at 10.30 On Tue, Jan 23, 2018 at 1:38 PM, Samuli Heinonen mailto:samp...@neutraali.net>> wrote: Pranith Kumar Karampuri kirjoitti 23.01.2018 09:34: On Mon, Jan 22, 2018 at 12:33 AM, Samuli Heinonen mailto:samp...@neutraali.net>> wrote: Hi again, here is more information regarding issue described earlier It looks like self healing is stuck. According to "heal statistics" crawl began at Sat Jan 20 12:56:19 2018 and it's still going on (It's around Sun Jan 21 20:30 when writing this). However glustershd.log says that last heal was completed at "2018-01-20 11:00:13.090697" (which is 13:00 UTC+2). Also "heal info" has been running now for over 16 hours without any information. In statedump I can see that storage nodes have locks on files and some of those are blocked. Ie. Here again it says that ovirt8z2 is having active lock even ovirt8z2 crashed after the lock was granted.: [xlator.features.locks.zone2-ssd1-vmstor1-locks.inode] path=/.shard/3d55f8cc-cda9-489a-b0a3-fd0f43d67876.27 mandatory=0 inodelk-count=3 lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:self-heal inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0, pid = 18446744073709551610, owner=d0c6d857a87f, client=0x7f885845efa0, connection-id=sto2z2.xxx-10975-2018/01/20-10:56:14:649541-zone2-ssd1-vmstor1-client-0-0-0, granted at 2018-01-20 10:59:52 lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:metadata lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0 inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0, pid = 3420, owner=d8b9372c397f, client=0x7f8858410be0, connection-id=ovirt8z2.xxx.com [1] <http://ovirt8z2.xxx.com>-5652-2017/12/27-09:49:02:946825-zone2-ssd1-vmstor1-client-0-7-0, granted at 2018-01-20 08:57:23 inodelk.inodelk[1](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid = 18446744073709551610, owner=d0c6d857a87f, client=0x7f885845efa0, connection-id=sto2z2.xxx-10975-2018/01/20-10:56:14:649541-zone2-ssd1-vmstor1-client-0-0-0, blocked at 2018-01-20 10:59:52 I'd also like to add that volume had arbiter brick before crash happened. We decided to remove it because we thought that it was causing issues. However now I think that this was unnecessary. After the crash arbiter logs had lots of messages like this: [2018-01-20 10:19:36.515717] I [MSGID: 115072] [server-rpc-fops.c:1640:server_setattr_cbk] 0-zone2-ssd1-vmstor1-server: 37374187: SETATTR (a52055bd-e2e9-42dd-92a3-e96b693bcafe) ==> (Operation not permitted) [Operation not permitted] Is there anyways to force self heal to stop? Any help would be very much appreciated :) Exposing .shard to a normal mount is opening a can of worms. You should probably look at mounting the volume with gfid aux-mount where you can access a file with /.gfid/to clear locks on it. Mount command: mount -t glusterfs -o aux-gfid-mount vm1:test /mnt/testvol A gfid string will have some hyphens like: 8443-1894-4273-9340-4b212fa1c0e4 That said. Next disconnect on the brick where you successfully did the clear-locks will crash the brick. There was a bug in 3.8.x series with clear-locks which was fixed in 3.9.0 with a feature. The self-heal deadlocks that you witnessed also is fixed in 3.10 version of the release. Thank you the answer. Could you please tell more about crash? What will actually happen or is there a bug report about it? Just want to make sure that we can do everything to secure data on bricks. We will look into upgrade but we have to make sure that new version works for us and of course get self healing working before doing anything :) Locks xlator/module maintains a list of locks that are granted to a client. Clear locks had an issue where it forgets to remove the lock from this list. So the connection list ends up pointing to data that is freed in that list after a clear lock. When a disconnect happens, all the locks that are granted to a client need to be unlocked. So the process starts traversing through this list and when it starts trying to access this freed data it leads to a crash. I fou
Re: [Gluster-users] Stale locks on shards
Hi! Thank you very much for your help so far. Could you please tell an example command how to use aux-gid-mount to remove locks? "gluster vol clear-locks" seems to mount volume by itself. Best regards, Samuli Heinonen Pranith Kumar Karampuri <mailto:pkara...@redhat.com> 23 January 2018 at 10.30 On Tue, Jan 23, 2018 at 1:38 PM, Samuli Heinonen mailto:samp...@neutraali.net>> wrote: Pranith Kumar Karampuri kirjoitti 23.01.2018 09:34: On Mon, Jan 22, 2018 at 12:33 AM, Samuli Heinonen mailto:samp...@neutraali.net>> wrote: Hi again, here is more information regarding issue described earlier It looks like self healing is stuck. According to "heal statistics" crawl began at Sat Jan 20 12:56:19 2018 and it's still going on (It's around Sun Jan 21 20:30 when writing this). However glustershd.log says that last heal was completed at "2018-01-20 11:00:13.090697" (which is 13:00 UTC+2). Also "heal info" has been running now for over 16 hours without any information. In statedump I can see that storage nodes have locks on files and some of those are blocked. Ie. Here again it says that ovirt8z2 is having active lock even ovirt8z2 crashed after the lock was granted.: [xlator.features.locks.zone2-ssd1-vmstor1-locks.inode] path=/.shard/3d55f8cc-cda9-489a-b0a3-fd0f43d67876.27 mandatory=0 inodelk-count=3 lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:self-heal inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0, pid = 18446744073709551610, owner=d0c6d857a87f, client=0x7f885845efa0, connection-id=sto2z2.xxx-10975-2018/01/20-10:56:14:649541-zone2-ssd1-vmstor1-client-0-0-0, granted at 2018-01-20 10:59:52 lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:metadata lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0 inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0, pid = 3420, owner=d8b9372c397f, client=0x7f8858410be0, connection-id=ovirt8z2.xxx.com <http://ovirt8z2.xxx.com>-5652-2017/12/27-09:49:02:946825-zone2-ssd1-vmstor1-client-0-7-0, granted at 2018-01-20 08:57:23 inodelk.inodelk[1](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid = 18446744073709551610, owner=d0c6d857a87f, client=0x7f885845efa0, connection-id=sto2z2.xxx-10975-2018/01/20-10:56:14:649541-zone2-ssd1-vmstor1-client-0-0-0, blocked at 2018-01-20 10:59:52 I'd also like to add that volume had arbiter brick before crash happened. We decided to remove it because we thought that it was causing issues. However now I think that this was unnecessary. After the crash arbiter logs had lots of messages like this: [2018-01-20 10:19:36.515717] I [MSGID: 115072] [server-rpc-fops.c:1640:server_setattr_cbk] 0-zone2-ssd1-vmstor1-server: 37374187: SETATTR (a52055bd-e2e9-42dd-92a3-e96b693bcafe) ==> (Operation not permitted) [Operation not permitted] Is there anyways to force self heal to stop? Any help would be very much appreciated :) Exposing .shard to a normal mount is opening a can of worms. You should probably look at mounting the volume with gfid aux-mount where you can access a file with /.gfid/to clear locks on it. Mount command: mount -t glusterfs -o aux-gfid-mount vm1:test /mnt/testvol A gfid string will have some hyphens like: 8443-1894-4273-9340-4b212fa1c0e4 That said. Next disconnect on the brick where you successfully did the clear-locks will crash the brick. There was a bug in 3.8.x series with clear-locks which was fixed in 3.9.0 with a feature. The self-heal deadlocks that you witnessed also is fixed in 3.10 version of the release. Thank you the answer. Could you please tell more about crash? What will actually happen or is there a bug report about it? Just want to make sure that we can do everything to secure data on bricks. We will look into upgrade but we have to make sure that new version works for us and of course get self healing working before doing anything :) Locks xlator/module maintains a list of locks that are granted to a client. Clear locks had an issue where it forgets to remove the lock from this
Re: [Gluster-users] Stale locks on shards
Pranith Kumar Karampuri kirjoitti 23.01.2018 09:34: On Mon, Jan 22, 2018 at 12:33 AM, Samuli Heinonen wrote: Hi again, here is more information regarding issue described earlier It looks like self healing is stuck. According to "heal statistics" crawl began at Sat Jan 20 12:56:19 2018 and it's still going on (It's around Sun Jan 21 20:30 when writing this). However glustershd.log says that last heal was completed at "2018-01-20 11:00:13.090697" (which is 13:00 UTC+2). Also "heal info" has been running now for over 16 hours without any information. In statedump I can see that storage nodes have locks on files and some of those are blocked. Ie. Here again it says that ovirt8z2 is having active lock even ovirt8z2 crashed after the lock was granted.: [xlator.features.locks.zone2-ssd1-vmstor1-locks.inode] path=/.shard/3d55f8cc-cda9-489a-b0a3-fd0f43d67876.27 mandatory=0 inodelk-count=3 lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:self-heal inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0, pid = 18446744073709551610, owner=d0c6d857a87f, client=0x7f885845efa0, connection-id=sto2z2.xxx-10975-2018/01/20-10:56:14:649541-zone2-ssd1-vmstor1-client-0-0-0, granted at 2018-01-20 10:59:52 lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:metadata lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0 inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0, pid = 3420, owner=d8b9372c397f, client=0x7f8858410be0, connection-id=ovirt8z2.xxx.com-5652-2017/12/27-09:49:02:946825-zone2-ssd1-vmstor1-client-0-7-0, granted at 2018-01-20 08:57:23 inodelk.inodelk[1](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid = 18446744073709551610, owner=d0c6d857a87f, client=0x7f885845efa0, connection-id=sto2z2.xxx-10975-2018/01/20-10:56:14:649541-zone2-ssd1-vmstor1-client-0-0-0, blocked at 2018-01-20 10:59:52 I'd also like to add that volume had arbiter brick before crash happened. We decided to remove it because we thought that it was causing issues. However now I think that this was unnecessary. After the crash arbiter logs had lots of messages like this: [2018-01-20 10:19:36.515717] I [MSGID: 115072] [server-rpc-fops.c:1640:server_setattr_cbk] 0-zone2-ssd1-vmstor1-server: 37374187: SETATTR (a52055bd-e2e9-42dd-92a3-e96b693bcafe) ==> (Operation not permitted) [Operation not permitted] Is there anyways to force self heal to stop? Any help would be very much appreciated :) Exposing .shard to a normal mount is opening a can of worms. You should probably look at mounting the volume with gfid aux-mount where you can access a file with /.gfid/to clear locks on it. Mount command: mount -t glusterfs -o aux-gfid-mount vm1:test /mnt/testvol A gfid string will have some hyphens like: 8443-1894-4273-9340-4b212fa1c0e4 That said. Next disconnect on the brick where you successfully did the clear-locks will crash the brick. There was a bug in 3.8.x series with clear-locks which was fixed in 3.9.0 with a feature. The self-heal deadlocks that you witnessed also is fixed in 3.10 version of the release. Thank you the answer. Could you please tell more about crash? What will actually happen or is there a bug report about it? Just want to make sure that we can do everything to secure data on bricks. We will look into upgrade but we have to make sure that new version works for us and of course get self healing working before doing anything :) Br, Samuli 3.8.x is EOLed, so I recommend you to upgrade to a supported version soon. Best regards, Samuli Heinonen Samuli Heinonen 20 January 2018 at 21.57 Hi all! One hypervisor on our virtualization environment crashed and now some of the VM images cannot be accessed. After investigation we found out that there was lots of images that still had active lock on crashed hypervisor. We were able to remove locks from "regular files", but it doesn't seem possible to remove locks from shards. We are running GlusterFS 3.8.15 on all nodes. Here is part of statedump that shows shard having active lock on crashed node: [xlator.features.locks.zone2-ssd1-vmstor1-locks.inode] path=/.shard/75353c17-d6b8-485d-9baf-fd6c700e39a1.21 mandatory=0 inodelk-count=1 lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:metadata lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:self-heal lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0 inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0, pid = 3568, owner=14ce372c397f, client=0x7f3198388770, connection-id ovirt8z2.xxx-5652-2017/12/27-09:49:02:946825-zone2-ssd1-vmstor1-client-1-7-0, granted at 2018-01-20 08:57:24 If we try to run clear-locks we get following error message: # gluster volume clear-locks zone2-ssd1-vmstor1 /.shard/75353c17-d6b8-485d-9baf-fd6c700e39a1.21 kind all inode Volume clear-locks unsuccessful clear-locks getxattr command failed. Reason: Operation not permitted Gluster vol info if ne
Re: [Gluster-users] Stale locks on shards
Hi again, here is more information regarding issue described earlier It looks like self healing is stuck. According to "heal statistics" crawl began at Sat Jan 20 12:56:19 2018 and it's still going on (It's around Sun Jan 21 20:30 when writing this). However glustershd.log says that last heal was completed at "2018-01-20 11:00:13.090697" (which is 13:00 UTC+2). Also "heal info" has been running now for over 16 hours without any information. In statedump I can see that storage nodes have locks on files and some of those are blocked. Ie. Here again it says that ovirt8z2 is having active lock even ovirt8z2 crashed after the lock was granted.: [xlator.features.locks.zone2-ssd1-vmstor1-locks.inode] path=/.shard/3d55f8cc-cda9-489a-b0a3-fd0f43d67876.27 mandatory=0 inodelk-count=3 lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:self-heal inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0, pid = 18446744073709551610, owner=d0c6d857a87f, client=0x7f885845efa0, connection-id=sto2z2.xxx-10975-2018/01/20-10:56:14:649541-zone2-ssd1-vmstor1-client-0-0-0, granted at 2018-01-20 10:59:52 lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:metadata lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0 inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0, pid = 3420, owner=d8b9372c397f, client=0x7f8858410be0, connection-id=ovirt8z2.xxx.com-5652-2017/12/27-09:49:02:946825-zone2-ssd1-vmstor1-client-0-7-0, granted at 2018-01-20 08:57:23 inodelk.inodelk[1](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid = 18446744073709551610, owner=d0c6d857a87f, client=0x7f885845efa0, connection-id=sto2z2.xxx-10975-2018/01/20-10:56:14:649541-zone2-ssd1-vmstor1-client-0-0-0, blocked at 2018-01-20 10:59:52 I'd also like to add that volume had arbiter brick before crash happened. We decided to remove it because we thought that it was causing issues. However now I think that this was unnecessary. After the crash arbiter logs had lots of messages like this: [2018-01-20 10:19:36.515717] I [MSGID: 115072] [server-rpc-fops.c:1640:server_setattr_cbk] 0-zone2-ssd1-vmstor1-server: 37374187: SETATTR (a52055bd-e2e9-42dd-92a3-e96b693bcafe) ==> (Operation not permitted) [Operation not permitted] Is there anyways to force self heal to stop? Any help would be very much appreciated :) Best regards, Samuli Heinonen Samuli Heinonen <mailto:samp...@neutraali.net> 20 January 2018 at 21.57 Hi all! One hypervisor on our virtualization environment crashed and now some of the VM images cannot be accessed. After investigation we found out that there was lots of images that still had active lock on crashed hypervisor. We were able to remove locks from "regular files", but it doesn't seem possible to remove locks from shards. We are running GlusterFS 3.8.15 on all nodes. Here is part of statedump that shows shard having active lock on crashed node: [xlator.features.locks.zone2-ssd1-vmstor1-locks.inode] path=/.shard/75353c17-d6b8-485d-9baf-fd6c700e39a1.21 mandatory=0 inodelk-count=1 lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:metadata lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:self-heal lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0 inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0, pid = 3568, owner=14ce372c397f, client=0x7f3198388770, connection-id ovirt8z2.xxx-5652-2017/12/27-09:49:02:946825-zone2-ssd1-vmstor1-client-1-7-0, granted at 2018-01-20 08:57:24 If we try to run clear-locks we get following error message: # gluster volume clear-locks zone2-ssd1-vmstor1 /.shard/75353c17-d6b8-485d-9baf-fd6c700e39a1.21 kind all inode Volume clear-locks unsuccessful clear-locks getxattr command failed. Reason: Operation not permitted Gluster vol info if needed: Volume Name: zone2-ssd1-vmstor1 Type: Replicate Volume ID: b6319968-690b-4060-8fff-b212d2295208 Status: Started Snapshot Count: 0 Number of Bricks: 1 x 2 = 2 Transport-type: rdma Bricks: Brick1: sto1z2.xxx:/ssd1/zone2-vmstor1/export Brick2: sto2z2.xxx:/ssd1/zone2-vmstor1/export Options Reconfigured: cluster.shd-wait-qlength: 1 cluster.shd-max-threads: 8 cluster.locking-scheme: granular performance.low-prio-threads: 32 cluster.data-self-heal-algorithm: full performance.client-io-threads: off storage.linux-aio: off performance.readdir-ahead: on client.event-threads: 16 server.event-threads: 16 performance.strict-write-ordering: off performance.quick-read: off performance.read-ahead: on performance.io-cache: off performance.stat-prefetch: off cluster.eager-lock: enable network.remote-dio: on cluster.quorum-type: none network.ping-timeout: 22 performance.write-behind: off nfs.disable: on features.shard: on features.shard-block-size: 512MB storage.owner-uid: 36 storage.owner-gid: 36 performance.io-thread-count: 64 performance.cache-size: 2048MB performance.write-behind-window-size: 256MB server.all
[Gluster-users] Stale locks on shards
Hi all! One hypervisor on our virtualization environment crashed and now some of the VM images cannot be accessed. After investigation we found out that there was lots of images that still had active lock on crashed hypervisor. We were able to remove locks from "regular files", but it doesn't seem possible to remove locks from shards. We are running GlusterFS 3.8.15 on all nodes. Here is part of statedump that shows shard having active lock on crashed node: [xlator.features.locks.zone2-ssd1-vmstor1-locks.inode] path=/.shard/75353c17-d6b8-485d-9baf-fd6c700e39a1.21 mandatory=0 inodelk-count=1 lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:metadata lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:self-heal lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0 inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0, pid = 3568, owner=14ce372c397f, client=0x7f3198388770, connection-id ovirt8z2.xxx-5652-2017/12/27-09:49:02:946825-zone2-ssd1-vmstor1-client-1-7-0, granted at 2018-01-20 08:57:24 If we try to run clear-locks we get following error message: # gluster volume clear-locks zone2-ssd1-vmstor1 /.shard/75353c17-d6b8-485d-9baf-fd6c700e39a1.21 kind all inode Volume clear-locks unsuccessful clear-locks getxattr command failed. Reason: Operation not permitted Gluster vol info if needed: Volume Name: zone2-ssd1-vmstor1 Type: Replicate Volume ID: b6319968-690b-4060-8fff-b212d2295208 Status: Started Snapshot Count: 0 Number of Bricks: 1 x 2 = 2 Transport-type: rdma Bricks: Brick1: sto1z2.xxx:/ssd1/zone2-vmstor1/export Brick2: sto2z2.xxx:/ssd1/zone2-vmstor1/export Options Reconfigured: cluster.shd-wait-qlength: 1 cluster.shd-max-threads: 8 cluster.locking-scheme: granular performance.low-prio-threads: 32 cluster.data-self-heal-algorithm: full performance.client-io-threads: off storage.linux-aio: off performance.readdir-ahead: on client.event-threads: 16 server.event-threads: 16 performance.strict-write-ordering: off performance.quick-read: off performance.read-ahead: on performance.io-cache: off performance.stat-prefetch: off cluster.eager-lock: enable network.remote-dio: on cluster.quorum-type: none network.ping-timeout: 22 performance.write-behind: off nfs.disable: on features.shard: on features.shard-block-size: 512MB storage.owner-uid: 36 storage.owner-gid: 36 performance.io-thread-count: 64 performance.cache-size: 2048MB performance.write-behind-window-size: 256MB server.allow-insecure: on cluster.ensure-durability: off config.transport: rdma server.outstanding-rpc-limit: 512 diagnostics.brick-log-level: INFO Any recommendations how to advance from here? Best regards, Samuli Heinonen ___ Gluster-users mailing list Gluster-users@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] [Gluster-devel] 3.7.13 & proxmox/qemu
Here is a quick way how to test this: GlusterFS 3.7.13 volume with default settings with brick on ZFS dataset. gluster-test1 is server and gluster-test2 is client mounting with FUSE. Writing file with oflag=direct is not ok: [root@gluster-test2 gluster]# dd if=/dev/zero of=file oflag=direct count=1 bs=1024000 dd: failed to open ‘file’: Invalid argument Enable network.remote-dio on Gluster Volume: [root@gluster-test1 gluster]# gluster volume set gluster network.remote-dio enable volume set: success Writing small file with oflag=direct is ok: [root@gluster-test2 gluster]# dd if=/dev/zero of=file oflag=direct count=1 bs=1024000 1+0 records in 1+0 records out 1024000 bytes (1.0 MB) copied, 0.0103793 s, 98.7 MB/s Writing bigger file with oflag=direct is ok: [root@gluster-test2 gluster]# dd if=/dev/zero of=file3 oflag=direct count=100 bs=1M 100+0 records in 100+0 records out 104857600 bytes (105 MB) copied, 1.10583 s, 94.8 MB/s Enable Sharding on Gluster Volume: [root@gluster-test1 gluster]# gluster volume set gluster features.shard enable volume set: success Writing small file with oflag=direct is ok: [root@gluster-test2 gluster]# dd if=/dev/zero of=file3 oflag=direct count=1 bs=1M 1+0 records in 1+0 records out 1048576 bytes (1.0 MB) copied, 0.0115247 s, 91.0 MB/s Writing bigger file with oflag=direct is not ok: [root@gluster-test2 gluster]# dd if=/dev/zero of=file3 oflag=direct count=100 bs=1M dd: error writing ‘file3’: Operation not permitted dd: closing output file ‘file3’: Operation not permitted -samuli > On 22 Jul 2016, at 16:12, Vijay Bellur wrote: > > 2016-07-22 1:54 GMT-04:00 Frank Rothenstein > : >> The point is that even if all other backend storage filesystems do correctly >> untill 3.7.11 there was no error on ZFS. Something happened nobody ever >> could explain in the release of 3.7.12 that makes FUSE-mount _in ovirt_ (it >> partly uses dd with iflag=direct , using iflag=direct yourself gives also >> errors on the FUSE-mounts ) unusable. >> >> So 3.7.11 is the last usable version when using ZFS on bricks, afaik. >> > > Can you please share the exact dd command that causes this problem? > > Thanks, > Vijay > ___ > Gluster-users mailing list > Gluster-users@gluster.org > http://www.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] [Gluster-devel] 3.7.13 & proxmox/qemu
> On 21 Jul 2016, at 20:48, David Gossage wrote: > > Wonder if this may be related at all > > * #1347553: O_DIRECT support for sharding > https://bugzilla.redhat.com/show_bug.cgi?id=1347553 > > Is it possible to downgrade from 3.8 back to 3.7.x > > Building test box right now anyway but wondering. > Have you been able to do any testing yet? "O_DIRECT support for sharding" has been also included in 3.7.12. Is this problem occurring only when sharding is enabled? Is it possible that it requires direct I/O all the way to the bricks with sharding even when network.remote-dio is enabled? Best regards, Samuli Heinonen ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] [Gluster-devel] 3.7.13 & proxmox/qemu
Hi all, I’m running oVirt 3.6 and Gluster 3.7 with ZFS backend. All hypervisor and storage nodes have CentOS 7. I was planning to upgrade to 3.7.13 during weekend but i’ll probably wait for more information on this issue. Afaik ZFS on Linux doesn’t support aio. Has there been some changes to GlusterFS regarding aio? Best regards, Samuli heinonen > On 21 Jul 2016, at 21:00, David Gossage wrote: > > On Thu, Jul 21, 2016 at 12:48 PM, David Gossage > wrote: > On Thu, Jul 21, 2016 at 9:58 AM, David Gossage > wrote: > On Thu, Jul 21, 2016 at 9:52 AM, Niels de Vos wrote: > On Sun, Jul 10, 2016 at 10:49:52AM +1000, Lindsay Mathieson wrote: > > Did a quick test this morning - 3.7.13 is now working with libgfapi - yay! > > > > > > However I do have to enable write-back or write-through caching in qemu > > before the vm's will start, I believe this is to do with aio support. Not a > > problem for me. > > > > I see there are settings for storage.linux-aio and storage.bd-aio - not sure > > as to whether they are relevant or which ones to play with. > > Both storage.*-aio options are used by the brick processes. Depending on > what type of brick you have (linux = filesystem, bd = LVM Volume Group) > you could enable the one or the other. > > We do have a strong suggestion to set these "gluster volume group .." > options: > https://github.com/gluster/glusterfs/blob/master/extras/group-virt.example > > From those options, network.remote-dio seems most related to your aio > theory. It was introduced with http://review.gluster.org/4460 that > contains some more details. > > > Wonder if this may be related at all > > * #1347553: O_DIRECT support for sharding > https://bugzilla.redhat.com/show_bug.cgi?id=1347553 > > Is it possible to downgrade from 3.8 back to 3.7.x > > Building test box right now anyway but wondering. > > May be anecdotal with small sample size but the few people who have had issue > all seemed to have zfs backed gluster volumes. > > Now that i recall back to the day I updated. The gluster volume on xfs I use > for my hosted engine never had issues. > > > > > Thanks with the exception of stat-prefetch I have those enabled > I could try turning that back off though at the time of update to 3.7.13 it > was off. I didnt turn it back on till later in next week after downgrading > back to 3.7.11. > > Number of Bricks: 1 x 3 = 3 > Transport-type: tcp > Bricks: > Brick1: ccgl1.gl.local:/gluster1/BRICK1/1 > Brick2: ccgl2.gl.local:/gluster1/BRICK1/1 > Brick3: ccgl4.gl.local:/gluster1/BRICK1/1 > Options Reconfigured: > diagnostics.brick-log-level: WARNING > features.shard-block-size: 64MB > features.shard: on > performance.readdir-ahead: on > storage.owner-uid: 36 > storage.owner-gid: 36 > performance.quick-read: off > performance.read-ahead: off > performance.io-cache: off > performance.stat-prefetch: on > cluster.eager-lock: enable > network.remote-dio: enable > cluster.quorum-type: auto > cluster.server-quorum-type: server > server.allow-insecure: on > cluster.self-heal-window-size: 1024 > cluster.background-self-heal-count: 16 > performance.strict-write-ordering: off > nfs.disable: on > nfs.addr-namelookup: off > nfs.enable-ino32: off > > > HTH, > Niels > > ___ > Gluster-users mailing list > Gluster-users@gluster.org > http://www.gluster.org/mailman/listinfo/gluster-users > > > > ___ > Gluster-users mailing list > Gluster-users@gluster.org > http://www.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Gluster performance on the small files
Hi! What image type you are using to store virtual machines? For example using sparse QCOW2 images is much slower than preallocated RAW images. Performance with QCOW2 should get better after image file has grown bigger and it's not necessary to resize sparse image anymore. Best regards, Samuli Heinonen On 13.2.2015, at 8.58, Punit Dambiwal wrote: > Hi, > > I have seen the gluster performance is dead slow on the small files...even i > am using the SSDit's too bad performanceeven i am getting better > performance in my SAN with normal SATA disk... > > I am using distributed replicated glusterfs with replica count=2...i have all > SSD disks on the brick... > > root@vm3:~# dd bs=64k count=4k if=/dev/zero of=test oflag=dsync > > 4096+0 records in > > 4096+0 records out > > 268435456 bytes (268 MB) copied, 57.3145 s, 4.7 MB/s > > > > root@vm3:~# dd bs=64k count=4k if=/dev/zero of=test conv=fdatasync > > 4096+0 records in > > 4096+0 records out > > 268435456 bytes (268 MB) copied, 1.80093 s, 149 MB/s > > > > Thanks, > > Punit > > ___ > Gluster-users mailing list > Gluster-users@gluster.org > http://www.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] libgfapi failover problem on replica bricks
Hi, Could you send output of gluster volume info and also exact command you are using to start VM's and what cache settings you are using with KVM? -samuli Paul Penev kirjoitti 21.4.2014 kello 10.47: > Ok, here is one more hint that point in the direction of libgfapi not > re-establishing the connections to the bricks after they come back > online: if I migrate the KVM machine (live) from one node to another > after the bricks are back online, and I kill the second brick, the KVM > will not suffer from disk problems. It is obvious that during > migration, the new process on the new node is forced to reconnect to > the gluster volume, hence reestablishing both links. After this it is > ready to loose one of the links without problems. > > Steps to replicate: > > 1. Start KVM VM and boot from a replicated volume > 2. killall -KILL glusterfsd on one brick (brick1). Verify the the kvm > is still working. > 3. Bring back the glusterfsd on brick1. > 4. heal the volume (gluster vol heal ) and wait until gluster vol > heal info shows no self-heal backlog. > 5. Now migrate the KVM from one node to another node. > 6. killall -KILL glusterfsd on the second brick (brick2). > 7. Verify that KVM is still working (!) It would die from disk errors > before, if step 5 was not executed. > 8. Bring back glusterfsd on brick2, heal and enjoy. > 9. repeat at will: The KVM will never die again, provided you migrate > is once before brick failure. > > What this means to me: there's a problem in libgfapi, gluster 3.4.2 > and 3.4.3 (at least) and/or kvm 1.7.1 (I'm running the latest 1.7 > source tree in production). > > Joe: we're in your hands. I hope you find the problem somewhere. > > Paul. > ___ > Gluster-users mailing list > Gluster-users@gluster.org > http://supercolony.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] gluster and kvm livemigration
23.1.2014 17:27, Bernhard Glomm kirjoitti: After migration the disks become read-only because on migration the disk files changes ownership from libvirt-qemu to root What am I missing? I'm not sure of this but is it possible that this is because of different ownership and permission on bricks? Can you try to set storage.owner-uid and storage.owner-gid to libvirt-qemu? To do that you have to stop volume. -samuli ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] gluster and kvm livemigration
Hello Bernhard, Can you test if setting option network.remote-dio to enable allows you to use cache=none? -samuli Bernhard Glomm kirjoitti 17.1.2014 kello 16.41: > Pranith, > I stopped the volume > started it again, > mounted it on both hosts > started the VM > did the livemigration > and collected the logs: > - etc-glusterfs-glusterd.vol.log > - glustershd.log > - srv-vm_infrastructure-vm-atom01.log > - cli.log > from the beginning of the gluster volume start. > You can found them here (part 1 to 3): > http://pastebin.com/mnATm2BE > http://pastebin.com/RYZFP3E9 > http://pastebin.com/HAXEGd54 > > further more: > gluster --version: glusterfs 3.4.2 built on Jan 11 2014 03:21:47 > ubuntu: raring > filesystem on the gluster bricks: zfs-0.6.2 > > gluster volume info fs_vm_atom01 > Volume Name: fs_vm_atom01 > Type: Replicate > Volume ID: fea9bdcf-783e-442a-831d-f564f8dbe551 > Status: Started > Number of Bricks: 1 x 2 = 2 > Transport-type: tcp > Bricks: > Brick1: 172.24.1.11:/zp_ping_1/fs_vm_atom01 > Brick2: 172.24.1.13:/zp_pong_1/fs_vm_atom01 > Options Reconfigured: > diagnostics.client-log-level: DEBUG > server.allow-insecure: on > > disk part of VM configuration > > /usr/bin/kvm-spice > > > > >function='0x0'/> > > > can't use as josh suggested because couldn't get > my qemu recompiled with gluster enabled yet. > > Are there other special tuning parameter for kvm/qemu/ to set on gluster? > as mentioned: all works except the livemigration (disk image file becomes > read only) > and I have to use something different than cache=none... > > TIA > > Bernhard > > > On 17.01.2014 05:04:52, Pranith Kumar Karampuri wrote: > Bernhard, > Configuration seems ok. Could you please give the log files of the bricks and > mount please. If you think it is not a big procedure to do this live > migration, could you set client-log-level to DEBUG and provide the log files > of that run. > > Pranith > > - Original Message - > From: "Bernhard Glomm" > To: pkara...@redhat.com > Cc: gluster-users@gluster.org > Sent: Thursday, January 16, 2014 5:58:17 PM > Subject: Re: [Gluster-users] gluster and kvm livemigration > > > hi Pranith > > # gluster volume info fs_vm_atom01 > > Volume Name: fs_vm_atom01 > Type: Replicate > Volume ID: fea9bdcf-783e-442a-831d-f564f8dbe551 > Status: Started > Number of Bricks: 1 x 2 = 2 > Transport-type: tcp > Bricks: > Brick1: 172.24.1.11:/zp_ping_1/fs_vm_atom01 > Brick2: 172.24.1.13:/zp_pong_1/fs_vm_atom01 > Options Reconfigured: > diagnostics.client-log-level: ERROR > > > TIA > Bernhard > > > On 16.01.2014 13:05:12, Pranith Kumar Karampuri wrote: > hi Bernhard, > Could you give gluster volume info output? > > Pranith > > - Original Message - > From: "Bernhard Glomm" <> > bernhard.gl...@ecologic.eu> > > > To: > > gluster-users@gluster.org > Sent: Thursday, January 16, 2014 4:22:36 PM > Subject: [Gluster-users] gluster and kvm livemigration > > I experienced a strange behavior of glusterfs during livemigration > of a qemu-kvm guest > using a 10GB file on a mirrored gluster 3.4.2 volume > (both on ubuntu 13.04) > I run > virsh migrate --verbose --live --unsafe --p2p --domain atom01 --desturi > qemu+ssh:///system > and the migration works, > the running machine is pingable and keeps sending pings. > nevertheless, when I let the machine touch a file during migration > it stops, complaining that it's filesystem is read only (from that moment > that > migration finished) > A reboot from inside the machine failes, > machine goes down and comes up with an error > unable to write to sector xx on hd0 > (than falling into the initrd). > a > virsh destroy VM && virsh start VM > leads to a perfect running VM again, > no matter on which of the two hosts I start the machine > anybody better experience with livemigration? > any hint on a procedure how to debug that? > TIA > Bernhard > > -- > > > Bernhard Glomm > IT Administration > > Phone: +49 (30) 86880 134 > Fax: +49 (30) 86880 100 > Skype: bernhard.glomm.ecologic > > Ecologic Institut gemeinnützige GmbH | Pfalzburger Str. 43/44 | 10717 > Berlin > | Germany > GF: R. Andreas Kraemer | AG: Charlottenburg HRB 57947 | USt/VAT-IdNr.: > DE811963464 > Ecologic™ is a Trade Mark (TM) of Ecologic Institut gemeinnützige GmbH > > > ___ > Gluster-users mailing list > Gluster-users@gluster.org > http://supercolony.gluster.org/mailman/listinfo/gluster-users > > > > > -- > > > > > > > > > > Bernhard Glomm > > IT Administration > > > > > Phone: > > > +49 (30) 86880 134 > > > Fax: > > > +49 (30) 86880 100 > > > Skype: > > > bernhard.glomm.ecologic > > > > > > > > > > > > > > > > > Ecologic Institut gemeinnützige GmbH | Pfalzburger Str. 43/44 | > 10717 Berlin | Germany > > GF: R. Andreas Kraemer | AG: Charlottenburg HRB 57947 | > USt/VAT-IdNr.: DE811963464 > > Ecologic™ is a Trade Mark (TM) of Ecologic
Re: [Gluster-users] glusterfs-3.5.0qa3 released
Sorry for the spammng. It actually looks like that none of the packages is signed but http://bits.gluster.org/pub/gluster/glusterfs/stage.repo has gpgcheck enabled. -samuli Samuli Heinonen kirjoitti 6.12.2013 kello 12.20: > Hello, > > Are these good packages to be used for Glusterfest testing? When installing > glusterfs-server it complains that "Package > glusterfs-libs-3.5.0qa3-1.el6.x86_64.rpm is not signed". > > -samuli > > > Gluster Build System kirjoitti 6.12.2013 kello > 10.19: > >> >> RPM: http://bits.gluster.org/pub/gluster/glusterfs/3.5.0qa3/ >> >> SRC: >> http://bits.gluster.org/pub/gluster/glusterfs/src/glusterfs-3.5.0qa3.tar.gz >> >> This release is made off jenkins-release-51 >> >> -- Gluster Build System >> ___ >> Gluster-users mailing list >> Gluster-users@gluster.org >> http://supercolony.gluster.org/mailman/listinfo/gluster-users > > ___ > Gluster-users mailing list > Gluster-users@gluster.org > http://supercolony.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] glusterfs-3.5.0qa3 released
Hello, Are these good packages to be used for Glusterfest testing? When installing glusterfs-server it complains that "Package glusterfs-libs-3.5.0qa3-1.el6.x86_64.rpm is not signed". -samuli Gluster Build System kirjoitti 6.12.2013 kello 10.19: > > RPM: http://bits.gluster.org/pub/gluster/glusterfs/3.5.0qa3/ > > SRC: > http://bits.gluster.org/pub/gluster/glusterfs/src/glusterfs-3.5.0qa3.tar.gz > > This release is made off jenkins-release-51 > > -- Gluster Build System > ___ > Gluster-users mailing list > Gluster-users@gluster.org > http://supercolony.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] mkfs.xfs warning advice needed
Hello, I'm getting similar warning and I haven't found out if this is "real deal". There are posts available at https://wiki.xkyle.com/XFS_Block_Sizes and at http://blog.tsunanet.net/2011/08/mkfsxfs-raid10-optimal-performance.html that explain how to calculate correct values. If I get everything correct, meaning that XFS doesn't complain anything, then it doesn't match up at all with stripe size specified in raid array. So my best guess is that in my case XFS isn't getting correct information from LSI raid controller. I haven't done any excessive testing comparing different settings so I may be completely wrong too :) -samuli "Ellison, Bob" kirjoitti 4.12.2013 kello 18.34: > A little off topic, but thought someone here would know off the top of their > head... > > I'm setting up a new glusterfs system. I have a hardware raid6 configured as > 10+2 with a 256K chunk size (so the partition being seen in Redhat is a > single LUN). I get a warning when I initialize it with XFS: > > [root@lab-ads1 ~]# mkfs.xfs -f -isize=512 -d su=256k,sw=10 /dev/mapper/c0 > mkfs.xfs: Specified data stripe width 5120 is not the same as the volume > stripe width 2048 > > The mkfs seems to work fine (and hence why I characterize the message as a > warning). > > I've searched around and see this type of message all over the place. But > there's never a hint if this is an expected message for such a configuration > which can be safely ignored or if I'm severely misinterpreting the man page > and performance will suffer as these partitions fill up with data. > > Thanks in advance, > Bob > > ___ > Gluster-users mailing list > Gluster-users@gluster.org > http://supercolony.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Error when trying to connect to a gluster volume (with libvirt/libgfapi)
Jacob Yundt kirjoitti 2.12.2013 kello 19.42: > > I think "option rpc-auth-allow-insecure on" in > /etc/glusterfs/glusterd.vol was the trick. > > However, I seem to be getting a new error now: Can you try to set owner in bluster volume, i.e.: gluster volume set kvmimages storage.owner-uid 36 gluster volume set kvmimages storage.owner-gid 36 -samuli ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Error when trying to connect to a gluster volume (with libvirt/libgfapi)
Jacob Yundt kirjoitti 2.12.2013 kello 19.27: > > I checked and I _do_ have allow-insecure set: > Did you remember to put "option rpc-auth-allow-insecure on" into /etc/glusterfs/glusterd.vol aswell? -samuli ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] KVM guest I/O errors with xfs backed gluster volumes
6.11.2013 14:33, Jacob Yundt kirjoitti: On Tue, Nov 5, 2013 at 10:56 PM, Bharata B Rao wrote: My below mail didn't make it to the list, hence resending... On Tue, Nov 5, 2013 at 8:04 PM, Bharata B Rao wrote: On Wed, Oct 30, 2013 at 11:26:48PM +0530, Bharata B Rao wrote: On Tue, Oct 29, 2013 at 1:21 PM, Anand Avati wrote: Looks like what is happening is that qemu performs ioctls() on the backend to query logical_block_size (for direct IO alignment). That works on XFS, but fails on FUSE (hence qemu ends up performing IO with default 512 alignment rather than 4k). Looks like this might be something we can enhance gluster driver in qemu. Note that glusterfs does not have an ioctl() FOP, but we could probably wire up a virtual xattr call for this purpose. Copying Bharata to check if he has other solutions in mind. I see alignment issues and subsequent QEMU failure (pread() failing with EINVAL) when I use a file from XFS mount point (with sectsz=4k) as a virtio disk with cache=none QEMU option. However this failure isn't seen when I have sectsz=512. And all this is w/o gluster. So there seems to be some alignment issues even w/o gluster, I will debug more and get back. I gather that QEMU block layer and SeaBIOS don't yet support 4k sectors. So this is not a QEMU-GlusterFS specific issue. You could either not use cache=none option which results in O_DIRECT or use the below something like below which explicitly sets the sector size and min io size for the guest. -drive file=/mnt/xfs.img,if=none,cache=none,format=raw,id=mydisk -device virtio-blk,drive=mydisk,logical_block_size=4096,physical_block_size=4096,min_io_size=4096 Ref: https://bugzilla.redhat.com/show_bug.cgi?id=997839 Regards, Bharata. Bharata- Thanks for the update on this. I'm going to give these qemu args a try and see what happens. On a side-note, I can't believe more users aren't running into this issue. I assumed (perhaps incorrectly) that most modern drives were using 4K sectors. -Jacob Jacob, are you using xfs on top of HDD or are you using somekind of RAID? We have disks with 4K sectors and we are using those in RAID-6 setup with LSI Megaraid controller. We haven't run into these issues and I wasn't able to reproduce it. I did only very quick tests tho, so it may be that I have missed something. -samuli ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Gluster 3.4 QEMU and Permission Denied Errors
Hello Andrew, How are you booting/managing VM's? Which user you use to launch them? Have you enabled Gluster mounting from insecure ports? It needs two changes. You have to edit glusterd.vol (in /etc/glusterfs directory) and add line "option rpc-auth-allow-insecure on". Also you have to set volume option server.allow-insecure on (ie. gluster volume set volname server.allow-insecure on). Restart of glusterd and stop and start of the volume is required for these changes to take effect. 16.9.2013 21:38, Andrew Niemantsverdriet kirjoitti: Hey List, I'm trying to test out using Gluster 3.4 for virtual machine disks. My enviroment consists of two Fedora 19 hosts with gluster and qemu/kvm installed. I have a single volume on gluster called vmdata that contains my qcow2 formated image created like this: qemu-img create -f qcow2 gluster://localhost/vmdata/test1.qcow 8G I'm able to boot my created virtual machine but in the logs I see this: [2013-09-16 15:16:04.471205] E [addr.c:152:gf_auth] 0-auth/addr: client is bound to port 46021 which is not privileged [2013-09-16 15:16:04.471277] I [server-handshake.c:567:server_setvolume] 0-vmdata-server: accepted client from gluster1.local-1061-2013/09/16-15:16:04:441166-vmdata-client-1-0 (version: 3.4.0)[2013-09-16 15:16:04.488000] I [server-rpc-fops.c:1572:server_open_cbk] 0-vmdata-server: 18: OPEN /test1.qcow (6b63a78b-7d5c-4195-a172-5bb6ed1e7dac) ==> (Permission denied) I have turned off SELinux to be sure that isn't in the way. When I look at the permissions on the file using ls -l I see the file is set to 600, this doesn't seem right. I tried manually changing the permission to 755 as a test and as soon as the machine booted it was changed back to 600. Any hints as to what is going on and how to get the disk functioning? The machine will boot but as soon as anything is written to disk it will hang forever. Thanks, ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
[Gluster-users] Geo-replication broken in 3.4 alpha2?
Dear all, I'm running GlusterFS 3.4 alpha2 together with oVirt 3.2. This is solely a test system and it doesn't have much data or anything important in it. Currently it has only 2 VM's running and disk usage is around 15 GB. I have been trying to set up a geo-replication for disaster recovery testing. For geo-replication I did following: All machines are running CentOS 6.4 and using GlusterFS packages from http://download.gluster.org/pub/gluster/glusterfs/qa-releases/3.4.0alpha2/EPEL.repo/. Gluster bricks are using XFS. On slave I have tried ext4 and btrfs. 1. Installed slave machine (VM hosted in separate environment) with glusterfs-geo-replication, rsync and some other packages as needed by dependencies. 2. Installed glusterfs-geo-replication and rsync packages on GlusterFS server. 3. Created ssh key on server, saved it to /var/lib/glusterd/geo-replication/secret.pem and copied it to slave /root/.ssh/authorized_keys 4. On server ran: - gluster volume geo-replication vmstorage slave:/backup/vmstorage config remote_gsyncd /usr/libexec/glusterfs/gsyncd - gluster volume geo-replication vmstorage slave:/backup/vmstorage start After that geo-replication status was "starting…" for a while and after that it switched to "N/A". I set log-level to DEBUG and saw lines like these appearing every 10 seconds: [2013-03-20 18:48:19.417107] D [repce:175:push] RepceClient: call 27756:140178941277952:1363798099.42 keep_alive(None,) ... [2013-03-20 18:48:19.418431] D [repce:190:__call__] RepceClient: call 27756:140178941277952:1363798099.42 keep_alive -> 34 [2013-03-20 18:48:29.427959] D [repce:175:push] RepceClient: call 27756:140178941277952:1363798109.43 keep_alive(None,) ... [2013-03-20 18:48:29.429172] D [repce:190:__call__] RepceClient: call 27756:140178941277952:1363798109.43 keep_alive -> 35 I thought that maybe it's creating index or something like that let it run for about 30 hours. Still after that there was no new log messages and no data being transferred to slave. I tried using strace -p 27756 to see what was going on but there was no output at all. My next thought was that maybe running virtual machines are causing some trouble so I shut down all VMs and restarted geo-replication but it didn't have any effect. My last effort was to crete new clean volume without any data in it and try geo-replication with it - no luck there either. I also did quick test with master running GlusterFS 3.3.1 and it had no problems copying data to exactly same slave server. There isn't much documentation available about geo-replication and before filing a bug report I'd like to hear if anyone else has used geo-replication successfully with 3.4 alpha orif I'm missing something obvious. Output of gluster volume info: Volume Name: vmstorage Type: Distributed-Replicate Volume ID: a800e5b7-089e-4b55-9515-c9cc72502aea Status: Started Number of Bricks: 2 x 2 = 4 Transport-type: tcp Bricks: Brick1: mc1.ovirt.local:/gluster/brick0/vmstorage Brick2: mc5.ovirt.local:/gluster/brick0/vmstorage Brick3: mc1.ovirt.local:/gluster/brick1/vmstorage Brick4: mc5.ovirt.local:/gluster/brick1/vmstorage Options Reconfigured: performance.stat-prefetch: off performance.io-cache: off performance.read-ahead: off performance.quick-read: off network.remote-dio: enable geo-replication.indexing: on storage.owner-uid: 36 storage.owner-gid: 36 network.ping-timeout: 10 nfs.disable: on Best regards, Samuli Heinonen ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] IRC channel stuck on invite only
Hey, Invite flag has been removed. Welcome! -samppah ps. thanks to JoeJulian once again ;) On 15.12.2012 14:43, Andrew Holway wrote: Hi, Any ops here? irc channel seems broken. Ta, Andrew ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] "Granular locking" - does this need to be enabled in 3.3.0 ?
3) sanlock configured (this is evil!) Just out of curiosity, can you please tell more why it is evil? I just found it out after your first post and want to know if there's any gotchas :) Though it doesn't corrupt data, the I/O performance is < 1% of my hardwares capability. Hopefully work on buffering and other tuning will fix this ? Or maybe the work mentioned getting qemu talking directly to gluster will fix this? Have you tried setting performance.client-io-threads on if it makes any difference? As a side note I have to say that I have seen similar problems with RAID-5 systems even when using them as non-replicated iSCSI target. In my experience it's definetly not good for hosting VM images. -samuli ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Gluster v 3.3 with KVM and High Availability
On 12.07.2012 11:40, Mark Nipper wrote: Something concerns me about those performance figures. If I'm reading them correctly, the normal fuse mount performance is about what I was seeing, 2-3MB. And now bypassing everything, libglusterfs is still capping out a little under 20MB/s. It's running tests on four files at the same time. Minb shows speed of slowest test, maxb is the fastest and aggrb shows all four tests aggregated. So am I kidding myself that approaching 45-50MB/s with a FUSE based Gluster mount and using cache=writethrough is actually a safe thing to do really? I know the performance is abysmal without setting the cache mode, but is using writethrough really safe, or is it a recipe for disaster waiting to happen? When using writethrough the data is written to cache but not marked finished before it has hit the disk. Data in cache can be then used to speed up read operations. So I would consider it pretty safe, but perhaps someone else can explain it better? ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Gluster v 3.3 with KVM and High Availability
On 11.07.2012 21:35, Brian Candler wrote: On Wed, Jul 11, 2012 at 12:55:50PM -0500, Mark Nipper wrote: Would that be using something like O_DIRECT which FUSE doesn't support at the moment? Yes. FUSE does support it in recent kernels (3.4), and I tried it. Nothing happened until I also mounted with -o direct-io-mode=enable; with that and cache=none, the VM was unable to start up at all. This FUSE patch has been backported into RHEL 6.3 and should also work with latest 6.2 kernels. Iirc, it also worked with cache=none without any special mount options, but unfortunately I don't have possibility to test if that's true :( -samuli ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Write performance in a replicated/distributed setup with KVM?
2.3.2012 15:33, Harald Hannelius kirjoitti: The pattern for me starts to look like this; max-write-speed ~= /nodes. Have you tried tuning performance.io-thread-count setting? More information about that can be found at http://docs.redhat.com/docs/en-US/Red_Hat_Storage_Software_Appliance/3.2/html/User_Guide/chap-User_Guide-Managing_Volumes.html -samuli ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Quota Translator Help
Hey! > I've tried implementing the quota translator and I am not having any luck. > Please check out below: I made few corrections to your vol file. Please try with it :) -samuli volume paul0 type storage/posix option directory /gluster/paul end-volume volume locks1 type features/locks subvolumes paul0 end-volume volume iothreads1 type performance/io-threads option thread-count 8 subvolumes locks1 end-volume volume brick1 type features/quota option disk-usage-limit 10GB#; usage limit in bytes subvolumes iothreads1 end-volume volume server-tcp type protocol/server option transport-type tcp option auth.addr.brick1.allow * option transport.socket.listen-port 6996 option transport.socket.nodelay on subvolumes brick1 end-volume ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Slow speed with Gluster Backed Xen DomU's
Hello Sheng, Sorry for the delay with my answer. I got your message earlier but I was hoping that someone else would answer with more precise information. I did some testing almost a year ago to use GlusterFS to store Xen images. I was testing with pre 2.10 version of GlusterFS but my findings were very similar to yours. Writing speed was fine when done from Dom0 to GlusterFS mount, I was even able to max out GigE link with my cheap hardware. However when I did same tests on DomU (image inside the GlusterFS mount) the write speeds dropped drastically. Read speeds were ok all the time. Unfortunately I didn't have time to investigate it much further back then. Although I did some performance tests with Bonnie++ on Dom0 and noticed that block write speeds were almost the same as write speeds in general on DomU. So, my guess is that block write speeds needs somekind of on GlusterFS side? -samuli > Hope your day has been going well. I tried emailing this in a week or > so ago, but i do not think that it got emailed to the group. I am just > trying again, but apologies in advanced if you have already read this. > I am currently using GlusterFS as a distributed SAN backend for the > Xen based cloud platform we are developing. > > We deploy Xen virtuals on pairs of servers using GlusterFs V3 in > replicate mode on Debian Stable (Lenny) with Xen 3.2.1 as the > hypervisor. I am currently experiencing a weird issue where within the > virtual machines (DomU) running on a GlusterFS mount only receive > around 10-18MB/s write speeds, but full speed reads. > > Our hardware for each node is Dual Core Xeon Processors, 8GB of RAM > and 4 * High Speed SATA drives (RAID 10, around 160MB/s writes and > reads). > > If I write a file to the Gluster mount in the Dom0 (host) we receive > around 90-100MB/s writes (maxing out the GigE link). If I run the > virtual machine on the disks without Gluster I get much higher speeds > within the DomU of around 80-90MB/s. > > This slow down only appears to occur on writes. Does anyone with a > better understanding of GlusterFS, Fuse and filesystems have an idea > why this is slowing down. The underlying file system is Ext3 using > TAP:AIO within Xen to connect to a file image based disk. This is > without using gluster fuse client (what benefits does this give?) and > Gluster version 3.0.4. ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Odd issue with apache 2.2.15 and glusterfs git 3.1.0git
Hello, Do you have MMAP enabled in httpd.conf? If so, try turning it off (EnableMMAP off). The default setting is on. I'm using Centos 5.5 with GlusterFS 3.0.4 and haven't seen such problems with my setup. -samuli - Alkuperäinen viesti - > If I use --disable-direct-io-mode when mounting the glusterfs document > root, I don't seem to have this issue with apache. > > Is this normal? > > > > This message (including any attachments) contains confidential > information intended for a specific individual and purpose, and is > protected by law. If you are not the intended recipient, you should > delete this message. Any disclosure, copying, or distribution of this > message, or the taking of any action based on it, is strictly > prohibited. ___ > Gluster-users mailing list Gluster-users@gluster.org > http://gluster.org/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users