Re: [Gluster-users] Geo-Rep. 3.5.3, Missing Files, Incorrect Files Pending
Is it reasonable for me to just remove all of the XSYNC-CHANGELOG files to make it start over with a full sync? I just want to figure out how to get it to pick up again. Is it better to remove and re-create the geo-rep session? Thanks, Dave On Tue, May 5, 2015 at 9:27 AM, David Gibbons david.c.gibb...@gmail.com wrote: I caught one of the nodes transitioning into faulty mode, log output is below. In master nodes, look for log messages. Let us know if you feel any issue in log messages. (/var/log/glusterfs/geo-replication/) When one of the nodes drops into faulty, which happens periodically, this is the type of output that appears in the log: [root@gfs-a-1 ~]# tail /usr/local/var/log/glusterfs/geo-replication/shares/ssh%3A%2F%2Froot%4010.XX.XXX.X%3Agluster%3A%2F%2F127.0.0.1%3Abkpshares.log [2015-05-05 09:22:58.140913] W [master(/mnt/a-1-shares-brick-2/brick):250:regjob] top: Rsync: .gfid/065c09f9-4502-4a2c-81fa-5e8fcaf22712 [errcode: 23] [2015-05-05 09:22:58.152951] W [master(/mnt/a-1-shares-brick-2/brick):250:regjob] top: Rsync: .gfid/28a237a4-4346-48c5-bd1c-713273f591c7 [errcode: 23] [2015-05-05 09:22:58.327603] W [master(/mnt/a-1-shares-brick-2/brick):250:regjob] top: Rsync: .gfid/5755db3e-e9d8-42d2-b415-890842b086ae [errcode: 23] [2015-05-05 09:22:58.336714] W [master(/mnt/a-1-shares-brick-2/brick):250:regjob] top: Rsync: .gfid/0b7fc219-1e31-4e66-865f-5ae1c26d5e54 [errcode: 23] [2015-05-05 09:22:58.360308] W [master(/mnt/a-1-shares-brick-2/brick):250:regjob] top: Rsync: .gfid/955cd0e4-dd06-4db6-9391-34dbf72c9b06 [errcode: 23] [2015-05-05 09:22:58.367522] W [master(/mnt/a-1-shares-brick-2/brick):250:regjob] top: Rsync: .gfid/1d455725-c3e1-4111-92e5-335610d3f513 [errcode: 23] [2015-05-05 09:22:58.368226] W [master(/mnt/a-1-shares-brick-2/brick):250:regjob] top: Rsync: .gfid/7ce881ae-3491-4e21-b38b-0a27fb620c74 [errcode: 23] [2015-05-05 09:22:58.368959] W [master(/mnt/a-1-shares-brick-2/brick):250:regjob] top: Rsync: .gfid/056732c1-1537-4925-a30c-b905c110a5b2 [errcode: 23] [2015-05-05 09:22:58.369635] W [master(/mnt/a-1-shares-brick-2/brick):250:regjob] top: Rsync: .gfid/8c58d6c5-9975-43c6-8f4c-2a92337f7350 [errcode: 23] [2015-05-05 09:22:58.369790] W [master(/mnt/a-1-shares-brick-2/brick):877:process] _GMaster: incomplete sync, retrying changelogs: XSYNC-CHANGELOG.1430830891 When the node is in active mode, I get a lot of log output that resembles this: [2015-05-05 09:23:54.735502] W [master(/mnt/a-1-shares-brick-3/brick):877:process] _GMaster: incomplete sync, retrying changelogs: XSYNC-CHANGELOG.1430832227 [2015-05-05 09:23:55.449265] W [master(/mnt/a-1-shares-brick-3/brick):250:regjob] top: Rsync: .gfid/0665be16-04e9-4cbe-a2c9-a633caa8c79d [errcode: 23] [2015-05-05 09:23:55.449491] W [master(/mnt/a-1-shares-brick-3/brick):877:process] _GMaster: incomplete sync, retrying changelogs: XSYNC-CHANGELOG.1430832227 [2015-05-05 09:23:56.277033] W [master(/mnt/a-1-shares-brick-3/brick):250:regjob] top: Rsync: .gfid/0665be16-04e9-4cbe-a2c9-a633caa8c79d [errcode: 23] [2015-05-05 09:23:56.277259] W [master(/mnt/a-1-shares-brick-3/brick):860:process] _GMaster: changelogs XSYNC-CHANGELOG.1430832227 could not be processed - moving on... [2015-05-05 09:23:56.294038] W [master(/mnt/a-1-shares-brick-3/brick):862:process] _GMaster: SKIPPED GFID = [2015-05-05 09:23:56.381592] I [master(/mnt/a-1-shares-brick-3/brick):1130:crawl] _GMaster: finished hybrid crawl syncing [2015-05-05 09:24:24.404884] I [master(/mnt/a-1-shares-brick-4/brick):445:crawlwrap] _GMaster: 1 crawls, 1 turns [2015-05-05 09:24:24.437452] I [master(/mnt/a-1-shares-brick-4/brick):1124:crawl] _GMaster: starting hybrid crawl... [2015-05-05 09:24:24.588865] I [master(/mnt/a-1-shares-brick-1/brick):1133:crawl] _GMaster: processing xsync changelog /usr/local/var/run/gluster/shares/ssh%3A%2F%2Froot%4010.XX.XXX.X%3Agluster%3A%2F%2F127.0.0.1%3Abkpshares/9d9a72f468c582609e97e8929e58b9ff/xsync/XSYNC-CHANGELOG.1430832135 This begs a couple of questions for me: Are these errcode:23 issues files that have been deleted/renamed since the changelog was created? Is it correct/expected for the node to drop into faulty and then recover itself to active periodically? Thank you again for your assistance! Dave ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Geo-Rep. 3.5.3, Missing Files, Incorrect Files Pending
Thank you, responses and further questions inline below. In master nodes, look for log messages. Let us know if you feel any issue in log messages. (/var/log/glusterfs/geo-replication/) The workers have been transitioning between active and faulty. They will throw an error in the log (I believe it's related to rsync error 23 or something, but will have to isolate it again), then switch to faulty. A minute or so later they are back to Active. Ideally, after initial crawl geo-rep should switch to Changelog crawl. Thanks for clarifying, I will wait and look for that. It appears that xsync is the default, but I did change it to changelog yesterday. Which is a more reliable option? -- Geo-rep doesn't have persistent store of all path names and sync status. When geo-rep gets the list of files to be synced, it adds the number to the counter. But if the same files modified again the counter will be incremented again. Numbers in Status output will not match the number of files on disk. When does it get reset back to 0? Or where are the 8191 files that it thinks are out of sync stored? I would like to be able to sanity check the progress. Thanks, Dave ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Geo-Rep. 3.5.3, Missing Files, Incorrect Files Pending
I caught one of the nodes transitioning into faulty mode, log output is below. In master nodes, look for log messages. Let us know if you feel any issue in log messages. (/var/log/glusterfs/geo-replication/) When one of the nodes drops into faulty, which happens periodically, this is the type of output that appears in the log: [root@gfs-a-1 ~]# tail /usr/local/var/log/glusterfs/geo-replication/shares/ssh%3A%2F%2Froot%4010.XX.XXX.X%3Agluster%3A%2F%2F127.0.0.1%3Abkpshares.log [2015-05-05 09:22:58.140913] W [master(/mnt/a-1-shares-brick-2/brick):250:regjob] top: Rsync: .gfid/065c09f9-4502-4a2c-81fa-5e8fcaf22712 [errcode: 23] [2015-05-05 09:22:58.152951] W [master(/mnt/a-1-shares-brick-2/brick):250:regjob] top: Rsync: .gfid/28a237a4-4346-48c5-bd1c-713273f591c7 [errcode: 23] [2015-05-05 09:22:58.327603] W [master(/mnt/a-1-shares-brick-2/brick):250:regjob] top: Rsync: .gfid/5755db3e-e9d8-42d2-b415-890842b086ae [errcode: 23] [2015-05-05 09:22:58.336714] W [master(/mnt/a-1-shares-brick-2/brick):250:regjob] top: Rsync: .gfid/0b7fc219-1e31-4e66-865f-5ae1c26d5e54 [errcode: 23] [2015-05-05 09:22:58.360308] W [master(/mnt/a-1-shares-brick-2/brick):250:regjob] top: Rsync: .gfid/955cd0e4-dd06-4db6-9391-34dbf72c9b06 [errcode: 23] [2015-05-05 09:22:58.367522] W [master(/mnt/a-1-shares-brick-2/brick):250:regjob] top: Rsync: .gfid/1d455725-c3e1-4111-92e5-335610d3f513 [errcode: 23] [2015-05-05 09:22:58.368226] W [master(/mnt/a-1-shares-brick-2/brick):250:regjob] top: Rsync: .gfid/7ce881ae-3491-4e21-b38b-0a27fb620c74 [errcode: 23] [2015-05-05 09:22:58.368959] W [master(/mnt/a-1-shares-brick-2/brick):250:regjob] top: Rsync: .gfid/056732c1-1537-4925-a30c-b905c110a5b2 [errcode: 23] [2015-05-05 09:22:58.369635] W [master(/mnt/a-1-shares-brick-2/brick):250:regjob] top: Rsync: .gfid/8c58d6c5-9975-43c6-8f4c-2a92337f7350 [errcode: 23] [2015-05-05 09:22:58.369790] W [master(/mnt/a-1-shares-brick-2/brick):877:process] _GMaster: incomplete sync, retrying changelogs: XSYNC-CHANGELOG.1430830891 When the node is in active mode, I get a lot of log output that resembles this: [2015-05-05 09:23:54.735502] W [master(/mnt/a-1-shares-brick-3/brick):877:process] _GMaster: incomplete sync, retrying changelogs: XSYNC-CHANGELOG.1430832227 [2015-05-05 09:23:55.449265] W [master(/mnt/a-1-shares-brick-3/brick):250:regjob] top: Rsync: .gfid/0665be16-04e9-4cbe-a2c9-a633caa8c79d [errcode: 23] [2015-05-05 09:23:55.449491] W [master(/mnt/a-1-shares-brick-3/brick):877:process] _GMaster: incomplete sync, retrying changelogs: XSYNC-CHANGELOG.1430832227 [2015-05-05 09:23:56.277033] W [master(/mnt/a-1-shares-brick-3/brick):250:regjob] top: Rsync: .gfid/0665be16-04e9-4cbe-a2c9-a633caa8c79d [errcode: 23] [2015-05-05 09:23:56.277259] W [master(/mnt/a-1-shares-brick-3/brick):860:process] _GMaster: changelogs XSYNC-CHANGELOG.1430832227 could not be processed - moving on... [2015-05-05 09:23:56.294038] W [master(/mnt/a-1-shares-brick-3/brick):862:process] _GMaster: SKIPPED GFID = [2015-05-05 09:23:56.381592] I [master(/mnt/a-1-shares-brick-3/brick):1130:crawl] _GMaster: finished hybrid crawl syncing [2015-05-05 09:24:24.404884] I [master(/mnt/a-1-shares-brick-4/brick):445:crawlwrap] _GMaster: 1 crawls, 1 turns [2015-05-05 09:24:24.437452] I [master(/mnt/a-1-shares-brick-4/brick):1124:crawl] _GMaster: starting hybrid crawl... [2015-05-05 09:24:24.588865] I [master(/mnt/a-1-shares-brick-1/brick):1133:crawl] _GMaster: processing xsync changelog /usr/local/var/run/gluster/shares/ssh%3A%2F%2Froot%4010.XX.XXX.X%3Agluster%3A%2F%2F127.0.0.1%3Abkpshares/9d9a72f468c582609e97e8929e58b9ff/xsync/XSYNC-CHANGELOG.1430832135 This begs a couple of questions for me: 1. Are these errcode:23 issues files that have been deleted/renamed since the changelog was created? 2. Is it correct/expected for the node to drop into faulty and then recover itself to active periodically? Thank you again for your assistance! Dave ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Geo-Rep. 3.5.3, Missing Files, Incorrect Files Pending
So I should do a compare out-of-band from Gluster and see what is actually in-sync vs out of sync? Is there any easy way just to start it over? I am assuming removing and re-adding geo-rep is the easiest way. Is that correct? Thanks, Dave On Mon, May 4, 2015 at 10:09 PM, Aravinda avish...@redhat.com wrote: Status output has issue showing exact number of files in sync. Please check the numbers on disk and let us know if difference exists between Master and Secondary Volume. -- regards Aravinda On 05/05/2015 06:58 AM, David Gibbons wrote: I am having an issue with geo-replication. There were a number of complications when I upgraded to 3.5.3, but geo-replication was (I think) working at some point. The volume is accessed via samba using vfs_glusterfs. The main issue is that geo-replication has not been sending updated copies of old files to the replicated server. So in the scenario where file created - time passes - file is modified - file is saved, the new version is not replicated. Is it possible that one brick is having a geo-rep issue and the others are not? Consider this output: MASTER NODE MASTER VOLMASTER BRICK SLAVE STATUS CHECKPOINT STATUSCRAWL STATUSFILES SYNCDFILES PENDINGBYTES PENDINGDELETES PENDINGFILES SKIPPED gfs-a-1 shares /mnt/a-1-shares-brick-1/brickgfs-a-bkp::bkpsharesActive N/A Hybrid Crawl230945600 0 0 gfs-a-1 shares /mnt/a-1-shares-brick-2/brickgfs-a-bkp::bkpsharesActive N/A Hybrid Crawl231555700 0 0 gfs-a-1 shares /mnt/a-1-shares-brick-3/brickgfs-a-bkp::bkpsharesActive N/A Hybrid Crawl236288400 0 0 gfs-a-1 shares /mnt/a-1-shares-brick-4/brickgfs-a-bkp::bkpsharesActive N/A Hybrid Crawl240760000 0 0 gfs-a-2 shares /mnt/a-2-shares-brick-1/brickgfs-a-bkp::bkpsharesActive N/A Hybrid Crawl240943000 0 0 gfs-a-2 shares /mnt/a-2-shares-brick-2/brickgfs-a-bkp::bkpsharesActive N/A Hybrid Crawl230896900 0 0 gfs-a-2 shares /mnt/a-2-shares-brick-3/brickgfs-a-bkp::bkpsharesActive N/A Hybrid Crawl20795768191 0 0 0 gfs-a-2 shares /mnt/a-2-shares-brick-4/brickgfs-a-bkp::bkpsharesActive N/A Hybrid Crawl234059700 0 0 gfs-a-3 shares /mnt/a-3-shares-brick-1/brickgfs-a-bkp::bkpsharesPassive N/A N/A 0 00 0 0 gfs-a-3 shares /mnt/a-3-shares-brick-2/brickgfs-a-bkp::bkpsharesPassive N/A N/A 0 00 0 0 gfs-a-3 shares /mnt/a-3-shares-brick-3/brickgfs-a-bkp::bkpsharesPassive N/A N/A 0 00 0 0 gfs-a-3 shares /mnt/a-3-shares-brick-4/brickgfs-a-bkp::bkpsharesPassive N/A N/A 0 00 0 0 gfs-a-4 shares /mnt/a-4-shares-brick-1/brickgfs-a-bkp::bkpsharesPassive N/A N/A 0 00 0 0 gfs-a-4 shares /mnt/a-4-shares-brick-2/brickgfs-a-bkp::bkpsharesPassive N/A N/A 0 00 0 0 gfs-a-4 shares /mnt/a-4-shares-brick-3/brickgfs-a-bkp::bkpsharesPassive N/A N/A 0 00 0 0 gfs-a-4 shares /mnt/a-4-shares-brick-4/brickgfs-a-bkp::bkpsharesPassive N/A N/A 0 00
[Gluster-users] Geo-Rep. 3.5.3, Missing Files, Incorrect Files Pending
I am having an issue with geo-replication. There were a number of complications when I upgraded to 3.5.3, but geo-replication was (I think) working at some point. The volume is accessed via samba using vfs_glusterfs. The main issue is that geo-replication has not been sending updated copies of old files to the replicated server. So in the scenario where file created - time passes - file is modified - file is saved, the new version is not replicated. Is it possible that one brick is having a geo-rep issue and the others are not? Consider this output: MASTER NODE MASTER VOLMASTER BRICK SLAVE STATUS CHECKPOINT STATUSCRAWL STATUSFILES SYNCDFILES PENDINGBYTES PENDINGDELETES PENDINGFILES SKIPPED gfs-a-1 shares /mnt/a-1-shares-brick-1/brickgfs-a-bkp::bkpsharesActive N/A Hybrid Crawl230945600 0 0 gfs-a-1 shares /mnt/a-1-shares-brick-2/brickgfs-a-bkp::bkpsharesActive N/A Hybrid Crawl231555700 0 0 gfs-a-1 shares /mnt/a-1-shares-brick-3/brickgfs-a-bkp::bkpsharesActive N/A Hybrid Crawl236288400 0 0 gfs-a-1 shares /mnt/a-1-shares-brick-4/brickgfs-a-bkp::bkpsharesActive N/A Hybrid Crawl240760000 0 0 gfs-a-2 shares /mnt/a-2-shares-brick-1/brickgfs-a-bkp::bkpsharesActive N/A Hybrid Crawl240943000 0 0 gfs-a-2 shares /mnt/a-2-shares-brick-2/brickgfs-a-bkp::bkpsharesActive N/A Hybrid Crawl230896900 0 0 gfs-a-2 shares /mnt/a-2-shares-brick-3/brickgfs-a-bkp::bkpsharesActive N/A Hybrid Crawl20795768191 0 0 0 gfs-a-2 shares /mnt/a-2-shares-brick-4/brickgfs-a-bkp::bkpsharesActive N/A Hybrid Crawl234059700 0 0 gfs-a-3 shares /mnt/a-3-shares-brick-1/brickgfs-a-bkp::bkpsharesPassive N/A N/A 0 00 0 0 gfs-a-3 shares /mnt/a-3-shares-brick-2/brickgfs-a-bkp::bkpsharesPassive N/A N/A 0 00 0 0 gfs-a-3 shares /mnt/a-3-shares-brick-3/brickgfs-a-bkp::bkpsharesPassive N/A N/A 0 00 0 0 gfs-a-3 shares /mnt/a-3-shares-brick-4/brickgfs-a-bkp::bkpsharesPassive N/A N/A 0 00 0 0 gfs-a-4 shares /mnt/a-4-shares-brick-1/brickgfs-a-bkp::bkpsharesPassive N/A N/A 0 00 0 0 gfs-a-4 shares /mnt/a-4-shares-brick-2/brickgfs-a-bkp::bkpsharesPassive N/A N/A 0 00 0 0 gfs-a-4 shares /mnt/a-4-shares-brick-3/brickgfs-a-bkp::bkpsharesPassive N/A N/A 0 00 0 0 gfs-a-4 shares /mnt/a-4-shares-brick-4/brickgfs-a-bkp::bkpsharesPassive N/A N/A 0 00 0 0 This seems to show that there are 8191 files_pending on just one brick, and the others are up to date. I am suspicious of the 8191 number because it's looks like we're at a bucket-size boundary on the backend. I've tried stopping and re-starting the rep session. I've also tried changing the change_detector from xsync to changelog. Neither seems to have had an effect. It seems like geo-replication is quite wonky in 3.5.x. Is there light at the end of the tunnel, or should I find another solution to replicate? Cheers, Dave ___ Gluster-users mailing list
Re: [Gluster-users] Trying to use gluster using Virtual IP.
I use VIPs and keepalived on my production configuration as well. You don't want to peer probe with the VIP. You want to peer probe with the actual IP. The VIP is merely a forwarding-facing mechanism for clients to connect to, and that's why it fails between your gluster peers. The peers themselves already know how to handle failover in a more graceful way than a VIP :). Remove the peers then re-probe with the actual IP instead of the VIP. The VIP is just for clients. Cheers, Dave On Mon, Jan 12, 2015 at 7:57 AM, Sergio Traldi sergio.tra...@pd.infn.it wrote: Hi, We have a SAN with 14 TB of disks space and we have 2 controllers attached to this SAN. We want to use this storage using gluster. Our goal is to use this storage in high availability, i.e. we want to keep using all the storage even if there are some problems with one of the controllers. Our idea is the following: - Create 2 LUN - Attach via iscsi the 2 LUN to each Controller Hosts. - Create a brick on each controller node (brick1 for Controller1 and brick2 for Controller2) - Make the login so each controller are able to mount disk1 to brick1 and disk2 to brick2. - Install keepalived (a routing software where its main goal is to provide simple and robust facilities for loadbalancing and high-availability to Linux). - Create 2 VIP (Virtual IP) one for controller 1 and the other for controller 2. So the situation would be: o Controller1 with his IP (IP1) would have also a VIP (VIP1) with 2 iscsi disks mounted but just one in R/W mode used (brick1). o Controller2 with his IP (IP2)and a VIP (VIP2) with 2 iscsi disksmounted but just one in R/W mode used (brick2). - The glusterfs volume would be mounted on the client in fail-over, i.e. in the fstab there would be something like: VIP1:/volume /var/lib/nova/instances glusterfs defaults,log-le vel=ERROR,_netdev,backup-volfile-servers=VIP2 0 0 - Keepalived would be configured to change VIP1 to IP2 if controller1 e.g. has to be shutdown. The same for VIP2. This VIP change should hopefully not impact the operations on the client We are trying this setting but when we try to create a volume: gluster volume create testvolume transport tcp VIP1:/data/brick1/sda VIP2:/data/brick2/sdb we obtain this error: volume create: testvolume : failed: Host VIP2 is not in 'Peer in Cluster' state But if we try : [controller1]# gluster peer status Number of Peers: 1 Hostname: VIP2 Uuid: 6692a700-4c41-4e8d-8810-48f9d1ee9315 State: Accepted peer request (Connected) [controller2]# gluster peer status Number of Peers: 1 Hostname: IP1 Uuid: 074e9eea-6bf5-4ac8-8ac9-d1159bb4d452 State: Accepted peer request (Disconnected) If we try to: [controller2]# gluster peer probe VIP1 we obtain this error: peer probe: failed: Probe returned with unknown errno 107 Any idea how I can not create a volume with two virtual IP? Thinking it could be a DNS problem I try also to put in /etc/hosts this lines: VIP1 controller1.mydomain controller1 VIP2 controller2.mydomain controller2 In each controller. In the log file of controller2 I just found: [2015-01-12 11:42:47.549545] E [glusterd-handshake.c:1644:__ glusterd_mgmt_hndsk_version_cbk] 0-management: failed to get the 'versions' from peer (IP1:24007) In the log file of cotnroller1 I just found: [2015-01-12 11:44:44.229600] E [glusterd-handshake.c:914:gd_validate_mgmt_hndsk_req] 0-management: Rejecting management handshake request from unknown peer IP2:1018 [2015-01-12 11:44:47.234863] E [glusterd-handshake.c:914:gd_validate_mgmt_hndsk_req] 0-management: Rejecting management handshake request from unknown peer IP2:1017 [2015-01-12 11:44:50.240324] E [glusterd-handshake.c:914:gd_validate_mgmt_hndsk_req] 0-management: Rejecting management handshake request from unknown peer IP2:1001 If I try a telnet: [controller2]# telnet VIP1 24007 and [controller1]# telnet VIP2 24007 they work fine. Any idea if it is possible create a volume using VIPs and not IPs? Cheers Sergio ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Geo-Replication Issue
Thank you for the advice. After re-compiling gluster with the xml option, I was able to get geo-replication started! Is this output normal? This is a 2x2 distributed/replicated volume: ]# gluster volume geo-rep shares gfs-a-bkp::bkpshares status MASTER NODE MASTER VOLMASTER BRICK SLAVE STATUS CHECKPOINT STATUSCRAWL STATUS gfs-a-2 shares /mnt/a-2-shares-brick-1/brickgfs-a-bkp::bkpsharesActive N/A Hybrid Crawl gfs-a-2 shares /mnt/a-2-shares-brick-2/brickgfs-a-bkp::bkpsharesActive N/A Hybrid Crawl gfs-a-2 shares /mnt/a-2-shares-brick-3/brickgfs-a-bkp::bkpsharesActive N/A Hybrid Crawl gfs-a-2 shares /mnt/a-2-shares-brick-4/brickgfs-a-bkp::bkpsharesActive N/A Hybrid Crawl gfs-a-3 shares /mnt/a-3-shares-brick-1/brickgfs-a-bkp::bkpsharesPassiveN/A N/A gfs-a-3 shares /mnt/a-3-shares-brick-2/brickgfs-a-bkp::bkpsharesPassiveN/A N/A gfs-a-3 shares /mnt/a-3-shares-brick-3/brickgfs-a-bkp::bkpsharesPassiveN/A N/A gfs-a-3 shares /mnt/a-3-shares-brick-4/brickgfs-a-bkp::bkpsharesPassiveN/A N/A gfs-a-4 shares /mnt/a-4-shares-brick-1/brickgfs-a-bkp::bkpsharesPassiveN/A N/A gfs-a-4 shares /mnt/a-4-shares-brick-2/brickgfs-a-bkp::bkpsharesPassiveN/A N/A gfs-a-4 shares /mnt/a-4-shares-brick-3/brickgfs-a-bkp::bkpsharesPassiveN/A N/A gfs-a-4 shares /mnt/a-4-shares-brick-4/brickgfs-a-bkp::bkpsharesPassiveN/A N/A gfs-a-1 shares /mnt/a-1-shares-brick-1/brickgfs-a-bkp::bkpsharesActive N/A Hybrid Crawl gfs-a-1 shares /mnt/a-1-shares-brick-2/brickgfs-a-bkp::bkpsharesActive N/A Hybrid Crawl gfs-a-1 shares /mnt/a-1-shares-brick-3/brickgfs-a-bkp::bkpsharesActive N/A Hybrid Crawl gfs-a-1 shares /mnt/a-1-shares-brick-4/brickgfs-a-bkp::bkpsharesActive N/A Hybrid Crawl What I mean to say is, is it normal for two of the nodes to be in active mode and two of the nodes to be in passive mode? I'm thinking the answer is yes due to the distributed/replicated nature, but would like some confirmation of that. Cheers, Dave On Thu, Dec 11, 2014 at 12:19 PM, Aravinda avish...@redhat.com wrote: Geo-replication depends on the xml output of Gluster Cli commands. For example, before connecting to slave nodes it gets the nodes list from both master and slave using gluster volume info and status commands with --xml. The Python tracebacks you are seeing in logs are due to inability to parse the output of gluster commands when xml is not supported. -- regards Aravinda http://aravindavk.in On 12/11/2014 07:56 PM, David Gibbons wrote: Thanks for the feedback, answers inline below: Have you followed all the upgrade steps w.r.t geo-rep mentioned in the following link? I didn't upgrade geo-rep, I disconnected the old replicated server and started from scratch. So everything with regard to geo-rep is fresh/brand-new. 2. Does the output of command 'gluster vol info vol-name --xml' proper ? Please paste the output. I do not have gluster compiled with xml. Perhaps that is the problem. Here is the output of the command you referenced: XML output not supported. Ignoring '--xml' option This is my config summary: GlusterFS configure summary === FUSE client : yes Infiniband verbs : no epoll IO multiplex : yes argp-standalone : no fusermount : yes readline : no georeplication : yes Linux-AIO: no Enable Debug : no systemtap: no Block Device xlator : no glupy: no Use syslog : yes XML output : no QEMU Block formats : no Encryption xlator: no Am I missing something that is require for geo-replication? I've found the documentation for those of us who are building the binaries to be a bit lacking with regard to dependencies within the project. Cheers, Dave ___ Gluster-users mailing listGluster-users
Re: [Gluster-users] Geo-Replication Issue
Thanks for the feedback, answers inline below: Have you followed all the upgrade steps w.r.t geo-rep mentioned in the following link? I didn't upgrade geo-rep, I disconnected the old replicated server and started from scratch. So everything with regard to geo-rep is fresh/brand-new. 2. Does the output of command 'gluster vol info vol-name --xml' proper ? Please paste the output. I do not have gluster compiled with xml. Perhaps that is the problem. Here is the output of the command you referenced: XML output not supported. Ignoring '--xml' option This is my config summary: GlusterFS configure summary === FUSE client : yes Infiniband verbs : no epoll IO multiplex : yes argp-standalone : no fusermount : yes readline : no georeplication : yes Linux-AIO: no Enable Debug : no systemtap: no Block Device xlator : no glupy: no Use syslog : yes XML output : no QEMU Block formats : no Encryption xlator: no Am I missing something that is require for geo-replication? I've found the documentation for those of us who are building the binaries to be a bit lacking with regard to dependencies within the project. Cheers, Dave ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Geo-Replication Issue
Hi Kotresh, Thanks for the tip. Unfortunately that does not seem to have any effect. The path to the gluster binaries was already in $PATH. I did try adding the path to the gsyncd binary, but same result. Contents of $PATH are: /usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/root/bin:/usr/local/libexec/glusterfs/ It seems like perhaps one of the remote gsyncd processes cannot find the gluster binary, because I see the following in the geo-replication/shares/ssh...log. Can you point me toward how I can find out what is throwing this log entry? [2014-12-10 07:20:53.886676] E [syncdutils(monitor):218:log_raise_exception] top: execution of gluster failed with ENOENT (No such file or directory) [2014-12-10 07:20:53.886883] I [syncdutils(monitor):192:finalize] top: exiting. I think that whatever process is trying to use the gluster command has the incorrect path to access it. Do you know how I could modify *that* path? I've manually tested the ssh_command and ssh_command_tar variables in the relevant gsyncd.conf; both connect to the slave server successfully and appear to execute the command they're supposed to. gluster_command_dir in gsyncd.conf is also the correct directory (/usr/local/sbin). In summary: I think we're on to something with setting the path, but I think I need to set it somewhere other than my shell. Thanks, Dave On Tue, Dec 9, 2014 at 11:52 PM, Kotresh Hiremath Ravishankar khire...@redhat.com wrote: If that is the case, as a workaround, try adding 'gluster' path to PATH environment variable or creating symlinks to gluster, glusterd binaries. 1. export PATH=$PATH:path where gluster binaries are installed Above should work, let me know if doesn't. Thanks and Regards, Kotresh H R - Original Message - From: David Gibbons david.c.gibb...@gmail.com To: Kotresh Hiremath Ravishankar khire...@redhat.com Cc: gluster-users Gluster-users@gluster.org, vno...@stonefly.com Sent: Tuesday, December 9, 2014 6:16:03 PM Subject: Re: [Gluster-users] Geo-Replication Issue Hi Kotresh, Yes, I believe that I am. Can you tell me which symlinks are missing/cause geo-replication to fail to start? I can create them manually. Thank you, Dave On Tue, Dec 9, 2014 at 3:54 AM, Kotresh Hiremath Ravishankar khire...@redhat.com wrote: Hi Dave, Are you hitting the below bug and so not able to sync symlinks ? https://bugzilla.redhat.com/show_bug.cgi?id=1105283 Does geo-rep status say Not Started ? Thanks and Regards, Kotresh H R - Original Message - From: David Gibbons david.c.gibb...@gmail.com To: gluster-users Gluster-users@gluster.org Cc: vno...@stonefly.com Sent: Monday, December 8, 2014 7:03:31 PM Subject: Re: [Gluster-users] Geo-Replication Issue Apologies for sending so many messages about this! I think I may be running into this bug: https://bugzilla.redhat.com/show_bug.cgi?id=1105283 Would someone be so kind as to let me know which symlinks are missing when this bug manifests, so that I can create them? Thank you, Dave On Sun, Dec 7, 2014 at 11:01 AM, David Gibbons david.c.gibb...@gmail.com wrote: Ok, I was able to get geo-replication configured by changing /usr/local/libexec/glusterfs/gverify.sh to use ssh to access the local machine, instead of accessing bash -c directly. I then found that the hook script was missing for geo-replication, so I copied that over manually. I now have what appears to be a configured geo-rep setup: # gluster volume geo-replication shares gfs-a-bkp::bkpshares status MASTER NODE MASTER VOL MASTER BRICK SLAVE STATUS CHECKPOINT STATUS CRAWL STATUS gfs-a-3 shares /mnt/a-3-shares-brick-1/brick gfs-a-bkp::bkpshares Not Started N/A N/A gfs-a-3 shares /mnt/a-3-shares-brick-2/brick gfs-a-bkp::bkpshares Not Started N/A N/A gfs-a-3 shares /mnt/a-3-shares-brick-3/brick gfs-a-bkp::bkpshares Not Started N/A N/A gfs-a-3 shares /mnt/a-3-shares-brick-4/brick gfs-a-bkp::bkpshares Not Started N/A N/A gfs-a-2 shares /mnt/a-2-shares-brick-1/brick gfs-a-bkp::bkpshares Not Started N/A N/A gfs-a-2 shares /mnt/a-2-shares-brick-2/brick gfs-a-bkp::bkpshares Not Started N/A N/A gfs-a-2 shares /mnt/a-2-shares-brick-3/brick gfs-a-bkp::bkpshares Not Started N/A N/A gfs-a-2 shares /mnt/a-2-shares-brick-4/brick gfs-a-bkp::bkpshares Not Started N/A N/A gfs-a-4 shares /mnt/a-4-shares-brick-1/brick gfs-a-bkp::bkpshares Not Started N/A N/A gfs-a-4 shares /mnt/a-4-shares-brick-2/brick gfs-a-bkp::bkpshares Not Started N/A N/A gfs-a-4 shares /mnt/a-4-shares-brick-3/brick gfs-a-bkp::bkpshares Not Started N/A N/A gfs-a-4 shares /mnt/a-4-shares-brick-4/brick gfs-a-bkp::bkpshares Not Started N/A N/A gfs-a-1 shares /mnt
Re: [Gluster-users] Geo-Replication Issue
Symlinking gluster to /usr/bin/ seems to have resolved the path issue. Thanks for the tip there. Now there's a different error throw in the geo-rep/ssh...log: [2014-12-10 07:32:42.609031] E [syncdutils(monitor):240:log_raise_exception] top: FAIL: Traceback (most recent call last): File /usr/local/libexec/glusterfs/python/syncdaemon/gsyncd.py, line 150, in main main_i() File /usr/local/libexec/glusterfs/python/syncdaemon/gsyncd.py, line 530, in main_i return monitor(*rscs) File /usr/local/libexec/glusterfs/python/syncdaemon/monitor.py, line 243, in monitor return Monitor().multiplex(*distribute(*resources)) File /usr/local/libexec/glusterfs/python/syncdaemon/monitor.py, line 205, in distribute mvol = Volinfo(master.volume, master.host) File /usr/local/libexec/glusterfs/python/syncdaemon/monitor.py, line 22, in __init__ vi = XET.fromstring(vix) File /usr/lib64/python2.6/xml/etree/ElementTree.py, line 963, in XML parser.feed(text) File /usr/lib64/python2.6/xml/etree/ElementTree.py, line 1245, in feed self._parser.Parse(data, 0) ExpatError: syntax error: line 2, column 0 [2014-12-10 07:32:42.610858] I [syncdutils(monitor):192:finalize] top: exiting. I also get a bunch of these errors but have been assuming that they are being thrown because geo-replication hasn't started successfully yet. There is one for each brick: [2014-12-10 12:33:33.539737] E [glusterd-geo-rep.c:2685:glusterd_gsync_read_frm_status] 0-: Unable to read gsyncd status file [2014-12-10 12:33:33.539742] E [glusterd-geo-rep.c:2999:glusterd_read_status_file] 0-: Unable to read the statusfile for /mnt/a-3-shares-brick-4/brick brick for shares(master), gfs-a-bkp::bkpshares(slave) session Do I have a config file error somewhere that I need to track down? This volume *was* upgraded from 3.4.2 a few weeks ago. Cheers, Dave On Wed, Dec 10, 2014 at 7:29 AM, David Gibbons david.c.gibb...@gmail.com wrote: Hi Kotresh, Thanks for the tip. Unfortunately that does not seem to have any effect. The path to the gluster binaries was already in $PATH. I did try adding the path to the gsyncd binary, but same result. Contents of $PATH are: /usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/root/bin:/usr/local/libexec/glusterfs/ It seems like perhaps one of the remote gsyncd processes cannot find the gluster binary, because I see the following in the geo-replication/shares/ssh...log. Can you point me toward how I can find out what is throwing this log entry? [2014-12-10 07:20:53.886676] E [syncdutils(monitor):218:log_raise_exception] top: execution of gluster failed with ENOENT (No such file or directory) [2014-12-10 07:20:53.886883] I [syncdutils(monitor):192:finalize] top: exiting. I think that whatever process is trying to use the gluster command has the incorrect path to access it. Do you know how I could modify *that* path? I've manually tested the ssh_command and ssh_command_tar variables in the relevant gsyncd.conf; both connect to the slave server successfully and appear to execute the command they're supposed to. gluster_command_dir in gsyncd.conf is also the correct directory (/usr/local/sbin). In summary: I think we're on to something with setting the path, but I think I need to set it somewhere other than my shell. Thanks, Dave On Tue, Dec 9, 2014 at 11:52 PM, Kotresh Hiremath Ravishankar khire...@redhat.com wrote: If that is the case, as a workaround, try adding 'gluster' path to PATH environment variable or creating symlinks to gluster, glusterd binaries. 1. export PATH=$PATH:path where gluster binaries are installed Above should work, let me know if doesn't. Thanks and Regards, Kotresh H R - Original Message - From: David Gibbons david.c.gibb...@gmail.com To: Kotresh Hiremath Ravishankar khire...@redhat.com Cc: gluster-users Gluster-users@gluster.org, vno...@stonefly.com Sent: Tuesday, December 9, 2014 6:16:03 PM Subject: Re: [Gluster-users] Geo-Replication Issue Hi Kotresh, Yes, I believe that I am. Can you tell me which symlinks are missing/cause geo-replication to fail to start? I can create them manually. Thank you, Dave On Tue, Dec 9, 2014 at 3:54 AM, Kotresh Hiremath Ravishankar khire...@redhat.com wrote: Hi Dave, Are you hitting the below bug and so not able to sync symlinks ? https://bugzilla.redhat.com/show_bug.cgi?id=1105283 Does geo-rep status say Not Started ? Thanks and Regards, Kotresh H R - Original Message - From: David Gibbons david.c.gibb...@gmail.com To: gluster-users Gluster-users@gluster.org Cc: vno...@stonefly.com Sent: Monday, December 8, 2014 7:03:31 PM Subject: Re: [Gluster-users] Geo-Replication Issue Apologies for sending so many messages about this! I think I may be running into this bug: https://bugzilla.redhat.com/show_bug.cgi?id=1105283 Would someone be so kind as to let me
Re: [Gluster-users] Geo-Replication Issue
Hi Kotresh, Yes, I believe that I am. Can you tell me which symlinks are missing/cause geo-replication to fail to start? I can create them manually. Thank you, Dave On Tue, Dec 9, 2014 at 3:54 AM, Kotresh Hiremath Ravishankar khire...@redhat.com wrote: Hi Dave, Are you hitting the below bug and so not able to sync symlinks ? https://bugzilla.redhat.com/show_bug.cgi?id=1105283 Does geo-rep status say Not Started ? Thanks and Regards, Kotresh H R - Original Message - From: David Gibbons david.c.gibb...@gmail.com To: gluster-users Gluster-users@gluster.org Cc: vno...@stonefly.com Sent: Monday, December 8, 2014 7:03:31 PM Subject: Re: [Gluster-users] Geo-Replication Issue Apologies for sending so many messages about this! I think I may be running into this bug: https://bugzilla.redhat.com/show_bug.cgi?id=1105283 Would someone be so kind as to let me know which symlinks are missing when this bug manifests, so that I can create them? Thank you, Dave On Sun, Dec 7, 2014 at 11:01 AM, David Gibbons david.c.gibb...@gmail.com wrote: Ok, I was able to get geo-replication configured by changing /usr/local/libexec/glusterfs/gverify.sh to use ssh to access the local machine, instead of accessing bash -c directly. I then found that the hook script was missing for geo-replication, so I copied that over manually. I now have what appears to be a configured geo-rep setup: # gluster volume geo-replication shares gfs-a-bkp::bkpshares status MASTER NODE MASTER VOL MASTER BRICK SLAVE STATUS CHECKPOINT STATUS CRAWL STATUS gfs-a-3 shares /mnt/a-3-shares-brick-1/brick gfs-a-bkp::bkpshares Not Started N/A N/A gfs-a-3 shares /mnt/a-3-shares-brick-2/brick gfs-a-bkp::bkpshares Not Started N/A N/A gfs-a-3 shares /mnt/a-3-shares-brick-3/brick gfs-a-bkp::bkpshares Not Started N/A N/A gfs-a-3 shares /mnt/a-3-shares-brick-4/brick gfs-a-bkp::bkpshares Not Started N/A N/A gfs-a-2 shares /mnt/a-2-shares-brick-1/brick gfs-a-bkp::bkpshares Not Started N/A N/A gfs-a-2 shares /mnt/a-2-shares-brick-2/brick gfs-a-bkp::bkpshares Not Started N/A N/A gfs-a-2 shares /mnt/a-2-shares-brick-3/brick gfs-a-bkp::bkpshares Not Started N/A N/A gfs-a-2 shares /mnt/a-2-shares-brick-4/brick gfs-a-bkp::bkpshares Not Started N/A N/A gfs-a-4 shares /mnt/a-4-shares-brick-1/brick gfs-a-bkp::bkpshares Not Started N/A N/A gfs-a-4 shares /mnt/a-4-shares-brick-2/brick gfs-a-bkp::bkpshares Not Started N/A N/A gfs-a-4 shares /mnt/a-4-shares-brick-3/brick gfs-a-bkp::bkpshares Not Started N/A N/A gfs-a-4 shares /mnt/a-4-shares-brick-4/brick gfs-a-bkp::bkpshares Not Started N/A N/A gfs-a-1 shares /mnt/a-1-shares-brick-1/brick gfs-a-bkp::bkpshares Not Started N/A N/A gfs-a-1 shares /mnt/a-1-shares-brick-2/brick gfs-a-bkp::bkpshares Not Started N/A N/A gfs-a-1 shares /mnt/a-1-shares-brick-3/brick gfs-a-bkp::bkpshares Not Started N/A N/A gfs-a-1 shares /mnt/a-1-shares-brick-4/brick gfs-a-bkp::bkpshares Not Started N/A N/A So that's a step in the right direction (and I can upload a patch for gverify to a bugzilla). However, gverify *should* have worked with bash-c, and I was not able to figure out why it didn't work, other than it didn't seem able to find some programs. I'm thinking that maybe the PATH variable is wrong for Gluster, and that's why gverify didn't work out of the box. When I attempt to start geo-rep now, I get the following in the geo-rep log: [2014-12-07 10:52:40.893594] E [syncdutils(monitor):218:log_raise_exception] top: execution of gluster failed with ENOENT (No such file or directory) [2014-12-07 10:52:40.893886] I [syncdutils(monitor):192:finalize] top: exiting. Which seems to agree that maybe gluster isn't running with the same path variable that my console session is running with. Is this possible? I know I'm grasping :). Any nudge in the right direction would be very much appreciated! Cheers, Dave On Sat, Dec 6, 2014 at 10:06 AM, David Gibbons david.c.gibb...@gmail.com wrote: Good Morning, I am having some trouble getting geo-replication started on a 3.5.3 volume. I have verified that password-less SSH is functional in both directions from the backup gluster server, and all nodes in the production gluster. I have verified that all nodes in production and backup cluster are running the same version of gluster, and that name resolution works in both directions. When I attempt to start geo-replication with this command: gluster volume geo-replication shares gfs-a-bkp::bkpshares create push-pem I end up with the following in the logs: [2014-12-06 15:02:50.284426] E [glusterd-geo-rep.c:1889:glusterd_verify_slave] 0-: Not a valid slave [2014-12-06 15:02:50.284495] E [glusterd-geo-rep.c:2106:glusterd_op_stage_gsync_create] 0-: gfs-a-bkp
Re: [Gluster-users] Geo-Replication Issue
Apologies for sending so many messages about this! I think I may be running into this bug: https://bugzilla.redhat.com/show_bug.cgi?id=1105283 Would someone be so kind as to let me know which symlinks are missing when this bug manifests, so that I can create them? Thank you, Dave On Sun, Dec 7, 2014 at 11:01 AM, David Gibbons david.c.gibb...@gmail.com wrote: Ok, I was able to get geo-replication configured by changing /usr/local/libexec/glusterfs/gverify.sh to use ssh to access the local machine, instead of accessing bash -c directly. I then found that the hook script was missing for geo-replication, so I copied that over manually. I now have what appears to be a configured geo-rep setup: # gluster volume geo-replication shares gfs-a-bkp::bkpshares status MASTER NODE MASTER VOLMASTER BRICK SLAVE STATUS CHECKPOINT STATUSCRAWL STATUS gfs-a-3 shares /mnt/a-3-shares-brick-1/brickgfs-a-bkp::bkpsharesNot Started N/A N/A gfs-a-3 shares /mnt/a-3-shares-brick-2/brickgfs-a-bkp::bkpsharesNot Started N/A N/A gfs-a-3 shares /mnt/a-3-shares-brick-3/brickgfs-a-bkp::bkpsharesNot Started N/A N/A gfs-a-3 shares /mnt/a-3-shares-brick-4/brickgfs-a-bkp::bkpsharesNot Started N/A N/A gfs-a-2 shares /mnt/a-2-shares-brick-1/brickgfs-a-bkp::bkpsharesNot Started N/A N/A gfs-a-2 shares /mnt/a-2-shares-brick-2/brickgfs-a-bkp::bkpsharesNot Started N/A N/A gfs-a-2 shares /mnt/a-2-shares-brick-3/brickgfs-a-bkp::bkpsharesNot Started N/A N/A gfs-a-2 shares /mnt/a-2-shares-brick-4/brickgfs-a-bkp::bkpsharesNot Started N/A N/A gfs-a-4 shares /mnt/a-4-shares-brick-1/brickgfs-a-bkp::bkpsharesNot Started N/A N/A gfs-a-4 shares /mnt/a-4-shares-brick-2/brickgfs-a-bkp::bkpsharesNot Started N/A N/A gfs-a-4 shares /mnt/a-4-shares-brick-3/brickgfs-a-bkp::bkpsharesNot Started N/A N/A gfs-a-4 shares /mnt/a-4-shares-brick-4/brickgfs-a-bkp::bkpsharesNot Started N/A N/A gfs-a-1 shares /mnt/a-1-shares-brick-1/brickgfs-a-bkp::bkpsharesNot Started N/A N/A gfs-a-1 shares /mnt/a-1-shares-brick-2/brickgfs-a-bkp::bkpsharesNot Started N/A N/A gfs-a-1 shares /mnt/a-1-shares-brick-3/brickgfs-a-bkp::bkpsharesNot Started N/A N/A gfs-a-1 shares /mnt/a-1-shares-brick-4/brickgfs-a-bkp::bkpsharesNot Started N/A N/A So that's a step in the right direction (and I can upload a patch for gverify to a bugzilla). However, gverify *should* have worked with bash-c, and I was not able to figure out why it didn't work, other than it didn't seem able to find some programs. I'm thinking that maybe the PATH variable is wrong for Gluster, and that's why gverify didn't work out of the box. When I attempt to start geo-rep now, I get the following in the geo-rep log: [2014-12-07 10:52:40.893594] E [syncdutils(monitor):218:log_raise_exception] top: execution of gluster failed with ENOENT (No such file or directory) [2014-12-07 10:52:40.893886] I [syncdutils(monitor):192:finalize] top: exiting. Which seems to agree that maybe gluster isn't running with the same path variable that my console session is running with. Is this possible? I know I'm grasping :). Any nudge in the right direction would be very much appreciated! Cheers, Dave On Sat, Dec 6, 2014 at 10:06 AM, David Gibbons david.c.gibb...@gmail.com wrote: Good Morning, I am having some trouble getting geo-replication started on a 3.5.3 volume. I have verified that password-less SSH is functional in both directions from the backup gluster server, and all nodes in the production gluster. I have verified that all nodes in production and backup cluster are running the same version of gluster, and that name resolution works in both directions. When I attempt to start geo-replication with this command: gluster volume geo-replication shares gfs-a-bkp::bkpshares create push-pem I end up with the following in the logs: [2014-12-06 15:02:50.284426] E [glusterd-geo-rep.c:1889:glusterd_verify_slave
[Gluster-users] Missing Hooks
Hi All, I am running into an issue where it appears that some hooks are missing from /var/lib/glusterd/hooks I am running version 3.5.3 and recently did an upgrade to that version from 3.4.2. I built from source with make make install. Is there another make target I need to use to get the hooks to install? Do I need to run make extras or something to get them installed? I see them in the source folder, so I could certainly just copy them over but I want to do this the right way if possible Cheers, Dave ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Missing Hooks
Thank you, Niels. Bugzilla ID is 1171477 Dave On Sun, Dec 7, 2014 at 10:12 AM, Niels de Vos nde...@redhat.com wrote: On Sun, Dec 07, 2014 at 09:55:11AM -0500, David Gibbons wrote: Hi All, I am running into an issue where it appears that some hooks are missing from /var/lib/glusterd/hooks I am running version 3.5.3 and recently did an upgrade to that version from 3.4.2. I built from source with make make install. Is there another make target I need to use to get the hooks to install? Do I need to run make extras or something to get them installed? I see them in the source folder, so I could certainly just copy them over but I want to do this the right way if possible These are copied over by the .spec that is used to generate the RPMs. It looks as if the hook scripts are not installed by 'make install'. If you can file a bug for this, we won't forget about it and can send a fix. Thanks, Niels ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Geo-Replication Issue
Ok, I was able to get geo-replication configured by changing /usr/local/libexec/glusterfs/gverify.sh to use ssh to access the local machine, instead of accessing bash -c directly. I then found that the hook script was missing for geo-replication, so I copied that over manually. I now have what appears to be a configured geo-rep setup: # gluster volume geo-replication shares gfs-a-bkp::bkpshares status MASTER NODE MASTER VOLMASTER BRICK SLAVE STATUS CHECKPOINT STATUSCRAWL STATUS gfs-a-3 shares /mnt/a-3-shares-brick-1/brickgfs-a-bkp::bkpsharesNot Started N/A N/A gfs-a-3 shares /mnt/a-3-shares-brick-2/brickgfs-a-bkp::bkpsharesNot Started N/A N/A gfs-a-3 shares /mnt/a-3-shares-brick-3/brickgfs-a-bkp::bkpsharesNot Started N/A N/A gfs-a-3 shares /mnt/a-3-shares-brick-4/brickgfs-a-bkp::bkpsharesNot Started N/A N/A gfs-a-2 shares /mnt/a-2-shares-brick-1/brickgfs-a-bkp::bkpsharesNot Started N/A N/A gfs-a-2 shares /mnt/a-2-shares-brick-2/brickgfs-a-bkp::bkpsharesNot Started N/A N/A gfs-a-2 shares /mnt/a-2-shares-brick-3/brickgfs-a-bkp::bkpsharesNot Started N/A N/A gfs-a-2 shares /mnt/a-2-shares-brick-4/brickgfs-a-bkp::bkpsharesNot Started N/A N/A gfs-a-4 shares /mnt/a-4-shares-brick-1/brickgfs-a-bkp::bkpsharesNot Started N/A N/A gfs-a-4 shares /mnt/a-4-shares-brick-2/brickgfs-a-bkp::bkpsharesNot Started N/A N/A gfs-a-4 shares /mnt/a-4-shares-brick-3/brickgfs-a-bkp::bkpsharesNot Started N/A N/A gfs-a-4 shares /mnt/a-4-shares-brick-4/brickgfs-a-bkp::bkpsharesNot Started N/A N/A gfs-a-1 shares /mnt/a-1-shares-brick-1/brickgfs-a-bkp::bkpsharesNot Started N/A N/A gfs-a-1 shares /mnt/a-1-shares-brick-2/brickgfs-a-bkp::bkpsharesNot Started N/A N/A gfs-a-1 shares /mnt/a-1-shares-brick-3/brickgfs-a-bkp::bkpsharesNot Started N/A N/A gfs-a-1 shares /mnt/a-1-shares-brick-4/brickgfs-a-bkp::bkpsharesNot Started N/A N/A So that's a step in the right direction (and I can upload a patch for gverify to a bugzilla). However, gverify *should* have worked with bash-c, and I was not able to figure out why it didn't work, other than it didn't seem able to find some programs. I'm thinking that maybe the PATH variable is wrong for Gluster, and that's why gverify didn't work out of the box. When I attempt to start geo-rep now, I get the following in the geo-rep log: [2014-12-07 10:52:40.893594] E [syncdutils(monitor):218:log_raise_exception] top: execution of gluster failed with ENOENT (No such file or directory) [2014-12-07 10:52:40.893886] I [syncdutils(monitor):192:finalize] top: exiting. Which seems to agree that maybe gluster isn't running with the same path variable that my console session is running with. Is this possible? I know I'm grasping :). Any nudge in the right direction would be very much appreciated! Cheers, Dave On Sat, Dec 6, 2014 at 10:06 AM, David Gibbons david.c.gibb...@gmail.com wrote: Good Morning, I am having some trouble getting geo-replication started on a 3.5.3 volume. I have verified that password-less SSH is functional in both directions from the backup gluster server, and all nodes in the production gluster. I have verified that all nodes in production and backup cluster are running the same version of gluster, and that name resolution works in both directions. When I attempt to start geo-replication with this command: gluster volume geo-replication shares gfs-a-bkp::bkpshares create push-pem I end up with the following in the logs: [2014-12-06 15:02:50.284426] E [glusterd-geo-rep.c:1889:glusterd_verify_slave] 0-: Not a valid slave [2014-12-06 15:02:50.284495] E [glusterd-geo-rep.c:2106:glusterd_op_stage_gsync_create] 0-: gfs-a-bkp::bkpshares is not a valid slave volume. Error: Unable to fetch master volume details. Please check the master cluster and master volume. [2014-12-06 15:02:50.284509] E [glusterd-syncop.c:912:gd_stage_op_phase] 0-management: Staging of operation 'Volume Geo
[Gluster-users] Geo-Replication Issue
Good Morning, I am having some trouble getting geo-replication started on a 3.5.3 volume. I have verified that password-less SSH is functional in both directions from the backup gluster server, and all nodes in the production gluster. I have verified that all nodes in production and backup cluster are running the same version of gluster, and that name resolution works in both directions. When I attempt to start geo-replication with this command: gluster volume geo-replication shares gfs-a-bkp::bkpshares create push-pem I end up with the following in the logs: [2014-12-06 15:02:50.284426] E [glusterd-geo-rep.c:1889:glusterd_verify_slave] 0-: Not a valid slave [2014-12-06 15:02:50.284495] E [glusterd-geo-rep.c:2106:glusterd_op_stage_gsync_create] 0-: gfs-a-bkp::bkpshares is not a valid slave volume. Error: Unable to fetch master volume details. Please check the master cluster and master volume. [2014-12-06 15:02:50.284509] E [glusterd-syncop.c:912:gd_stage_op_phase] 0-management: Staging of operation 'Volume Geo-replication Create' failed on localhost : Unable to fetch master volume details. Please check the master cluster and master volume. Would someone be so kind as to point me in the right direction? Cheers, Dave ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Upgraded from 3.4.1 to 3.5.2, quota no longer working
Thank you for the assistance. Yesterday we started to have bricks on one server randomly crash. When the one server crashed, it would lock up the bricks on its replica as well. I ended up upgrading to 3.5.3, and noticed in the process that the libgfrpc and libgfxdr libraries were out of date on the server that was having crashed bricks. Upgrading to 3.5.3 and replacing the old versions of the libraries on the cranky server seems to have made everything happy again. Thanks again! Dave On Tue, Dec 2, 2014 at 2:28 AM, Krutika Dhananjay kdhan...@redhat.com wrote: Hi, Are you sure the post-upgrade script ran to completion? Here is one way to confirm whether that is the case: check if the quota configured directories have an xattr called trusted.glusterfs.quota.limit-set set on them in the respective bricks. For example, here's what mine looks like: [root@haddock 1]# pwd /brick/1 [root@haddock 1]# getfattr -d -m . -e hex 1 # file: 1 security.selinux=0x73797374656d5f753a6f626a6563745f723a64656661756c745f743a733000 trusted.gfid=0x57d0a561ca574d1cb0428f38d1c06e85 trusted.glusterfs.dht=0x00017fff trusted.glusterfs.quota.----0001.contri=0x0a00 trusted.glusterfs.quota.dirty=0x3000 trusted.glusterfs.quota.limit-set=0x0640 trusted.glusterfs.quota.size=0x0a00 where /brick/1 is the brick directory and under it 1 is the name of one of the quota-configured directories. I believe your quota configurations are backed up at /var/tmp/glusterfs/quota-config-backup/vol_volname which you can use to get the quota-configured directory names. As for operating version, I think it is sufficient for it to be at 3 for the 3.5.x quota to work. -Krutika -- *From: *David Gibbons david.c.gibb...@gmail.com *To: *Krutika Dhananjay kdhan...@redhat.com *Cc: *gluster-users gluster-users@gluster.org *Sent: *Monday, December 1, 2014 6:35:55 PM *Subject: *Re: [Gluster-users] Upgraded from 3.4.1 to 3.5.2, quota no longer working Certainly, thank you for your response: Quotad is running on all nodes: [root@gfs-a-1 ~]# ps aux | grep quotad root 3004 0.0 0.4 241368 68552 ?Ssl Nov30 0:07 /usr/local/sbin/glusterfs -s localhost --volfile-id gluster/quotad -p /var/lib/glusterd/quotad/run/quotad.pid -l /usr/local/var/log/glusterfs/quotad.log -S /var/run/9d02605105ef0e74d913a4671c1143a1.socket --xlator-option *replicate*.data-self-heal=off --xlator-option *replicate*.metadata-self-heal=off --xlator-option *replicate*.entry-self-heal=off And the relevant output from gluster volume status shares per your request: [root@gfs-a-1 ~]# gluster volume status shares | grep Quota Quota Daemon on localhost N/A Y 3004 Quota Daemon on gfs-a-3 N/A Y 32307 Quota Daemon on gfs-a-4 N/A Y 10818 Quota Daemon on gfs-a-2 N/A Y 12292 No log entries are created in /var/log/glusterfs/quotad.log when I run a quota list; all of the log entries are from yesterday. They do indicate a version mis-match, although I can't seem to locate where that version is specified: [2014-11-30 13:21:55.173081] I [client-handshake.c:1474:client_setvolume_cbk] 0-shares-client-14: Server and Client lk-version numbers are not same, reopening the fds [2014-11-30 13:21:55.173170] I [client-handshake.c:450:client_set_lk_version_cbk] 0-shares-client-14: Server lk version = 1 [2014-11-30 13:21:55.178739] I [rpc-clnt.c:1729:rpc_clnt_reconfig] 0-shares-client-9: changing port to 49154 (from 0) [2014-11-30 13:21:55.181170] I [client-handshake.c:1677:select_server_supported_programs] 0-shares-client-9: Using Program GlusterFS 3.3, Num (1298437), Version (330) [2014-11-30 13:21:55.181386] I [client-handshake.c:1462:client_setvolume_cbk] 0-shares-client-9: Connected to 172.16.10.13:49154, attached to remote volume '/mnt/a-3-shares-brick-3/brick'. [2014-11-30 13:21:55.181401] I [client-handshake.c:1474:client_setvolume_cbk] 0-shares-client-9: Server and Client lk-version numbers are not same, reopening the fds [2014-11-30 13:21:55.181535] I [client-handshake.c:450:client_set_lk_version_cbk] 0-shares-client-9: Server lk version = 1 I see the operational mode for the volume as 3. I saw a non-related thread that indicated this number should be more digits on a cluster running 3.5.2. The other thread also indicated that quota may not work if the volume version number was not compatible with the quota version running on the cluster. I can't seem to find the link right now. It's almost as if the volume version did not get upgraded when the server version was upgraded. Is that possible? Cheers, Dave On Sun, Nov 30, 2014 at 11:46 PM, Krutika Dhananjay kdhan...@redhat.com wrote: Hi, Could you confirm
[Gluster-users] Upgraded from 3.4.1 to 3.5.2, quota no longer working
Hi All, I performed a long-awaited upgrade from 3.4.1 to 3.5.2 today following the instructions for an offline upgrade outlined here: http://www.gluster.org/community/documentation/index.php/Upgrade_to_3.5 I ran the pre- and post- upgrade scripts as instructed, intending to move the quotas over to the new version. The upgrade seemed to go well, the volume is online and it appears to be functioning properly. When I attempt to check quotas, the list is empty: [root@gfs-a-1 glusterfs]# gluster volume quota shares list Path Hard-limit Soft-limit Used Available [root@gfs-a-1 glusterfs]# And upon execution of that command, the cli.log file fills up with entries like this. I am assuming it's one cli log entry per quota entry: [2014-11-30 14:00:02.154143] W [cli-rpc-ops.c:2469:print_quota_list_from_quotad] 0-cli: path key is not present in dict [2014-11-30 14:00:02.160507] W [cli-rpc-ops.c:2469:print_quota_list_from_quotad] 0-cli: path key is not present in dict [2014-11-30 14:00:02.167947] W [cli-rpc-ops.c:2469:print_quota_list_from_quotad] 0-cli: path key is not present in dic So, it appears that somehow the quota database has become offline or corrupt. Any thoughts on what I can do to resolve this? I have checked all of the binaries on all 4 machines in the cluster, and they all appear to be running the correct version: [root@gfs-a-1 glusterfs]# glusterfsd --version glusterfs 3.5.2 built on Nov 30 2014 08:16:37 Cheers, Dave ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Re-Sync Geo Replication
I am interested in hearing this answer too. James, if you have a consistency check script, maybe just changing the timestamp or some other attribute on the file (perhaps something as simple as chmod to something and then back) would trigger the integrated rsync. Dave On Sat, Oct 4, 2014 at 5:33 PM, James Payne jimqwer...@hotmail.com wrote: Hi, Just wondering if there was a method to manually force a re-sync of a geo replication slave so it is an identical mirror of the master? History of this request is that I have a test setup and the Gluster Geo-Replication seems to have missed 7 files out completely (not sure if this was a bug or an issue with my setup specifically as this is a test setup it has been setup and torn down a few times). Now though the Geo replica will not converge to be the same, ie. It’s stable, new files add fine and files will delete, but the missing files just don’t seem to be interested in synchronising! I’m guessing that as the rsync is triggered by the change log and as these files aren’t changing it won’t ever notice them again? I can manually copy the files (there are only 7 after all…) but I have only found them through a consistency checking script I wrote. I can run this through a cron to pick up any missing files, however I wondered if Gluster had something built in which did a check and sync? Also, If I did manually copy these files across how would that affect the consistency of the geo replica session? Running: GlusterFS 3.4.5 on CentOS 6.5 Regards James ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
[Gluster-users] Fwd: Samba vfs_glusterfs no such file or directory
Boy, it's not a good day for my list etiquette. Apologies, folks. -- Forwarded message -- From: David Gibbons david.c.gibb...@gmail.com Date: Fri, Jun 27, 2014 at 3:10 PM Subject: Re: [Gluster-users] Samba vfs_glusterfs no such file or directory To: Niels de Vos nde...@redhat.com Samba with vfs_glusterfs has a limit of approx. 93 groups. If 'id $USER' returns more than 93 groups, those users can run into various issues. 'Access is denied' is one of the most common errors they'll see. The upcoming 3.5.1 release has a 'server.manage-gids' volume option. With this option enabled, the number of groups will be limited to 65535. Ahh, great. I am very glad, at least, that this is a known issue and that it's being addressed. What am I missing here? Very little, I would also suspect that the number of groups that those problematic users belong to is too big. Well that's a first :). I will test this against the 3.5.1 release when that is ready. Is the 3.5.1 version of vfs_glusterfs backwards compatible with glusterfs 3.4, or do I need to upgrade the whole cluster to leverage the new vfs_glusterfs? Thanks so much, Dave ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
[Gluster-users] Samba vfs_glusterfs no such file or directory
Hi All, I am running into a strange error with samba and vfs_glusterfs. Here is some version information: [root@gfs-a-3 samba]# smbd -V Version 3.6.20 [root@gfs-a-3 tmp]# glusterfsd --version glusterfs 3.4.1 built on Oct 21 2013 09:23:23 Samba is configured in an AD environment, using winbind. Group resolution, user resolution, and cross--mapping of SIDs to IDs to usernames all works as expected. The vfs_glusterfs module is working perfectly for the vast majority of the users I have configured. A small percentage of the users, though, get an access is denied error when they attempt to access the share. They are all configured in the same way as the users that are working. We initially thought that perhaps the number of groups the user was a member of was causing the issue. This still might be the case but we're not sure how to verify that guess. When we connect with a working user, with glusterfs:loglevel = 10, here is are the last bits of log file. I'm not really sure where the interesting lines are, any guidance would be much appreciated: [2014-06-17 12:11:53.753289] D [client-handshake.c:1430:client_setvolume_cbk] 0-shares-client-5: clnt-lk-version = 1, server-lk-version = 0 [2014-06-17 12:11:53.753296] I [client-handshake.c:1456:client_setvolume_cbk] 0-shares-client-5: Connected to 172.16.10.13:49153, attached to remote volume '/mnt/a-3-shares-brick-2/brick'. [2014-06-17 12:11:53.753301] I [client-handshake.c:1468:client_setvolume_cbk] 0-shares-client-5: Server and Client lk-version numbers are not same, reopening the fds [2014-06-17 12:11:53.753306] D [client-handshake.c:1318:client_post_handshake] 0-shares-client-5: No fds to open - notifying all parents child up [2014-06-17 12:11:53.753313] D [client-handshake.c:486:client_set_lk_version] 0-shares-client-5: Sending SET_LK_VERSION [2014-06-17 12:11:53.753320] T [rpc-clnt.c:1302:rpc_clnt_record] 0-shares-client-5: Auth Info: pid: 0, uid: 0, gid: 0, owner: [2014-06-17 12:11:53.753327] T [rpc-clnt.c:1182:rpc_clnt_record_build_header] 0-rpc-clnt: Request fraglen 132, payload: 68, rpc hdr: 64 [2014-06-17 12:11:53.753344] T [rpc-clnt.c:1499:rpc_clnt_submit] 0-rpc-clnt: submitted request (XID: 0x32x Program: GlusterFS Handshake, ProgVers: 2, Proc: 4) to rpc-transport (shares-client-5) [2014-06-17 12:11:53.753353] T [rpc-clnt.c:1302:rpc_clnt_record] 0-shares-client-5: Auth Info: pid: 0, uid: 0, gid: 0, owner: [2014-06-17 12:11:53.753360] T [rpc-clnt.c:1182:rpc_clnt_record_build_header] 0-rpc-clnt: Request fraglen 64, payload: 0, rpc hdr: 64 [2014-06-17 12:11:53.753373] T [rpc-clnt.c:1499:rpc_clnt_submit] 0-rpc-clnt: submitted request (XID: 0x33x Program: GlusterFS Handshake, ProgVers: 2, Proc: 3) to rpc-transport (shares-client-5) [2014-06-17 12:11:53.753381] I [afr-common.c:3698:afr_notify] 0-shares-replicate-2: Subvolume 'shares-client-5' came back up; going online. [2014-06-17 12:11:53.753393] T [rpc-clnt.c:1302:rpc_clnt_record] 0-shares-client-5: Auth Info: pid: 0, uid: 0, gid: 0, owner: [2014-06-17 12:11:53.753399] T [rpc-clnt.c:1182:rpc_clnt_record_build_header] 0-rpc-clnt: Request fraglen 84, payload: 20, rpc hdr: 64 [2014-06-17 12:11:53.753413] T [rpc-clnt.c:1499:rpc_clnt_submit] 0-rpc-clnt: submitted request (XID: 0x34x Program: GlusterFS 3.3, ProgVers: 330, Proc: 14) to rpc-transport (shares-client-5) [2014-06-17 12:11:53.753430] T [rpc-clnt.c:669:rpc_clnt_reply_init] 0-shares-client-5: received rpc message (RPC XID: 0x32x Program: GlusterFS Handshake, ProgVers: 2, Proc: 4) from rpc-transport (shares-client-5) [2014-06-17 12:11:53.753441] I [client-handshake.c:450:client_set_lk_version_cbk] 0-shares-client-5: Server lk version = 1 [2014-06-17 12:11:53.753451] T [rpc-clnt.c:669:rpc_clnt_reply_init] 0-shares-client-5: received rpc message (RPC XID: 0x33x Program: GlusterFS Handshake, ProgVers: 2, Proc: 3) from rpc-transport (shares-client-5) [2014-06-17 12:11:53.753474] T [rpc-clnt.c:669:rpc_clnt_reply_init] 0-shares-client-5: received rpc message (RPC XID: 0x34x Program: GlusterFS 3.3, ProgVers: 330, Proc: 14) from rpc-transport (shares-client-5) [2014-06-17 12:11:53.753483] D [dht-diskusage.c:80:dht_du_info_cbk] 0-shares-dht: on subvolume 'shares-replicate-2': avail_percent is: 95.00 and avail_space is: 1050826719232 and avail_inodes is: 99.00 And here is a log snip from the non-working user: [2014-06-17 12:07:17.866693] W [socket.c:514:__socket_rwv] 0-shares-client-13: readv failed (No data available) [2014-06-17 12:07:17.866699] D [socket.c:1962:__socket_proto_state_machine] 0-shares-client-13: reading from socket failed. Error (No data available), peer (172.16.10.13:49155) [2014-06-17 12:07:17.866707] D [socket.c:2236:socket_event_handler] 0-transport: disconnecting now [2014-06-17 12:07:17.866716] T [rpc-clnt.c:519:rpc_clnt_connection_cleanup] 0-shares-client-13: cleaning up state in transport object 0x7f22300aaa60 [2014-06-17 12:07:17.866722] I
Re: [Gluster-users] Gluster quota issue
Are these sparse files? Check to see what the file allocation is vs what the actual size is: ~# ls -lsah -Dave On Mon, Apr 7, 2014 at 8:53 AM, Barry Stetler ba...@hivelocity.net wrote: I am having an issue with Gluster quotas. User is set to to 200GB and he is using about 57 GB on mounted file system, Gluster says he is using 180 GB, gluster volume quota home list /user shows he is using 180 GB Is this a bug is this looking at the replicas? Here is the volume info Volume Name: home Type: Distributed-Replicate Volume ID: 9e0ffc91-9d46-477a-b8eb-dfd3b7d65765 Status: Started Number of Bricks: 2 x 2 = 4 Transport-type: tcp Bricks: Brick1: gluster1:/export/cluster1 Brick2: gluster2:/export/cluster1 Brick3: gluster3:/export/cluster1 Brick4: gluster4:/export/cluster1 Options Reconfigured: -- Barry Stetler HIVELOCITY | Devops and Operations Leader 888-869-4678 ext. 224 | Hivelocity.net http://hivelocity.net ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users inline: sig.png___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Gluster, Samba, and VFS
We use samba VFS quite heavily in our infrastructure. Integrated with AD via winbind, load balanced SMB front-ends based on HA with auto failover. This system has been in production for about 4 months. So far it's worked very well. Dave On Tue, Feb 11, 2014 at 11:35 AM, Matt Miller m...@mattandtiff.net wrote: Yesterday was my first day on the list, so I had not yet seen that thread. Appears to be working though. Will have to setup some load tests. On Tue, Feb 11, 2014 at 12:42 AM, Daniel Müller muel...@tropenklinik.dewrote: No, not really: Look at my thread: samba vfs objects glusterfs is it now working? I am just waiting for an answer to fix this. The only way I succeeded to make it work is how you descriped (exporting fuse mount thru samba) EDV Daniel Müller Leitung EDV Tropenklinik Paul-Lechler-Krankenhaus Paul-Lechler-Str. 24 72076 Tübingen Tel.: 07071/206-463, Fax: 07071/206-499 eMail: muel...@tropenklinik.de Internet: www.tropenklinik.de Der Mensch ist die Medizin des Menschen Von: gluster-users-boun...@gluster.org [mailto:gluster-users-boun...@gluster.org] Im Auftrag von Matt Miller Gesendet: Montag, 10. Februar 2014 16:43 An: gluster-users@gluster.org Betreff: [Gluster-users] Gluster, Samba, and VFS Stumbled upon https://forge.gluster.org/samba-glusterfs/samba-glusterfs-vfs/commits/master when trying to find info on how to make Gluster and Samba play nice as a general purpose file server. I have had severe performance problems in the past with mounting the Gluster volume as a Fuse mount, then exporting the Fuse mount thru Samba. As I found out after setting up the cluster this is somewhat expected when serving out lots of small files. Was hoping VFS would provide better performance when serving out lots and lots of small files. Is anyone using VFS extensions in production? Is it ready for prime time? I could not find a single reference to it on Gluster's main website (maybe I am looking in the wrong place), so not sure of the stability or supported-ness of this. ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
[Gluster-users] Multiple Volumes (bricks), One Disk
Hi All, I am interested in some feedback on putting multiple bricks on one physical disk. Each brick being assigned to a different volume. Here is the scenario: 4 disks per server, 4 servers, 2x2 distribute/replicate I would prefer to have just one volume but need to do geo-replication on some of the data (but not all of it). My thought was to use two volumes, which would allow me to selectively geo-replicate just the data that I need to, by replicating only one volume. A couple of questions come to mind: 1) Any implications of doing two bricks for different volumes on one physical disk? 2) Will the free space across each volume still calculate correctly? IE, if one volume takes up 2/3 of the total physical disk space, will the second volume still reflect the correct amount of used space? 3) Am I being stupid/missing something obvious? Cheers, Dave ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Samba vfs_glusterfs Quota Support?
Ira, Thank you for the response. I suspect that your patch will resolve this issue as well -- however, an upgrade to SMB 3.6.20 continues to display the total volume size behavior, instead of the glusterFS folder-quota behavior as expected. I note that your patch was accepted into 3.6.next but don't see whether or not it actually made it into the 3.6.20 release. I'm probably looking in the wrong place. Any pointers? Cheers, Dave On Wed, Oct 30, 2013 at 11:53 AM, Ira Cooper i...@redhat.com wrote: I suspect you are missing the patch needed to make this work. http://git.samba.org/?p=samba.git;a=commit;h=872a7d61ca769c47890244a1005c1bd445a3bab6; It was put in, in the 3.6.13 timeframe if I'm reading the git history correctly. The bug manifests when the base of the share has a different amount of Quota Allowance than elsewhere in the tree. \\foo\ - 5GB quota \\foo\bar - 2.5GB quota When you run dir in \\foo you get the results from the 5GB quota, and the same in \\foo\bar, which is incorrect and highly confusing to users. https://bugzilla.samba.org/show_bug.cgi?id=9646 Despite my discussion of multi-volume it should be the same bug. Thanks, -Ira / i...@samba.org - Original Message - From: David Gibbons david.c.gibb...@gmail.com To: gluster-users@gluster.org Sent: Wednesday, October 30, 2013 11:04:49 AM Subject: Re: [Gluster-users] Samba vfs_glusterfs Quota Support? Thanks all for the pointers. What version of Samba are you running? Samba version is 3.6.9: [root@gfs-a-1 /]# smbd -V Version 3.6.9 Gluster version is 3.4.1 git: [root@gfs-a-1 /]# glusterfs --version glusterfs 3.4.1 built on Oct 21 2013 09:22:36 It should be # gluster volume set gfsv0 features.quota-deem-statfs on [root@gfs-a-1 /]# gluster volume set gfsv0 features.quota-deem-statfs on volume set: failed: option : features.quota-deem-statfs does not exist Did you mean features.quota-timeout? I wonder if the quota-deem-statfs is part a more recent version? Cheers, Dave ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Samba vfs_glusterfs Quota Support?
Hi Lala, Thank you. I should have been more clear and you are correct, I can't write data above the quota. I was referring only to the listing of disk size in windows/samba land. Thanks for the tip in quota-deem-statfs. Here are my results with that command: # gluster volume set gfsv0 quota-deem-statfs on volume set: failed: option : quota-deem-statfs does not exist Did you mean dump-fd-stats or quota-timeout? Which Gluster version does that feature setting apply to? Cheers, Dave On Wed, Oct 30, 2013 at 3:09 AM, Lalatendu Mohanty lmoha...@redhat.comwrote: On 10/23/2013 05:26 PM, David Gibbons wrote: Hi All, I'm setting up a gluster cluster that will be accessed via smb. I was hoping that the quotas. I've configured a quota on the path itself: # gluster volume quota gfsv0 list path limit_set size -- /shares/testsharedave 10GB8.0KB And I've configured the share in samba (and can access it fine): # cat /etc/samba/smb.conf [testsharedave] vfs objects = glusterfs glusterfs:volfile_server = localhost glusterfs:volume = gfsv0 path = /shares/testsharedave valid users = dave guest ok = no writeable = yes But windows does not reflect the quota and instead shows the full size of the gluster volume. I've reviewed the code in https://forge.gluster.org/samba-glusterfs/samba-glusterfs-vfs/blobs/master/src/vfs_glusterfs.c -- which does not appear to support passing gluster quotas to samba. So I don't think my installation is broken, it seems like maybe this just isn't supported. Can anyone speak to whether or not quotas are going to be implemented in vfs_glusterfs for samba? Or if I'm just crazy and doing this wrong ;)? I'm definitely willing to help with the code but don't have much experience with either samba modules or the gluster API. Hi David, Quotas are supported by vfs_glusterfs for samba. I have also set quota on the volume correctly. If you try to write more data then the quota on the directory(/shares/testsharedave ), it will not allow. But for the clients (i.e. Windows/smb, nfs, fuse) to reflect in the meta data information (i.e. properties in Windows) , you have run below volume set command on respective volume. gluster volume set VOLUME NAME quota-deem-statfs on -Lala Cheers, Dave ___ Gluster-users mailing listGluster-users@gluster.orghttp://supercolony.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Samba vfs_glusterfs Quota Support?
Thanks all for the pointers. What version of Samba are you running? Samba version is 3.6.9: [root@gfs-a-1 /]# smbd -V Version 3.6.9 Gluster version is 3.4.1 git: [root@gfs-a-1 /]# glusterfs --version glusterfs 3.4.1 built on Oct 21 2013 09:22:36 It should be # gluster volume set gfsv0 features.quota-deem-statfs on [root@gfs-a-1 /]# gluster volume set gfsv0 features.quota-deem-statfs on volume set: failed: option : features.quota-deem-statfs does not exist Did you mean features.quota-timeout? I wonder if the quota-deem-statfs is part a more recent version? Cheers, Dave ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Geo-replication: queue delete commands and process after a specified time
Steve, I think the best bet would be geo replication with lvm snaps: 1) Geo replicate to another gluster install on separate hardware 2) Snap the volume using lvm that you have your gluster bricks on If you snap once a day and retain for 7 days, that should achieve your backup need. Cheers, Dave On Thu, Oct 24, 2013 at 11:40 AM, Steve Dainard sdain...@miovision.comwrote: Hello list, I'm toying with the idea of using Gluster as a user facing network share, and geo-replicating the data for backup purposes. At a bare minimum I'd like geo-replicate to not sync file deletions immediately to the slave, but instead queue those deletions for a configurable period of time (say 7 days). As an added bonus, moving a file would actually leave a copy behind with a data stamp suffix on the slave. I could then have a cron job cleanup old file copies. Lastly I would then expose the geo-replicated volume to users as read-only so they could retrieve old files if necessary, perhaps in a web-ui. At the end of the day I suppose I'm looking for a VSS style solution. From some research it doesn't look like either of these solutions exist in Gluster right now, are there any plans for this type of use-case? Obviously this would cause some serious havoc if the volume was used as a VM store so it would need to be properly cautioned. Otherwise, anyone know of an opensource solution that could do this? *Steve Dainard * IT Infrastructure Manager Miovision http://miovision.com/ | *Rethink Traffic* 519-513-2407 ex.250 877-646-8476 (toll-free) *Blog http://miovision.com/blog | **LinkedInhttps://www.linkedin.com/company/miovision-technologies | Twitter https://twitter.com/miovision | Facebookhttps://www.facebook.com/miovision * -- Miovision Technologies Inc. | 148 Manitou Drive, Suite 101, Kitchener, ON, Canada | N2C 1L3 This e-mail may contain information that is privileged or confidential. If you are not the intended recipient, please delete the e-mail and any attachments and notify us immediately. ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
[Gluster-users] Samba vfs_glusterfs Quota Support?
Hi All, I'm setting up a gluster cluster that will be accessed via smb. I was hoping that the quotas. I've configured a quota on the path itself: # gluster volume quota gfsv0 list path limit_set size -- /shares/testsharedave 10GB8.0KB And I've configured the share in samba (and can access it fine): # cat /etc/samba/smb.conf [testsharedave] vfs objects = glusterfs glusterfs:volfile_server = localhost glusterfs:volume = gfsv0 path = /shares/testsharedave valid users = dave guest ok = no writeable = yes But windows does not reflect the quota and instead shows the full size of the gluster volume. I've reviewed the code in https://forge.gluster.org/samba-glusterfs/samba-glusterfs-vfs/blobs/master/src/vfs_glusterfs.c -- which does not appear to support passing gluster quotas to samba. So I don't think my installation is broken, it seems like maybe this just isn't supported. Can anyone speak to whether or not quotas are going to be implemented in vfs_glusterfs for samba? Or if I'm just crazy and doing this wrong ;)? I'm definitely willing to help with the code but don't have much experience with either samba modules or the gluster API. Cheers, Dave ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Cluster Slowness with Glustrefs
Hi There, How many files are in each directory? We've seen hangs up to a few seconds when doing a ls against a folder with 2k files in it. Lowering the number of files per folder resolved the issue for us. From what I understand, this is a known behavior with Gluster. Dave On Mon, Oct 7, 2013 at 3:23 AM, pramod@wipro.com wrote: Hi Team, ** ** We have recently implemented glusterfs with 225 T.B of usable space. Glusterfs is not stable, we are having lot of issues. ** ** Once you traverse deeper into directory’s the gluster partition hangs.*** * ** ** Kindly Help. ** ** - Regards, PRamod The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments. WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. www.wipro.com ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Advice for building samba-glusterfs-vfs
Dan, I had the same trouble yesterday. As it happens, I created a doc to help script installs of future nodes. I did not snip out the portions that apply to what you're looking for, but the script below works for me. The biggest issue was that some modules were apparently installed in lib and the build process was looking for them in lib64. In any event, the vfs module builds, installs and runs cleanly after this. ** The big win for me was finding this command, that allowed me to figure out where it was looking for modules in the wrong lib directory. ldd /usr/local/samba/lib/vfs/glusterfs.so I'm sure there is any easier way to to do this :)... Cheers, Dave -- #!/bin/bash yum groupinstall Development Tools yum install git openssl-devel wget yum install libtalloc libtdb # set up gluster cd /usr/src git clone https://github.com/gluster/glusterfs.git cd /usr/src/glusterfs ./autogen.sh ./configure make make install # set up samba 3.6.9 cd /usr/src wget http://ftp.samba.org/pub/samba/stable/samba-3.6.9.tar.gz; tar -zxvf samba-3.6.9.tar.gz cd /usr/src/samba-3.6.9/source3 ./configure make make install ln -s /usr/local/samba/lib/libwbclient.so.0 /usr/lib64/libwbclient.so.0 # then install the RPM samba version yum install samba # set up vfs_glusterfs cd /usr/src git clone git:// forge.gluster.org/samba-glusterfs/samba-glusterfs-vfs.git ln -s /usr/local/include/glusterfs /usr/include/glusterfs cd /usr/src/samba-vfs/glusterfs ./configure --with-samba-source=/usr/src/samba-3.6.9/source3 ln -s /usr/local/samba/lib/vfs/glusterfs.so /usr/lib64/samba/vfs/glusterfs.so # link the other modules ln -s /usr/local/lib/libgfapi.so /usr/lib64/ ln -s /usr/local/lib/libgfapi.so.0 /usr/lib64/ ln -s /usr/local/lib/libgfapi.la /usr/lib64/ ln -s /usr/local/lib/libglusterfs.la /usr/lib64/ ln -s /usr/local/lib/libglusterfs.so /usr/lib64/ ln -s /usr/local/lib/libglusterfs.so.0 /usr/lib64/ ln -s /usr/local/lib/libglusterfs.so.0.0.0 /usr/lib64/ EOF On Tue, Oct 1, 2013 at 10:38 PM, Dan Mons dm...@cuttingedge.com.au wrote: Hi folks, I've got CentOS6.4 with Samba 3.6.9 installed from the standard CentOS repos via yum. I also have GlusterFS 3.4.0 GA installed from RPMs direct from gluster.org. I'm trying to build the glusterfs VFS module for Samba to take advantage of libgfapi for our Windows users, and migrate them off the current Samba-on-FUSE setup we have currently. I've downloaded the appropriate source trees for all projects (GlusterFS from gluster.org, Samba from the matching CentOS6 SRPM, and samba-glusterfs-vfs from the git repo), but am facing troubles early on just finding appropriate headers. [root@bne-gback000 samba-glusterfs-vfs]# find /usr/local/src/glusterfs-3.4.0 -type f -name glfs.h /usr/local/src/glusterfs-3.4.0/api/src/glfs.h [root@bne-gback000 samba-glusterfs-vfs]# ./configure --with-glusterfs=/usr/local/src/glusterfs-3.4.0 *snip* checking api/glfs.h usability... no checking api/glfs.h presence... no checking for api/glfs.h... no Cannot find api/glfs.h. Please specify --with-glusterfs=dir if necessary If I install glusterfs-api-devel-3.4.0-8.el6.x86_64.rpm, I need to copy /usr/include/glusterfs/api/glfs.h to /usr/include for it to be found (even using --with-glusterfs= doesn't work), and then I get further errors about not being able to link to glfs_init: [root@bne-gback000 samba-glusterfs-vfs]# rpm -ivh /tmp/glusterfs-api-devel-3.4.0-8.el6.x86_64.rpm [root@bne-gback000 samba-glusterfs-vfs]# cp /usr/include/glusterfs/api/glfs.h /usr/include/ [root@bne-gback000 samba-glusterfs-vfs]# ./configure *snip* checking api/glfs.h usability... yes checking api/glfs.h presence... yes checking for api/glfs.h... yes checking for glfs_init... no Cannot link to gfapi (glfs_init). Please specify --with-glusterfs=dir if necessary If anyone can point me in the right direction, that would be greatly appreciated. Cheers, -Dan ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Replacing a failed brick
Joe, Now I understand what is going on here. Makes a lot more sense that it's a bug in the sanity checking code. Thanks so much! Dave On Fri, Aug 16, 2013 at 11:19 AM, Joe Julian j...@julianfamily.org wrote: This tells you that this brick isn't running. That's probably because it was formatted and lost it's volume-id extended attribute. See http://www.joejulian.name/**blog/replacing-a-brick-on-**glusterfs-340/http://www.joejulian.name/blog/replacing-a-brick-on-glusterfs-340/ Once that's fixed, on 10.250.4.65: gluster volume start test-a force On 08/16/2013 08:03 AM, David Gibbons wrote: Brick 10.250.4.65:/localmnt/g2lv5 N/A N N/A ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Replacing a failed brick
Ok, it appears that the following worked. Thanks for the nudge in the right direction: volume replace-brick test-a 10.250.4.65:/localmnt/g2lv5 10.250.4.65:/localmnt/g2lv6 commit force then volume heal test-a full and monitor the progress with volume heal test-a info However that does not solve my problem for what to do when a brick is corrupted somehow, if I don't have enough space to first heal it and then replace it. That did get me thinking though, what if I replace the brick, forgoe the heal, replace it again and then do a heal? That seems to work. So if I lose one brick, here is the process that I used to recover it: 1) create a directory that is just to temporary trick gluster and allow us to maintain the correct replica count: mkdir /localmnt/garbage 2) replace the dead brick with our garbage directory: volume replace-brick test-a 10.250.4.65:/localmnt/g2lv5 10.250.4.65:/localmnt/garbage commit force 3) fix our dead brick using whatever process is required. in this case, for testing, we had to remove some gluster bits or it throws the already part of a volume error: setfattr -x trusted.glusterfs.volume-id /localmnt/g2lv5 setfattr -x trusted.gfid /localmnt/g2lv5 4) now that our dead brick is fixed, swap it for the garbage/temporary brick: volume replace-brick test-a 10.250.4.65:/localmnt/garbage 10.250.4.65:/localmnt/g2lv5 commit force 5) now all that we have to do is let gluster heal the volume: volume heal test-a full Is there anything wrong with this procedure? Cheers, Dave On Fri, Aug 16, 2013 at 11:03 AM, David Gibbons david.c.gibb...@gmail.comwrote: Ravi, Thanks for the tips. When I run a volume status: gluster volume status test-a Status of volume: test-a Gluster process PortOnline Pid -- Brick 10.250.4.63:/localmnt/g1lv2 49152 Y 8072 Brick 10.250.4.65:/localmnt/g2lv2 49152 Y 3403 Brick 10.250.4.63:/localmnt/g1lv3 49153 Y 8081 Brick 10.250.4.65:/localmnt/g2lv3 49153 Y 3410 Brick 10.250.4.63:/localmnt/g1lv4 49154 Y 8090 Brick 10.250.4.65:/localmnt/g2lv4 49154 Y 3417 Brick 10.250.4.63:/localmnt/g1lv5 49155 Y 8099 Brick 10.250.4.65:/localmnt/g2lv5 N/A N N/A Brick 10.250.4.63:/localmnt/g1lv1 49156 Y 8576 Brick 10.250.4.65:/localmnt/g2lv1 49156 Y 3431 NFS Server on localhost 2049Y 3440 Self-heal Daemon on localhost N/A Y 3445 NFS Server on 10.250.4.63 2049Y 8586 Self-heal Daemon on 10.250.4.63 N/A Y 8593 There are no active volume tasks -- Attempting to start the volume results in: gluster volume start test-a force volume start: test-a: failed: Failed to get extended attribute trusted.glusterfs.volume-id for brick dir /localmnt/g2lv5. Reason : No data available -- It doesn't like when I try to fire off a heal either: gluster volume heal test-a Launching Heal operation on volume test-a has been unsuccessful -- Although that did lead me to this: gluster volume heal test-a info Gathering Heal info on volume test-a has been successful Brick 10.250.4.63:/localmnt/g1lv2 Number of entries: 0 Brick 10.250.4.65:/localmnt/g2lv2 Number of entries: 0 Brick 10.250.4.63:/localmnt/g1lv3 Number of entries: 0 Brick 10.250.4.65:/localmnt/g2lv3 Number of entries: 0 Brick 10.250.4.63:/localmnt/g1lv4 Number of entries: 0 Brick 10.250.4.65:/localmnt/g2lv4 Number of entries: 0 Brick 10.250.4.63:/localmnt/g1lv5 Number of entries: 0 Brick 10.250.4.65:/localmnt/g2lv5 Status: Brick is Not connected Number of entries: 0 Brick 10.250.4.63:/localmnt/g1lv1 Number of entries: 0 Brick 10.250.4.65:/localmnt/g2lv1 Number of entries: 0 -- So perhaps I need to re-connect the brick? Cheers, Dave On Fri, Aug 16, 2013 at 12:43 AM, Ravishankar N ravishan...@redhat.comwrote: On 08/15/2013 10:05 PM, David Gibbons wrote: Hi There, I'm currently testing Gluster for possible production use. I haven't been able to find the answer to this question in the forum arch or in the public docs. It's possible that I don't know which keywords to search for. Here's the question (more details below): let's say that one of my bricks fails -- *not* a whole node failure but a single brick failure within the node. How do I replace a single brick on a node and force a sync from one of the replicas? I have two nodes with 5 bricks each: gluster volume info test-a Volume Name: test-a Type: Distributed-Replicate Volume ID: e8957773-dd36-44ae-b80a-01e22c78a8b4 Status: Started Number of Bricks: 5 x 2 = 10 Transport-type: tcp Bricks: Brick1
[Gluster-users] Replacing a failed brick
Hi There, I'm currently testing Gluster for possible production use. I haven't been able to find the answer to this question in the forum arch or in the public docs. It's possible that I don't know which keywords to search for. Here's the question (more details below): let's say that one of my bricks fails -- *not* a whole node failure but a single brick failure within the node. How do I replace a single brick on a node and force a sync from one of the replicas? I have two nodes with 5 bricks each: gluster volume info test-a Volume Name: test-a Type: Distributed-Replicate Volume ID: e8957773-dd36-44ae-b80a-01e22c78a8b4 Status: Started Number of Bricks: 5 x 2 = 10 Transport-type: tcp Bricks: Brick1: 10.250.4.63:/localmnt/g1lv2 Brick2: 10.250.4.65:/localmnt/g2lv2 Brick3: 10.250.4.63:/localmnt/g1lv3 Brick4: 10.250.4.65:/localmnt/g2lv3 Brick5: 10.250.4.63:/localmnt/g1lv4 Brick6: 10.250.4.65:/localmnt/g2lv4 Brick7: 10.250.4.63:/localmnt/g1lv5 Brick8: 10.250.4.65:/localmnt/g2lv5 Brick9: 10.250.4.63:/localmnt/g1lv1 Brick10: 10.250.4.65:/localmnt/g2lv1 I formatted 10.250.4.65:/localmnt/g2lv5 (to simulate a failure). What is the next step? I have tried various combinations of removing and re-adding the brick, replacing the brick, etc. I read in a previous message to this list that replace-brick was for planned changes which makes sense, so that's probably not my next step. Cheers, Dave ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users