Re: [Gluster-users] Geo-Rep. 3.5.3, Missing Files, Incorrect Files Pending

2015-05-06 Thread David Gibbons
Is it reasonable for me to just remove all of the XSYNC-CHANGELOG
files to make it start over with a full sync?

I just want to figure out how to get it to pick up again. Is it better
to remove and re-create the geo-rep session?

Thanks,
Dave

On Tue, May 5, 2015 at 9:27 AM, David Gibbons david.c.gibb...@gmail.com wrote:
 I caught one of the nodes transitioning into faulty mode, log output is
 below.


  In master nodes, look for log messages. Let us know if you feel any issue
 in log messages. (/var/log/glusterfs/geo-replication/)

 When one of the nodes drops into faulty, which happens periodically, this
 is the type of output that appears in the log:

 [root@gfs-a-1 ~]# tail
 /usr/local/var/log/glusterfs/geo-replication/shares/ssh%3A%2F%2Froot%4010.XX.XXX.X%3Agluster%3A%2F%2F127.0.0.1%3Abkpshares.log
 [2015-05-05 09:22:58.140913] W
 [master(/mnt/a-1-shares-brick-2/brick):250:regjob] top: Rsync:
 .gfid/065c09f9-4502-4a2c-81fa-5e8fcaf22712 [errcode: 23]
 [2015-05-05 09:22:58.152951] W
 [master(/mnt/a-1-shares-brick-2/brick):250:regjob] top: Rsync:
 .gfid/28a237a4-4346-48c5-bd1c-713273f591c7 [errcode: 23]
 [2015-05-05 09:22:58.327603] W
 [master(/mnt/a-1-shares-brick-2/brick):250:regjob] top: Rsync:
 .gfid/5755db3e-e9d8-42d2-b415-890842b086ae [errcode: 23]
 [2015-05-05 09:22:58.336714] W
 [master(/mnt/a-1-shares-brick-2/brick):250:regjob] top: Rsync:
 .gfid/0b7fc219-1e31-4e66-865f-5ae1c26d5e54 [errcode: 23]
 [2015-05-05 09:22:58.360308] W
 [master(/mnt/a-1-shares-brick-2/brick):250:regjob] top: Rsync:
 .gfid/955cd0e4-dd06-4db6-9391-34dbf72c9b06 [errcode: 23]
 [2015-05-05 09:22:58.367522] W
 [master(/mnt/a-1-shares-brick-2/brick):250:regjob] top: Rsync:
 .gfid/1d455725-c3e1-4111-92e5-335610d3f513 [errcode: 23]
 [2015-05-05 09:22:58.368226] W
 [master(/mnt/a-1-shares-brick-2/brick):250:regjob] top: Rsync:
 .gfid/7ce881ae-3491-4e21-b38b-0a27fb620c74 [errcode: 23]
 [2015-05-05 09:22:58.368959] W
 [master(/mnt/a-1-shares-brick-2/brick):250:regjob] top: Rsync:
 .gfid/056732c1-1537-4925-a30c-b905c110a5b2 [errcode: 23]
 [2015-05-05 09:22:58.369635] W
 [master(/mnt/a-1-shares-brick-2/brick):250:regjob] top: Rsync:
 .gfid/8c58d6c5-9975-43c6-8f4c-2a92337f7350 [errcode: 23]
 [2015-05-05 09:22:58.369790] W
 [master(/mnt/a-1-shares-brick-2/brick):877:process] _GMaster: incomplete
 sync, retrying changelogs: XSYNC-CHANGELOG.1430830891

 When the node is in active mode, I get a lot of log output that resembles
 this:
 [2015-05-05 09:23:54.735502] W
 [master(/mnt/a-1-shares-brick-3/brick):877:process] _GMaster: incomplete
 sync, retrying changelogs: XSYNC-CHANGELOG.1430832227
 [2015-05-05 09:23:55.449265] W
 [master(/mnt/a-1-shares-brick-3/brick):250:regjob] top: Rsync:
 .gfid/0665be16-04e9-4cbe-a2c9-a633caa8c79d [errcode: 23]
 [2015-05-05 09:23:55.449491] W
 [master(/mnt/a-1-shares-brick-3/brick):877:process] _GMaster: incomplete
 sync, retrying changelogs: XSYNC-CHANGELOG.1430832227
 [2015-05-05 09:23:56.277033] W
 [master(/mnt/a-1-shares-brick-3/brick):250:regjob] top: Rsync:
 .gfid/0665be16-04e9-4cbe-a2c9-a633caa8c79d [errcode: 23]
 [2015-05-05 09:23:56.277259] W
 [master(/mnt/a-1-shares-brick-3/brick):860:process] _GMaster: changelogs
 XSYNC-CHANGELOG.1430832227 could not be processed - moving on...
 [2015-05-05 09:23:56.294038] W
 [master(/mnt/a-1-shares-brick-3/brick):862:process] _GMaster: SKIPPED GFID =
 [2015-05-05 09:23:56.381592] I
 [master(/mnt/a-1-shares-brick-3/brick):1130:crawl] _GMaster: finished hybrid
 crawl syncing
 [2015-05-05 09:24:24.404884] I
 [master(/mnt/a-1-shares-brick-4/brick):445:crawlwrap] _GMaster: 1 crawls, 1
 turns
 [2015-05-05 09:24:24.437452] I
 [master(/mnt/a-1-shares-brick-4/brick):1124:crawl] _GMaster: starting hybrid
 crawl...
 [2015-05-05 09:24:24.588865] I
 [master(/mnt/a-1-shares-brick-1/brick):1133:crawl] _GMaster: processing
 xsync changelog
 /usr/local/var/run/gluster/shares/ssh%3A%2F%2Froot%4010.XX.XXX.X%3Agluster%3A%2F%2F127.0.0.1%3Abkpshares/9d9a72f468c582609e97e8929e58b9ff/xsync/XSYNC-CHANGELOG.1430832135

 This begs a couple of questions for me:

 Are these errcode:23 issues files that have been deleted/renamed since the
 changelog was created?
 Is it correct/expected for the node to drop into faulty and then recover
 itself to active periodically?

 Thank you again for your assistance!
 Dave
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Geo-Rep. 3.5.3, Missing Files, Incorrect Files Pending

2015-05-05 Thread David Gibbons
Thank you, responses and further questions inline below.


 In master nodes, look for log messages. Let us know if you feel any issue
 in log messages. (/var/log/glusterfs/geo-replication/)


The workers have been transitioning between active and faulty. They will
throw an error in the log (I believe it's related to rsync error 23 or
something, but will have to isolate it again), then switch to faulty. A
minute or so later they are back to Active.

Ideally, after initial crawl geo-rep should switch to Changelog crawl.


Thanks for clarifying, I will wait and look for that. It appears that xsync
is the default, but I did change it to changelog yesterday. Which is a more
reliable option?


 --
 Geo-rep doesn't have persistent store of all path names and sync status.
 When geo-rep gets the list of files to be synced, it adds the number to the
 counter. But if the same files modified again the counter will be
 incremented again. Numbers in Status output will not match the number of
 files on disk.


When does it get reset back to 0? Or where are the 8191 files that it
thinks are out of sync stored? I would like to be able to sanity check the
progress.

Thanks,
Dave
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Geo-Rep. 3.5.3, Missing Files, Incorrect Files Pending

2015-05-05 Thread David Gibbons
I caught one of the nodes transitioning into faulty mode, log output is
below.


  In master nodes, look for log messages. Let us know if you feel any issue
 in log messages. (/var/log/glusterfs/geo-replication/)

When one of the nodes drops into faulty, which happens periodically, this
is the type of output that appears in the log:

[root@gfs-a-1 ~]# tail
 
/usr/local/var/log/glusterfs/geo-replication/shares/ssh%3A%2F%2Froot%4010.XX.XXX.X%3Agluster%3A%2F%2F127.0.0.1%3Abkpshares.log
[2015-05-05 09:22:58.140913] W
[master(/mnt/a-1-shares-brick-2/brick):250:regjob] top: Rsync:
.gfid/065c09f9-4502-4a2c-81fa-5e8fcaf22712 [errcode: 23]
[2015-05-05 09:22:58.152951] W
[master(/mnt/a-1-shares-brick-2/brick):250:regjob] top: Rsync:
.gfid/28a237a4-4346-48c5-bd1c-713273f591c7 [errcode: 23]
[2015-05-05 09:22:58.327603] W
[master(/mnt/a-1-shares-brick-2/brick):250:regjob] top: Rsync:
.gfid/5755db3e-e9d8-42d2-b415-890842b086ae [errcode: 23]
[2015-05-05 09:22:58.336714] W
[master(/mnt/a-1-shares-brick-2/brick):250:regjob] top: Rsync:
.gfid/0b7fc219-1e31-4e66-865f-5ae1c26d5e54 [errcode: 23]
[2015-05-05 09:22:58.360308] W
[master(/mnt/a-1-shares-brick-2/brick):250:regjob] top: Rsync:
.gfid/955cd0e4-dd06-4db6-9391-34dbf72c9b06 [errcode: 23]
[2015-05-05 09:22:58.367522] W
[master(/mnt/a-1-shares-brick-2/brick):250:regjob] top: Rsync:
.gfid/1d455725-c3e1-4111-92e5-335610d3f513 [errcode: 23]
[2015-05-05 09:22:58.368226] W
[master(/mnt/a-1-shares-brick-2/brick):250:regjob] top: Rsync:
.gfid/7ce881ae-3491-4e21-b38b-0a27fb620c74 [errcode: 23]
[2015-05-05 09:22:58.368959] W
[master(/mnt/a-1-shares-brick-2/brick):250:regjob] top: Rsync:
.gfid/056732c1-1537-4925-a30c-b905c110a5b2 [errcode: 23]
[2015-05-05 09:22:58.369635] W
[master(/mnt/a-1-shares-brick-2/brick):250:regjob] top: Rsync:
.gfid/8c58d6c5-9975-43c6-8f4c-2a92337f7350 [errcode: 23]
[2015-05-05 09:22:58.369790] W
[master(/mnt/a-1-shares-brick-2/brick):877:process] _GMaster: incomplete
sync, retrying changelogs: XSYNC-CHANGELOG.1430830891

When the node is in active mode, I get a lot of log output that resembles
this:
[2015-05-05 09:23:54.735502] W
[master(/mnt/a-1-shares-brick-3/brick):877:process] _GMaster: incomplete
sync, retrying changelogs: XSYNC-CHANGELOG.1430832227
[2015-05-05 09:23:55.449265] W
[master(/mnt/a-1-shares-brick-3/brick):250:regjob] top: Rsync:
.gfid/0665be16-04e9-4cbe-a2c9-a633caa8c79d [errcode: 23]
[2015-05-05 09:23:55.449491] W
[master(/mnt/a-1-shares-brick-3/brick):877:process] _GMaster: incomplete
sync, retrying changelogs: XSYNC-CHANGELOG.1430832227
[2015-05-05 09:23:56.277033] W
[master(/mnt/a-1-shares-brick-3/brick):250:regjob] top: Rsync:
.gfid/0665be16-04e9-4cbe-a2c9-a633caa8c79d [errcode: 23]
[2015-05-05 09:23:56.277259] W
[master(/mnt/a-1-shares-brick-3/brick):860:process] _GMaster: changelogs
XSYNC-CHANGELOG.1430832227 could not be processed - moving on...
[2015-05-05 09:23:56.294038] W
[master(/mnt/a-1-shares-brick-3/brick):862:process] _GMaster: SKIPPED GFID =
[2015-05-05 09:23:56.381592] I
[master(/mnt/a-1-shares-brick-3/brick):1130:crawl] _GMaster: finished
hybrid crawl syncing
[2015-05-05 09:24:24.404884] I
[master(/mnt/a-1-shares-brick-4/brick):445:crawlwrap] _GMaster: 1 crawls, 1
turns
[2015-05-05 09:24:24.437452] I
[master(/mnt/a-1-shares-brick-4/brick):1124:crawl] _GMaster: starting
hybrid crawl...
[2015-05-05 09:24:24.588865] I
[master(/mnt/a-1-shares-brick-1/brick):1133:crawl] _GMaster: processing
xsync changelog
/usr/local/var/run/gluster/shares/ssh%3A%2F%2Froot%4010.XX.XXX.X%3Agluster%3A%2F%2F127.0.0.1%3Abkpshares/9d9a72f468c582609e97e8929e58b9ff/xsync/XSYNC-CHANGELOG.1430832135

This begs a couple of questions for me:

   1. Are these errcode:23 issues files that have been deleted/renamed
   since the changelog was created?
   2. Is it correct/expected for the node to drop into faulty and then
   recover itself to active periodically?

Thank you again for your assistance!
Dave
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Geo-Rep. 3.5.3, Missing Files, Incorrect Files Pending

2015-05-04 Thread David Gibbons
So I should do a compare out-of-band from Gluster and see what is
actually in-sync vs out of sync? Is there any easy way just to start
it over? I am assuming removing and re-adding geo-rep is the easiest
way. Is that correct?

Thanks,
Dave

On Mon, May 4, 2015 at 10:09 PM, Aravinda avish...@redhat.com wrote:
 Status output has issue showing exact number of files in sync. Please check
 the numbers on disk and let us know if difference exists between Master and
 Secondary Volume.

 --
 regards
 Aravinda

 On 05/05/2015 06:58 AM, David Gibbons wrote:

 I am having an issue with geo-replication. There were a number of
 complications when I upgraded to 3.5.3, but geo-replication was (I
 think) working at some point. The volume is accessed via samba using
 vfs_glusterfs.

 The main issue is that geo-replication has not been sending updated
 copies of old files to the replicated server. So in the scenario where
 file created - time passes - file is modified - file is saved, the
 new version is not replicated.

 Is it possible that one brick is having a geo-rep issue and the others
 are not? Consider this output:

 MASTER NODE MASTER VOLMASTER BRICK
  SLAVE   STATUS CHECKPOINT STATUSCRAWL
 STATUSFILES SYNCDFILES PENDINGBYTES PENDINGDELETES
 PENDINGFILES SKIPPED

 
 gfs-a-1 shares
 /mnt/a-1-shares-brick-1/brickgfs-a-bkp::bkpsharesActive
 N/A  Hybrid Crawl230945600
 0  0
 gfs-a-1 shares
 /mnt/a-1-shares-brick-2/brickgfs-a-bkp::bkpsharesActive
 N/A  Hybrid Crawl231555700
 0  0
 gfs-a-1 shares
 /mnt/a-1-shares-brick-3/brickgfs-a-bkp::bkpsharesActive
 N/A  Hybrid Crawl236288400
 0  0
 gfs-a-1 shares
 /mnt/a-1-shares-brick-4/brickgfs-a-bkp::bkpsharesActive
 N/A  Hybrid Crawl240760000
 0  0
 gfs-a-2 shares
 /mnt/a-2-shares-brick-1/brickgfs-a-bkp::bkpsharesActive
 N/A  Hybrid Crawl240943000
 0  0
 gfs-a-2 shares
 /mnt/a-2-shares-brick-2/brickgfs-a-bkp::bkpsharesActive
 N/A  Hybrid Crawl230896900
 0  0
 gfs-a-2 shares
 /mnt/a-2-shares-brick-3/brickgfs-a-bkp::bkpsharesActive
 N/A  Hybrid Crawl20795768191 0
 0  0
 gfs-a-2 shares
 /mnt/a-2-shares-brick-4/brickgfs-a-bkp::bkpsharesActive
 N/A  Hybrid Crawl234059700
 0  0
 gfs-a-3 shares
 /mnt/a-3-shares-brick-1/brickgfs-a-bkp::bkpsharesPassive
 N/A  N/A 0  00
 0  0
 gfs-a-3 shares
 /mnt/a-3-shares-brick-2/brickgfs-a-bkp::bkpsharesPassive
 N/A  N/A 0  00
 0  0
 gfs-a-3 shares
 /mnt/a-3-shares-brick-3/brickgfs-a-bkp::bkpsharesPassive
 N/A  N/A 0  00
 0  0
 gfs-a-3 shares
 /mnt/a-3-shares-brick-4/brickgfs-a-bkp::bkpsharesPassive
 N/A  N/A 0  00
 0  0
 gfs-a-4 shares
 /mnt/a-4-shares-brick-1/brickgfs-a-bkp::bkpsharesPassive
 N/A  N/A 0  00
 0  0
 gfs-a-4 shares
 /mnt/a-4-shares-brick-2/brickgfs-a-bkp::bkpsharesPassive
 N/A  N/A 0  00
 0  0
 gfs-a-4 shares
 /mnt/a-4-shares-brick-3/brickgfs-a-bkp::bkpsharesPassive
 N/A  N/A 0  00
 0  0
 gfs-a-4 shares
 /mnt/a-4-shares-brick-4/brickgfs-a-bkp::bkpsharesPassive
 N/A  N/A 0  00

[Gluster-users] Geo-Rep. 3.5.3, Missing Files, Incorrect Files Pending

2015-05-04 Thread David Gibbons
I am having an issue with geo-replication. There were a number of
complications when I upgraded to 3.5.3, but geo-replication was (I
think) working at some point. The volume is accessed via samba using
vfs_glusterfs.

The main issue is that geo-replication has not been sending updated
copies of old files to the replicated server. So in the scenario where
file created - time passes - file is modified - file is saved, the
new version is not replicated.

Is it possible that one brick is having a geo-rep issue and the others
are not? Consider this output:

MASTER NODE MASTER VOLMASTER BRICK
SLAVE   STATUS CHECKPOINT STATUSCRAWL
STATUSFILES SYNCDFILES PENDINGBYTES PENDINGDELETES
PENDINGFILES SKIPPED

gfs-a-1 shares
/mnt/a-1-shares-brick-1/brickgfs-a-bkp::bkpsharesActive
N/A  Hybrid Crawl230945600
   0  0
gfs-a-1 shares
/mnt/a-1-shares-brick-2/brickgfs-a-bkp::bkpsharesActive
N/A  Hybrid Crawl231555700
   0  0
gfs-a-1 shares
/mnt/a-1-shares-brick-3/brickgfs-a-bkp::bkpsharesActive
N/A  Hybrid Crawl236288400
   0  0
gfs-a-1 shares
/mnt/a-1-shares-brick-4/brickgfs-a-bkp::bkpsharesActive
N/A  Hybrid Crawl240760000
   0  0
gfs-a-2 shares
/mnt/a-2-shares-brick-1/brickgfs-a-bkp::bkpsharesActive
N/A  Hybrid Crawl240943000
   0  0
gfs-a-2 shares
/mnt/a-2-shares-brick-2/brickgfs-a-bkp::bkpsharesActive
N/A  Hybrid Crawl230896900
   0  0
gfs-a-2 shares
/mnt/a-2-shares-brick-3/brickgfs-a-bkp::bkpsharesActive
N/A  Hybrid Crawl20795768191 0
   0  0
gfs-a-2 shares
/mnt/a-2-shares-brick-4/brickgfs-a-bkp::bkpsharesActive
N/A  Hybrid Crawl234059700
   0  0
gfs-a-3 shares
/mnt/a-3-shares-brick-1/brickgfs-a-bkp::bkpsharesPassive
N/A  N/A 0  00
   0  0
gfs-a-3 shares
/mnt/a-3-shares-brick-2/brickgfs-a-bkp::bkpsharesPassive
N/A  N/A 0  00
   0  0
gfs-a-3 shares
/mnt/a-3-shares-brick-3/brickgfs-a-bkp::bkpsharesPassive
N/A  N/A 0  00
   0  0
gfs-a-3 shares
/mnt/a-3-shares-brick-4/brickgfs-a-bkp::bkpsharesPassive
N/A  N/A 0  00
   0  0
gfs-a-4 shares
/mnt/a-4-shares-brick-1/brickgfs-a-bkp::bkpsharesPassive
N/A  N/A 0  00
   0  0
gfs-a-4 shares
/mnt/a-4-shares-brick-2/brickgfs-a-bkp::bkpsharesPassive
N/A  N/A 0  00
   0  0
gfs-a-4 shares
/mnt/a-4-shares-brick-3/brickgfs-a-bkp::bkpsharesPassive
N/A  N/A 0  00
   0  0
gfs-a-4 shares
/mnt/a-4-shares-brick-4/brickgfs-a-bkp::bkpsharesPassive
N/A  N/A 0  00
   0  0

This seems to show that there are 8191 files_pending on just one
brick, and the others are up to date. I am suspicious of the 8191
number because it's looks like we're at a bucket-size boundary on the
backend. I've tried stopping and re-starting the rep session. I've
also tried changing the change_detector from xsync to changelog.
Neither seems to have had an effect.

It seems like geo-replication is quite wonky in 3.5.x. Is there light
at the end of the tunnel, or should I find another solution to
replicate?

Cheers,
Dave
___
Gluster-users mailing list

Re: [Gluster-users] Trying to use gluster using Virtual IP.

2015-01-12 Thread David Gibbons
I use VIPs and keepalived on my production configuration as well. You don't
want to peer probe with the VIP. You want to peer probe with the actual IP.
The VIP is merely a forwarding-facing mechanism for clients to connect to,
and that's why it fails between your gluster peers. The peers themselves
already know how to handle failover in a more graceful way than a VIP :).

Remove the peers then re-probe with the actual IP instead of the VIP. The
VIP is just for clients.

Cheers,
Dave

On Mon, Jan 12, 2015 at 7:57 AM, Sergio Traldi sergio.tra...@pd.infn.it
wrote:

 Hi,
 We have a SAN with 14 TB of disks space and we have 2 controllers attached
 to this SAN.

 We want to use this storage using gluster.

 Our goal is to use this storage in high availability, i.e. we want to keep
 using all the storage even if there are some problems with one of the
 controllers.

 Our idea is the following:
 - Create 2 LUN
 - Attach via iscsi the 2 LUN to each Controller Hosts.
 - Create a brick on each controller node (brick1 for Controller1 and
 brick2 for Controller2)
 - Make the login so each controller are able to mount disk1 to brick1 and
 disk2 to brick2.
 - Install keepalived (a routing software where its main goal is to provide
 simple and robust facilities for loadbalancing and high-availability to
 Linux).
 - Create 2 VIP (Virtual IP) one for controller 1 and the other for
 controller 2. So the situation would be:
   o Controller1 with his IP (IP1) would have also a VIP (VIP1) with 2
 iscsi disks mounted but just one in R/W mode used (brick1).
   o Controller2 with his IP (IP2)and a VIP (VIP2) with 2 iscsi
 disksmounted but just one in R/W mode used (brick2).

 - The glusterfs volume would be mounted on the client in fail-over, i.e.
 in the fstab there would be something like:

 VIP1:/volume /var/lib/nova/instances glusterfs defaults,log-le
 vel=ERROR,_netdev,backup-volfile-servers=VIP2 0 0


 - Keepalived would be configured to change VIP1 to IP2 if controller1 e.g.
 has to be shutdown. The same for VIP2.
 This VIP change should hopefully not impact the operations on the client


 We are trying this setting but when we try to create a volume:
 gluster volume create testvolume transport tcp VIP1:/data/brick1/sda
 VIP2:/data/brick2/sdb

 we obtain this error:
 volume create: testvolume : failed: Host VIP2 is not in 'Peer in Cluster'
 state

 But if we try :
 [controller1]# gluster peer status
 Number of Peers: 1

 Hostname: VIP2
 Uuid: 6692a700-4c41-4e8d-8810-48f9d1ee9315
 State: Accepted peer request (Connected)

 [controller2]# gluster peer status
 Number of Peers: 1

 Hostname: IP1
 Uuid: 074e9eea-6bf5-4ac8-8ac9-d1159bb4d452
 State: Accepted peer request (Disconnected)


 If we try to:
 [controller2]# gluster peer probe VIP1

 we obtain this error:
 peer probe: failed: Probe returned with unknown errno 107


 Any idea how I can not create a volume with two virtual IP?

 Thinking it could be a DNS problem I try also to put in /etc/hosts this
 lines:
 VIP1 controller1.mydomain controller1
 VIP2 controller2.mydomain controller2

 In each controller.

 In the log file of controller2 I just found:

 [2015-01-12 11:42:47.549545] E [glusterd-handshake.c:1644:__
 glusterd_mgmt_hndsk_version_cbk] 0-management: failed to get the
 'versions' from peer (IP1:24007)

 In the log file of cotnroller1 I just found:

 [2015-01-12 11:44:44.229600] E 
 [glusterd-handshake.c:914:gd_validate_mgmt_hndsk_req]
 0-management: Rejecting management handshake request from unknown peer
 IP2:1018
 [2015-01-12 11:44:47.234863] E 
 [glusterd-handshake.c:914:gd_validate_mgmt_hndsk_req]
 0-management: Rejecting management handshake request from unknown peer
 IP2:1017
 [2015-01-12 11:44:50.240324] E 
 [glusterd-handshake.c:914:gd_validate_mgmt_hndsk_req]
 0-management: Rejecting management handshake request from unknown peer
 IP2:1001

 If I try a telnet:
 [controller2]# telnet VIP1 24007

 and
 [controller1]# telnet VIP2 24007

 they work fine.

 Any idea if it is possible create a volume using VIPs and not IPs?
 Cheers
 Sergio
 ___
 Gluster-users mailing list
 Gluster-users@gluster.org
 http://www.gluster.org/mailman/listinfo/gluster-users

___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Geo-Replication Issue

2014-12-14 Thread David Gibbons
Thank you for the advice. After re-compiling gluster with the xml option, I
was able to get geo-replication started!

Is this output normal? This is a 2x2 distributed/replicated volume:

 ]# gluster volume geo-rep shares gfs-a-bkp::bkpshares status


 MASTER NODE MASTER VOLMASTER BRICK
   SLAVE   STATUS CHECKPOINT STATUSCRAWL STATUS


 

 gfs-a-2 shares
  /mnt/a-2-shares-brick-1/brickgfs-a-bkp::bkpsharesActive N/A
Hybrid Crawl

 gfs-a-2 shares
  /mnt/a-2-shares-brick-2/brickgfs-a-bkp::bkpsharesActive N/A
Hybrid Crawl

 gfs-a-2 shares
  /mnt/a-2-shares-brick-3/brickgfs-a-bkp::bkpsharesActive N/A
Hybrid Crawl

 gfs-a-2 shares
  /mnt/a-2-shares-brick-4/brickgfs-a-bkp::bkpsharesActive N/A
Hybrid Crawl

 gfs-a-3 shares
  /mnt/a-3-shares-brick-1/brickgfs-a-bkp::bkpsharesPassiveN/A
N/A

 gfs-a-3 shares
  /mnt/a-3-shares-brick-2/brickgfs-a-bkp::bkpsharesPassiveN/A
N/A

 gfs-a-3 shares
  /mnt/a-3-shares-brick-3/brickgfs-a-bkp::bkpsharesPassiveN/A
N/A

 gfs-a-3 shares
  /mnt/a-3-shares-brick-4/brickgfs-a-bkp::bkpsharesPassiveN/A
N/A

 gfs-a-4 shares
  /mnt/a-4-shares-brick-1/brickgfs-a-bkp::bkpsharesPassiveN/A
N/A

 gfs-a-4 shares
  /mnt/a-4-shares-brick-2/brickgfs-a-bkp::bkpsharesPassiveN/A
N/A

 gfs-a-4 shares
  /mnt/a-4-shares-brick-3/brickgfs-a-bkp::bkpsharesPassiveN/A
N/A

 gfs-a-4 shares
  /mnt/a-4-shares-brick-4/brickgfs-a-bkp::bkpsharesPassiveN/A
N/A

 gfs-a-1 shares
  /mnt/a-1-shares-brick-1/brickgfs-a-bkp::bkpsharesActive N/A
Hybrid Crawl

 gfs-a-1 shares
  /mnt/a-1-shares-brick-2/brickgfs-a-bkp::bkpsharesActive N/A
Hybrid Crawl

 gfs-a-1 shares
  /mnt/a-1-shares-brick-3/brickgfs-a-bkp::bkpsharesActive N/A
Hybrid Crawl

 gfs-a-1 shares
  /mnt/a-1-shares-brick-4/brickgfs-a-bkp::bkpsharesActive N/A
Hybrid Crawl


What I mean to say is, is it normal for two of the nodes to be in active
mode and two of the nodes to be in passive mode? I'm thinking the answer is
yes due to the distributed/replicated nature, but would like some
confirmation of that.

Cheers,
Dave

On Thu, Dec 11, 2014 at 12:19 PM, Aravinda avish...@redhat.com wrote:

  Geo-replication depends on the xml output of Gluster Cli commands. For
 example, before connecting to slave nodes it gets the nodes list from both
 master and slave using gluster volume info and status commands with --xml.

 The Python tracebacks you are seeing in logs are due to inability to parse
 the output of gluster commands when xml is not supported.

 --
 regards
 Aravinda
 http://aravindavk.in



 On 12/11/2014 07:56 PM, David Gibbons wrote:

   Thanks for the feedback, answers inline below:


Have you followed all the upgrade steps w.r.t geo-rep
mentioned in the following link?


  I didn't upgrade geo-rep, I disconnected the old replicated server and
 started from scratch. So everything with regard to geo-rep is
 fresh/brand-new.


 2. Does the output of command 'gluster vol info vol-name --xml' proper ?
Please paste the output.


  I do not have gluster compiled with xml. Perhaps that is the problem.
 Here is the output of the command you referenced:

 XML output not supported. Ignoring '--xml' option


  This is my config summary:

  GlusterFS configure summary
 ===
 FUSE client  : yes
 Infiniband verbs : no
 epoll IO multiplex   : yes
 argp-standalone  : no
 fusermount   : yes
 readline : no
 georeplication   : yes
 Linux-AIO: no
 Enable Debug : no
 systemtap: no
 Block Device xlator  : no
 glupy: no
 Use syslog   : yes
 XML output   : no
 QEMU Block formats   : no
 Encryption xlator: no


  Am I missing something that is require for geo-replication? I've found
 the documentation for those of us who are building the binaries to be a bit
 lacking with regard to dependencies within the project.

  Cheers,
 Dave



 ___
 Gluster-users mailing 
 listGluster-users

Re: [Gluster-users] Geo-Replication Issue

2014-12-11 Thread David Gibbons
Thanks for the feedback, answers inline below:


Have you followed all the upgrade steps w.r.t geo-rep
mentioned in the following link?


I didn't upgrade geo-rep, I disconnected the old replicated server and
started from scratch. So everything with regard to geo-rep is
fresh/brand-new.


 2. Does the output of command 'gluster vol info vol-name --xml' proper ?
Please paste the output.


I do not have gluster compiled with xml. Perhaps that is the problem. Here
is the output of the command you referenced:

 XML output not supported. Ignoring '--xml' option


This is my config summary:

GlusterFS configure summary
 ===
 FUSE client  : yes
 Infiniband verbs : no
 epoll IO multiplex   : yes
 argp-standalone  : no
 fusermount   : yes
 readline : no
 georeplication   : yes
 Linux-AIO: no
 Enable Debug : no
 systemtap: no
 Block Device xlator  : no
 glupy: no
 Use syslog   : yes
 XML output   : no
 QEMU Block formats   : no
 Encryption xlator: no


Am I missing something that is require for geo-replication? I've found the
documentation for those of us who are building the binaries to be a bit
lacking with regard to dependencies within the project.

Cheers,
Dave
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Geo-Replication Issue

2014-12-10 Thread David Gibbons
Hi Kotresh,

Thanks for the tip. Unfortunately that does not seem to have any effect.
The path to the gluster binaries was already in $PATH. I did try adding the
path to the gsyncd binary, but same result. Contents of $PATH are:


 /usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/root/bin:/usr/local/libexec/glusterfs/


It seems like perhaps one of the remote gsyncd processes cannot find the
gluster binary, because I see the following in the
geo-replication/shares/ssh...log. Can you point me toward how I can find
out what is throwing this log entry?

 [2014-12-10 07:20:53.886676] E
 [syncdutils(monitor):218:log_raise_exception] top: execution of gluster
 failed with ENOENT (No such file or directory)

 [2014-12-10 07:20:53.886883] I [syncdutils(monitor):192:finalize] top:
 exiting.


I think that whatever process is trying to use the gluster command has the
incorrect path to access it. Do you know how I could modify *that* path?

I've manually tested the ssh_command and ssh_command_tar variables in the
relevant gsyncd.conf; both connect to the slave server successfully and
appear to execute the command they're supposed to.

gluster_command_dir in gsyncd.conf is also the correct directory
(/usr/local/sbin).

In summary: I think we're on to something with setting the path, but I
think I need to set it somewhere other than my shell.

Thanks,
Dave


On Tue, Dec 9, 2014 at 11:52 PM, Kotresh Hiremath Ravishankar 
khire...@redhat.com wrote:

 If that is the case, as a workaround, try adding 'gluster' path
 to PATH environment variable or creating symlinks to gluster,
 glusterd binaries.

 1. export PATH=$PATH:path where gluster binaries are installed

 Above should work, let me know if doesn't.

 Thanks and Regards,
 Kotresh H R

 - Original Message -
 From: David Gibbons david.c.gibb...@gmail.com
 To: Kotresh Hiremath Ravishankar khire...@redhat.com
 Cc: gluster-users Gluster-users@gluster.org, vno...@stonefly.com
 Sent: Tuesday, December 9, 2014 6:16:03 PM
 Subject: Re: [Gluster-users] Geo-Replication Issue

 Hi Kotresh,

 Yes, I believe that I am. Can you tell me which symlinks are missing/cause
 geo-replication to fail to start? I can create them manually.

 Thank you,
 Dave

 On Tue, Dec 9, 2014 at 3:54 AM, Kotresh Hiremath Ravishankar 
 khire...@redhat.com wrote:

  Hi Dave,
 
  Are you hitting the below bug and so not able to sync symlinks ?
  https://bugzilla.redhat.com/show_bug.cgi?id=1105283
 
  Does geo-rep status say Not Started ?
 
  Thanks and Regards,
  Kotresh H R
 
  - Original Message -
  From: David Gibbons david.c.gibb...@gmail.com
  To: gluster-users Gluster-users@gluster.org
  Cc: vno...@stonefly.com
  Sent: Monday, December 8, 2014 7:03:31 PM
  Subject: Re: [Gluster-users] Geo-Replication Issue
 
  Apologies for sending so many messages about this! I think I may be
  running into this bug:
  https://bugzilla.redhat.com/show_bug.cgi?id=1105283
 
  Would someone be so kind as to let me know which symlinks are missing
 when
  this bug manifests, so that I can create them?
 
  Thank you,
  Dave
 
 
  On Sun, Dec 7, 2014 at 11:01 AM, David Gibbons 
 david.c.gibb...@gmail.com
   wrote:
 
 
 
  Ok,
 
  I was able to get geo-replication configured by changing
  /usr/local/libexec/glusterfs/gverify.sh to use ssh to access the local
  machine, instead of accessing bash -c directly. I then found that the
 hook
  script was missing for geo-replication, so I copied that over manually. I
  now have what appears to be a configured geo-rep setup:
 
 
 
 
  # gluster volume geo-replication shares gfs-a-bkp::bkpshares status
 
 
 
 
  MASTER NODE MASTER VOL MASTER BRICK SLAVE STATUS CHECKPOINT STATUS CRAWL
  STATUS
 
 
 
 
 
  gfs-a-3 shares /mnt/a-3-shares-brick-1/brick gfs-a-bkp::bkpshares Not
  Started N/A N/A
 
  gfs-a-3 shares /mnt/a-3-shares-brick-2/brick gfs-a-bkp::bkpshares Not
  Started N/A N/A
 
  gfs-a-3 shares /mnt/a-3-shares-brick-3/brick gfs-a-bkp::bkpshares Not
  Started N/A N/A
 
  gfs-a-3 shares /mnt/a-3-shares-brick-4/brick gfs-a-bkp::bkpshares Not
  Started N/A N/A
 
  gfs-a-2 shares /mnt/a-2-shares-brick-1/brick gfs-a-bkp::bkpshares Not
  Started N/A N/A
 
  gfs-a-2 shares /mnt/a-2-shares-brick-2/brick gfs-a-bkp::bkpshares Not
  Started N/A N/A
 
  gfs-a-2 shares /mnt/a-2-shares-brick-3/brick gfs-a-bkp::bkpshares Not
  Started N/A N/A
 
  gfs-a-2 shares /mnt/a-2-shares-brick-4/brick gfs-a-bkp::bkpshares Not
  Started N/A N/A
 
  gfs-a-4 shares /mnt/a-4-shares-brick-1/brick gfs-a-bkp::bkpshares Not
  Started N/A N/A
 
  gfs-a-4 shares /mnt/a-4-shares-brick-2/brick gfs-a-bkp::bkpshares Not
  Started N/A N/A
 
  gfs-a-4 shares /mnt/a-4-shares-brick-3/brick gfs-a-bkp::bkpshares Not
  Started N/A N/A
 
  gfs-a-4 shares /mnt/a-4-shares-brick-4/brick gfs-a-bkp::bkpshares Not
  Started N/A N/A
 
  gfs-a-1 shares /mnt

Re: [Gluster-users] Geo-Replication Issue

2014-12-10 Thread David Gibbons
Symlinking gluster to /usr/bin/ seems to have resolved the path issue.
Thanks for the tip there.

Now there's a different error throw in the geo-rep/ssh...log:

 [2014-12-10 07:32:42.609031] E
 [syncdutils(monitor):240:log_raise_exception] top: FAIL:

 Traceback (most recent call last):

   File /usr/local/libexec/glusterfs/python/syncdaemon/gsyncd.py, line
 150, in main

 main_i()

   File /usr/local/libexec/glusterfs/python/syncdaemon/gsyncd.py, line
 530, in main_i

 return monitor(*rscs)

   File /usr/local/libexec/glusterfs/python/syncdaemon/monitor.py, line
 243, in monitor

 return Monitor().multiplex(*distribute(*resources))

   File /usr/local/libexec/glusterfs/python/syncdaemon/monitor.py, line
 205, in distribute

 mvol = Volinfo(master.volume, master.host)

   File /usr/local/libexec/glusterfs/python/syncdaemon/monitor.py, line
 22, in __init__

 vi = XET.fromstring(vix)

   File /usr/lib64/python2.6/xml/etree/ElementTree.py, line 963, in XML

 parser.feed(text)

   File /usr/lib64/python2.6/xml/etree/ElementTree.py, line 1245, in feed

 self._parser.Parse(data, 0)

 ExpatError: syntax error: line 2, column 0

 [2014-12-10 07:32:42.610858] I [syncdutils(monitor):192:finalize] top:
 exiting.


I also get a bunch of these errors but have been assuming that they are
being thrown because geo-replication hasn't started successfully yet. There
is one for each brick:

 [2014-12-10 12:33:33.539737] E
 [glusterd-geo-rep.c:2685:glusterd_gsync_read_frm_status] 0-: Unable to read
 gsyncd status file

 [2014-12-10 12:33:33.539742] E
 [glusterd-geo-rep.c:2999:glusterd_read_status_file] 0-: Unable to read the
 statusfile for /mnt/a-3-shares-brick-4/brick brick for  shares(master),
 gfs-a-bkp::bkpshares(slave) session


Do I have a config file error somewhere that I need to track down? This
volume *was* upgraded from 3.4.2 a few weeks ago.

Cheers,
Dave

On Wed, Dec 10, 2014 at 7:29 AM, David Gibbons david.c.gibb...@gmail.com
wrote:

 Hi Kotresh,

 Thanks for the tip. Unfortunately that does not seem to have any effect.
 The path to the gluster binaries was already in $PATH. I did try adding the
 path to the gsyncd binary, but same result. Contents of $PATH are:


 /usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/root/bin:/usr/local/libexec/glusterfs/


 It seems like perhaps one of the remote gsyncd processes cannot find the
 gluster binary, because I see the following in the
 geo-replication/shares/ssh...log. Can you point me toward how I can find
 out what is throwing this log entry?

 [2014-12-10 07:20:53.886676] E
 [syncdutils(monitor):218:log_raise_exception] top: execution of gluster
 failed with ENOENT (No such file or directory)

 [2014-12-10 07:20:53.886883] I [syncdutils(monitor):192:finalize] top:
 exiting.


 I think that whatever process is trying to use the gluster command has the
 incorrect path to access it. Do you know how I could modify *that* path?

 I've manually tested the ssh_command and ssh_command_tar variables in the
 relevant gsyncd.conf; both connect to the slave server successfully and
 appear to execute the command they're supposed to.

 gluster_command_dir in gsyncd.conf is also the correct directory
 (/usr/local/sbin).

 In summary: I think we're on to something with setting the path, but I
 think I need to set it somewhere other than my shell.

 Thanks,
 Dave


 On Tue, Dec 9, 2014 at 11:52 PM, Kotresh Hiremath Ravishankar 
 khire...@redhat.com wrote:

 If that is the case, as a workaround, try adding 'gluster' path
 to PATH environment variable or creating symlinks to gluster,
 glusterd binaries.

 1. export PATH=$PATH:path where gluster binaries are installed

 Above should work, let me know if doesn't.

 Thanks and Regards,
 Kotresh H R

 - Original Message -
 From: David Gibbons david.c.gibb...@gmail.com
 To: Kotresh Hiremath Ravishankar khire...@redhat.com
 Cc: gluster-users Gluster-users@gluster.org, vno...@stonefly.com
 Sent: Tuesday, December 9, 2014 6:16:03 PM
 Subject: Re: [Gluster-users] Geo-Replication Issue

 Hi Kotresh,

 Yes, I believe that I am. Can you tell me which symlinks are missing/cause
 geo-replication to fail to start? I can create them manually.

 Thank you,
 Dave

 On Tue, Dec 9, 2014 at 3:54 AM, Kotresh Hiremath Ravishankar 
 khire...@redhat.com wrote:

  Hi Dave,
 
  Are you hitting the below bug and so not able to sync symlinks ?
  https://bugzilla.redhat.com/show_bug.cgi?id=1105283
 
  Does geo-rep status say Not Started ?
 
  Thanks and Regards,
  Kotresh H R
 
  - Original Message -
  From: David Gibbons david.c.gibb...@gmail.com
  To: gluster-users Gluster-users@gluster.org
  Cc: vno...@stonefly.com
  Sent: Monday, December 8, 2014 7:03:31 PM
  Subject: Re: [Gluster-users] Geo-Replication Issue
 
  Apologies for sending so many messages about this! I think I may be
  running into this bug:
  https://bugzilla.redhat.com/show_bug.cgi?id=1105283
 
  Would someone be so kind as to let me

Re: [Gluster-users] Geo-Replication Issue

2014-12-09 Thread David Gibbons
Hi Kotresh,

Yes, I believe that I am. Can you tell me which symlinks are missing/cause
geo-replication to fail to start? I can create them manually.

Thank you,
Dave

On Tue, Dec 9, 2014 at 3:54 AM, Kotresh Hiremath Ravishankar 
khire...@redhat.com wrote:

 Hi Dave,

 Are you hitting the below bug and so not able to sync symlinks ?
 https://bugzilla.redhat.com/show_bug.cgi?id=1105283

 Does geo-rep status say Not Started ?

 Thanks and Regards,
 Kotresh H R

 - Original Message -
 From: David Gibbons david.c.gibb...@gmail.com
 To: gluster-users Gluster-users@gluster.org
 Cc: vno...@stonefly.com
 Sent: Monday, December 8, 2014 7:03:31 PM
 Subject: Re: [Gluster-users] Geo-Replication Issue

 Apologies for sending so many messages about this! I think I may be
 running into this bug:
 https://bugzilla.redhat.com/show_bug.cgi?id=1105283

 Would someone be so kind as to let me know which symlinks are missing when
 this bug manifests, so that I can create them?

 Thank you,
 Dave


 On Sun, Dec 7, 2014 at 11:01 AM, David Gibbons  david.c.gibb...@gmail.com
  wrote:



 Ok,

 I was able to get geo-replication configured by changing
 /usr/local/libexec/glusterfs/gverify.sh to use ssh to access the local
 machine, instead of accessing bash -c directly. I then found that the hook
 script was missing for geo-replication, so I copied that over manually. I
 now have what appears to be a configured geo-rep setup:




 # gluster volume geo-replication shares gfs-a-bkp::bkpshares status




 MASTER NODE MASTER VOL MASTER BRICK SLAVE STATUS CHECKPOINT STATUS CRAWL
 STATUS


 

 gfs-a-3 shares /mnt/a-3-shares-brick-1/brick gfs-a-bkp::bkpshares Not
 Started N/A N/A

 gfs-a-3 shares /mnt/a-3-shares-brick-2/brick gfs-a-bkp::bkpshares Not
 Started N/A N/A

 gfs-a-3 shares /mnt/a-3-shares-brick-3/brick gfs-a-bkp::bkpshares Not
 Started N/A N/A

 gfs-a-3 shares /mnt/a-3-shares-brick-4/brick gfs-a-bkp::bkpshares Not
 Started N/A N/A

 gfs-a-2 shares /mnt/a-2-shares-brick-1/brick gfs-a-bkp::bkpshares Not
 Started N/A N/A

 gfs-a-2 shares /mnt/a-2-shares-brick-2/brick gfs-a-bkp::bkpshares Not
 Started N/A N/A

 gfs-a-2 shares /mnt/a-2-shares-brick-3/brick gfs-a-bkp::bkpshares Not
 Started N/A N/A

 gfs-a-2 shares /mnt/a-2-shares-brick-4/brick gfs-a-bkp::bkpshares Not
 Started N/A N/A

 gfs-a-4 shares /mnt/a-4-shares-brick-1/brick gfs-a-bkp::bkpshares Not
 Started N/A N/A

 gfs-a-4 shares /mnt/a-4-shares-brick-2/brick gfs-a-bkp::bkpshares Not
 Started N/A N/A

 gfs-a-4 shares /mnt/a-4-shares-brick-3/brick gfs-a-bkp::bkpshares Not
 Started N/A N/A

 gfs-a-4 shares /mnt/a-4-shares-brick-4/brick gfs-a-bkp::bkpshares Not
 Started N/A N/A

 gfs-a-1 shares /mnt/a-1-shares-brick-1/brick gfs-a-bkp::bkpshares Not
 Started N/A N/A

 gfs-a-1 shares /mnt/a-1-shares-brick-2/brick gfs-a-bkp::bkpshares Not
 Started N/A N/A

 gfs-a-1 shares /mnt/a-1-shares-brick-3/brick gfs-a-bkp::bkpshares Not
 Started N/A N/A

 gfs-a-1 shares /mnt/a-1-shares-brick-4/brick gfs-a-bkp::bkpshares Not
 Started N/A N/A

 So that's a step in the right direction (and I can upload a patch for
 gverify to a bugzilla). However, gverify *should* have worked with bash-c,
 and I was not able to figure out why it didn't work, other than it didn't
 seem able to find some programs. I'm thinking that maybe the PATH variable
 is wrong for Gluster, and that's why gverify didn't work out of the box.

 When I attempt to start geo-rep now, I get the following in the geo-rep
 log:


 [2014-12-07 10:52:40.893594] E
 [syncdutils(monitor):218:log_raise_exception] top: execution of gluster
 failed with ENOENT (No such file or directory)

 [2014-12-07 10:52:40.893886] I [syncdutils(monitor):192:finalize] top:
 exiting.

 Which seems to agree that maybe gluster isn't running with the same path
 variable that my console session is running with. Is this possible? I know
 I'm grasping :).

 Any nudge in the right direction would be very much appreciated!

 Cheers,
 Dave


 On Sat, Dec 6, 2014 at 10:06 AM, David Gibbons  david.c.gibb...@gmail.com
  wrote:



 Good Morning,

 I am having some trouble getting geo-replication started on a 3.5.3 volume.

 I have verified that password-less SSH is functional in both directions
 from the backup gluster server, and all nodes in the production gluster. I
 have verified that all nodes in production and backup cluster are running
 the same version of gluster, and that name resolution works in both
 directions.

 When I attempt to start geo-replication with this command:


 gluster volume geo-replication shares gfs-a-bkp::bkpshares create push-pem

 I end up with the following in the logs:


 [2014-12-06 15:02:50.284426] E
 [glusterd-geo-rep.c:1889:glusterd_verify_slave] 0-: Not a valid slave

 [2014-12-06 15:02:50.284495] E
 [glusterd-geo-rep.c:2106:glusterd_op_stage_gsync_create] 0-:
 gfs-a-bkp

Re: [Gluster-users] Geo-Replication Issue

2014-12-08 Thread David Gibbons
Apologies for sending so many messages about this! I think I may be running
into this bug:
https://bugzilla.redhat.com/show_bug.cgi?id=1105283

Would someone be so kind as to let me know which symlinks are missing when
this bug manifests, so that I can create them?

Thank you,
Dave


On Sun, Dec 7, 2014 at 11:01 AM, David Gibbons david.c.gibb...@gmail.com
wrote:

 Ok,

 I was able to get geo-replication configured by
 changing /usr/local/libexec/glusterfs/gverify.sh to use ssh to access the
 local machine, instead of accessing bash -c directly. I then found that the
 hook script was missing for geo-replication, so I copied that over
 manually. I now have what appears to be a configured geo-rep setup:

 # gluster volume geo-replication shares gfs-a-bkp::bkpshares status


 MASTER NODE MASTER VOLMASTER BRICK
   SLAVE   STATUS CHECKPOINT STATUSCRAWL
 STATUS


 

 gfs-a-3 shares
  /mnt/a-3-shares-brick-1/brickgfs-a-bkp::bkpsharesNot Started
  N/A  N/A

 gfs-a-3 shares
  /mnt/a-3-shares-brick-2/brickgfs-a-bkp::bkpsharesNot Started
  N/A  N/A

 gfs-a-3 shares
  /mnt/a-3-shares-brick-3/brickgfs-a-bkp::bkpsharesNot Started
  N/A  N/A

 gfs-a-3 shares
  /mnt/a-3-shares-brick-4/brickgfs-a-bkp::bkpsharesNot Started
  N/A  N/A

 gfs-a-2 shares
  /mnt/a-2-shares-brick-1/brickgfs-a-bkp::bkpsharesNot Started
  N/A  N/A

 gfs-a-2 shares
  /mnt/a-2-shares-brick-2/brickgfs-a-bkp::bkpsharesNot Started
  N/A  N/A

 gfs-a-2 shares
  /mnt/a-2-shares-brick-3/brickgfs-a-bkp::bkpsharesNot Started
  N/A  N/A

 gfs-a-2 shares
  /mnt/a-2-shares-brick-4/brickgfs-a-bkp::bkpsharesNot Started
  N/A  N/A

 gfs-a-4 shares
  /mnt/a-4-shares-brick-1/brickgfs-a-bkp::bkpsharesNot Started
  N/A  N/A

 gfs-a-4 shares
  /mnt/a-4-shares-brick-2/brickgfs-a-bkp::bkpsharesNot Started
  N/A  N/A

 gfs-a-4 shares
  /mnt/a-4-shares-brick-3/brickgfs-a-bkp::bkpsharesNot Started
  N/A  N/A

 gfs-a-4 shares
  /mnt/a-4-shares-brick-4/brickgfs-a-bkp::bkpsharesNot Started
  N/A  N/A

 gfs-a-1 shares
  /mnt/a-1-shares-brick-1/brickgfs-a-bkp::bkpsharesNot Started
  N/A  N/A

 gfs-a-1 shares
  /mnt/a-1-shares-brick-2/brickgfs-a-bkp::bkpsharesNot Started
  N/A  N/A

 gfs-a-1 shares
  /mnt/a-1-shares-brick-3/brickgfs-a-bkp::bkpsharesNot Started
  N/A  N/A

 gfs-a-1 shares
  /mnt/a-1-shares-brick-4/brickgfs-a-bkp::bkpsharesNot Started
  N/A  N/A


 So that's a step in the right direction (and I can upload a patch for
 gverify to a bugzilla). However, gverify *should* have worked with bash-c,
 and I was not able to figure out why it didn't work, other than it didn't
 seem able to find some programs. I'm thinking that maybe the PATH variable
 is wrong for Gluster, and that's why gverify didn't work out of the box.

 When I attempt to start geo-rep now, I get the following in the geo-rep
 log:

 [2014-12-07 10:52:40.893594] E
 [syncdutils(monitor):218:log_raise_exception] top: execution of gluster
 failed with ENOENT (No such file or directory)

 [2014-12-07 10:52:40.893886] I [syncdutils(monitor):192:finalize] top:
 exiting.


 Which seems to agree that maybe gluster isn't running with the same path
 variable that my console session is running with. Is this possible? I know
 I'm grasping :).

 Any nudge in the right direction would be very much appreciated!

 Cheers,
 Dave


 On Sat, Dec 6, 2014 at 10:06 AM, David Gibbons david.c.gibb...@gmail.com
 wrote:

 Good Morning,

 I am having some trouble getting geo-replication started on a 3.5.3
 volume.

 I have verified that password-less SSH is functional in both directions
 from the backup gluster server, and all nodes in the production gluster. I
 have verified that all nodes in production and backup cluster are running
 the same version of gluster, and that name resolution works in both
 directions.

 When I attempt to start geo-replication with this command:

 gluster volume geo-replication shares gfs-a-bkp::bkpshares create
 push-pem


 I end up with the following in the logs:

  [2014-12-06 15:02:50.284426] E
 [glusterd-geo-rep.c:1889:glusterd_verify_slave

[Gluster-users] Missing Hooks

2014-12-07 Thread David Gibbons
Hi All,

I am running into an issue where it appears that some hooks are missing
from /var/lib/glusterd/hooks

I am running version 3.5.3 and recently did an upgrade to that version from
3.4.2.

I built from source with make  make install. Is there another make target
I need to use to get the hooks to install? Do I need to run make extras or
something to get them installed?

I see them in the source folder, so I could certainly just copy them over
but I want to do this the right way if possible

Cheers,
Dave
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Missing Hooks

2014-12-07 Thread David Gibbons
Thank you, Niels.

Bugzilla ID is 1171477

Dave

On Sun, Dec 7, 2014 at 10:12 AM, Niels de Vos nde...@redhat.com wrote:

 On Sun, Dec 07, 2014 at 09:55:11AM -0500, David Gibbons wrote:
  Hi All,
 
  I am running into an issue where it appears that some hooks are missing
  from /var/lib/glusterd/hooks
 
  I am running version 3.5.3 and recently did an upgrade to that version
 from
  3.4.2.
 
  I built from source with make  make install. Is there another make
 target
  I need to use to get the hooks to install? Do I need to run make extras
 or
  something to get them installed?
 
  I see them in the source folder, so I could certainly just copy them
 over
  but I want to do this the right way if possible

 These are copied over by the .spec that is used to generate the RPMs.
 It looks as if the hook scripts are not installed by 'make install'. If
 you can file a bug for this, we won't forget about it and can send a
 fix.

 Thanks,
 Niels

___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Geo-Replication Issue

2014-12-07 Thread David Gibbons
Ok,

I was able to get geo-replication configured by
changing /usr/local/libexec/glusterfs/gverify.sh to use ssh to access the
local machine, instead of accessing bash -c directly. I then found that the
hook script was missing for geo-replication, so I copied that over
manually. I now have what appears to be a configured geo-rep setup:

 # gluster volume geo-replication shares gfs-a-bkp::bkpshares status


 MASTER NODE MASTER VOLMASTER BRICK
   SLAVE   STATUS CHECKPOINT STATUSCRAWL
 STATUS


 

 gfs-a-3 shares
  /mnt/a-3-shares-brick-1/brickgfs-a-bkp::bkpsharesNot Started
  N/A  N/A

 gfs-a-3 shares
  /mnt/a-3-shares-brick-2/brickgfs-a-bkp::bkpsharesNot Started
  N/A  N/A

 gfs-a-3 shares
  /mnt/a-3-shares-brick-3/brickgfs-a-bkp::bkpsharesNot Started
  N/A  N/A

 gfs-a-3 shares
  /mnt/a-3-shares-brick-4/brickgfs-a-bkp::bkpsharesNot Started
  N/A  N/A

 gfs-a-2 shares
  /mnt/a-2-shares-brick-1/brickgfs-a-bkp::bkpsharesNot Started
  N/A  N/A

 gfs-a-2 shares
  /mnt/a-2-shares-brick-2/brickgfs-a-bkp::bkpsharesNot Started
  N/A  N/A

 gfs-a-2 shares
  /mnt/a-2-shares-brick-3/brickgfs-a-bkp::bkpsharesNot Started
  N/A  N/A

 gfs-a-2 shares
  /mnt/a-2-shares-brick-4/brickgfs-a-bkp::bkpsharesNot Started
  N/A  N/A

 gfs-a-4 shares
  /mnt/a-4-shares-brick-1/brickgfs-a-bkp::bkpsharesNot Started
  N/A  N/A

 gfs-a-4 shares
  /mnt/a-4-shares-brick-2/brickgfs-a-bkp::bkpsharesNot Started
  N/A  N/A

 gfs-a-4 shares
  /mnt/a-4-shares-brick-3/brickgfs-a-bkp::bkpsharesNot Started
  N/A  N/A

 gfs-a-4 shares
  /mnt/a-4-shares-brick-4/brickgfs-a-bkp::bkpsharesNot Started
  N/A  N/A

 gfs-a-1 shares
  /mnt/a-1-shares-brick-1/brickgfs-a-bkp::bkpsharesNot Started
  N/A  N/A

 gfs-a-1 shares
  /mnt/a-1-shares-brick-2/brickgfs-a-bkp::bkpsharesNot Started
  N/A  N/A

 gfs-a-1 shares
  /mnt/a-1-shares-brick-3/brickgfs-a-bkp::bkpsharesNot Started
  N/A  N/A

 gfs-a-1 shares
  /mnt/a-1-shares-brick-4/brickgfs-a-bkp::bkpsharesNot Started
  N/A  N/A


So that's a step in the right direction (and I can upload a patch for
gverify to a bugzilla). However, gverify *should* have worked with bash-c,
and I was not able to figure out why it didn't work, other than it didn't
seem able to find some programs. I'm thinking that maybe the PATH variable
is wrong for Gluster, and that's why gverify didn't work out of the box.

When I attempt to start geo-rep now, I get the following in the geo-rep log:

 [2014-12-07 10:52:40.893594] E
 [syncdutils(monitor):218:log_raise_exception] top: execution of gluster
 failed with ENOENT (No such file or directory)

[2014-12-07 10:52:40.893886] I [syncdutils(monitor):192:finalize] top:
 exiting.


Which seems to agree that maybe gluster isn't running with the same path
variable that my console session is running with. Is this possible? I know
I'm grasping :).

Any nudge in the right direction would be very much appreciated!

Cheers,
Dave


On Sat, Dec 6, 2014 at 10:06 AM, David Gibbons david.c.gibb...@gmail.com
wrote:

 Good Morning,

 I am having some trouble getting geo-replication started on a 3.5.3 volume.

 I have verified that password-less SSH is functional in both directions
 from the backup gluster server, and all nodes in the production gluster. I
 have verified that all nodes in production and backup cluster are running
 the same version of gluster, and that name resolution works in both
 directions.

 When I attempt to start geo-replication with this command:

 gluster volume geo-replication shares gfs-a-bkp::bkpshares create push-pem


 I end up with the following in the logs:

  [2014-12-06 15:02:50.284426] E
 [glusterd-geo-rep.c:1889:glusterd_verify_slave] 0-: Not a valid slave

 [2014-12-06 15:02:50.284495] E
 [glusterd-geo-rep.c:2106:glusterd_op_stage_gsync_create] 0-:
 gfs-a-bkp::bkpshares is not a valid slave volume. Error: Unable to fetch
 master volume details. Please check the master cluster and master volume.

 [2014-12-06 15:02:50.284509] E [glusterd-syncop.c:912:gd_stage_op_phase]
 0-management: Staging of operation 'Volume Geo

[Gluster-users] Geo-Replication Issue

2014-12-06 Thread David Gibbons
Good Morning,

I am having some trouble getting geo-replication started on a 3.5.3 volume.

I have verified that password-less SSH is functional in both directions
from the backup gluster server, and all nodes in the production gluster. I
have verified that all nodes in production and backup cluster are running
the same version of gluster, and that name resolution works in both
directions.

When I attempt to start geo-replication with this command:

 gluster volume geo-replication shares gfs-a-bkp::bkpshares create push-pem


I end up with the following in the logs:

  [2014-12-06 15:02:50.284426] E
 [glusterd-geo-rep.c:1889:glusterd_verify_slave] 0-: Not a valid slave

[2014-12-06 15:02:50.284495] E
 [glusterd-geo-rep.c:2106:glusterd_op_stage_gsync_create] 0-:
 gfs-a-bkp::bkpshares is not a valid slave volume. Error: Unable to fetch
 master volume details. Please check the master cluster and master volume.

[2014-12-06 15:02:50.284509] E [glusterd-syncop.c:912:gd_stage_op_phase]
 0-management: Staging of operation 'Volume Geo-replication Create' failed
 on localhost : Unable to fetch master volume details. Please check the
 master cluster and master volume.


Would someone be so kind as to point me in the right direction?

Cheers,
Dave
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Upgraded from 3.4.1 to 3.5.2, quota no longer working

2014-12-02 Thread David Gibbons
Thank you for the assistance.

Yesterday we started to have bricks on one server randomly crash. When the
one server crashed, it would lock up the bricks on its replica as well. I
ended up upgrading to 3.5.3, and noticed in the process that the libgfrpc
and libgfxdr libraries were out of date on the server that was having
crashed bricks. Upgrading to 3.5.3 and replacing the old versions of the
libraries on the cranky server seems to have made everything happy again.

Thanks again!
Dave

On Tue, Dec 2, 2014 at 2:28 AM, Krutika Dhananjay kdhan...@redhat.com
wrote:

 Hi,

 Are you sure the post-upgrade script ran to completion?
 Here is one way to confirm whether that is the case: check if the quota
 configured directories have an xattr called
 trusted.glusterfs.quota.limit-set set on them in the respective bricks.

 For example, here's what mine looks like:

 [root@haddock 1]# pwd
 /brick/1
 [root@haddock 1]# getfattr -d -m . -e hex 1
 # file: 1

 security.selinux=0x73797374656d5f753a6f626a6563745f723a64656661756c745f743a733000
 trusted.gfid=0x57d0a561ca574d1cb0428f38d1c06e85
 trusted.glusterfs.dht=0x00017fff

 trusted.glusterfs.quota.----0001.contri=0x0a00
 trusted.glusterfs.quota.dirty=0x3000
 trusted.glusterfs.quota.limit-set=0x0640
 trusted.glusterfs.quota.size=0x0a00


 where /brick/1 is the brick directory and under it 1 is the name of one
 of the quota-configured directories.

 I believe your quota configurations are backed up at
 /var/tmp/glusterfs/quota-config-backup/vol_volname which you can use to
 get the quota-configured directory names.
 As for operating version, I think it is sufficient for it to be at 3 for
 the 3.5.x quota to work.

 -Krutika

 --

 *From: *David Gibbons david.c.gibb...@gmail.com
 *To: *Krutika Dhananjay kdhan...@redhat.com
 *Cc: *gluster-users gluster-users@gluster.org
 *Sent: *Monday, December 1, 2014 6:35:55 PM
 *Subject: *Re: [Gluster-users] Upgraded from 3.4.1 to 3.5.2, quota no
 longer working


 Certainly, thank you for your response:

 Quotad is running on all nodes:

 [root@gfs-a-1 ~]# ps aux | grep quotad

 root  3004  0.0  0.4 241368 68552 ?Ssl  Nov30   0:07
 /usr/local/sbin/glusterfs -s localhost --volfile-id gluster/quotad -p
 /var/lib/glusterd/quotad/run/quotad.pid -l
 /usr/local/var/log/glusterfs/quotad.log -S
 /var/run/9d02605105ef0e74d913a4671c1143a1.socket --xlator-option
 *replicate*.data-self-heal=off --xlator-option
 *replicate*.metadata-self-heal=off --xlator-option
 *replicate*.entry-self-heal=off


 And the relevant output from gluster volume status shares per your request:

 [root@gfs-a-1 ~]# gluster volume status shares | grep Quota

 Quota Daemon on localhost   N/A Y
 3004

 Quota Daemon on gfs-a-3 N/A Y
 32307

 Quota Daemon on gfs-a-4 N/A Y
 10818

 Quota Daemon on gfs-a-2 N/A Y
 12292


  No log entries are created in /var/log/glusterfs/quotad.log when I run a
 quota list; all of the log entries are from yesterday. They do indicate a
 version mis-match, although I can't seem to locate where that version is
 specified:

 [2014-11-30 13:21:55.173081] I
 [client-handshake.c:1474:client_setvolume_cbk] 0-shares-client-14: Server
 and Client lk-version numbers are not same, reopening the fds

 [2014-11-30 13:21:55.173170] I
 [client-handshake.c:450:client_set_lk_version_cbk] 0-shares-client-14:
 Server lk version = 1

 [2014-11-30 13:21:55.178739] I [rpc-clnt.c:1729:rpc_clnt_reconfig]
 0-shares-client-9: changing port to 49154 (from 0)

 [2014-11-30 13:21:55.181170] I
 [client-handshake.c:1677:select_server_supported_programs]
 0-shares-client-9: Using Program GlusterFS 3.3, Num (1298437), Version (330)

 [2014-11-30 13:21:55.181386] I
 [client-handshake.c:1462:client_setvolume_cbk] 0-shares-client-9: Connected
 to 172.16.10.13:49154, attached to remote volume
 '/mnt/a-3-shares-brick-3/brick'.

 [2014-11-30 13:21:55.181401] I
 [client-handshake.c:1474:client_setvolume_cbk] 0-shares-client-9: Server
 and Client lk-version numbers are not same, reopening the fds

 [2014-11-30 13:21:55.181535] I
 [client-handshake.c:450:client_set_lk_version_cbk] 0-shares-client-9:
 Server lk version = 1


 I see the operational mode for the volume as 3. I saw a non-related
 thread that indicated this number should be more digits on a cluster
 running 3.5.2. The other thread also indicated that quota may not work if
 the volume version number was not compatible with the quota version running
 on the cluster. I can't seem to find the link right now.

 It's almost as if the volume version did not get upgraded when the server
 version was upgraded. Is that possible?

 Cheers,
 Dave


 On Sun, Nov 30, 2014 at 11:46 PM, Krutika Dhananjay kdhan...@redhat.com
 wrote:

 Hi,

 Could you confirm

[Gluster-users] Upgraded from 3.4.1 to 3.5.2, quota no longer working

2014-11-30 Thread David Gibbons
Hi All,

I performed a long-awaited upgrade from 3.4.1 to 3.5.2 today following the
instructions for an offline upgrade outlined here:
http://www.gluster.org/community/documentation/index.php/Upgrade_to_3.5

I ran the pre- and post- upgrade scripts as instructed, intending to move
the quotas over to the new version. The upgrade seemed to go well, the
volume is online and it appears to be functioning properly.

When I attempt to check quotas, the list is empty:

 [root@gfs-a-1 glusterfs]# gluster volume quota shares list

   Path   Hard-limit Soft-limit   Used
  Available


 

 [root@gfs-a-1 glusterfs]#



And upon execution of that command, the cli.log file fills up with entries
like this. I am assuming it's one cli log entry per quota entry:

 [2014-11-30 14:00:02.154143] W
 [cli-rpc-ops.c:2469:print_quota_list_from_quotad] 0-cli: path key is not
 present in dict

 [2014-11-30 14:00:02.160507] W
 [cli-rpc-ops.c:2469:print_quota_list_from_quotad] 0-cli: path key is not
 present in dict

 [2014-11-30 14:00:02.167947] W
 [cli-rpc-ops.c:2469:print_quota_list_from_quotad] 0-cli: path key is not
 present in dic


So, it appears that somehow the quota database has become offline or
corrupt. Any thoughts on what I can do to resolve this?

I have checked all of the binaries on all 4 machines in the cluster, and
they all appear to be running the correct version:

 [root@gfs-a-1 glusterfs]# glusterfsd --version

 glusterfs 3.5.2 built on Nov 30 2014 08:16:37


Cheers,
Dave
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Re-Sync Geo Replication

2014-10-07 Thread David Gibbons
I am interested in hearing this answer too.

James, if you have a consistency check script, maybe just changing the
timestamp or some other attribute on the file (perhaps something as simple
as chmod to something and then back) would trigger the integrated rsync.

Dave

On Sat, Oct 4, 2014 at 5:33 PM, James Payne jimqwer...@hotmail.com wrote:

 Hi,



 Just wondering if there was a method to manually force a re-sync of a geo
 replication slave so it is an identical mirror of the master?



 History of this request is that I have a test setup and the Gluster
 Geo-Replication seems to have missed 7 files out completely (not sure if
 this was a bug or an issue with my setup specifically as this is a test
 setup it has been setup and torn down a few times). Now though the Geo
 replica will not converge to be the same, ie. It’s stable, new files add
 fine and files will delete, but the missing files just don’t seem to be
 interested in synchronising! I’m guessing that as the rsync is triggered by
 the change log and as these files aren’t changing it won’t ever notice them
 again? I can manually copy the files (there are only 7 after all…) but I
 have only found them through a consistency checking script I wrote. I can
 run this through a cron to pick up any missing files, however I wondered if
 Gluster had something built in which did a check and sync? Also, If I did
 manually copy these files across how would that affect the consistency of
 the geo replica session?



 Running: GlusterFS 3.4.5 on CentOS 6.5



 Regards

 James



 ___
 Gluster-users mailing list
 Gluster-users@gluster.org
 http://supercolony.gluster.org/mailman/listinfo/gluster-users

___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] Fwd: Samba vfs_glusterfs no such file or directory

2014-06-27 Thread David Gibbons
Boy, it's not a good day for my list etiquette. Apologies, folks.

-- Forwarded message --
From: David Gibbons david.c.gibb...@gmail.com
Date: Fri, Jun 27, 2014 at 3:10 PM
Subject: Re: [Gluster-users] Samba vfs_glusterfs no such file or directory
To: Niels de Vos nde...@redhat.com


Samba with vfs_glusterfs has a limit of approx. 93 groups. If 'id $USER'
  returns more than 93 groups, those users can run into various issues.
 'Access is denied' is one of the most common errors they'll see.

 The upcoming 3.5.1 release has a 'server.manage-gids' volume option.
 With this option enabled, the number of groups will be limited to 65535.


Ahh, great. I am very glad, at least, that this is a known issue and that
it's being addressed.



  What am I missing here?
 Very little, I would also suspect that the number of groups that those
 problematic users belong to is too big.


Well that's a first :).

I will test this against the 3.5.1 release when that is ready. Is the 3.5.1
version of vfs_glusterfs backwards compatible with glusterfs 3.4, or do I
need to upgrade the whole cluster to leverage the new vfs_glusterfs?

Thanks so much,
Dave
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] Samba vfs_glusterfs no such file or directory

2014-06-17 Thread David Gibbons
Hi All,

I am running into a strange error with samba and vfs_glusterfs.

Here is some version information:
[root@gfs-a-3 samba]# smbd -V
Version 3.6.20

[root@gfs-a-3 tmp]# glusterfsd --version
glusterfs 3.4.1 built on Oct 21 2013 09:23:23

Samba is configured in an AD environment, using winbind. Group resolution,
user resolution, and cross--mapping of SIDs to IDs to usernames all works
as expected. The vfs_glusterfs module is working perfectly for the vast
majority of the users I have configured. A small percentage of the users,
though, get an access is denied error when they attempt to access the
share. They are all configured in the same way as the users that are
working.

We initially thought that perhaps the number of groups the user was a
member of was causing the issue. This still might be the case but we're not
sure how to verify that guess.

When we connect with a working user, with glusterfs:loglevel = 10, here is
are the last bits of log file. I'm not really sure where the interesting
lines are, any guidance would be much appreciated:

[2014-06-17 12:11:53.753289] D
 [client-handshake.c:1430:client_setvolume_cbk] 0-shares-client-5:
 clnt-lk-version = 1, server-lk-version = 0
 [2014-06-17 12:11:53.753296] I
 [client-handshake.c:1456:client_setvolume_cbk] 0-shares-client-5: Connected
 to 172.16.10.13:49153, attached to remote volume
 '/mnt/a-3-shares-brick-2/brick'.
 [2014-06-17 12:11:53.753301] I
 [client-handshake.c:1468:client_setvolume_cbk] 0-shares-client-5: Server
 and Client lk-version numbers are not same, reopening the fds
 [2014-06-17 12:11:53.753306] D
 [client-handshake.c:1318:client_post_handshake] 0-shares-client-5: No fds
 to open - notifying all parents child up
 [2014-06-17 12:11:53.753313] D
 [client-handshake.c:486:client_set_lk_version] 0-shares-client-5: Sending
 SET_LK_VERSION
 [2014-06-17 12:11:53.753320] T [rpc-clnt.c:1302:rpc_clnt_record]
 0-shares-client-5: Auth Info: pid: 0, uid: 0, gid: 0, owner:
 [2014-06-17 12:11:53.753327] T
 [rpc-clnt.c:1182:rpc_clnt_record_build_header] 0-rpc-clnt: Request fraglen
 132, payload: 68, rpc hdr: 64
 [2014-06-17 12:11:53.753344] T [rpc-clnt.c:1499:rpc_clnt_submit]
 0-rpc-clnt: submitted request (XID: 0x32x Program: GlusterFS Handshake,
 ProgVers: 2, Proc: 4) to rpc-transport (shares-client-5)
 [2014-06-17 12:11:53.753353] T [rpc-clnt.c:1302:rpc_clnt_record]
 0-shares-client-5: Auth Info: pid: 0, uid: 0, gid: 0, owner:
 [2014-06-17 12:11:53.753360] T
 [rpc-clnt.c:1182:rpc_clnt_record_build_header] 0-rpc-clnt: Request fraglen
 64, payload: 0, rpc hdr: 64
 [2014-06-17 12:11:53.753373] T [rpc-clnt.c:1499:rpc_clnt_submit]
 0-rpc-clnt: submitted request (XID: 0x33x Program: GlusterFS Handshake,
 ProgVers: 2, Proc: 3) to rpc-transport (shares-client-5)
 [2014-06-17 12:11:53.753381] I [afr-common.c:3698:afr_notify]
 0-shares-replicate-2: Subvolume 'shares-client-5' came back up; going
 online.
 [2014-06-17 12:11:53.753393] T [rpc-clnt.c:1302:rpc_clnt_record]
 0-shares-client-5: Auth Info: pid: 0, uid: 0, gid: 0, owner:
 [2014-06-17 12:11:53.753399] T
 [rpc-clnt.c:1182:rpc_clnt_record_build_header] 0-rpc-clnt: Request fraglen
 84, payload: 20, rpc hdr: 64
 [2014-06-17 12:11:53.753413] T [rpc-clnt.c:1499:rpc_clnt_submit]
 0-rpc-clnt: submitted request (XID: 0x34x Program: GlusterFS 3.3, ProgVers:
 330, Proc: 14) to rpc-transport (shares-client-5)
 [2014-06-17 12:11:53.753430] T [rpc-clnt.c:669:rpc_clnt_reply_init]
 0-shares-client-5: received rpc message (RPC XID: 0x32x Program: GlusterFS
 Handshake, ProgVers: 2, Proc: 4) from rpc-transport (shares-client-5)
 [2014-06-17 12:11:53.753441] I
 [client-handshake.c:450:client_set_lk_version_cbk] 0-shares-client-5:
 Server lk version = 1
 [2014-06-17 12:11:53.753451] T [rpc-clnt.c:669:rpc_clnt_reply_init]
 0-shares-client-5: received rpc message (RPC XID: 0x33x Program: GlusterFS
 Handshake, ProgVers: 2, Proc: 3) from rpc-transport (shares-client-5)
 [2014-06-17 12:11:53.753474] T [rpc-clnt.c:669:rpc_clnt_reply_init]
 0-shares-client-5: received rpc message (RPC XID: 0x34x Program: GlusterFS
 3.3, ProgVers: 330, Proc: 14) from rpc-transport (shares-client-5)
 [2014-06-17 12:11:53.753483] D [dht-diskusage.c:80:dht_du_info_cbk]
 0-shares-dht: on subvolume 'shares-replicate-2': avail_percent is: 95.00
 and avail_space is: 1050826719232 and avail_inodes is: 99.00


And here is a log snip from the non-working user:

[2014-06-17 12:07:17.866693] W [socket.c:514:__socket_rwv]
 0-shares-client-13: readv failed (No data available)
 [2014-06-17 12:07:17.866699] D
 [socket.c:1962:__socket_proto_state_machine] 0-shares-client-13: reading
 from socket failed. Error (No data available), peer (172.16.10.13:49155)
 [2014-06-17 12:07:17.866707] D [socket.c:2236:socket_event_handler]
 0-transport: disconnecting now
 [2014-06-17 12:07:17.866716] T
 [rpc-clnt.c:519:rpc_clnt_connection_cleanup] 0-shares-client-13: cleaning
 up state in transport object 0x7f22300aaa60
 [2014-06-17 12:07:17.866722] I 

Re: [Gluster-users] Gluster quota issue

2014-04-07 Thread David Gibbons
Are these sparse files? Check to see what the file allocation is vs what
the actual size is:

~# ls -lsah

-Dave


On Mon, Apr 7, 2014 at 8:53 AM, Barry Stetler ba...@hivelocity.net wrote:

  I am having an issue with Gluster quotas. User is set to to 200GB and he
 is using about 57 GB on mounted file system, Gluster says he is using 180
 GB,

 gluster volume quota home list /user

 shows he is using 180 GB

 Is this a bug is this looking at the replicas?

 Here is the volume info

 Volume Name: home
 Type: Distributed-Replicate
 Volume ID: 9e0ffc91-9d46-477a-b8eb-dfd3b7d65765
 Status: Started
 Number of Bricks: 2 x 2 = 4
 Transport-type: tcp
 Bricks:
 Brick1: gluster1:/export/cluster1
 Brick2: gluster2:/export/cluster1
 Brick3: gluster3:/export/cluster1
 Brick4: gluster4:/export/cluster1
 Options Reconfigured:


 --
Barry Stetler  HIVELOCITY | Devops and Operations Leader  888-869-4678
 ext. 224 | Hivelocity.net http://hivelocity.net




 ___
 Gluster-users mailing list
 Gluster-users@gluster.org
 http://supercolony.gluster.org/mailman/listinfo/gluster-users

inline: sig.png___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Gluster, Samba, and VFS

2014-02-11 Thread David Gibbons
We use samba VFS quite heavily in our infrastructure. Integrated with AD
via winbind, load balanced SMB front-ends based on HA with auto failover.
This system has been in production for about 4 months. So far it's worked
very well.

Dave


On Tue, Feb 11, 2014 at 11:35 AM, Matt Miller m...@mattandtiff.net wrote:

 Yesterday was my first day on the list, so I had not yet seen that thread.
  Appears to be working though.  Will have to setup some load tests.


 On Tue, Feb 11, 2014 at 12:42 AM, Daniel Müller 
 muel...@tropenklinik.dewrote:

 No, not really:
 Look at my thread: samba vfs objects glusterfs is it now working?
 I am just waiting for an answer to fix this.
 The only way I succeeded to make it work is how you descriped (exporting
 fuse mount thru samba)



 EDV Daniel Müller

 Leitung EDV
 Tropenklinik Paul-Lechler-Krankenhaus
 Paul-Lechler-Str. 24
 72076 Tübingen
 Tel.: 07071/206-463, Fax: 07071/206-499
 eMail: muel...@tropenklinik.de
 Internet: www.tropenklinik.de
 Der Mensch ist die Medizin des Menschen




 Von: gluster-users-boun...@gluster.org
 [mailto:gluster-users-boun...@gluster.org] Im Auftrag von Matt Miller
 Gesendet: Montag, 10. Februar 2014 16:43
 An: gluster-users@gluster.org
 Betreff: [Gluster-users] Gluster, Samba, and VFS

 Stumbled upon

 https://forge.gluster.org/samba-glusterfs/samba-glusterfs-vfs/commits/master
 when trying to find info on how to make Gluster and Samba play nice as a
 general purpose file server.  I have had severe performance problems in
 the
 past with mounting the Gluster volume as a Fuse mount, then exporting the
 Fuse mount thru Samba.  As I found out after setting up the cluster this
 is
 somewhat expected when serving out lots of small files.  Was hoping VFS
 would provide better performance when serving out lots and lots of small
 files.
 Is anyone using VFS extensions in production?  Is it ready for prime
 time?
 I could not find a single reference to it on Gluster's main website
 (maybe I
 am looking in the wrong place), so not sure of the stability or
 supported-ness of this.



 ___
 Gluster-users mailing list
 Gluster-users@gluster.org
 http://supercolony.gluster.org/mailman/listinfo/gluster-users

___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] Multiple Volumes (bricks), One Disk

2013-11-12 Thread David Gibbons
Hi All,

I am interested in some feedback on putting multiple bricks on one physical
disk. Each brick being assigned to a different volume. Here is the scenario:

4 disks per server, 4 servers, 2x2 distribute/replicate

I would prefer to have just one volume but need to do geo-replication on
some of the data (but not all of it). My thought was to use two volumes,
which would allow me to selectively geo-replicate just the data that I need
to, by replicating only one volume.

A couple of questions come to mind:
1) Any implications of doing two bricks for different volumes on one
physical disk?
2) Will the free space across each volume still calculate correctly? IE,
if one volume takes up 2/3 of the total physical disk space, will the
second volume still reflect the correct amount of used space?
3) Am I being stupid/missing something obvious?

Cheers,
Dave
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Samba vfs_glusterfs Quota Support?

2013-11-12 Thread David Gibbons
Ira,

Thank you for the response. I suspect that your patch will resolve this
issue as well -- however, an upgrade to SMB 3.6.20 continues to display the
total volume size behavior, instead of the glusterFS folder-quota behavior
as expected. I note that your patch was accepted into 3.6.next but don't
see whether or not it actually made it into the 3.6.20 release. I'm
probably looking in the wrong place. Any pointers?

Cheers,
Dave


On Wed, Oct 30, 2013 at 11:53 AM, Ira Cooper i...@redhat.com wrote:

 I suspect you are missing the patch needed to make this work.


 http://git.samba.org/?p=samba.git;a=commit;h=872a7d61ca769c47890244a1005c1bd445a3bab6;
  It was put in, in the 3.6.13 timeframe if I'm reading the git history
 correctly.

 The bug manifests when the base of the share has a different amount of
 Quota Allowance than elsewhere in the tree.

 \\foo\ - 5GB quota
 \\foo\bar - 2.5GB quota

 When you run dir in \\foo you get the results from the 5GB quota, and
 the same in \\foo\bar, which is incorrect and highly confusing to users.

 https://bugzilla.samba.org/show_bug.cgi?id=9646

 Despite my discussion of multi-volume it should be the same bug.

 Thanks,

 -Ira / i...@samba.org

 - Original Message -
 From: David Gibbons david.c.gibb...@gmail.com
 To: gluster-users@gluster.org
 Sent: Wednesday, October 30, 2013 11:04:49 AM
 Subject: Re: [Gluster-users] Samba vfs_glusterfs Quota Support?

 Thanks all for the pointers.



 What version of Samba are you running?

 Samba version is 3.6.9:
 [root@gfs-a-1 /]# smbd -V
 Version 3.6.9

 Gluster version is 3.4.1 git:
 [root@gfs-a-1 /]# glusterfs --version
 glusterfs 3.4.1 built on Oct 21 2013 09:22:36


 It should be
 # gluster volume set gfsv0 features.quota-deem-statfs on
 [root@gfs-a-1 /]# gluster volume set gfsv0 features.quota-deem-statfs on
 volume set: failed: option : features.quota-deem-statfs does not exist
 Did you mean features.quota-timeout?

 I wonder if the quota-deem-statfs is part a more recent version?

 Cheers,
 Dave


 ___
 Gluster-users mailing list
 Gluster-users@gluster.org
 http://supercolony.gluster.org/mailman/listinfo/gluster-users

___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Samba vfs_glusterfs Quota Support?

2013-10-30 Thread David Gibbons
Hi Lala,

Thank you. I should have been more clear and you are correct, I can't write
data above the quota. I was referring only to the listing of disk size in
windows/samba land.

Thanks for the tip in quota-deem-statfs. Here are my results with that
command:
# gluster volume set gfsv0 quota-deem-statfs on
volume set: failed: option : quota-deem-statfs does not exist
Did you mean dump-fd-stats or quota-timeout?

Which Gluster version does that feature setting apply to?

Cheers,
Dave


On Wed, Oct 30, 2013 at 3:09 AM, Lalatendu Mohanty lmoha...@redhat.comwrote:

  On 10/23/2013 05:26 PM, David Gibbons wrote:

 Hi All,

  I'm setting up a gluster cluster that will be accessed via smb. I was
 hoping that the quotas. I've configured a quota on the path itself:

  # gluster volume quota gfsv0 list
 path  limit_set  size

 --
 /shares/testsharedave   10GB8.0KB

  And I've configured the share in samba (and can access it fine):
 # cat /etc/samba/smb.conf
  [testsharedave]
 vfs objects = glusterfs
 glusterfs:volfile_server = localhost
 glusterfs:volume = gfsv0
 path = /shares/testsharedave
 valid users = dave
 guest ok = no
 writeable = yes

  But windows does not reflect the quota and instead shows the full size
 of the gluster volume.

  I've reviewed the code in
 https://forge.gluster.org/samba-glusterfs/samba-glusterfs-vfs/blobs/master/src/vfs_glusterfs.c
  --
 which does not appear to support passing gluster quotas to samba. So I
 don't think my installation is broken, it seems like maybe this just isn't
 supported.

  Can anyone speak to whether or not quotas are going to be implemented in
 vfs_glusterfs for samba? Or if I'm just crazy and doing this wrong ;)? I'm
 definitely willing to help with the code but don't have much experience
 with either samba modules or the gluster API.

   Hi David,
 Quotas are supported by vfs_glusterfs for samba. I have also set quota on
 the volume correctly. If you try to write more data then the quota on the
 directory(/shares/testsharedave ), it will not allow.

 But for the clients (i.e. Windows/smb, nfs, fuse) to reflect in the meta
 data  information (i.e. properties in Windows) , you have run below volume
 set  command on respective volume.

 gluster volume set VOLUME NAME quota-deem-statfs on

 -Lala

  Cheers,
 Dave



 ___
 Gluster-users mailing 
 listGluster-users@gluster.orghttp://supercolony.gluster.org/mailman/listinfo/gluster-users



 ___
 Gluster-users mailing list
 Gluster-users@gluster.org
 http://supercolony.gluster.org/mailman/listinfo/gluster-users

___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Samba vfs_glusterfs Quota Support?

2013-10-30 Thread David Gibbons
Thanks all for the pointers.

What version of Samba are you running?


Samba version is 3.6.9:
[root@gfs-a-1 /]# smbd -V
Version 3.6.9

Gluster version is 3.4.1 git:
[root@gfs-a-1 /]# glusterfs --version
glusterfs 3.4.1 built on Oct 21 2013 09:22:36


 It should be
 # gluster volume set gfsv0 features.quota-deem-statfs on

[root@gfs-a-1 /]# gluster volume set gfsv0 features.quota-deem-statfs on
volume set: failed: option : features.quota-deem-statfs does not exist
Did you mean features.quota-timeout?

I wonder if the quota-deem-statfs is part a more recent version?

Cheers,
Dave
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Geo-replication: queue delete commands and process after a specified time

2013-10-24 Thread David Gibbons
Steve,

I think the best bet would be geo replication with lvm snaps:
1) Geo replicate to another gluster install on separate hardware
2) Snap the volume using lvm that you have your gluster bricks on

If you snap once a day and retain for 7 days, that should achieve your
backup need.

Cheers,
Dave


On Thu, Oct 24, 2013 at 11:40 AM, Steve Dainard sdain...@miovision.comwrote:

 Hello list,

 I'm toying with the idea of using Gluster as a user facing network share,
 and geo-replicating the data for backup purposes.

 At a bare minimum I'd like geo-replicate to not sync file deletions
 immediately to the slave, but instead queue those deletions for a
 configurable period of time (say 7 days).

 As an added bonus, moving a file would actually leave a copy behind with a
 data stamp suffix on the slave. I could then have a cron job cleanup old
 file copies. Lastly I would then expose the geo-replicated volume to users
 as read-only so they could retrieve old files if necessary, perhaps in a
 web-ui. At the end of the day I suppose I'm looking for a VSS style
 solution.

 From some research it doesn't look like either of these solutions exist in
 Gluster right now, are there any plans for this type of use-case? Obviously
 this would cause some serious havoc if the volume was used as a VM store so
 it would need to be properly cautioned.

 Otherwise, anyone know of an opensource solution that could do this?


 *Steve Dainard *
 IT Infrastructure Manager
 Miovision http://miovision.com/ | *Rethink Traffic*
 519-513-2407 ex.250
 877-646-8476 (toll-free)

 *Blog http://miovision.com/blog  |  
 **LinkedInhttps://www.linkedin.com/company/miovision-technologies  |
  Twitter https://twitter.com/miovision  |  
 Facebookhttps://www.facebook.com/miovision
 *
 --
  Miovision Technologies Inc. | 148 Manitou Drive, Suite 101, Kitchener,
 ON, Canada | N2C 1L3
 This e-mail may contain information that is privileged or confidential. If
 you are not the intended recipient, please delete the e-mail and any
 attachments and notify us immediately.

 ___
 Gluster-users mailing list
 Gluster-users@gluster.org
 http://supercolony.gluster.org/mailman/listinfo/gluster-users

___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] Samba vfs_glusterfs Quota Support?

2013-10-23 Thread David Gibbons
Hi All,

I'm setting up a gluster cluster that will be accessed via smb. I was
hoping that the quotas. I've configured a quota on the path itself:

# gluster volume quota gfsv0 list
path  limit_set  size
--
/shares/testsharedave   10GB8.0KB

And I've configured the share in samba (and can access it fine):
# cat /etc/samba/smb.conf
[testsharedave]
vfs objects = glusterfs
glusterfs:volfile_server = localhost
glusterfs:volume = gfsv0
path = /shares/testsharedave
valid users = dave
guest ok = no
writeable = yes

But windows does not reflect the quota and instead shows the full size of
the gluster volume.

I've reviewed the code in
https://forge.gluster.org/samba-glusterfs/samba-glusterfs-vfs/blobs/master/src/vfs_glusterfs.c
--
which does not appear to support passing gluster quotas to samba. So I
don't think my installation is broken, it seems like maybe this just isn't
supported.

Can anyone speak to whether or not quotas are going to be implemented in
vfs_glusterfs for samba? Or if I'm just crazy and doing this wrong ;)? I'm
definitely willing to help with the code but don't have much experience
with either samba modules or the gluster API.

Cheers,
Dave
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Cluster Slowness with Glustrefs

2013-10-07 Thread David Gibbons
Hi There,

How many files are in each directory? We've seen hangs up to a few seconds
when doing a ls against a folder with 2k files in it. Lowering the number
of files per folder resolved the issue for us. From what I understand, this
is a known behavior with Gluster.

Dave


On Mon, Oct 7, 2013 at 3:23 AM, pramod@wipro.com wrote:

  Hi Team,

 ** **

 We have recently implemented glusterfs with 225 T.B of usable space.

 Glusterfs is not stable, we are having lot of issues.

 ** **

 Once you traverse deeper into directory’s  the gluster partition hangs.***
 *

 ** **

 Kindly Help.

 ** **

 -

 Regards,

 PRamod

 The information contained in this electronic message and any attachments
 to this message are intended for the exclusive use of the addressee(s) and
 may contain proprietary, confidential or privileged information. If you are
 not the intended recipient, you should not disseminate, distribute or copy
 this e-mail. Please notify the sender immediately and destroy all copies of
 this message and any attachments.

 WARNING: Computer viruses can be transmitted via email. The recipient
 should check this email and any attachments for the presence of viruses.
 The company accepts no liability for any damage caused by any virus
 transmitted by this email.

 www.wipro.com

 ___
 Gluster-users mailing list
 Gluster-users@gluster.org
 http://supercolony.gluster.org/mailman/listinfo/gluster-users

___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Advice for building samba-glusterfs-vfs

2013-10-02 Thread David Gibbons
Dan,

I had the same trouble yesterday. As it happens, I created a doc to help
script installs of future nodes. I did not snip out the portions that apply
to what you're looking for, but the script below works for me.

The biggest issue was that some modules were apparently installed in lib
and the build process was looking for them in lib64. In any event, the vfs
module builds, installs and runs cleanly after this.

** The big win for me was finding this command, that allowed me to figure
out where it was looking for modules in the wrong lib directory.
ldd /usr/local/samba/lib/vfs/glusterfs.so

I'm sure there is any easier way to to do this :)...

Cheers,
Dave


--

#!/bin/bash

yum groupinstall Development Tools

yum install git openssl-devel wget

yum install libtalloc libtdb

# set up gluster

cd /usr/src  git clone https://github.com/gluster/glusterfs.git

cd /usr/src/glusterfs  ./autogen.sh  ./configure  make

make install

# set up samba 3.6.9

cd /usr/src  wget
http://ftp.samba.org/pub/samba/stable/samba-3.6.9.tar.gz; tar -zxvf
samba-3.6.9.tar.gz

cd /usr/src/samba-3.6.9/source3  ./configure  make

make install

ln -s /usr/local/samba/lib/libwbclient.so.0 /usr/lib64/libwbclient.so.0

# then install the RPM samba version

yum install samba

# set up vfs_glusterfs

cd /usr/src  git clone git://
forge.gluster.org/samba-glusterfs/samba-glusterfs-vfs.git

ln -s /usr/local/include/glusterfs /usr/include/glusterfs

cd /usr/src/samba-vfs/glusterfs

./configure --with-samba-source=/usr/src/samba-3.6.9/source3

ln -s /usr/local/samba/lib/vfs/glusterfs.so
/usr/lib64/samba/vfs/glusterfs.so

# link the other modules

ln -s /usr/local/lib/libgfapi.so /usr/lib64/

ln -s /usr/local/lib/libgfapi.so.0 /usr/lib64/

ln -s /usr/local/lib/libgfapi.la /usr/lib64/

ln -s /usr/local/lib/libglusterfs.la /usr/lib64/

ln -s /usr/local/lib/libglusterfs.so /usr/lib64/

ln -s /usr/local/lib/libglusterfs.so.0 /usr/lib64/
ln -s /usr/local/lib/libglusterfs.so.0.0.0 /usr/lib64/

EOF


On Tue, Oct 1, 2013 at 10:38 PM, Dan Mons dm...@cuttingedge.com.au wrote:

 Hi folks,

 I've got CentOS6.4 with Samba 3.6.9 installed from the standard CentOS
 repos via yum.  I also have GlusterFS 3.4.0 GA installed from RPMs
 direct from gluster.org.

 I'm trying to build the glusterfs VFS module for Samba to take
 advantage of libgfapi for our Windows users, and migrate them off the
 current Samba-on-FUSE setup we have currently.

 I've downloaded the appropriate source trees for all projects
 (GlusterFS from gluster.org, Samba from the matching CentOS6 SRPM, and
 samba-glusterfs-vfs from the git repo), but am facing troubles early
 on just finding appropriate headers.

 [root@bne-gback000 samba-glusterfs-vfs]# find
 /usr/local/src/glusterfs-3.4.0 -type f -name glfs.h
 /usr/local/src/glusterfs-3.4.0/api/src/glfs.h

 [root@bne-gback000 samba-glusterfs-vfs]# ./configure
 --with-glusterfs=/usr/local/src/glusterfs-3.4.0
 *snip*
 checking api/glfs.h usability... no
 checking api/glfs.h presence... no
 checking for api/glfs.h... no
 Cannot find api/glfs.h. Please specify --with-glusterfs=dir if necessary

 If I install glusterfs-api-devel-3.4.0-8.el6.x86_64.rpm, I need to
 copy /usr/include/glusterfs/api/glfs.h to /usr/include for it to be
 found (even using --with-glusterfs= doesn't work), and then I get
 further errors about not being able to link to glfs_init:

 [root@bne-gback000 samba-glusterfs-vfs]# rpm -ivh
 /tmp/glusterfs-api-devel-3.4.0-8.el6.x86_64.rpm
 [root@bne-gback000 samba-glusterfs-vfs]# cp
 /usr/include/glusterfs/api/glfs.h /usr/include/
 [root@bne-gback000 samba-glusterfs-vfs]# ./configure
 *snip*
 checking api/glfs.h usability... yes
 checking api/glfs.h presence... yes
 checking for api/glfs.h... yes
 checking for glfs_init... no
 Cannot link to gfapi (glfs_init). Please specify --with-glusterfs=dir
 if necessary

 If anyone can point me in the right direction, that would be greatly
 appreciated.

 Cheers,

 -Dan
 ___
 Gluster-users mailing list
 Gluster-users@gluster.org
 http://supercolony.gluster.org/mailman/listinfo/gluster-users

___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Replacing a failed brick

2013-08-18 Thread David Gibbons
Joe,

Now I understand what is going on here. Makes a lot more sense that it's a
bug in the sanity checking code. Thanks so much!

Dave


On Fri, Aug 16, 2013 at 11:19 AM, Joe Julian j...@julianfamily.org wrote:

 This tells you that this brick isn't running. That's probably because it
 was formatted and lost it's volume-id extended attribute. See
 http://www.joejulian.name/**blog/replacing-a-brick-on-**glusterfs-340/http://www.joejulian.name/blog/replacing-a-brick-on-glusterfs-340/

 Once that's fixed, on 10.250.4.65:


   gluster volume start test-a force


 On 08/16/2013 08:03 AM, David Gibbons wrote:

 Brick 10.250.4.65:/localmnt/g2lv5   N/A N   N/A



___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Replacing a failed brick

2013-08-16 Thread David Gibbons
Ok, it appears that the following worked. Thanks for the nudge in the right
direction:

volume replace-brick test-a 10.250.4.65:/localmnt/g2lv5
10.250.4.65:/localmnt/g2lv6
commit force

then
volume heal test-a full

and monitor the progress with
volume heal test-a info

However that does not solve my problem for what to do when a brick is
corrupted somehow, if I don't have enough space to first heal it and then
replace it.

That did get me thinking though, what if I replace the brick, forgoe the
heal, replace it again and then do a heal? That seems to work.

So if I lose one brick, here is the process that I used to recover it:
1) create a directory that is just to temporary trick gluster and allow us
to maintain the correct replica count: mkdir /localmnt/garbage
2) replace the dead brick with our garbage directory: volume replace-brick
test-a 10.250.4.65:/localmnt/g2lv5 10.250.4.65:/localmnt/garbage commit
force
3) fix our dead brick using whatever process is required. in this case, for
testing, we had to remove some gluster bits or it throws the already part
of a volume error:
setfattr -x trusted.glusterfs.volume-id /localmnt/g2lv5
setfattr -x trusted.gfid /localmnt/g2lv5
4) now that our dead brick is fixed, swap it for the garbage/temporary
brick: volume replace-brick test-a 10.250.4.65:/localmnt/garbage
10.250.4.65:/localmnt/g2lv5 commit force
5) now all that we have to do is let gluster heal the volume: volume heal
test-a full

Is there anything wrong with this procedure?

Cheers,
Dave




On Fri, Aug 16, 2013 at 11:03 AM, David Gibbons
david.c.gibb...@gmail.comwrote:

 Ravi,

 Thanks for the tips. When I run a volume status:
 gluster volume status test-a
 Status of volume: test-a
 Gluster process PortOnline  Pid

 --
 Brick 10.250.4.63:/localmnt/g1lv2   49152   Y
 8072
 Brick 10.250.4.65:/localmnt/g2lv2   49152   Y
 3403
 Brick 10.250.4.63:/localmnt/g1lv3   49153   Y
 8081
 Brick 10.250.4.65:/localmnt/g2lv3   49153   Y
 3410
 Brick 10.250.4.63:/localmnt/g1lv4   49154   Y
 8090
 Brick 10.250.4.65:/localmnt/g2lv4   49154   Y
 3417
 Brick 10.250.4.63:/localmnt/g1lv5   49155   Y
 8099
 Brick 10.250.4.65:/localmnt/g2lv5   N/A N
 N/A
 Brick 10.250.4.63:/localmnt/g1lv1   49156   Y
 8576
 Brick 10.250.4.65:/localmnt/g2lv1   49156   Y
 3431
 NFS Server on localhost 2049Y
 3440
 Self-heal Daemon on localhost   N/A Y
 3445
 NFS Server on 10.250.4.63   2049Y
 8586
 Self-heal Daemon on 10.250.4.63 N/A Y
 8593

 There are no active volume tasks
 --

 Attempting to start the volume results in:
 gluster volume start test-a force
 volume start: test-a: failed: Failed to get extended attribute
 trusted.glusterfs.volume-id for brick dir /localmnt/g2lv5. Reason : No data
 available
 --

 It doesn't like when I try to fire off a heal either:
 gluster volume heal test-a
 Launching Heal operation on volume test-a has been unsuccessful
 --

 Although that did lead me to this:
 gluster volume heal test-a info
 Gathering Heal info on volume test-a has been successful

 Brick 10.250.4.63:/localmnt/g1lv2
 Number of entries: 0

 Brick 10.250.4.65:/localmnt/g2lv2
 Number of entries: 0

 Brick 10.250.4.63:/localmnt/g1lv3
 Number of entries: 0

 Brick 10.250.4.65:/localmnt/g2lv3
 Number of entries: 0

 Brick 10.250.4.63:/localmnt/g1lv4
 Number of entries: 0

 Brick 10.250.4.65:/localmnt/g2lv4
 Number of entries: 0

 Brick 10.250.4.63:/localmnt/g1lv5
 Number of entries: 0

 Brick 10.250.4.65:/localmnt/g2lv5
 Status: Brick is Not connected
 Number of entries: 0

 Brick 10.250.4.63:/localmnt/g1lv1
 Number of entries: 0

 Brick 10.250.4.65:/localmnt/g2lv1
 Number of entries: 0
 --

 So perhaps I need to re-connect the brick?

 Cheers,
 Dave



 On Fri, Aug 16, 2013 at 12:43 AM, Ravishankar N ravishan...@redhat.comwrote:

  On 08/15/2013 10:05 PM, David Gibbons wrote:

 Hi There,

  I'm currently testing Gluster for possible production use. I haven't
 been able to find the answer to this question in the forum arch or in the
 public docs. It's possible that I don't know which keywords to search for.

  Here's the question (more details below): let's say that one of my
 bricks fails -- *not* a whole node failure but a single brick failure
 within the node. How do I replace a single brick on a node and force a sync
 from one of the replicas?

  I have two nodes with 5 bricks each:
  gluster volume info test-a

  Volume Name: test-a
 Type: Distributed-Replicate
 Volume ID: e8957773-dd36-44ae-b80a-01e22c78a8b4
 Status: Started
 Number of Bricks: 5 x 2 = 10
 Transport-type: tcp
 Bricks:
 Brick1

[Gluster-users] Replacing a failed brick

2013-08-15 Thread David Gibbons
Hi There,

I'm currently testing Gluster for possible production use. I haven't been
able to find the answer to this question in the forum arch or in the public
docs. It's possible that I don't know which keywords to search for.

Here's the question (more details below): let's say that one of my bricks
fails -- *not* a whole node failure but a single brick failure within the
node. How do I replace a single brick on a node and force a sync from one
of the replicas?

I have two nodes with 5 bricks each:
gluster volume info test-a

Volume Name: test-a
Type: Distributed-Replicate
Volume ID: e8957773-dd36-44ae-b80a-01e22c78a8b4
Status: Started
Number of Bricks: 5 x 2 = 10
Transport-type: tcp
Bricks:
Brick1: 10.250.4.63:/localmnt/g1lv2
Brick2: 10.250.4.65:/localmnt/g2lv2
Brick3: 10.250.4.63:/localmnt/g1lv3
Brick4: 10.250.4.65:/localmnt/g2lv3
Brick5: 10.250.4.63:/localmnt/g1lv4
Brick6: 10.250.4.65:/localmnt/g2lv4
Brick7: 10.250.4.63:/localmnt/g1lv5
Brick8: 10.250.4.65:/localmnt/g2lv5
Brick9: 10.250.4.63:/localmnt/g1lv1
Brick10: 10.250.4.65:/localmnt/g2lv1

I formatted 10.250.4.65:/localmnt/g2lv5 (to simulate a failure). What is
the next step? I have tried various combinations of removing and re-adding
the brick, replacing the brick, etc. I read in a previous message to this
list that replace-brick was for planned changes which makes sense, so
that's probably not my next step.

Cheers,
Dave
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users