[Gluster-users] Geo-replication (v3.5.3)

2015-03-10 Thread John Gardeniers
Using Gluster v3.5.3 and trying to follow the geo-replication 
instructions 
(https://github.com/gluster/glusterfs/blob/master/doc/admin-guide/en-US/markdown/admin_distributed_geo_rep.md), 
step by step, gets me nowhere.


The slave volume has been created and passwordless SSH is set up for 
root from the master to slave. Both master and slave volumes are running.


Running "gluster system:: execute gsec_create", no problem.
Running "gluster volume geo-replication  
:: create push-pem [force]" (with appropriate 
parameters, with and without "force") results in "Passwordless ssh login 
has not been setup with . geo-replication command failed"


As I said, passwordless SSH *is* set up. I can SSH from the master to 
the slave without a password just fine. What gives? More to the point, 
how do I make this work.


regards,
John


___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Remove geo-replica

2015-03-10 Thread John Gardeniers
Please ignore this one. The problem appears to be resolved by deleting 
the backup volume.


On 11/03/15 09:16, John Gardeniers wrote:
As a result of some server changes I wish to remove a geo-replica from 
our Gluster volume, in readiness for creating a new geo-replica on a 
different server. We have 2 Gluster servers, name Jupiter and Rigel. 
Jupiter is CentOS 6.6 and Rigel is CentOS 7. Both are running Gluster 
v3.5.3.


This doesn't appear to be documented, at least not anywhere I've 
searched, so I fell back on Google. The results show 2 variations of a 
command:


1 - gluster volume geo-replication gluster-volume 
slave_server:/backup_volume delete
2 - gluster volume geo-replication gluster-volume 
gluster://slave_server:/backup_volume delete


I tried both variation on each of our servers. Not only did they fail, 
they failed with totally different error messages.


On Jupiter I received "unrecognized word: geo-replication (position 1)".
On Rigel I received "Staging failed on localhost. Please check the log 
file for more details."


I should mention that Rigel was not part of the cluster when the 
geo-replication was originally set up.


So the question is; what is the proper command to use to remove a 
geo-replica?


regards,
John

___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

__
This email has been scanned by the Symantec Email Security.cloud service.
For more information please visit http://www.symanteccloud.com
__


___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


[Gluster-users] Remove geo-replica

2015-03-10 Thread John Gardeniers
As a result of some server changes I wish to remove a geo-replica from 
our Gluster volume, in readiness for creating a new geo-replica on a 
different server. We have 2 Gluster servers, name Jupiter and Rigel. 
Jupiter is CentOS 6.6 and Rigel is CentOS 7. Both are running Gluster 
v3.5.3.


This doesn't appear to be documented, at least not anywhere I've 
searched, so I fell back on Google. The results show 2 variations of a 
command:


1 - gluster volume geo-replication gluster-volume 
slave_server:/backup_volume delete
2 - gluster volume geo-replication gluster-volume 
gluster://slave_server:/backup_volume delete


I tried both variation on each of our servers. Not only did they fail, 
they failed with totally different error messages.


On Jupiter I received "unrecognized word: geo-replication (position 1)".
On Rigel I received "Staging failed on localhost. Please check the log 
file for more details."


I should mention that Rigel was not part of the cluster when the 
geo-replication was originally set up.


So the question is; what is the proper command to use to remove a 
geo-replica?


regards,
John

___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Peers not connecting after changing IP address

2015-03-10 Thread Alex Crow

Hi JF,

They are all hostnames, no IPs anywhere. The odd thing is that now, the 
remote site says everything is up, whereas the local servers only show 
each other as connected. It's a bit odd.


Cheers

Alex

On 10/03/15 15:40, JF Le Fillâtre wrote:

On my setup: on the host from which I peer probed, it's all hostnames.
On the other hosts, it's all IPs.

Can you check if it's the case on your setup too?

Thanks,
JF


On 10/03/15 16:29, Alex Crow wrote:

Hi,

They only have the hostname:

uuid=22b88f85-0554-419f-a279-980fceaeaf49
state=3
hostname1=zalma

And pinging these hostnames give the correct IP. Still no connection
though.

Thanks,

Alex

On 10/03/15 15:04, JF Le Fillâtre wrote:

Hello,

Check the files in the peer directory:

/var/lib/glusterd/peers

They contain the IP addresses of the peers.

I haven't done it but I assume that if you update those files on all
servers you should be back online.

Thanks,
JF


On 10/03/15 16:00, Alex Crow wrote:

Hi,

I've had a 4 node Dis/Rep cluster up and running for a while, but
recently moved two of the nodes (the replicas of the other 2) to a
nearby datacentre. The IP addresses of the moved two therefore changed,
but I updated the /etc/hosts file on all four hosts to reflect the
change (and the peers were all probed by name, not IP).

However at each site the other two peers show as disconnected, even
though the servers can all successfully talk to each other. Is there
some way I can kick this back into life?

Regards,

Alex



--
This message is intended only for the addressee and may contain
confidential information. Unless you are that person, you may not
disclose its contents or use it in any way and are requested to delete
the message along with any attachments and notify us immediately.
"Transact" is operated by Integrated Financial Arrangements plc. 29
Clement's Lane, London EC4N 7AE. Tel: (020) 7608 4900 Fax: (020) 7608
5300. (Registered office: as above; Registered in England and Wales
under number: 3727592). Authorised and regulated by the Financial
Conduct Authority (entered on the Financial Services Register; no. 190856).

___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Peers not connecting after changing IP address

2015-03-10 Thread JF Le Fillâtre

On my setup: on the host from which I peer probed, it's all hostnames.
On the other hosts, it's all IPs.

Can you check if it's the case on your setup too?

Thanks,
JF


On 10/03/15 16:29, Alex Crow wrote:
> Hi,
> 
> They only have the hostname:
> 
> uuid=22b88f85-0554-419f-a279-980fceaeaf49
> state=3
> hostname1=zalma
> 
> And pinging these hostnames give the correct IP. Still no connection
> though.
> 
> Thanks,
> 
> Alex
> 
> On 10/03/15 15:04, JF Le Fillâtre wrote:
>> Hello,
>>
>> Check the files in the peer directory:
>>
>> /var/lib/glusterd/peers
>>
>> They contain the IP addresses of the peers.
>>
>> I haven't done it but I assume that if you update those files on all
>> servers you should be back online.
>>
>> Thanks,
>> JF
>>
>>
>> On 10/03/15 16:00, Alex Crow wrote:
>>> Hi,
>>>
>>> I've had a 4 node Dis/Rep cluster up and running for a while, but
>>> recently moved two of the nodes (the replicas of the other 2) to a
>>> nearby datacentre. The IP addresses of the moved two therefore changed,
>>> but I updated the /etc/hosts file on all four hosts to reflect the
>>> change (and the peers were all probed by name, not IP).
>>>
>>> However at each site the other two peers show as disconnected, even
>>> though the servers can all successfully talk to each other. Is there
>>> some way I can kick this back into life?
>>>
>>> Regards,
>>>
>>> Alex
>>>
> 

-- 

 Jean-François Le Fillâtre
 ---
 HPC Systems Administrator
 LCSB - University of Luxembourg
 ---
 PGP KeyID 0x134657C6
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Peers not connecting after changing IP address

2015-03-10 Thread Alex Crow

Hi,

They only have the hostname:

uuid=22b88f85-0554-419f-a279-980fceaeaf49
state=3
hostname1=zalma

And pinging these hostnames give the correct IP. Still no connection though.

Thanks,

Alex

On 10/03/15 15:04, JF Le Fillâtre wrote:

Hello,

Check the files in the peer directory:

/var/lib/glusterd/peers

They contain the IP addresses of the peers.

I haven't done it but I assume that if you update those files on all
servers you should be back online.

Thanks,
JF


On 10/03/15 16:00, Alex Crow wrote:

Hi,

I've had a 4 node Dis/Rep cluster up and running for a while, but
recently moved two of the nodes (the replicas of the other 2) to a
nearby datacentre. The IP addresses of the moved two therefore changed,
but I updated the /etc/hosts file on all four hosts to reflect the
change (and the peers were all probed by name, not IP).

However at each site the other two peers show as disconnected, even
though the servers can all successfully talk to each other. Is there
some way I can kick this back into life?

Regards,

Alex



--
This message is intended only for the addressee and may contain
confidential information. Unless you are that person, you may not
disclose its contents or use it in any way and are requested to delete
the message along with any attachments and notify us immediately.
"Transact" is operated by Integrated Financial Arrangements plc. 29
Clement's Lane, London EC4N 7AE. Tel: (020) 7608 4900 Fax: (020) 7608
5300. (Registered office: as above; Registered in England and Wales
under number: 3727592). Authorised and regulated by the Financial
Conduct Authority (entered on the Financial Services Register; no. 190856).

___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Peers not connecting after changing IP address

2015-03-10 Thread JF Le Fillâtre

Hello,

Check the files in the peer directory:

/var/lib/glusterd/peers

They contain the IP addresses of the peers.

I haven't done it but I assume that if you update those files on all
servers you should be back online.

Thanks,
JF


On 10/03/15 16:00, Alex Crow wrote:
> Hi,
> 
> I've had a 4 node Dis/Rep cluster up and running for a while, but
> recently moved two of the nodes (the replicas of the other 2) to a
> nearby datacentre. The IP addresses of the moved two therefore changed,
> but I updated the /etc/hosts file on all four hosts to reflect the
> change (and the peers were all probed by name, not IP).
> 
> However at each site the other two peers show as disconnected, even
> though the servers can all successfully talk to each other. Is there
> some way I can kick this back into life?
> 
> Regards,
> 
> Alex
> 

-- 

 Jean-François Le Fillâtre
 ---
 HPC Systems Administrator
 LCSB - University of Luxembourg
 ---
 PGP KeyID 0x134657C6
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] Peers not connecting after changing IP address

2015-03-10 Thread Alex Crow

Hi,

I've had a 4 node Dis/Rep cluster up and running for a while, but 
recently moved two of the nodes (the replicas of the other 2) to a 
nearby datacentre. The IP addresses of the moved two therefore changed, 
but I updated the /etc/hosts file on all four hosts to reflect the 
change (and the peers were all probed by name, not IP).


However at each site the other two peers show as disconnected, even 
though the servers can all successfully talk to each other. Is there 
some way I can kick this back into life?


Regards,

Alex

--
This message is intended only for the addressee and may contain
confidential information. Unless you are that person, you may not
disclose its contents or use it in any way and are requested to delete
the message along with any attachments and notify us immediately.
"Transact" is operated by Integrated Financial Arrangements plc. 29
Clement's Lane, London EC4N 7AE. Tel: (020) 7608 4900 Fax: (020) 7608
5300. (Registered office: as above; Registered in England and Wales
under number: 3727592). Authorised and regulated by the Financial
Conduct Authority (entered on the Financial Services Register; no. 190856).

___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


[Gluster-users] Rebalance issue on 3.5.3

2015-03-10 Thread Alessandro Ipe
Hi,


I launched a couple a days ago a rebalance on my gluster distribute-replicate 
volume 
(see below) through its CLI, while allowing my users to continue using the 
volume.

Yesterday, they managed to fill completely the volume. It now results in 
unavailable 
files on the client (using fuse) with the message "Transport endpoint is not 
connected". Investigating  to associated files on the bricks, I noticed that 
these are 
displayed with ls -l as 
-T 2 user group 0 Jan 15 22:00 file
Performing a 
ls -lR /data/glusterfs/home/brick1/* | grep -F -- "-T"
on a single brick gave me a LOT of files in that above-mentioned state.

Why are the files in that state ?

Did I lose all these files or can they still be recovered from the replicate 
copy of 
another brick ?


Regards,


Alessandro.


gluster volume info home output:
Volume Name: home
Type: Distributed-Replicate
Volume ID: 501741ed-4146-4022-af0b-41f5b1297766
Status: Started
Number of Bricks: 12 x 2 = 24
Transport-type: tcp
Bricks:
Brick1: tsunami1:/data/glusterfs/home/brick1
Brick2: tsunami2:/data/glusterfs/home/brick1
Brick3: tsunami1:/data/glusterfs/home/brick2
Brick4: tsunami2:/data/glusterfs/home/brick2
Brick5: tsunami1:/data/glusterfs/home/brick3
Brick6: tsunami2:/data/glusterfs/home/brick3
Brick7: tsunami1:/data/glusterfs/home/brick4
Brick8: tsunami2:/data/glusterfs/home/brick4
Brick9: tsunami3:/data/glusterfs/home/brick1
Brick10: tsunami4:/data/glusterfs/home/brick1
Brick11: tsunami3:/data/glusterfs/home/brick2
Brick12: tsunami4:/data/glusterfs/home/brick2
Brick13: tsunami3:/data/glusterfs/home/brick3
Brick14: tsunami4:/data/glusterfs/home/brick3
Brick15: tsunami3:/data/glusterfs/home/brick4
Brick16: tsunami4:/data/glusterfs/home/brick4
Brick17: tsunami5:/data/glusterfs/home/brick1
Brick18: tsunami6:/data/glusterfs/home/brick1
Brick19: tsunami5:/data/glusterfs/home/brick2
Brick20: tsunami6:/data/glusterfs/home/brick2
Brick21: tsunami5:/data/glusterfs/home/brick3
Brick22: tsunami6:/data/glusterfs/home/brick3
Brick23: tsunami5:/data/glusterfs/home/brick4
Brick24: tsunami6:/data/glusterfs/home/brick4
Options Reconfigured:
features.default-soft-limit: 95%
cluster.ensure-durability: off
performance.cache-size: 512MB
performance.io-thread-count: 64
performance.flush-behind: off
performance.write-behind-window-size: 4MB
performance.write-behind: on
nfs.disable: on
features.quota: on
cluster.read-hash-mode: 2
diagnostics.brick-log-level: CRITICAL
cluster.lookup-unhashed: off
server.allow-insecure: on


___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Quorum setup for 2+1

2015-03-10 Thread Jeff Darcy
> I wold like to setup server side quorum by using the following setup:
> - 2x storage nodes (s-node-1, s-node-2)
> - 1x arbiter node (s-node-3)
> So the trusted storage pool has three peers.
> 
> This is my volume info:
> Volume Name: wp-vol-0
> Type: Replicate
> Volume ID: 8808ee87-b201-474f-83ae-6f08eb259b43
> Status: Started
> Number of Bricks: 1 x 2 = 2
> Transport-type: tcp
> Bricks:
> Brick1: s-node-1:/gluster/gvol0/brick0/brick
> Brick2: s-node-2:/gluster/gvol0/brick0/brick
> 
> I would like to setup the server side quorum so that any two nodes would
> have quorum.
> s-node-1, s-node-2 = quorum
> s-node-1, s-node-3 = quorum
> s-node-2, s-node-3 = quorum
> According to the Gluster guys at FOSDEM this should be possible.
> 
> I have been fiddling with the quorum options, but have not been able to
> achieve the desired setup.
> Theoretically I would do:
> # gluster volume set wp-vol-0 cluster.server-quorum-type server
> # gluster volume set wp-vol-0 cluster.server-quorum-ratio 60
> 
> But the cluster.server-quorum-ratio option produces an error:
> volume set: failed: Not a valid option for single volume
> 
> How would I achieve the desired setup?

Somewhat counter-intuitively, server-quorum-type is a *volume* option
but server-quorum-ratio is a *cluster wide* option.  Therefore, instead
of specifying a volume name on that command, use this:

# gluster volume set all cluster.server-quorum-ratio 60
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


[Gluster-users] Quorum setup for 2+1

2015-03-10 Thread Mitja Mihelič

Hi!

I wold like to setup server side quorum by using the following setup:
- 2x storage nodes (s-node-1, s-node-2)
- 1x arbiter node (s-node-3)
So the trusted storage pool has three peers.

This is my volume info:
Volume Name: wp-vol-0
Type: Replicate
Volume ID: 8808ee87-b201-474f-83ae-6f08eb259b43
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: s-node-1:/gluster/gvol0/brick0/brick
Brick2: s-node-2:/gluster/gvol0/brick0/brick

I would like to setup the server side quorum so that any two nodes would 
have quorum.

s-node-1, s-node-2 = quorum
s-node-1, s-node-3 = quorum
s-node-2, s-node-3 = quorum
According to the Gluster guys at FOSDEM this should be possible.

I have been fiddling with the quorum options, but have not been able to 
achieve the desired setup.

Theoretically I would do:
# gluster volume set wp-vol-0 cluster.server-quorum-type server
# gluster volume set wp-vol-0 cluster.server-quorum-ratio 60

But the cluster.server-quorum-ratio option produces an error:
volume set: failed: Not a valid option for single volume

How would I achieve the desired setup?

Kind regards,
Mitja

--
--
Mitja Mihelič
ARNES, Tehnološki park 18, p.p. 7, SI-1001 Ljubljana, Slovenia
tel: +386 1 479 8877, fax: +386 1 479 88 78

___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] Minutes of todays Gluster Community Bug Triage meeting

2015-03-10 Thread Niels de Vos
On Mon, Mar 09, 2015 at 11:09:51PM -0400, Niels de Vos wrote:
> Hi all,
> 
> This meeting is scheduled for anyone that is interested in learning more
> about, or assisting with the Bug Triage.
> 
> Meeting details:
> - location: #gluster-meeting on Freenode IRC
> - date: every Tuesday
> - time: 12:00 UTC, 13:00 CET (run: date -d "12:00 UTC")
> - agenda: https://public.pad.fsfe.org/p/gluster-bug-triage
> 
> Currently the following items are listed:
> * Roll Call
> * Status of last weeks action items
> * Group Triage
> * Open Floor
> 
> The last two topics have space for additions. If you have a suitable bug
> or topic to discuss, please add it to the agenda.


Minutes: 
http://meetbot.fedoraproject.org/gluster-meeting/2015-03-10/gluster-meeting.2015-03-10-12.00.html
Minutes (text): 
http://meetbot.fedoraproject.org/gluster-meeting/2015-03-10/gluster-meeting.2015-03-10-12.00.txt
Log: 
http://meetbot.fedoraproject.org/gluster-meeting/2015-03-10/gluster-meeting.2015-03-10-12.00.log.html


Meeting summary
---
* Agenda: https://public.pad.fsfe.org/p/gluster-bug-triage  (ndevos,
  12:00:37)
* Roll Call  (ndevos, 12:00:43)

* Last weeks action items  (ndevos, 12:03:05)
  * subtopic: lalatenduM will send a reminder to the users- and devel-
ML about (and how to) fixing Coverity defects  (ndevos, 12:03:42)
  * ACTION: lalatenduM 's automated Coverity setup in Jenkins need
assistance from an admin with more permissions  (ndevos, 12:07:09)
  * AGREED: Try to find someone (not lalatenduM) to send the Coverity
reminder/howto to the lists  (ndevos, 12:09:08)
  * subtopic lalatenduM initiate a discussion with the RH Gluster team
to triage their own bugs when they report them  (ndevos, 12:09:23)
  * AGREED: RH QE/dev people should now know how to triage their own
bugs, we'll talk to hagarth about it if they fail to do so  (ndevos,
12:12:09)
  * subtopic: ndevos  will send an email to gluster-devel with some
standard bugzilla  queries/links to encourage developers to take
NEW+Triaged bugs  (ndevos, 12:12:38)
  * subtopic: lalatenduM will look into using nightly builds for
automated testing, and will report issues/success to the mailinglist
(ndevos, 12:13:40)
  * ACTION: ndevos needs to look into building nightly debug rpms that
can be used for testing  (ndevos, 12:18:39)
  * ACTION: lalatenduM and ndevos need to think about and decide how to
provide/use debug builds  (ndevos, 12:21:08)
  * ACTION: lalatenduM provide a simple step/walk-through on how to
provide testcases for the nightly rpm tests  (ndevos, 12:25:55)
  * ACTION: ndevos to propose some test-cases for minimal libgfapi tests
(ndevos, 12:26:36)

* Group Triage  (ndevos, 12:28:43)

* Open Floor  (ndevos, 12:50:23)

Meeting ended at 12:51:52 UTC.




Action Items

* lalatenduM 's automated Coverity setup in Jenkins need assistance from
  an admin with more permissions
* ndevos needs to look into building nightly debug rpms that can be used
  for testing
* lalatenduM and ndevos need to think about and decide how to
  provide/use debug builds
* lalatenduM provide a simple step/walk-through on how to provide
  testcases for the nightly rpm tests
* ndevos to propose some test-cases for minimal libgfapi tests




Action Items, by person
---
* lalatenduM
  * lalatenduM 's automated Coverity setup in Jenkins need assistance
from an admin with more permissions
  * lalatenduM and ndevos need to think about and decide how to
provide/use debug builds
  * lalatenduM provide a simple step/walk-through on how to provide
testcases for the nightly rpm tests
* ndevos
  * ndevos needs to look into building nightly debug rpms that can be
used for testing
  * lalatenduM and ndevos need to think about and decide how to
provide/use debug builds
  * ndevos to propose some test-cases for minimal libgfapi tests
* **UNASSIGNED**
  * (none)




People Present (lines said)
---
* ndevos (69)
* lalatenduM (43)
* hchiramm (13)
* rafi (5)
* zodbot (2)
* hagarth (1)




Generated by `MeetBot`_ 0.1.4

.. _`MeetBot`: http://wiki.debian.org/MeetBot
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Gluster errors create zombie processes [LOGS ATTACHED]

2015-03-10 Thread Przemysław Mroczek
The versions were:
gluster client: 3.6.2
gluster server: 3.6.0

2015-03-08 18:17 GMT+01:00 Vijay Bellur :

> On 03/08/2015 09:36 AM, Przemysław Mroczek wrote:
>
>> I don't have volfiles, they are not on our machines as I said previously
>> we don't have impact on gluster servers.
>>
>> I saw some graph that looks similiar to volume file on logs. I will
>> paste it here but we don't really have any impact on that. We are just
>> using client to connect to gluster servers, we are not in control of.
>>
>>
> I would recommend to not alter the default for frame timeout.
>
>
>> Btw, do you think that different versions of gluster client and gluster
>> server could be an issue here?
>>
>>
> It can potentially be. What versions are you using on the servers and the
> client?
>
> -Vijay
>
>  2015-03-08 1:29 GMT+01:00 Vijay Bellur > >:
>>
>>
>> On 03/07/2015 06:20 PM, Przemysław Mroczek wrote:
>>
>> Hi guys,
>>
>> We have rails app, which is using gluster for our distributed file
>> system. The glusters servers are hosted independently as part of
>> deal
>> with other, we don't have any impact on them, we are connected o
>> them by
>> using gluster native client.
>>
>> We tried to resolve this issue using help from the admins of the
>> company
>> that is hosting our gluster servers, but they say that's the
>> client
>> issue and we ran out of ideas how that's possible if we are not
>> doing
>> anything special here.
>>
>> Information about independent gluster servers:
>> -version: 3.6.0.42.1
>> - They are using red hat
>> -They are enterprise so the are always using older versions
>>
>> Our servers:
>> System version: Ubuntu 14.04
>> Our gluster client version: 3.6.2
>>
>> The exact problem is that it often happens(couple times a week)
>> that
>> errors in gluster causes proceses to become zombies. It happens
>> with our
>> application server(unicorn), nginx and our crawling script that
>> is run
>> as daemon.
>>
>> Our fstab file:
>>
>> 10.10.11.17:/drslk-prod /mnt/storage  glusterfs
>> defaults,_netdev,nobootwait,__fetch-attempts=10 0 0
>> 10.10.11.17:/drslk-backup /mnt/backup  glusterfs
>> defaults,_netdev,nobootwait,__fetch-attempts=10 0 0
>>
>> Logs from gluster:
>>
>> 2015-02-18 12:36:12.375695] E
>> [rpc-clnt.c:362:saved_frames___unwind] (-->
>> /usr/lib/x86_64-linux-gnu/__libglusterfs.so.0(_gf_log___
>> callingfn+0x186)[__0x7fb41ddeada6]
>> (-->
>> /usr/lib/x86_64-linux-gnu/__libgfrpc.so.0(saved_frames___
>> unwind+0x1de)[0x7fb41d
>> bc1c7e] (-->
>> /usr/lib/x86_64-linux-gnu/__libgfrpc.so.0(saved_frames___
>> destroy+0xe)[0x7fb41dbc1d8e]
>> (-->
>> /usr/lib/x86_64-linux-gnu/__libgfrpc.so.0(rpc_clnt___
>> connection_cleanup+0x82)[__0x7fb41dbc3602]
>> (--> /usr/lib/x86_64-linux-gnu/__libgfrpc.so.0(rpc
>> _clnt_notify+0x48)[__0x7fb41dbc3d98] )
>> 0-drslk-prod-client-10: forced
>> unwinding frame type(GlusterFS 3.3) op(LOOKUP(27)) called at
>> 2015-02-18
>> 12:36:12.361489 (xid=0x5d475da)
>> [2015-02-18 12:36:12.375765] W
>> [client-rpc-fops.c:2766:__client3_3_lookup_cbk]
>> 0-drslk-prod-client-10:
>> remote operation failed: Transport endpoint is not connected.
>> Path:
>> /system/posts/00/00/71/77/59.__jpg
>> (2ad81c2b-a141-478d-9dd4-__253345edbce
>> b)
>> [2015-02-18 12:36:12.376288] E
>> [rpc-clnt.c:362:saved_frames___unwind] (-->
>> /usr/lib/x86_64-linux-gnu/__libglusterfs.so.0(_gf_log___
>> callingfn+0x186)[__0x7fb41ddeada6]
>> (-->
>> /usr/lib/x86_64-linux-gnu/__libgfrpc.so.0(saved_frames___
>> unwind+0x1de)[0x7fb41d
>> bc1c7e] (-->
>> /usr/lib/x86_64-linux-gnu/__libgfrpc.so.0(saved_frames___
>> destroy+0xe)[0x7fb41dbc1d8e]
>> (-->
>> /usr/lib/x86_64-linux-gnu/__libgfrpc.so.0(rpc_clnt___
>> connection_cleanup+0x82)[__0x7fb41dbc3602]
>> (--> /usr/lib/x86_64-linux-gnu/__libgfrpc.so.0(rpc
>> _clnt_notify+0x48)[__0x7fb41dbc3d98] )
>> 0-drslk-prod-client-10: forced
>> unwinding frame type(GlusterFS 3.3) op(LOOKUP(27)) called at
>> 2015-02-18
>> 12:36:12.361858 (xid=0x5d475db)
>> [2015-02-18 12:36:12.376355] W
>> [client-rpc-fops.c:2766:__client3_3_lookup_cbk]
>> 0-drslk-prod-client-10:
>> remote operation failed: Transport endpoint is not connected.
>> Path:
>> /system/posts/00/00/08 (f5c33a99-719e-4ea2-ad1f-__33b893af103d)
>> [2015-02-18 12:36:12.376711] I
>> [socket.c:3292:socket_submit___request]
>> 0-drslk-prod-client-10: not connected (priv