Re: [Gluster-users] Gluster errors create zombie processes [LOGS ATTACHED]

2015-03-10 Thread Przemysław Mroczek
The versions were:
gluster client: 3.6.2
gluster server: 3.6.0

2015-03-08 18:17 GMT+01:00 Vijay Bellur :

> On 03/08/2015 09:36 AM, Przemysław Mroczek wrote:
>
>> I don't have volfiles, they are not on our machines as I said previously
>> we don't have impact on gluster servers.
>>
>> I saw some graph that looks similiar to volume file on logs. I will
>> paste it here but we don't really have any impact on that. We are just
>> using client to connect to gluster servers, we are not in control of.
>>
>>
> I would recommend to not alter the default for frame timeout.
>
>
>> Btw, do you think that different versions of gluster client and gluster
>> server could be an issue here?
>>
>>
> It can potentially be. What versions are you using on the servers and the
> client?
>
> -Vijay
>
>  2015-03-08 1:29 GMT+01:00 Vijay Bellur > <mailto:vbel...@redhat.com>>:
>>
>>
>> On 03/07/2015 06:20 PM, Przemysław Mroczek wrote:
>>
>> Hi guys,
>>
>> We have rails app, which is using gluster for our distributed file
>> system. The glusters servers are hosted independently as part of
>> deal
>> with other, we don't have any impact on them, we are connected o
>> them by
>> using gluster native client.
>>
>> We tried to resolve this issue using help from the admins of the
>> company
>> that is hosting our gluster servers, but they say that's the
>> client
>> issue and we ran out of ideas how that's possible if we are not
>> doing
>> anything special here.
>>
>> Information about independent gluster servers:
>> -version: 3.6.0.42.1
>> - They are using red hat
>> -They are enterprise so the are always using older versions
>>
>> Our servers:
>> System version: Ubuntu 14.04
>> Our gluster client version: 3.6.2
>>
>> The exact problem is that it often happens(couple times a week)
>> that
>> errors in gluster causes proceses to become zombies. It happens
>> with our
>> application server(unicorn), nginx and our crawling script that
>> is run
>> as daemon.
>>
>> Our fstab file:
>>
>> 10.10.11.17:/drslk-prod /mnt/storage  glusterfs
>> defaults,_netdev,nobootwait,__fetch-attempts=10 0 0
>> 10.10.11.17:/drslk-backup /mnt/backup  glusterfs
>> defaults,_netdev,nobootwait,__fetch-attempts=10 0 0
>>
>> Logs from gluster:
>>
>> 2015-02-18 12:36:12.375695] E
>> [rpc-clnt.c:362:saved_frames___unwind] (-->
>> /usr/lib/x86_64-linux-gnu/__libglusterfs.so.0(_gf_log___
>> callingfn+0x186)[__0x7fb41ddeada6]
>> (-->
>> /usr/lib/x86_64-linux-gnu/__libgfrpc.so.0(saved_frames___
>> unwind+0x1de)[0x7fb41d
>> bc1c7e] (-->
>> /usr/lib/x86_64-linux-gnu/__libgfrpc.so.0(saved_frames___
>> destroy+0xe)[0x7fb41dbc1d8e]
>> (-->
>> /usr/lib/x86_64-linux-gnu/__libgfrpc.so.0(rpc_clnt___
>> connection_cleanup+0x82)[__0x7fb41dbc3602]
>> (--> /usr/lib/x86_64-linux-gnu/__libgfrpc.so.0(rpc
>> _clnt_notify+0x48)[__0x7fb41dbc3d98] )
>> 0-drslk-prod-client-10: forced
>> unwinding frame type(GlusterFS 3.3) op(LOOKUP(27)) called at
>> 2015-02-18
>> 12:36:12.361489 (xid=0x5d475da)
>> [2015-02-18 12:36:12.375765] W
>> [client-rpc-fops.c:2766:__client3_3_lookup_cbk]
>> 0-drslk-prod-client-10:
>> remote operation failed: Transport endpoint is not connected.
>> Path:
>> /system/posts/00/00/71/77/59.__jpg
>> (2ad81c2b-a141-478d-9dd4-__253345edbce
>> b)
>> [2015-02-18 12:36:12.376288] E
>> [rpc-clnt.c:362:saved_frames___unwind] (-->
>> /usr/lib/x86_64-linux-gnu/__libglusterfs.so.0(_gf_log___
>> callingfn+0x186)[__0x7fb41ddeada6]
>> (-->
>> /usr/lib/x86_64-linux-gnu/__libgfrpc.so.0(saved_frames___
>> unwind+0x1de)[0x7fb41d
>> bc1c7e] (-->
>> /usr/lib/x86_64-linux-gnu/__libgfrpc.so.0(saved_frames___
>> destroy+0xe)[0x7fb41dbc1d8e]
>> (-->
>> /usr/lib/x86_64-linux-gnu/__libgfrpc.so.0(rpc_clnt___
>> connection_cleanup+0x82)[__0x7fb41dbc3602]
>> (--> /usr/lib

Re: [Gluster-users] Gluster errors create zombie processes [LOGS ATTACHED]

2015-03-08 Thread Przemysław Mroczek
*
*134:  *
*135: volume drslk-prod-client-11*
*136: type protocol/client*
*137: option ping-timeout 20*
*138: option remote-host brick24.gluster.iadm*
*139: option remote-subvolume /GLUSTERFS/drslk-prod*
*140: option transport-type socket*
*141: option frame-timeout 60*
*142: option send-gids true*
*143: end-volume*
*144:  *
*145: volume drslk-prod-replicate-3*
*146: type cluster/replicate*
*147: option read-hash-mode 2*
*148: option data-self-heal-window-size 128*
*149: option quorum-type auto*
*150: subvolumes drslk-prod-client-9 drslk-prod-client-10
drslk-prod-client-11*
*151: end-volume*
*152:  *
*153: volume drslk-prod-dht*
*154: type cluster/distribute*
*155: option min-free-disk 10%*
*156: option readdir-optimize on*
*157: subvolumes drslk-prod-replicate-0 drslk-prod-replicate-1
drslk-prod-replicate-2 drslk-prod-replicate-3*
*158: end-volume*
*159:  *
*160: volume drslk-prod-write-behind*
*161: type performance/write-behind*
*162: option cache-size 1MB*
*163: subvolumes drslk-prod-dht*
*164: end-volume*
*165:  *
*166: volume drslk-prod-read-ahead*
*167: type performance/read-ahead*
*168: subvolumes drslk-prod-write-behind*
*169: end-volume*
*170:  *
*171: volume drslk-prod-readdir-ahead*
*172: type performance/readdir-ahead*
*173: subvolumes drslk-prod-read-ahead*
*174: end-volume*
*175:  *
*176: volume drslk-prod-io-cache*
*177: type performance/io-cache*
*178: option cache-timeout 60*
*179: option cache-size 512MB*
*180: subvolumes drslk-prod-readdir-ahead*
*181: end-volume*
*182:  *
*183: volume drslk-prod-quick-read*
*184: type performance/quick-read*
*185: option cache-size 512MB*
*186: subvolumes drslk-prod-io-cache*
*187: end-volume*
*188:  *
*189: volume drslk-prod-md-cache*
*190: type performance/md-cache*
*191: subvolumes drslk-prod-quick-read*
*192: end-volume*
*193:  *
*194: volume drslk-prod*
*195: type debug/io-stats*
*196: option latency-measurement off*
*197: option count-fop-hits off*
*198: subvolumes drslk-prod-md-cache*
*199: end-volume*
*200:  *
*201: volume meta-autoload*
*202: type meta*
*203: subvolumes drslk-prod*
*204: end-volume*
*205:  *

Btw, do you think that different versions of gluster client and gluster
server could be an issue here?

2015-03-08 1:29 GMT+01:00 Vijay Bellur :

> On 03/07/2015 06:20 PM, Przemysław Mroczek wrote:
>
>> Hi guys,
>>
>> We have rails app, which is using gluster for our distributed file
>> system. The glusters servers are hosted independently as part of deal
>> with other, we don't have any impact on them, we are connected o them by
>> using gluster native client.
>>
>> We tried to resolve this issue using help from the admins of the company
>> that is hosting our gluster servers, but they say that's the client
>> issue and we ran out of ideas how that's possible if we are not doing
>> anything special here.
>>
>> Information about independent gluster servers:
>> -version: 3.6.0.42.1
>> - They are using red hat
>> -They are enterprise so the are always using older versions
>>
>> Our servers:
>> System version: Ubuntu 14.04
>> Our gluster client version: 3.6.2
>>
>> The exact problem is that it often happens(couple times a week) that
>> errors in gluster causes proceses to become zombies. It happens with our
>> application server(unicorn), nginx and our crawling script that is run
>> as daemon.
>>
>> Our fstab file:
>>
>> 10.10.11.17:/drslk-prod /mnt/storage  glusterfs
>> defaults,_netdev,nobootwait,fetch-attempts=10 0 0
>> 10.10.11.17:/drslk-backup /mnt/backup  glusterfs
>> defaults,_netdev,nobootwait,fetch-attempts=10 0 0
>>
>> Logs from gluster:
>>
>> 2015-02-18 12:36:12.375695] E [rpc-clnt.c:362:saved_frames_unwind] (-->
>> /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x186)[
>> 0x7fb41ddeada6]
>> (-->
>> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_
>> unwind+0x1de)[0x7fb41d
>> bc1c7e] (-->
>> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_
>> destroy+0xe)[0x7fb41dbc1d8e]
>> (-->
>> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_
>> connection_cleanup+0x82)[0x7fb41dbc3602]
>> (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc
>> _clnt_notify+0x48)[0x7fb41dbc3d98] ) 0-drslk-prod-client-10: forced
>> unwinding frame type(GlusterFS 3.3) op(LOOKUP(27)) called at 2015-02-18
>> 12:36:12.361489 (xid=0x5d475da)
>> [2015-02-18 12:36:12.375765] W
>> [client-rpc-fops.c:2766:client3_3_lookup_cbk] 0-drslk-prod-client-10:
>> remote operation failed: Transport endpoint is not connected. Pat

[Gluster-users] Gluster errors create zombie processes [LOGS ATTACHED]

2015-03-07 Thread Przemysław Mroczek
Hi guys,

We have rails app, which is using gluster for our distributed file system.
The glusters servers are hosted independently as part of deal with other,
we don't have any impact on them, we are connected o them by using gluster
native client.

We tried to resolve this issue using help from the admins of the company
that is hosting our gluster servers, but they say that's the client issue
and we ran out of ideas how that's possible if we are not doing anything
special here.

Information about independent gluster servers:
-version: 3.6.0.42.1
- They are using red hat
-They are enterprise so the are always using older versions

Our servers:
System version: Ubuntu 14.04
Our gluster client version: 3.6.2

The exact problem is that it often happens(couple times a week) that errors
in gluster causes proceses to become zombies. It happens with our
application server(unicorn), nginx and our crawling script that is run as
daemon.

Our fstab file:

10.10.11.17:/drslk-prod /mnt/storage  glusterfs
defaults,_netdev,nobootwait,fetch-attempts=10 0 0
10.10.11.17:/drslk-backup /mnt/backup  glusterfs
defaults,_netdev,nobootwait,fetch-attempts=10 0 0

Logs from gluster:

2015-02-18 12:36:12.375695] E [rpc-clnt.c:362:saved_frames_unwind] (-->
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x186)[0x7fb41ddeada6]
(-->
/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7fb41d
bc1c7e] (-->
/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7fb41dbc1d8e]
(-->
/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x82)[0x7fb41dbc3602]
(--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc
_clnt_notify+0x48)[0x7fb41dbc3d98] ) 0-drslk-prod-client-10: forced
unwinding frame type(GlusterFS 3.3) op(LOOKUP(27)) called at 2015-02-18
12:36:12.361489 (xid=0x5d475da)
[2015-02-18 12:36:12.375765] W
[client-rpc-fops.c:2766:client3_3_lookup_cbk] 0-drslk-prod-client-10:
remote operation failed: Transport endpoint is not connected. Path:
/system/posts/00/00/71/77/59.jpg (2ad81c2b-a141-478d-9dd4-253345edbce
b)
[2015-02-18 12:36:12.376288] E [rpc-clnt.c:362:saved_frames_unwind] (-->
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x186)[0x7fb41ddeada6]
(-->
/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7fb41d
bc1c7e] (-->
/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7fb41dbc1d8e]
(-->
/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x82)[0x7fb41dbc3602]
(--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc
_clnt_notify+0x48)[0x7fb41dbc3d98] ) 0-drslk-prod-client-10: forced
unwinding frame type(GlusterFS 3.3) op(LOOKUP(27)) called at 2015-02-18
12:36:12.361858 (xid=0x5d475db)
[2015-02-18 12:36:12.376355] W
[client-rpc-fops.c:2766:client3_3_lookup_cbk] 0-drslk-prod-client-10:
remote operation failed: Transport endpoint is not connected. Path:
/system/posts/00/00/08 (f5c33a99-719e-4ea2-ad1f-33b893af103d)
[2015-02-18 12:36:12.376711] I [socket.c:3292:socket_submit_request]
0-drslk-prod-client-10: not connected (priv->connected = 0)
[2015-02-18 12:36:12.376749] W [rpc-clnt.c:1562:rpc_clnt_submit]
0-drslk-prod-client-10: failed to submit rpc-request (XID: 0x5d475dc
Program: GlusterFS 3.3, ProgVers: 330, Proc: 27) to rpc-transport
(drslk-prod-client-10)
[2015-02-18 12:36:12.376814] W
[client-rpc-fops.c:2766:client3_3_lookup_cbk] 0-drslk-prod-client-10:
remote operation failed: Transport endpoint is not connected. Path: (null)
(----)
[2015-02-18 12:36:12.376829] I [client.c:2215:client_rpc_notify]
0-drslk-prod-client-10: disconnected from drslk-prod-client-10. Client
process will keep trying to connect to glusterd until brick's port is
available
[2015-02-18 12:36:12.376834] W [rpc-clnt.c:1562:rpc_clnt_submit]
0-drslk-prod-client-10: failed to submit rpc-request (XID: 0x5d475dd
Program: GlusterFS 3.3, ProgVers: 330, Proc: 27) to rpc-transport
(drslk-prod-client-10)
[2015-02-18 12:36:12.376906] W
[client-rpc-fops.c:2766:client3_3_lookup_cbk] 0-drslk-prod-client-10:
remote operation failed: Transport endpoint is not connected. Path: (null)
(----)
[2015-02-18 12:36:12.376931] E [socket.c:2267:socket_connect_finish]
0-drslk-prod-client-10: connection to 10.10.11.23:24007 failed (Connection
refused)
[2015-02-18 12:36:12.379296] W
[client-rpc-fops.c:2766:client3_3_lookup_cbk] 0-drslk-prod-client-10:
remote operation failed: Transport endpoint is not connected. Path: (null)
(----)
[2015-02-18 12:36:12.379700] W
[client-rpc-fops.c:2766:client3_3_lookup_cbk] 0-drslk-prod-client-10:
remote operation failed: Transport endpoint is not connected. Path: (null)
(----)
[2015-02-18 13:10:52.759736] E
[client-handshake.c:1496:client_query_portmap_cbk] 0-drslk-prod-client-10:
failed to get the port number for remote subvolume. Please run 'gluster
volume status' on server to see if br