Re: [Gluster-users] transport endpoint not connected on just 2 files

2022-06-07 Thread Kingsley Tart
Hi,

Thanks - sorry for the late reply - I was suddenly swamped with other
work then it was a UK holiday.

I've tried rsync -A -X with the volume stopped, then restarted it. Will
see whether it heals.

Cheers,
Kingsley.

On Mon, 2022-05-30 at 18:41 +, Strahil Nikolov wrote:
> Make a backup from all bricks. Based on the info 2 of the bricks have
> the same copy while brickC has another copy (gfid mismatch).
> 
> I would use mtime to identify the latest version and use that, but I
> have no clue what kind of application you have.
> 
> Usually, It's not recommended to manipulate bricks directly, but in
> this case it might be necessary. The simplest way is to move the file
> on brick C (the only one that is different) away, but if you need
> exactly that one, you can rsync/scp it to the other 2 bricks.
> 
> 
> Best Regards,
> Strahil Nikolov
> 
> > On Fri, May 27, 2022 at 11:45, Kingsley Tart
> >  wrote:
> > Hi, thanks.
> > 
> > OK that's interesting. Picking one of the files, on bricks A and B
> > I see this (and all of the values are identical between bricks A
> > and B):
> > 
> > trusted.afr.dirty=0x
> > trusted.afr.gw-runqueues-client-2=0x00010002
> > trusted.gfid=0xa40bb83ff3784ae09c997d272296a7a9
> > trusted.gfid2path.06eddbe9be9c7c75=0x30323665396561652d613661662d34
> > 6365642d623863632d626135303739646364372f677733
> > trusted.glusterfs.mdata=0x01628ec577000
> > 0007168bb628ec576628ec57600
> > 00
> > 
> > and on brick C I see this:
> > 
> > trusted.gfid=0xd73992aee03e4021824b1baced973df3
> > trusted.gfid2path.06eddbe9be9c7c75=0x30323665396561652d613661662d34
> > 6365642d623863632d626135303739646364372f677733
> > trusted.glusterfs.mdata=0x01628ec523000
> > 030136ca0628ec523628ec52300
> > 00
> > 
> > So brick C is missing the trusted.afr attributes and the
> > trusted.gfid and mdata differ.
> > 
> > What do I need to do to fix this?
> > 
> > Cheers,
> > Kingsley.
> > 
> > On Fri, 2022-05-27 at 03:59 +, Strahil Nikolov wrote:
> > > Check the file attributes on all bricks:
> > > 
> > > getfattr -d -e hex -m. /data/brick/gw-runqueues/
> > > 
> > > 
> > > Best Regards,
> > > Strahil Nikolov
> > > 
> > > > On Thu, May 26, 2022 at 16:05, Kingsley Tart
> > > >  wrote:
> > > > Hi,
> > > > 
> > > > I've got a strange issue where on all clients I've tested on
> > > > (tested on
> > > > 4) I have "transport endpoint is not connected" on two files in
> > > > a
> > > > directory, whereas other files can be read fine.
> > > > 
> > > > Any ideas?
> > > > 
> > > > On one of the servers (all same version):
> > > > 
> > > > # gluster --version
> > > > glusterfs 9.1
> > > > 
> > > > On one of the clients (same thing with all of them) - problem
> > > > with
> > > > files "gw3" and "gw11":
> > > > 
> > > > [root@gw6 btl]# cd /mnt/runqueues/runners/
> > > > [root@gw6 runners]# ls -la
> > > > ls: cannot access gw11: Transport endpoint is not connected
> > > > ls: cannot access gw3: Transport endpoint is not connected
> > > > total 8
> > > > drwxr-xr-x  2 root root 4096 May 26 09:48 .
> > > > drwxr-xr-x 13 root root 4096 Apr 12  2021 ..
> > > > -rw-r--r--  1 root root0 May 26 09:49 gw1
> > > > -rw-r--r--  1 root root0 May 26 09:49 gw10
> > > > -?  ? ??  ?? gw11
> > > > -rw-r--r--  1 root root0 May 26 09:49 gw2
> > > > -?  ? ??  ?? gw3
> > > > -rw-r--r--  1 root root0 May 26 09:49 gw4
> > > > -rw-r--r--  1 root root0 May 26 09:49 gw6
> > > > -rw-r--r--  1 root root0 May 26 09:49 gw7
> > > > [root@gw6 runners]# cat *
> > > > cat: gw11: Transport endpoint is not connected
> > > > cat: gw3: Transport endpoint is not connected
> > > > [root@gw6 runners]#
> > > > 
> > > > 
> > > > Querying on a server shows those two problematic files:
> > > > 
> > > > # gluster volume heal gw-runqueues info
> > > > Brick gluster9a:/data/brick/gw-runqueues
> > > > /runners
> > > > /runners/gw11
> > > > /runners/gw3
> > > > Status: Connected
> > > > Number of entries: 3
> > > > 
> > > > Brick gluster9b:/data/brick/gw-runqueues
> > > > /runners
> > > > /runners/gw11
> > > > /runners/gw3
> > > > Status: Connected
> > > > Number of entries: 3
> > > > 
> > > > Brick gluster9c:/data/brick/gw-runqueues
> > > > Status: Connected
> > > > Number of entries: 0
> > > > 
> > > > 
> > > > However several hours later there's no obvious change. The
> > > > servers have
> > > > hardly any load and the volume is tiny. From a client:
> > > > 
> > > > # find /mnt/runqueues | wc -l
> > > > 35
> > > > 
> > > > 
> > > > glfsheal-gw-runqueues.log from server gluster9a:
> > > > https://pastebin.com/7mPszBBM
> > > > 
> > > > glfsheal-gw-runqueues.log from server gluster9b:
> > > > https://pastebin.com/rxXs5Tcv
> > > > 
> > > > 
> > > > Any pointers would be much appreciated!
> > > > 
> > > > Cheers,

Re: [Gluster-users] transport endpoint not connected on just 2 files

2022-05-27 Thread Kingsley Tart
Hi, thanks.

OK that's interesting. Picking one of the files, on bricks A and B I
see this (and all of the values are identical between bricks A and B):

trusted.afr.dirty=0x
trusted.afr.gw-runqueues-client-2=0x00010002
trusted.gfid=0xa40bb83ff3784ae09c997d272296a7a9
trusted.gfid2path.06eddbe9be9c7c75=0x30323665396561652d613661662d346365642d623863632d626135303739646364372f677733
trusted.glusterfs.mdata=0x01628ec577007168bb628ec576628ec576

and on brick C I see this:

trusted.gfid=0xd73992aee03e4021824b1baced973df3
trusted.gfid2path.06eddbe9be9c7c75=0x30323665396561652d613661662d346365642d623863632d626135303739646364372f677733
trusted.glusterfs.mdata=0x01628ec52330136ca0628ec523628ec523

So brick C is missing the trusted.afr attributes and the trusted.gfid
and mdata differ.

What do I need to do to fix this?

Cheers,
Kingsley.

On Fri, 2022-05-27 at 03:59 +, Strahil Nikolov wrote:
> Check the file attributes on all bricks:
> 
> getfattr -d -e hex -m. /data/brick/gw-runqueues/
> 
> 
> Best Regards,
> Strahil Nikolov
> 
> > On Thu, May 26, 2022 at 16:05, Kingsley Tart
> >  wrote:
> > Hi,
> > 
> > I've got a strange issue where on all clients I've tested on
> > (tested on
> > 4) I have "transport endpoint is not connected" on two files in a
> > directory, whereas other files can be read fine.
> > 
> > Any ideas?
> > 
> > On one of the servers (all same version):
> > 
> > # gluster --version
> > glusterfs 9.1
> > 
> > On one of the clients (same thing with all of them) - problem with
> > files "gw3" and "gw11":
> > 
> > [root@gw6 btl]# cd /mnt/runqueues/runners/
> > [root@gw6 runners]# ls -la
> > ls: cannot access gw11: Transport endpoint is not connected
> > ls: cannot access gw3: Transport endpoint is not connected
> > total 8
> > drwxr-xr-x  2 root root 4096 May 26 09:48 .
> > drwxr-xr-x 13 root root 4096 Apr 12  2021 ..
> > -rw-r--r--  1 root root0 May 26 09:49 gw1
> > -rw-r--r--  1 root root0 May 26 09:49 gw10
> > -?  ? ??  ?? gw11
> > -rw-r--r--  1 root root0 May 26 09:49 gw2
> > -?  ? ??  ?? gw3
> > -rw-r--r--  1 root root0 May 26 09:49 gw4
> > -rw-r--r--  1 root root0 May 26 09:49 gw6
> > -rw-r--r--  1 root root0 May 26 09:49 gw7
> > [root@gw6 runners]# cat *
> > cat: gw11: Transport endpoint is not connected
> > cat: gw3: Transport endpoint is not connected
> > [root@gw6 runners]#
> > 
> > 
> > Querying on a server shows those two problematic files:
> > 
> > # gluster volume heal gw-runqueues info
> > Brick gluster9a:/data/brick/gw-runqueues
> > /runners
> > /runners/gw11
> > /runners/gw3
> > Status: Connected
> > Number of entries: 3
> > 
> > Brick gluster9b:/data/brick/gw-runqueues
> > /runners
> > /runners/gw11
> > /runners/gw3
> > Status: Connected
> > Number of entries: 3
> > 
> > Brick gluster9c:/data/brick/gw-runqueues
> > Status: Connected
> > Number of entries: 0
> > 
> > 
> > However several hours later there's no obvious change. The servers
> > have
> > hardly any load and the volume is tiny. From a client:
> > 
> > # find /mnt/runqueues | wc -l
> > 35
> > 
> > 
> > glfsheal-gw-runqueues.log from server gluster9a:
> > https://pastebin.com/7mPszBBM
> > 
> > glfsheal-gw-runqueues.log from server gluster9b:
> > https://pastebin.com/rxXs5Tcv
> > 
> > 
> > Any pointers would be much appreciated!
> > 
> > Cheers,
> > Kingsley.
> > 
> > 
> > 
> > 
> > 
> > Community Meeting Calendar:
> > 
> > Schedule -
> > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> > Bridge: https://meet.google.com/cpu-eiue-hvk
> > Gluster-users mailing list
> > Gluster-users@gluster.org
> > https://lists.gluster.org/mailman/listinfo/gluster-users
> 
> 
> 
> 
> 
> Community Meeting Calendar:
> 
> Schedule -
> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> Bridge: https://meet.google.com/cpu-eiue-hvk
> Gluster-users mailing list
> Gluster-users@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users




Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Transport Endpoint Not Connected When Writing a Lot of Files

2019-11-08 Thread DUCARROZ Birgit

Hi again,

This time I seem to have more information:
I get the error also when there is not a lot of network traffic.
Actually, the following broadcast msg was sent:

root@nas20:/var/log/glusterfs#
Broadcast message from systemd-journald@nas20 (Fri 2019-11-08 12:20:25 CET):

bigdisk-brick1-vol-users[6115]: [2019-11-08 11:20:25.849956] M [MSGID: 
113075] [posix-helpers.c:1962:posix_health_check_thread_proc] 
0-vol-users-posix: health-check failed, going down



Broadcast message from systemd-journald@dnas20 (Fri 2019-11-08 12:20:25 
CET):


bigdisk-brick1-vol-users[6115]: [2019-11-08 11:20:25.850170] M [MSGID: 
113075] [posix-helpers.c:1981:posix_health_check_thread_proc] 
0-vol-users-posix: still alive! -> SIGTERM


The only thing that helps is to stop and restart the volume with a mount -a

I already did a fsck on the underlying volume - there seems to be no errors.

Any ideas?

Kind regards,
Birgit


On 16/10/19 10:18, DUCARROZ Birgit wrote:

Thank you for your response!

it does not reply.
but all other servers neither reply to ping  24007
pinged this from each server to each server, no reply at all.
even with firewall disabled it does not reply.

root@diufnas22:/home/diuf-sysadmin# netstat -tulpe |grep 24007
tcp    0  0 *:24007 *:* LISTEN  root   
47716960    141881/glusterd


Firewall:
8458,24007,24008,49150,49151,49152,49153,49154,49155,49156,49157,49158/tcp 
(v6) on bond1 ALLOW   Anywhere (v6)
8458,24007,24008,49150,49151,49152,49153,49154,49155,49156,49157,49158/tcp 
(v6) on bond0 ALLOW   Anywhere (v6)


Kind regards,
Birgit


On 16/10/19 05:59, Amar Tumballi wrote:
Went through the design. In my opinion, this makes sense, ie, as long 
as you can use better faster network as alternative to reach server, 
it is fine. And considering gluster servers are stateless, that 
shouldn't cause a problem.


Coming to the my suspicion of an issue of network in this particular 
issue comes from the log which mentions 'No route to host'  in the 
log. That particular log prints errno from RPC layer's connect() 
system call.


Snippet from `man 3 connect`

        EHOSTUNREACH
               The destination host cannot be reached (probably
    because the  host  is  down  or  a  remote
               router cannot reach it).


It may be working from few servers, but on the ones you are getting 
this error, can you check if you can reach the server (specific IP), 
using 'ping' or 'telnet  24007' ?


Regards,
Amar




On Tue, Oct 15, 2019 at 5:00 PM DUCARROZ Birgit 
mailto:birgit.ducar...@unifr.ch>> wrote:


    Hi,

    I send you this mail without sending it to the gluster mailing list
    because of the pdf attachment (I do not want it to be published).

    Do you think it might be because of the different IP address of
    diufnas22 where the arbiter brick 3 is installed?

    2 hosts communicate with internal 192.168.x.x Network whereas the 3rd
    host diufnas22 is connected to the two other hosts via a switch, 
using

    ip address 134.21.x.x. Both networks have a speed of 20gb (2 x 10gb
    using lacp bond).

    (The scheme shows this).

    I would be able to remove the 192.168.x.x network, but my aim was to
    speed up the network using an internal communication between bricks
    1 and 2.

    If this is really the problem, why does the installation mostly works
    and crush down when there is heavy network usage with a lot of small
    files?


    Kind regards
    --     Birgit Ducarroz
    Unix Systems Administration
    Department of Informatics
    University of Fribourg Switzerland
    mailto:birgit.ducar...@unifr.ch 
    Phone: +41 (26) 300 8342
    https://diuf.unifr.ch/people/ducarroz/
    INTRANET / SECURITY NEWS: https://diuf-file.unifr.ch

    On 14/10/19 12:47, Amar Tumballi wrote:
 > One of the host ( 134.21.57.122 ) is
    not
 > reachable from your network. Also checking at the IP, it would 
have

 > gotten resolved to something else than expected. Can you check if
 > 'diufnas22' is properly resolved?
 >
 > -Amar
 >
 > On Mon, Oct 14, 2019 at 3:44 PM DUCARROZ Birgit
 > mailto:birgit.ducar...@unifr.ch>
    >>
    wrote:
 >
 >     Thank you.
 >     I checked the logs but the information was not clear to me.
 >
 >     I add the log of two different crashes. I will do an 
upgrade to

 >     glusterFS 6 in some weeks. Actually I cannot interrupt user
    activity on
 >     these servers since we are in the middle of the uni-semester.
 >
 >     If these logfiles reveal something interesting to you, would
    be nice to
 >     get a hint.
 >
 >
 >     ol-data-client-2. Client process will keep trying to 
connect to

 >     glusterd
 >     until brick's port is available
 >     [2019-09-16 19:05:34.028164] E
    

Re: [Gluster-users] Transport Endpoint Not Connected When Writing a Lot of Files

2019-10-16 Thread DUCARROZ Birgit

Thank you for your response!

it does not reply.
but all other servers neither reply to ping  24007
pinged this from each server to each server, no reply at all.
even with firewall disabled it does not reply.

root@diufnas22:/home/diuf-sysadmin# netstat -tulpe |grep 24007
tcp0  0 *:24007 *:* 
LISTEN  root   47716960141881/glusterd


Firewall:
8458,24007,24008,49150,49151,49152,49153,49154,49155,49156,49157,49158/tcp 
(v6) on bond1 ALLOW   Anywhere (v6)
8458,24007,24008,49150,49151,49152,49153,49154,49155,49156,49157,49158/tcp 
(v6) on bond0 ALLOW   Anywhere (v6)


Kind regards,
Birgit


On 16/10/19 05:59, Amar Tumballi wrote:
Went through the design. In my opinion, this makes sense, ie, as long as 
you can use better faster network as alternative to reach server, it is 
fine. And considering gluster servers are stateless, that shouldn't 
cause a problem.


Coming to the my suspicion of an issue of network in this particular 
issue comes from the log which mentions 'No route to host'  in the log. 
That particular log prints errno from RPC layer's connect() system call.


Snippet from `man 3 connect`

        EHOSTUNREACH
               The destination host cannot be reached (probably
because the  host  is  down  or  a  remote
               router cannot reach it).


It may be working from few servers, but on the ones you are getting this 
error, can you check if you can reach the server (specific IP), using 
'ping' or 'telnet  24007' ?


Regards,
Amar




On Tue, Oct 15, 2019 at 5:00 PM DUCARROZ Birgit 
mailto:birgit.ducar...@unifr.ch>> wrote:


Hi,

I send you this mail without sending it to the gluster mailing list
because of the pdf attachment (I do not want it to be published).

Do you think it might be because of the different IP address of
diufnas22 where the arbiter brick 3 is installed?

2 hosts communicate with internal 192.168.x.x Network whereas the 3rd
host diufnas22 is connected to the two other hosts via a switch, using
ip address 134.21.x.x. Both networks have a speed of 20gb (2 x 10gb
using lacp bond).

(The scheme shows this).

I would be able to remove the 192.168.x.x network, but my aim was to
speed up the network using an internal communication between bricks
1 and 2.

If this is really the problem, why does the installation mostly works
and crush down when there is heavy network usage with a lot of small
files?


Kind regards
-- 
Birgit Ducarroz

Unix Systems Administration
Department of Informatics
University of Fribourg Switzerland
mailto:birgit.ducar...@unifr.ch 
Phone: +41 (26) 300 8342
https://diuf.unifr.ch/people/ducarroz/
INTRANET / SECURITY NEWS: https://diuf-file.unifr.ch

On 14/10/19 12:47, Amar Tumballi wrote:
 > One of the host ( 134.21.57.122 ) is
not
 > reachable from your network. Also checking at the IP, it would have
 > gotten resolved to something else than expected. Can you check if
 > 'diufnas22' is properly resolved?
 >
 > -Amar
 >
 > On Mon, Oct 14, 2019 at 3:44 PM DUCARROZ Birgit
 > mailto:birgit.ducar...@unifr.ch>
>>
wrote:
 >
 >     Thank you.
 >     I checked the logs but the information was not clear to me.
 >
 >     I add the log of two different crashes. I will do an upgrade to
 >     glusterFS 6 in some weeks. Actually I cannot interrupt user
activity on
 >     these servers since we are in the middle of the uni-semester.
 >
 >     If these logfiles reveal something interesting to you, would
be nice to
 >     get a hint.
 >
 >
 >     ol-data-client-2. Client process will keep trying to connect to
 >     glusterd
 >     until brick's port is available
 >     [2019-09-16 19:05:34.028164] E
[rpc-clnt.c:348:saved_frames_unwind]
 >     (-->
 >   
  /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7ff167753ddb]

 >
 >     (-->
/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xc021)[0x7ff167523021]
 >     (-->
/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xc14e)[0x7ff16752314e]
 >     (-->
 >   
  /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x8e)[0x7ff1675246be]

 >
 >     (-->
/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xe268)[0x7ff167525268]
 >     ) 0-vol-data-client-2: forced unwinding frame
type(GlusterFS 4.x
 >     v1)
 >     op(FSTAT(25)) called at 2019-09-16 19:05:28.736873
(xid=0x113aecf)
 >     [2019-09-16 19:05:34.028206] W [MSGID: 114031]
 >     [client-rpc-fops_v2.c:1260:client4_0_fstat_cbk]
0-vol-data-client-2:
 >     remote operation failed [Transport endpoint is not connected]
 >     [2019-09-16 

Re: [Gluster-users] Transport Endpoint Not Connected When Writing a Lot of Files

2019-10-14 Thread Amar Tumballi
One of the host ( 134.21.57.122 ) is not
reachable from your network. Also checking at the IP, it would have gotten
resolved to something else than expected. Can you check if 'diufnas22' is
properly resolved?

-Amar

On Mon, Oct 14, 2019 at 3:44 PM DUCARROZ Birgit 
wrote:

> Thank you.
> I checked the logs but the information was not clear to me.
>
> I add the log of two different crashes. I will do an upgrade to
> glusterFS 6 in some weeks. Actually I cannot interrupt user activity on
> these servers since we are in the middle of the uni-semester.
>
> If these logfiles reveal something interesting to you, would be nice to
> get a hint.
>
>
> ol-data-client-2. Client process will keep trying to connect to glusterd
> until brick's port is available
> [2019-09-16 19:05:34.028164] E [rpc-clnt.c:348:saved_frames_unwind] (-->
> /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7ff167753ddb]
>
> (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xc021)[0x7ff167523021]
> (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xc14e)[0x7ff16752314e]
> (-->
> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x8e)[0x7ff1675246be]
>
> (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xe268)[0x7ff167525268]
> ) 0-vol-data-client-2: forced unwinding frame type(GlusterFS 4.x v1)
> op(FSTAT(25)) called at 2019-09-16 19:05:28.736873 (xid=0x113aecf)
> [2019-09-16 19:05:34.028206] W [MSGID: 114031]
> [client-rpc-fops_v2.c:1260:client4_0_fstat_cbk] 0-vol-data-client-2:
> remote operation failed [Transport endpoint is not connected]
> [2019-09-16 19:05:44.970828] W [rpc-clnt.c:1753:rpc_clnt_submit]
> 0-vol-data-client-2: error returned while attempting to connect to
> host:(null), port:0
> [2019-09-16 19:05:44.971030] W [rpc-clnt.c:1753:rpc_clnt_submit]
> 0-vol-data-client-2: error returned while attempting to connect to
> host:(null), port:0
> [2019-09-16 19:05:44.971165] E [MSGID: 114058]
> [client-handshake.c:1442:client_query_portmap_cbk] 0-vol-data-client-2:
> failed to get the port number for remote subvolume. Please run 'gluster
> volume status' on server to see if brick process is running.
> [2019-09-16 19:05:47.971375] W [rpc-clnt.c:1753:rpc_clnt_submit]
> 0-vol-data-client-2: error returned while attempting to connect to
> host:(null), port:0
>
> [2019-09-16 19:05:44.971200] I [MSGID: 114018]
> [client.c:2254:client_rpc_notify] 0-vol-data-client-2: disconnected from
> vol-data-client-2. Client process will keep trying to connect to
> glusterd until brick's port is available
>
>
>
> [2019-09-17 07:43:44.807182] E [MSGID: 114058]
> [client-handshake.c:1442:client_query_portmap_cbk] 0-vol-data-client-0:
> failed to get the port number for remote subvolume. Please run 'gluster
> volume status' on server to see if brick process is running.
> [2019-09-17 07:43:44.807217] I [MSGID: 114018]
> [client.c:2254:client_rpc_notify] 0-vol-data-client-0: disconnected from
> vol-data-client-0. Client process will keep trying to connect to
> glusterd until brick's port is available
> [2019-09-17 07:43:44.807228] E [MSGID: 108006]
> [afr-common.c:5413:__afr_handle_child_down_event]
> 0-vol-data-replicate-0: All subvolumes are down. Going offline until
> atleast one of them comes back up.
> Final graph:
>
> +--+
>1: volume vol-data-client-0
>2: type protocol/client
>3: option ping-timeout 42
>4: option remote-host diufnas20
>5: option remote-subvolume /bigdisk/brick1/vol-data
>6: option transport-type socket
>7: option transport.address-family inet
>8: option username a14ffa1b-b64e-410c-894d-435c18e81b2d
>9: option password 37ba4281-166d-40fd-9ef0-08a187d1107b
>   10: option transport.tcp-user-timeout 0
>   11: option transport.socket.keepalive-time 20
>   12: option transport.socket.keepalive-interval 2
>   13: option transport.socket.keepalive-count 9
>   14: option send-gids true
>   15: end-volume
>   16:
>   17: volume vol-data-client-1
>   18: type protocol/client
>   19: option ping-timeout 42
>   20: option remote-host diufnas21
>   21: option remote-subvolume /bigdisk/brick2/vol-data
>   22: option transport-type socket
>   23: option transport.address-family inet
>   24: option username a14ffa1b-b64e-410c-894d-435c18e81b2d
>   25: option password 37ba4281-166d-40fd-9ef0-08a187d1107b
>   26: option transport.tcp-user-timeout 0
>   27: option transport.socket.keepalive-time 20
> 29: option transport.socket.keepalive-count 9
>   30: option send-gids true
>   31: end-volume
>   32:
>   33: volume vol-data-client-2
>   34: type protocol/client
>   35: option ping-timeout 42
>   36: option remote-host diufnas22
>   37: option remote-subvolume /bigdisk/brick3/vol-data
>   38: option transport-type socket
>   39: option transport.address-family 

Re: [Gluster-users] Transport Endpoint Not Connected When Writing a Lot of Files

2019-10-14 Thread DUCARROZ Birgit

Thank you.
I checked the logs but the information was not clear to me.

I add the log of two different crashes. I will do an upgrade to 
glusterFS 6 in some weeks. Actually I cannot interrupt user activity on 
these servers since we are in the middle of the uni-semester.


If these logfiles reveal something interesting to you, would be nice to 
get a hint.



ol-data-client-2. Client process will keep trying to connect to glusterd 
until brick's port is available
[2019-09-16 19:05:34.028164] E [rpc-clnt.c:348:saved_frames_unwind] (--> 
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7ff167753ddb] 
(--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xc021)[0x7ff167523021] 
(--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xc14e)[0x7ff16752314e] 
(--> 
/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x8e)[0x7ff1675246be] 
(--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xe268)[0x7ff167525268] 
) 0-vol-data-client-2: forced unwinding frame type(GlusterFS 4.x v1) 
op(FSTAT(25)) called at 2019-09-16 19:05:28.736873 (xid=0x113aecf)
[2019-09-16 19:05:34.028206] W [MSGID: 114031] 
[client-rpc-fops_v2.c:1260:client4_0_fstat_cbk] 0-vol-data-client-2: 
remote operation failed [Transport endpoint is not connected]
[2019-09-16 19:05:44.970828] W [rpc-clnt.c:1753:rpc_clnt_submit] 
0-vol-data-client-2: error returned while attempting to connect to 
host:(null), port:0
[2019-09-16 19:05:44.971030] W [rpc-clnt.c:1753:rpc_clnt_submit] 
0-vol-data-client-2: error returned while attempting to connect to 
host:(null), port:0
[2019-09-16 19:05:44.971165] E [MSGID: 114058] 
[client-handshake.c:1442:client_query_portmap_cbk] 0-vol-data-client-2: 
failed to get the port number for remote subvolume. Please run 'gluster 
volume status' on server to see if brick process is running.
[2019-09-16 19:05:47.971375] W [rpc-clnt.c:1753:rpc_clnt_submit] 
0-vol-data-client-2: error returned while attempting to connect to 
host:(null), port:0


[2019-09-16 19:05:44.971200] I [MSGID: 114018] 
[client.c:2254:client_rpc_notify] 0-vol-data-client-2: disconnected from 
vol-data-client-2. Client process will keep trying to connect to 
glusterd until brick's port is available




[2019-09-17 07:43:44.807182] E [MSGID: 114058] 
[client-handshake.c:1442:client_query_portmap_cbk] 0-vol-data-client-0: 
failed to get the port number for remote subvolume. Please run 'gluster 
volume status' on server to see if brick process is running.
[2019-09-17 07:43:44.807217] I [MSGID: 114018] 
[client.c:2254:client_rpc_notify] 0-vol-data-client-0: disconnected from 
vol-data-client-0. Client process will keep trying to connect to 
glusterd until brick's port is available
[2019-09-17 07:43:44.807228] E [MSGID: 108006] 
[afr-common.c:5413:__afr_handle_child_down_event] 
0-vol-data-replicate-0: All subvolumes are down. Going offline until 
atleast one of them comes back up.

Final graph:
+--+
  1: volume vol-data-client-0
  2: type protocol/client
  3: option ping-timeout 42
  4: option remote-host diufnas20
  5: option remote-subvolume /bigdisk/brick1/vol-data
  6: option transport-type socket
  7: option transport.address-family inet
  8: option username a14ffa1b-b64e-410c-894d-435c18e81b2d
  9: option password 37ba4281-166d-40fd-9ef0-08a187d1107b
 10: option transport.tcp-user-timeout 0
 11: option transport.socket.keepalive-time 20
 12: option transport.socket.keepalive-interval 2
 13: option transport.socket.keepalive-count 9
 14: option send-gids true
 15: end-volume
 16:
 17: volume vol-data-client-1
 18: type protocol/client
 19: option ping-timeout 42
 20: option remote-host diufnas21
 21: option remote-subvolume /bigdisk/brick2/vol-data
 22: option transport-type socket
 23: option transport.address-family inet
 24: option username a14ffa1b-b64e-410c-894d-435c18e81b2d
 25: option password 37ba4281-166d-40fd-9ef0-08a187d1107b
 26: option transport.tcp-user-timeout 0
 27: option transport.socket.keepalive-time 20
29: option transport.socket.keepalive-count 9
 30: option send-gids true
 31: end-volume
 32:
 33: volume vol-data-client-2
 34: type protocol/client
 35: option ping-timeout 42
 36: option remote-host diufnas22
 37: option remote-subvolume /bigdisk/brick3/vol-data
 38: option transport-type socket
 39: option transport.address-family inet
 40: option username a14ffa1b-b64e-410c-894d-435c18e81b2d
 41: option password 37ba4281-166d-40fd-9ef0-08a187d1107b
 42: option transport.tcp-user-timeout 0
 43: option transport.socket.keepalive-time 20
 44: option transport.socket.keepalive-interval 2
 45: option transport.socket.keepalive-count 9
 46: option send-gids true
 47: end-volume
 48:
49: volume vol-data-replicate-0
 50: type cluster/replicate
 51: option afr-pending-xattr 

Re: [Gluster-users] Transport Endpoint Not Connected When Writing a Lot of Files

2019-10-13 Thread Amar Tumballi
'Transport endpoint not connected' (ie, ENOTCONN) comes when the n/w
connection is not established between client and the server. I recommend
checking the logs for particular reason. Specially the brick (server side)
logs will have some hints on this.

About the crash, we treat it as a bug. Considering there is no specific
backtrace, or logs shared with the email, it is hard to tell if it is
already fixed in higher version or not.

Considering you are in 4.1.8 version, and there are many releases done
after that, upgrading also can be an option.

Regards,
Amar


On Fri, Oct 11, 2019 at 4:13 PM DUCARROZ Birgit 
wrote:

> Hi list,
>
> Does anyone know what I can do to avoid "Transport Endpoint not
> connected" (and then to get a blocked server) when writing a lot of
> small files on a volume?
>
> I'm running glusterfs 4.1.8 on 6 servers. With 3 servers I never have
> problems, but the other 3 servers are acting as HA storage for people
> who write sometimes a thousands of small files. This seems to provoke a
> crash of the gluster daemon.
>
> I have 3 bricks whereas the 3rd brick acts as arbiter.
>
>
> # Location of the bricks:
> #---$HOST1---  ---$HOST3---
> # brick1|  | brick3   | brick3 = arbiter
> #   |  |  |
> #---$HOST2---  
> # brick2|
> #
>
> Checked:
> The underlying ext4 filesystem and the HD's seem to be without errors.
> The ports in the firewall should not be the problem since it occurs also
> when the firewall is disabled.
>
> Any help appreciated!
> Kind regards,
> Birgit
> 
>
> Community Meeting Calendar:
>
> APAC Schedule -
> Every 2nd and 4th Tuesday at 11:30 AM IST
> Bridge: https://bluejeans.com/118564314
>
> NA/EMEA Schedule -
> Every 1st and 3rd Tuesday at 01:00 PM EDT
> Bridge: https://bluejeans.com/118564314
>
> Gluster-users mailing list
> Gluster-users@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
>


Community Meeting Calendar:

APAC Schedule -
Every 2nd and 4th Tuesday at 11:30 AM IST
Bridge: https://bluejeans.com/118564314

NA/EMEA Schedule -
Every 1st and 3rd Tuesday at 01:00 PM EDT
Bridge: https://bluejeans.com/118564314

Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] transport endpoint not connected and sudden unmount

2018-06-28 Thread Nithya Balachandran
Hi,

There should be a coredump for the crash. Please open it in gdb and send us
the bt (after installing the debuginfo so we can see the symbols).

Thanks,
Nithya

On 27 June 2018 at 19:49, Brian Andrus  wrote:

> All,
>
> I have a gluster filesystem (glusterfs-4.0.2-1, Type:
> Distributed-Replicate, Number of Bricks: 5 x 3 = 15)
>
> I have one directory that is used for slurm statefiles, which seems to get
> out of sync fairly often. There are particular files that end up never
> healing.
>
> Since the files are ephemeral, I'm ok with losing them (for now).
> Following some advice, I deleted UUID files that were in
> /GLUSTER/brick1/.glusterfs/indices/xattrop/
>
> This makes gluster volume heal GDATA statistics heal-count show no issues,
> however the issue is still there. Even though nothing is showing up with
> gluster volume heal GDATA info, there are some files/directories that, if I
> try to access them at all, I get "Transport endpoint is not connected"
> There is even a directory, which is empty but if I try to 'rmdir' it, I
> get "rmdir: failed to remove ‘/DATA/slurmstate.old/slurm/’: Software caused
> connection abort" and the mount goes bad. I have to umount/mount it to get
> it back.
>
> There is a bit of info in the log file that has to do with the crash which
> is attached.
>
> How do I clean this up? And what is the 'proper' way to handle when you
> have a file that will not heal even in a 3-way replicate?
>
> Brian Andrus
>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Transport Endpoint Not connected while running sysbench on Gluster Volume

2017-06-15 Thread Ben Turner
- Original Message -
> From: "Ben Turner" <btur...@redhat.com>
> To: "Julio Guevara" <julioguevara...@gmail.com>
> Cc: gluster-users@gluster.org
> Sent: Thursday, June 15, 2017 6:10:58 PM
> Subject: Re: [Gluster-users] Transport Endpoint Not connected while running 
> sysbench on Gluster Volume
> 
> 
> 
> - Original Message -
> > From: "Julio Guevara" <julioguevara...@gmail.com>
> > To: "Ben Turner" <btur...@redhat.com>
> > Sent: Thursday, June 15, 2017 5:52:26 PM
> > Subject: Re: [Gluster-users] Transport Endpoint Not connected while running
> > sysbench on Gluster Volume
> > 
> > I stumble upon the problem.
> > 
> > We are using deep security agent (da_agent) as our main antivirus. When the
> > antivirus gets activated it installs kernel modules:
> >   redirfs
> >   gsch
> > 
> > Apparently when this modules are present and loaded to the kernel, I see
> > all the issues that i have described here.
> > Once I uninstall the agent and reboot the system (To make sure modules are
> > unloaded) glusterfs works without any issue.
> > This is the sofware version that i'm using if it is useful for anybody:
> > 
> >   CentOS 6.8
> >   kernel2.6.32-696.3.1.el6
> >   ds_agent   9.6.2-7723.el6 tried with ds_agent 9.6.2-7888.el6
> >  same issue.
> >   glusterfs-server  3.8.12-1.el6
> > 
> > @Ben the tail I sent before includes both server and client logs, even
> > bricks.
> 
> Hmm, maybe the security SW is killing / interfering some how with the gluster
> stack?  Do you know the expected behavior of the antivirus when is sees
> binaries and / or behavior it doesn't recognize?  Maybe FUSE being in user
> space is tripping it up?  Is there any way to configure the anitvirus to
> white list / not interfere with the components of the gluster stack?

I just did a quick google and saw:

http://docs.trendmicro.com/all/ent/ds/v9.5_sp1/en-us/DS_Agent-Linux_9.5_SP1_readme.txt

   - Anti-Malware is unable to scan fuse-based file-system if the 
 mount owner is not root, and the mount does not allow other users to 
 access. [26265]

So it would appear that there have been some issues with FUSE based file 
systems.  It may be worth reaching out to the vendor if you have support and 
see if there are any known issues with FUSE based systems.  In the meantime you 
may want to try NFS if you NEED the antivirus else you could leave it disabled 
until you get the issue sorted.

-b


> 
> -b
> 
> 
> > 
> > Thanks
> > Julio Guevara
> > 
> > On Wed, Jun 14, 2017 at 11:11 PM, Ben Turner <btur...@redhat.com> wrote:
> > 
> > > - Original Message -
> > > > From: "Julio Guevara" <julioguevara...@gmail.com>
> > > > To: gluster-users@gluster.org
> > > > Sent: Tuesday, June 13, 2017 4:43:06 PM
> > > > Subject: [Gluster-users] Transport Endpoint Not connected while running
> > >  sysbench on Gluster Volume
> > > >
> > > > I'm having a hard time trying to get a gluster volume up and running. I
> > > have
> > > > setup other gluster volumes on other systems without much problems but
> > > this
> > > > one is killing me.
> > > >
> > > > The gluster vol was created with the command:
> > > > gluster volume create mariadb_gluster_volume
> > > > laeft-dccdb01p:/export/mariadb/brick
> > > >
> > > > I had to lower frame-timeout since the system would become unresponsive
> > > until
> > > > the frame failed by timeout:
> > > > gluster volume set mariadb_gluster_volume networking.frame-timeout 5
> > > >
> > > > running gluster version: glusterfs 3.8.12
> > > >
> > > > The workload i'm using is: sysbench --test=fileio --file-total-size=4G
> > > > --file-num=64 prepare
> > > >
> > > > sysbench version: sysbench 0.4.12-5.el6
> > > >
> > > > kernel version: 2.6.32-696.1.1.el6
> > > >
> > > > centos: 6.8
> > > >
> > > > Issue: Whenever I run the sysbench over the mount
> > > > /var/lib/mysql_backups
> > > I
> > > > get the error that is shown on the log output.
> > > >
> > > > It is a constant issue, I can reproduce it when I start increasing the
> > > > --file-num for sysbench above 3.
> > >
> > > It looks like you may be seeing a crash.  If you look at
> > > /var/log/messages
> >

Re: [Gluster-users] Transport Endpoint Not connected while running sysbench on Gluster Volume

2017-06-15 Thread Ben Turner


- Original Message -
> From: "Julio Guevara" <julioguevara...@gmail.com>
> To: "Ben Turner" <btur...@redhat.com>
> Sent: Thursday, June 15, 2017 5:52:26 PM
> Subject: Re: [Gluster-users] Transport Endpoint Not connected while running 
> sysbench on Gluster Volume
> 
> I stumble upon the problem.
> 
> We are using deep security agent (da_agent) as our main antivirus. When the
> antivirus gets activated it installs kernel modules:
>   redirfs
>   gsch
> 
> Apparently when this modules are present and loaded to the kernel, I see
> all the issues that i have described here.
> Once I uninstall the agent and reboot the system (To make sure modules are
> unloaded) glusterfs works without any issue.
> This is the sofware version that i'm using if it is useful for anybody:
> 
>   CentOS 6.8
>   kernel2.6.32-696.3.1.el6
>   ds_agent   9.6.2-7723.el6 tried with ds_agent 9.6.2-7888.el6
>  same issue.
>   glusterfs-server  3.8.12-1.el6
> 
> @Ben the tail I sent before includes both server and client logs, even
> bricks.

Hmm, maybe the security SW is killing / interfering some how with the gluster 
stack?  Do you know the expected behavior of the antivirus when is sees 
binaries and / or behavior it doesn't recognize?  Maybe FUSE being in user 
space is tripping it up?  Is there any way to configure the anitvirus to white 
list / not interfere with the components of the gluster stack?

-b


> 
> Thanks
> Julio Guevara
> 
> On Wed, Jun 14, 2017 at 11:11 PM, Ben Turner <btur...@redhat.com> wrote:
> 
> > - Original Message -
> > > From: "Julio Guevara" <julioguevara...@gmail.com>
> > > To: gluster-users@gluster.org
> > > Sent: Tuesday, June 13, 2017 4:43:06 PM
> > > Subject: [Gluster-users] Transport Endpoint Not connected while running
> >  sysbench on Gluster Volume
> > >
> > > I'm having a hard time trying to get a gluster volume up and running. I
> > have
> > > setup other gluster volumes on other systems without much problems but
> > this
> > > one is killing me.
> > >
> > > The gluster vol was created with the command:
> > > gluster volume create mariadb_gluster_volume
> > > laeft-dccdb01p:/export/mariadb/brick
> > >
> > > I had to lower frame-timeout since the system would become unresponsive
> > until
> > > the frame failed by timeout:
> > > gluster volume set mariadb_gluster_volume networking.frame-timeout 5
> > >
> > > running gluster version: glusterfs 3.8.12
> > >
> > > The workload i'm using is: sysbench --test=fileio --file-total-size=4G
> > > --file-num=64 prepare
> > >
> > > sysbench version: sysbench 0.4.12-5.el6
> > >
> > > kernel version: 2.6.32-696.1.1.el6
> > >
> > > centos: 6.8
> > >
> > > Issue: Whenever I run the sysbench over the mount /var/lib/mysql_backups
> > I
> > > get the error that is shown on the log output.
> > >
> > > It is a constant issue, I can reproduce it when I start increasing the
> > > --file-num for sysbench above 3.
> >
> > It looks like you may be seeing a crash.  If you look at /var/log/messages
> > on all of the clients / servers do you see any crashes / seg faults / ABRT
> > messages in the log?  If so can you open a BZ with the core / other info
> > here?  Here is an example of a crash on one of the bricks:
> >
> > http://lists.gluster.org/pipermail/gluster-users/2016-February/025460.html
> >
> > My guess is something is happening client sidesince we don't see anything
> > in the server logs, check the client mount
> > log(/var/log/glusterfs/.log
> > and the messages file on your client.  Also check messages on the servers.
> > If you see anything shoot us out the info and lets get a BZ open, if not
> > maybe someone else on the list has some other ideas.
> >
> > -b
> >
> > >
> > >
> > >
> > > ___
> > > Gluster-users mailing list
> > > Gluster-users@gluster.org
> > > http://lists.gluster.org/mailman/listinfo/gluster-users
> >
> 
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Transport Endpoint Not connected while running sysbench on Gluster Volume

2017-06-14 Thread Ben Turner
- Original Message -
> From: "Julio Guevara" 
> To: gluster-users@gluster.org
> Sent: Tuesday, June 13, 2017 4:43:06 PM
> Subject: [Gluster-users] Transport Endpoint Not connected while running   
> sysbench on Gluster Volume
> 
> I'm having a hard time trying to get a gluster volume up and running. I have
> setup other gluster volumes on other systems without much problems but this
> one is killing me.
> 
> The gluster vol was created with the command:
> gluster volume create mariadb_gluster_volume
> laeft-dccdb01p:/export/mariadb/brick
> 
> I had to lower frame-timeout since the system would become unresponsive until
> the frame failed by timeout:
> gluster volume set mariadb_gluster_volume networking.frame-timeout 5
> 
> running gluster version: glusterfs 3.8.12
> 
> The workload i'm using is: sysbench --test=fileio --file-total-size=4G
> --file-num=64 prepare
> 
> sysbench version: sysbench 0.4.12-5.el6
> 
> kernel version: 2.6.32-696.1.1.el6
> 
> centos: 6.8
> 
> Issue: Whenever I run the sysbench over the mount /var/lib/mysql_backups I
> get the error that is shown on the log output.
> 
> It is a constant issue, I can reproduce it when I start increasing the
> --file-num for sysbench above 3.

It looks like you may be seeing a crash.  If you look at /var/log/messages on 
all of the clients / servers do you see any crashes / seg faults / ABRT 
messages in the log?  If so can you open a BZ with the core / other info here?  
Here is an example of a crash on one of the bricks:

http://lists.gluster.org/pipermail/gluster-users/2016-February/025460.html

My guess is something is happening client sidesince we don't see anything in 
the server logs, check the client mount 
log(/var/log/glusterfs/.log and the messages file on your client.  
Also check messages on the servers.  If you see anything shoot us out the info 
and lets get a BZ open, if not maybe someone else on the list has some other 
ideas.

-b

> 
> 
> 
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Transport Endpoint Not connected while running sysbench on Gluster Volume

2017-06-14 Thread Julio Guevara
Also, this is the profile output of this Volume:

gluster> volume profile mariadb_gluster_volume info cumulative
Brick: laeft-dccdb01p.core.epay.us.loc:/export/mariadb_backup/brick
---
Cumulative Stats:
   Block Size:  16384b+   32768b+
65536b+
 No. of Reads:0 0
  0
No. of Writes:83391465750
 102911

   Block Size: 131072b+
 No. of Reads:   33
No. of Writes: 8551
 %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls
Fop
 -   ---   ---   ---   
 
  0.00   0.00 us   0.00 us   0.00 us 26
 FORGET
  0.00   0.00 us   0.00 us   0.00 us 33
RELEASE
  0.00   0.00 us   0.00 us   0.00 us 92
 RELEASEDIR
  0.00 102.25 us  70.00 us 155.00 us  4
 SETXATTR
  0.00 159.80 us  97.00 us 207.00 us  5
RMDIR
  0.00 266.75 us 121.00 us 675.00 us  4
SETATTR
  0.00 100.62 us  29.00 us 700.00 us 16
INODELK
  0.00  84.33 us  29.00 us 477.00 us 33
FLUSH
  0.00  68.16 us  34.00 us 165.00 us 92
OPENDIR
  0.01  88.35 us  21.00 us 608.00 us 92
 STAT
  0.01 754.04 us 262.00 us   10104.00 us 25
 CREATE
  0.02 169.04 us  27.00 us 997.00 us179
 READDIRP
  0.04 150.90 us  43.00 us1867.00 us365
 LOOKUP
  0.04   16330.75 us 297.00 us   46360.00 us  4
MKDIR
  0.187896.70 us  71.00 us  256814.00 us 33
 READ
  0.68 466.93 us  19.00 us1848.00 us   2119
 STATFS
  2.39  151339.17 us 227.00 us  540998.00 us 23
 UNLINK
  5.49  320155.28 us2035.00 us 1273394.00 us 25
FSYNC
 31.28  69.00 us  30.00 us   11447.00 us 660603
WRITE
 59.84 10899586.88 us 5827342.00 us 13921169.00 us  8
 OPEN

Duration: 71425 seconds
   Data Read: 4325376 bytes
Data Written: 29195534336 bytes


As you can see OPEN Fop take the most amount of time and they normally
timeout, even with default value for networking.frame-timeout

Thanks

On Tue, Jun 13, 2017 at 3:43 PM, Julio Guevara 
wrote:

> I'm having a hard time trying to get a gluster volume up and running. I
> have setup other gluster volumes on other systems without much problems but
> this one is killing me.
>
> The gluster vol was created with the command:
> gluster volume create mariadb_gluster_volume laeft-dccdb01p:/export/
> mariadb/brick
>
> I had to lower frame-timeout since the system would become unresponsive
> until the frame failed by timeout:
> gluster volume set mariadb_gluster_volume networking.frame-timeout 5
>
> running gluster version: glusterfs 3.8.12
>
> The workload i'm using is: sysbench --test=fileio --file-total-size=4G
> --file-num=64 prepare
>
> sysbench version: sysbench 0.4.12-5.el6
>
> kernel version: 2.6.32-696.1.1.el6
>
> centos: 6.8
>
> Issue: Whenever I run the sysbench over the mount /var/lib/mysql_backups I
> get the error that is shown on the log output.
>
> It is a constant issue, I can reproduce it when I start increasing the
> --file-num for sysbench above 3.
>
>
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] 'Transport endpoint not connected'

2012-06-04 Thread Brian Candler
On Fri, May 04, 2012 at 01:27:35PM +0530, Amar Tumballi wrote:
 Are you sure the clients are not automatically remounted within 10
 seconds of servers coming up? This was working fine from the time we
 had networking code written.
 
 Internally, there is a timer thread which makes sure we
 automatically reconnect after 10seconds.
 
 Please see if you can repeat the operations 2-3 times before doing a
 umount/mount, it should have gotten reconnected.
 
 If not, please file a bug report with the glusterfs logs (of the
 client process).

OK this happened again, bug reported at
https://bugzilla.redhat.com/show_bug.cgi?id=828509

Nothing happened in the client log for each of the attempts to 'ls' the
affected directly, however the client log does have evidence of what looks
like a client-side crash of some sort (sig 11)
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] 'Transport endpoint not connected'

2012-06-04 Thread Activepage Gmail


I Have the same problem with 3.2.6, time to time in a randon basis some 
server give-me the Transport endpoint not connected.


I Have to reboot the server to make it connect again.

I run Fedora 16 and Gluster 3.2.6-2

- Original Message - 
From: Brian Candler b.cand...@pobox.com

To: Amar Tumballi ama...@redhat.com
Cc: gluster-users@gluster.org
Sent: Monday, June 04, 2012 5:07 PM
Subject: Re: [Gluster-users] 'Transport endpoint not connected'



On Fri, May 04, 2012 at 01:27:35PM +0530, Amar Tumballi wrote:

Are you sure the clients are not automatically remounted within 10
seconds of servers coming up? This was working fine from the time we
had networking code written.

Internally, there is a timer thread which makes sure we
automatically reconnect after 10seconds.

Please see if you can repeat the operations 2-3 times before doing a
umount/mount, it should have gotten reconnected.

If not, please file a bug report with the glusterfs logs (of the
client process).


OK this happened again, bug reported at
https://bugzilla.redhat.com/show_bug.cgi?id=828509

Nothing happened in the client log for each of the attempts to 'ls' the
affected directly, however the client log does have evidence of what looks
like a client-side crash of some sort (sig 11)
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users



___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] 'Transport endpoint not connected'

2012-05-04 Thread Amar Tumballi

On 05/04/2012 01:05 PM, Brian Candler wrote:

This should be a pretty easy issue to reproduce, at least it seems to happen
to me very often. (gluster-3.2.5)

After storage backend(s) have been rebooted, the client mounts are often
broken until you unmount and remount.  Example from this morning: I had
rebooted storage servers to upgrade them to ubuntu 12.04.  Now at the client
side:

$ ls /gluster/scratch
ls: cannot access /gluster/scratch: Transport endpoint is not connected
$ ls /gluster/scratch3
dbbuild  DBS
$ sudo umount /gluster/scratch
$ sudo mount /gluster/scratch
$ ls /gluster/scratch
dbbuild
$

Note that /gluster/scratch is a distributed volume (spread across servers
'storage2' and 'storage3'), whereas /gluster/scratch3 is a single brick
(server 'storage3' only).

So *some* of the mounts do seem to automatically reconnect - not all are
affected.

But in future, I think it would be good if the FUSE client could
automatically attempt to reconnect under whatever circumstance causes
'Transport endpoint is not connected'; clearly it *can* reconnect if forced.



Are you sure the clients are not automatically remounted within 10 
seconds of servers coming up? This was working fine from the time we had 
networking code written.


Internally, there is a timer thread which makes sure we automatically 
reconnect after 10seconds.


Please see if you can repeat the operations 2-3 times before doing a 
umount/mount, it should have gotten reconnected.


If not, please file a bug report with the glusterfs logs (of the client 
process).


Regards,
Amar
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] 'Transport endpoint not connected'

2012-05-04 Thread Brian Candler
On Fri, May 04, 2012 at 01:27:35PM +0530, Amar Tumballi wrote:
 Are you sure the clients are not automatically remounted within 10
 seconds of servers coming up? This was working fine from the time we
 had networking code written.
 
 Internally, there is a timer thread which makes sure we
 automatically reconnect after 10seconds.
 
 Please see if you can repeat the operations 2-3 times before doing a
 umount/mount, it should have gotten reconnected.

OK, I'll do that next time. Thanks.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] Transport endpoint not connected

2010-04-28 Thread Joe Warren-Meeks
Hey guys,

Any clues or pointers with this problem? It's occurring every 6 hours or
so.. Anything else I can do to help debug it?

Kind regards

 -- joe.


 -Original Message-
 From: gluster-users-boun...@gluster.org [mailto:gluster-users-
 boun...@gluster.org] On Behalf Of Joe Warren-Meeks
 Sent: 26 April 2010 12:31
 To: Vijay Bellur
 Cc: gluster-users@gluster.org
 Subject: Re: [Gluster-users] Transport endpoint not connected
 
 Here is the relevant crash section:
 
 patchset: v3.0.4
 signal received: 11
 time of crash: 2010-04-23 21:40:40
 configuration details:
 argp 1
 backtrace 1
 dlfcn 1
 fdatasync 1
 libpthread 1
 llistxattr 1
 setfsid 1
 spinlock 1
 epoll.h 1
 xattr.h 1
 st_atim.tv_nsec 1
 package-string: glusterfs 3.0.4
 /lib/libc.so.6[0x7ffd0d809100]
 /usr/local/lib/glusterfs/3.0.4/xlator/performance/read-
 ahead.so(ra_fstat
 +0x82)[0
 x7ffd0c968d22]
 /usr/local/lib/libglusterfs.so.0(default_fstat+0xcb)[0x7ffd0df7411b]
 /usr/local/lib/glusterfs/3.0.4/xlator/performance/quick-
 read.so(qr_fstat
 +0x113)[
 0x7ffd0c5570a3]
 /usr/local/lib/glusterfs/3.0.4/xlator/performance/write-
 behind.so(wb_fst
 at_helpe
 r+0xcb)[0x7ffd0c346adb]
 /usr/local/lib/libglusterfs.so.0(call_resume+0x390)[0x7ffd0df7cf60]
 /usr/local/lib/glusterfs/3.0.4/xlator/performance/write-
 behind.so(wb_res
 ume_othe
 r_requests+0x58)[0x7ffd0c349938]
 /usr/local/lib/glusterfs/3.0.4/xlator/performance/write-
 behind.so(wb_pro
 cess_que
 ue+0xe1)[0x7ffd0c348251]
 /usr/local/lib/glusterfs/3.0.4/xlator/performance/write-
 behind.so(wb_fst
 at+0x20a
 )[0x7ffd0c34a87a]
 /usr/local/lib/libglusterfs.so.0(default_fstat+0xcb)[0x7ffd0df7411b]
 /usr/local/lib/glusterfs/3.0.4/xlator/mount/fuse.so[0x7ffd0bf23a36]
 /usr/local/lib/glusterfs/3.0.4/xlator/mount/fuse.so[0x7ffd0bf246b6]
 /lib/libpthread.so.0[0x7ffd0db3f3f7]
 /lib/libc.so.6(clone+0x6d)[0x7ffd0d8aeb4d]
 
 And Startup section:
 
 -

===
 =
 
 Version  : glusterfs 3.0.4 built on Apr 19 2010 16:37:50
 git: v3.0.4
 Starting Time: 2010-04-26 10:00:59
 Command line : /usr/local/sbin/glusterfs --log-level=NORMAL
 --volfile=/etc/glust
 erfs/repstore1-tcp.vol /data/import
 PID  : 5910
 System name  : Linux
 Nodename : w2
 Kernel Release : 2.6.24-27-server
 Hardware Identifier: x86_64
 
 Given volfile:

+--
 -
 ---+
   1: ## file auto generated by /usr/local/bin/glusterfs-volgen
 (mount.vol)
   2: # Cmd line:
   3: # $ /usr/local/bin/glusterfs-volgen --name repstore1 --raid 1
 10.10.130.11:/data/export 10.10.130.12:/data/export
   4:
   5: # RAID 1
   6: # TRANSPORT-TYPE tcp
   7: volume 10.10.130.12-1
   8: type protocol/client
   9: option transport-type tcp
  10: option remote-host 10.10.130.12
  11: option transport.socket.nodelay on
  12: option transport.remote-port 6996
  13: option remote-subvolume brick1
  14: end-volume
  15:
  16: volume 10.10.130.11-1
  17: type protocol/client
  18: option transport-type tcp
  19: option remote-host 10.10.130.11
  20: option transport.socket.nodelay on
  21: option transport.remote-port 6996
  22: option remote-subvolume brick1
  23: end-volume
  24:
  25: volume mirror-0
  26: type cluster/replicate
  27: subvolumes 10.10.130.11-1 10.10.130.12-1
  28: end-volume
  29:
  30: volume readahead
  31: type performance/read-ahead
  32: option page-count 4
  33: subvolumes mirror-0
  34: end-volume
  35:
  36: volume iocache
  37: type performance/io-cache
  38: option cache-size `echo $(( $(grep 'MemTotal' /proc/meminfo |
 sed 's/[^0-9]//g') / 5120 ))`MB
  39: option cache-timeout 1
  40: subvolumes readahead
 41: end-volume
  42:
  43: volume quickread
  44: type performance/quick-read
  45: option cache-timeout 1
  46: option max-file-size 64kB
  47: subvolumes iocache
  48: end-volume
  49:
  50: volume writebehind
  51: type performance/write-behind
  52: option cache-size 4MB
  53: subvolumes quickread
  54: end-volume
  55:
  56: volume statprefetch
  57: type performance/stat-prefetch
  58: subvolumes writebehind
  59: end-volume
  60:
 
  -Original Message-
  From: Vijay Bellur [mailto:vi...@gluster.com]
  Sent: 22 April 2010 18:40
  To: Joe Warren-Meeks
  Cc: gluster-users@gluster.org
  Subject: Re: [Gluster-users] Transport endpoint not connected
 
  Hi Joe,
 
  Can you please share the complete client log file?
 
  Thanks,
  Vijay
 
 
  Joe Warren-Meeks wrote:
   Hey guys,
  
  
  
   I've recently implemented gluster to share webcontent read-write
  between
   two servers.
  
  
  
   Version  : glusterfs 3.0.4 built on Apr 19 2010 16:37:50
  
   Fuse: 2.7.2-1ubuntu2.1
  
   Platform: ubuntu 8.04LTS
  
  
  
   I used the following command to generate my configs:
  
   /usr/local/bin/glusterfs-volgen --name repstore1

Re: [Gluster-users] Transport endpoint not connected

2010-04-28 Thread Anand Avati
Joe,
  Do you have access to the core dump from the crash? If you do,
please post the output of 'thread apply all bt full' within gdb on the
core.

Thanks,
Avati

On Wed, Apr 28, 2010 at 2:26 PM, Joe Warren-Meeks
j...@encoretickets.co.uk wrote:
 Hey guys,

 Any clues or pointers with this problem? It's occurring every 6 hours or
 so.. Anything else I can do to help debug it?

 Kind regards

  -- joe.


 -Original Message-
 From: gluster-users-boun...@gluster.org [mailto:gluster-users-
 boun...@gluster.org] On Behalf Of Joe Warren-Meeks
 Sent: 26 April 2010 12:31
 To: Vijay Bellur
 Cc: gluster-users@gluster.org
 Subject: Re: [Gluster-users] Transport endpoint not connected

 Here is the relevant crash section:

 patchset: v3.0.4
 signal received: 11
 time of crash: 2010-04-23 21:40:40
 configuration details:
 argp 1
 backtrace 1
 dlfcn 1
 fdatasync 1
 libpthread 1
 llistxattr 1
 setfsid 1
 spinlock 1
 epoll.h 1
 xattr.h 1
 st_atim.tv_nsec 1
 package-string: glusterfs 3.0.4
 /lib/libc.so.6[0x7ffd0d809100]
 /usr/local/lib/glusterfs/3.0.4/xlator/performance/read-
 ahead.so(ra_fstat
 +0x82)[0
 x7ffd0c968d22]
 /usr/local/lib/libglusterfs.so.0(default_fstat+0xcb)[0x7ffd0df7411b]
 /usr/local/lib/glusterfs/3.0.4/xlator/performance/quick-
 read.so(qr_fstat
 +0x113)[
 0x7ffd0c5570a3]
 /usr/local/lib/glusterfs/3.0.4/xlator/performance/write-
 behind.so(wb_fst
 at_helpe
 r+0xcb)[0x7ffd0c346adb]
 /usr/local/lib/libglusterfs.so.0(call_resume+0x390)[0x7ffd0df7cf60]
 /usr/local/lib/glusterfs/3.0.4/xlator/performance/write-
 behind.so(wb_res
 ume_othe
 r_requests+0x58)[0x7ffd0c349938]
 /usr/local/lib/glusterfs/3.0.4/xlator/performance/write-
 behind.so(wb_pro
 cess_que
 ue+0xe1)[0x7ffd0c348251]
 /usr/local/lib/glusterfs/3.0.4/xlator/performance/write-
 behind.so(wb_fst
 at+0x20a
 )[0x7ffd0c34a87a]
 /usr/local/lib/libglusterfs.so.0(default_fstat+0xcb)[0x7ffd0df7411b]
 /usr/local/lib/glusterfs/3.0.4/xlator/mount/fuse.so[0x7ffd0bf23a36]
 /usr/local/lib/glusterfs/3.0.4/xlator/mount/fuse.so[0x7ffd0bf246b6]
 /lib/libpthread.so.0[0x7ffd0db3f3f7]
 /lib/libc.so.6(clone+0x6d)[0x7ffd0d8aeb4d]

 And Startup section:

 -

 ===
 =
 
 Version      : glusterfs 3.0.4 built on Apr 19 2010 16:37:50
 git: v3.0.4
 Starting Time: 2010-04-26 10:00:59
 Command line : /usr/local/sbin/glusterfs --log-level=NORMAL
 --volfile=/etc/glust
 erfs/repstore1-tcp.vol /data/import
 PID          : 5910
 System name  : Linux
 Nodename     : w2
 Kernel Release : 2.6.24-27-server
 Hardware Identifier: x86_64

 Given volfile:

 +--
 -
 ---+
   1: ## file auto generated by /usr/local/bin/glusterfs-volgen
 (mount.vol)
   2: # Cmd line:
   3: # $ /usr/local/bin/glusterfs-volgen --name repstore1 --raid 1
 10.10.130.11:/data/export 10.10.130.12:/data/export
   4:
   5: # RAID 1
   6: # TRANSPORT-TYPE tcp
   7: volume 10.10.130.12-1
   8:     type protocol/client
   9:     option transport-type tcp
  10:     option remote-host 10.10.130.12
  11:     option transport.socket.nodelay on
  12:     option transport.remote-port 6996
  13:     option remote-subvolume brick1
  14: end-volume
  15:
  16: volume 10.10.130.11-1
  17:     type protocol/client
  18:     option transport-type tcp
  19:     option remote-host 10.10.130.11
  20:     option transport.socket.nodelay on
  21:     option transport.remote-port 6996
  22:     option remote-subvolume brick1
  23: end-volume
  24:
  25: volume mirror-0
  26:     type cluster/replicate
  27:     subvolumes 10.10.130.11-1 10.10.130.12-1
  28: end-volume
  29:
  30: volume readahead
  31:     type performance/read-ahead
  32:     option page-count 4
  33:     subvolumes mirror-0
  34: end-volume
  35:
  36: volume iocache
  37:     type performance/io-cache
  38:     option cache-size `echo $(( $(grep 'MemTotal' /proc/meminfo |
 sed 's/[^0-9]//g') / 5120 ))`MB
  39:     option cache-timeout 1
  40:     subvolumes readahead
 41: end-volume
  42:
  43: volume quickread
  44:     type performance/quick-read
  45:     option cache-timeout 1
  46:     option max-file-size 64kB
  47:     subvolumes iocache
  48: end-volume
  49:
  50: volume writebehind
  51:     type performance/write-behind
  52:     option cache-size 4MB
  53:     subvolumes quickread
  54: end-volume
  55:
  56: volume statprefetch
  57:     type performance/stat-prefetch
  58:     subvolumes writebehind
  59: end-volume
  60:

  -Original Message-
  From: Vijay Bellur [mailto:vi...@gluster.com]
  Sent: 22 April 2010 18:40
  To: Joe Warren-Meeks
  Cc: gluster-users@gluster.org
  Subject: Re: [Gluster-users] Transport endpoint not connected
 
  Hi Joe,
 
  Can you please share the complete client log file?
 
  Thanks,
  Vijay
 
 
  Joe Warren-Meeks wrote:
   Hey guys,
  
  
  
   I've recently implemented gluster to share webcontent read-write
  between
   two servers.
  
  
  
   Version

Re: [Gluster-users] Transport endpoint not connected

2010-04-28 Thread Anand Avati
 Here you go!

 Anything else I can do?

Joe, can you please rerun the gdb command as:

# gdb /usr/local/sbin/glusterfs -c /core.13560

Without giving the glusterfs binary in the parameter the backtrace is
missing all the symbols and just the numerical addresses are not quite
useful.

Thanks,
Avati
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] Transport endpoint not connected

2010-04-28 Thread Joe Warren-Meeks
 = 0x7ffcfb0c, iov_len = 131072}}
msg = (void *) 0x0
ret = value optimized out
now = {tv_sec = 1271949306, tv_usec = 169347}
timeout = {tv_sec = 1271949307, tv_nsec = 169347000}
__FUNCTION__ = fuse_thread_proc
#11 0x7ffd0db3f3f7 in start_thread () from /lib/libpthread.so.0
No symbol table info available.
#12 0x7ffd0d8aeb4d in clone () from /lib/libc.so.6
No symbol table info available.
#13 0x in ?? ()
No symbol table info available.




 -Original Message-
 From: Anand Avati [mailto:anand.av...@gmail.com]
 Sent: 28 April 2010 18:24
 To: Joe Warren-Meeks
 Cc: Vijay Bellur; gluster-users@gluster.org
 Subject: Re: [Gluster-users] Transport endpoint not connected
 
  Here you go!
 
  Anything else I can do?
 
 Joe, can you please rerun the gdb command as:
 
 # gdb /usr/local/sbin/glusterfs -c /core.13560
 
 Without giving the glusterfs binary in the parameter the backtrace is
 missing all the symbols and just the numerical addresses are not quite
 useful.
 
 Thanks,
 Avati


___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] Transport endpoint not connected

2010-04-28 Thread Anand Avati
On Wed, Apr 28, 2010 at 11:15 PM, Joe Warren-Meeks
j...@encoretickets.co.uk wrote:
 Oops, I'm an idiot, sorry about that.. here you go!


Thanks! We have a good understanding of the issue now. Please add
yourself to the CC list at
http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=868 to get
fix updates.

Avati
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] Transport endpoint not connected

2010-04-26 Thread Joe Warren-Meeks
Here is the relevant crash section:

patchset: v3.0.4
signal received: 11
time of crash: 2010-04-23 21:40:40
configuration details:
argp 1
backtrace 1
dlfcn 1
fdatasync 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.0.4
/lib/libc.so.6[0x7ffd0d809100]
/usr/local/lib/glusterfs/3.0.4/xlator/performance/read-ahead.so(ra_fstat
+0x82)[0
x7ffd0c968d22]
/usr/local/lib/libglusterfs.so.0(default_fstat+0xcb)[0x7ffd0df7411b]
/usr/local/lib/glusterfs/3.0.4/xlator/performance/quick-read.so(qr_fstat
+0x113)[
0x7ffd0c5570a3]
/usr/local/lib/glusterfs/3.0.4/xlator/performance/write-behind.so(wb_fst
at_helpe
r+0xcb)[0x7ffd0c346adb]
/usr/local/lib/libglusterfs.so.0(call_resume+0x390)[0x7ffd0df7cf60]
/usr/local/lib/glusterfs/3.0.4/xlator/performance/write-behind.so(wb_res
ume_othe
r_requests+0x58)[0x7ffd0c349938]
/usr/local/lib/glusterfs/3.0.4/xlator/performance/write-behind.so(wb_pro
cess_que
ue+0xe1)[0x7ffd0c348251]
/usr/local/lib/glusterfs/3.0.4/xlator/performance/write-behind.so(wb_fst
at+0x20a
)[0x7ffd0c34a87a]
/usr/local/lib/libglusterfs.so.0(default_fstat+0xcb)[0x7ffd0df7411b]
/usr/local/lib/glusterfs/3.0.4/xlator/mount/fuse.so[0x7ffd0bf23a36]
/usr/local/lib/glusterfs/3.0.4/xlator/mount/fuse.so[0x7ffd0bf246b6]
/lib/libpthread.so.0[0x7ffd0db3f3f7]
/lib/libc.so.6(clone+0x6d)[0x7ffd0d8aeb4d]

And Startup section:

-


Version  : glusterfs 3.0.4 built on Apr 19 2010 16:37:50
git: v3.0.4
Starting Time: 2010-04-26 10:00:59
Command line : /usr/local/sbin/glusterfs --log-level=NORMAL
--volfile=/etc/glust
erfs/repstore1-tcp.vol /data/import 
PID  : 5910
System name  : Linux
Nodename : w2
Kernel Release : 2.6.24-27-server
Hardware Identifier: x86_64

Given volfile:
+---
---+
  1: ## file auto generated by /usr/local/bin/glusterfs-volgen
(mount.vol)
  2: # Cmd line:
  3: # $ /usr/local/bin/glusterfs-volgen --name repstore1 --raid 1
10.10.130.11:/data/export 10.10.130.12:/data/export
  4: 
  5: # RAID 1
  6: # TRANSPORT-TYPE tcp
  7: volume 10.10.130.12-1
  8: type protocol/client
  9: option transport-type tcp
 10: option remote-host 10.10.130.12
 11: option transport.socket.nodelay on
 12: option transport.remote-port 6996
 13: option remote-subvolume brick1
 14: end-volume
 15: 
 16: volume 10.10.130.11-1
 17: type protocol/client
 18: option transport-type tcp
 19: option remote-host 10.10.130.11
 20: option transport.socket.nodelay on
 21: option transport.remote-port 6996
 22: option remote-subvolume brick1
 23: end-volume
 24: 
 25: volume mirror-0
 26: type cluster/replicate
 27: subvolumes 10.10.130.11-1 10.10.130.12-1
 28: end-volume
 29: 
 30: volume readahead
 31: type performance/read-ahead
 32: option page-count 4
 33: subvolumes mirror-0
 34: end-volume
 35: 
 36: volume iocache
 37: type performance/io-cache
 38: option cache-size `echo $(( $(grep 'MemTotal' /proc/meminfo |
sed 's/[^0-9]//g') / 5120 ))`MB
 39: option cache-timeout 1
 40: subvolumes readahead
41: end-volume
 42: 
 43: volume quickread
 44: type performance/quick-read
 45: option cache-timeout 1
 46: option max-file-size 64kB
 47: subvolumes iocache
 48: end-volume
 49: 
 50: volume writebehind
 51: type performance/write-behind
 52: option cache-size 4MB
 53: subvolumes quickread
 54: end-volume
 55: 
 56: volume statprefetch
 57: type performance/stat-prefetch
 58: subvolumes writebehind
 59: end-volume
 60:

 -Original Message-
 From: Vijay Bellur [mailto:vi...@gluster.com]
 Sent: 22 April 2010 18:40
 To: Joe Warren-Meeks
 Cc: gluster-users@gluster.org
 Subject: Re: [Gluster-users] Transport endpoint not connected
 
 Hi Joe,
 
 Can you please share the complete client log file?
 
 Thanks,
 Vijay
 
 
 Joe Warren-Meeks wrote:
  Hey guys,
 
 
 
  I've recently implemented gluster to share webcontent read-write
 between
  two servers.
 
 
 
  Version  : glusterfs 3.0.4 built on Apr 19 2010 16:37:50
 
  Fuse: 2.7.2-1ubuntu2.1
 
  Platform: ubuntu 8.04LTS
 
 
 
  I used the following command to generate my configs:
 
  /usr/local/bin/glusterfs-volgen --name repstore1 --raid 1
  10.10.130.11:/data/export 10.10.130.12:/data/export
 
 
 
  And mount them on each of the servers as so:
 
  /etc/fstab:
 
  /etc/glusterfs/repstore1-tcp.vol  /data/import  glusterfs  defaults
 0
  0
 
 
 
 
 
  Every 12 hours or so, one or other of the servers will lose the
mount
  and error with:
 
  df: `/data/import': Transport endpoint is not connected
 
 
 
  And I get the following in my logfile:
 
  patchset: v3.0.4
 
  signal received: 11
 
  time of crash: 2010-04-22 11:41:10
 
  configuration details:
 
  argp 1
 
  backtrace 1
 
  dlfcn 1
 
  fdatasync 1
 
  libpthread 1

Re: [Gluster-users] Transport endpoint not connected

2010-04-22 Thread Vijay Bellur

Hi Joe,

Can you please share the complete client log file?

Thanks,
Vijay


Joe Warren-Meeks wrote:

Hey guys,

 


I've recently implemented gluster to share webcontent read-write between
two servers.

 


Version  : glusterfs 3.0.4 built on Apr 19 2010 16:37:50

Fuse: 2.7.2-1ubuntu2.1

Platform: ubuntu 8.04LTS

 


I used the following command to generate my configs:

/usr/local/bin/glusterfs-volgen --name repstore1 --raid 1
10.10.130.11:/data/export 10.10.130.12:/data/export

 


And mount them on each of the servers as so:

/etc/fstab:

/etc/glusterfs/repstore1-tcp.vol  /data/import  glusterfs  defaults  0
0

 

 


Every 12 hours or so, one or other of the servers will lose the mount
and error with:

df: `/data/import': Transport endpoint is not connected

 


And I get the following in my logfile:

patchset: v3.0.4

signal received: 11

time of crash: 2010-04-22 11:41:10

configuration details:

argp 1

backtrace 1

dlfcn 1

fdatasync 1

libpthread 1

llistxattr 1

setfsid 1

spinlock 1

epoll.h 1

xattr.h 1

st_atim.tv_nsec 1

package-string: glusterfs 3.0.4

/lib/libc.so.6[0x7f2eca39a100]

/usr/local/lib/glusterfs/3.0.4/xlator/performance/read-ahead.so(ra_fstat
+0x82

)[0x7f2ec94f9d22]

/usr/local/lib/libglusterfs.so.0(default_fstat+0xcb)[0x7f2ecab0511b]

/usr/local/lib/glusterfs/3.0.4/xlator/performance/quick-read.so(qr_fstat
+0x11

3)[0x7f2ec90e80a3]

/usr/local/lib/glusterfs/3.0.4/xlator/performance/write-behind.so(wb_fst
at_he

lper+0xcb)[0x7f2ec8ed7adb]

/usr/local/lib/libglusterfs.so.0(call_resume+0x390)[0x7f2ecab0df60]

/usr/local/lib/glusterfs/3.0.4/xlator/performance/write-behind.so(wb_res
ume_o

ther_requests+0x58)[0x7f2ec8eda938]

/usr/local/lib/glusterfs/3.0.4/xlator/performance/write-behind.so(wb_pro
cess_queue+0xe1)[0x7f2ec8ed9251]

/usr/local/lib/glusterfs/3.0.4/xlator/performance/write-behind.so(wb_fst
at+0x20a)[0x7f2ec8edb87a]

/usr/local/lib/libglusterfs.so.0(default_fstat+0xcb)[0x7f2ecab0511b]

/usr/local/lib/glusterfs/3.0.4/xlator/mount/fuse.so[0x7f2ec8ab4a36]

/usr/local/lib/glusterfs/3.0.4/xlator/mount/fuse.so[0x7f2ec8ab56b6]

/lib/libpthread.so.0[0x7f2eca6d03f7]

/lib/libc.so.6(clone+0x6d)[0x7f2eca43fb4d]

 

 


If I umount and remount, things work again, but it isn't ideal..

 


Any clues, pointers, hints?

 


Kind regards

 


 -- joe.

 


Joe Warren-Meeks

Director Of Systems Development

ENCORE TICKETS LTD

Encore House, 50-51 Bedford Row, London WC1R 4LR

Direct line:  +44 (0)20 7492 1506

Reservations:+44 (0)20 7492 1500

Fax:+44 (0)20 7831 4410

Email:j...@encoretickets.co.uk
mailto:j...@encoretickets.co.uk 


web:  www.encoretickets.co.uk
http://www.encoretickets.co.uk/ 

 

 


Copyright in this message and any attachments remains with us. It is
confidential and may be legally privileged. If this message is not
intended for you it must not be read, copied or used by you or disclosed
to anyone else. Please advise the sender immediately if you have
received this message in error. Although this message and any
attachments are believed to be free of any virus or other defect that
might affect any computer system into which it is received and opened it
is the responsibility of the recipient to ensure that it is virus free
and no responsibility is accepted by Encore Tickets Limited for any loss
or damage in any way arising from its use.

 



  



___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
  


___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users