[Gluster-users] Transport endpoint is not connected

2021-03-10 Thread Pat Haley



Hi,

We had a hardware error in one of our switches which cut-off the 
communications between one of our gluster brick nodes and the client 
nodes. By the time we had identified the problem and replaced the bad 
part, one of our clients started throwing up "Transport endpoint is not 
connected" errors. We are still getting these errors even though we have 
re-established the connection. Is there a simple way to clear this error 
besides rebooting the client system?


Thanks

--

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Pat Haley  Email:  pha...@mit.edu
Center for Ocean Engineering   Phone:  (617) 253-6824
Dept. of Mechanical EngineeringFax:(617) 253-8125
MIT, Room 5-213http://web.mit.edu/phaley/www/
77 Massachusetts Avenue
Cambridge, MA  02139-4301





Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] "Transport endpoint is not connected" error + long list of files to be healed

2019-11-13 Thread Ashish Pandey
Hi Mauro, 

Yes, it will take time to heal these files and time depends on the number of 
file/dir you have created and the amount of data you have written while the 
bricks were down. 

YOu can just run following command and keep observing that the count is 
changing or not - 

gluster volume heal tier2 info | grep entries 

--- 
Ashish 

- Original Message -

From: "Mauro Tridici"  
To: "Gluster Devel"  
Cc: "Gluster-users"  
Sent: Wednesday, November 13, 2019 7:00:37 PM 
Subject: [Gluster-users] "Transport endpoint is not connected" error + long 
list of files to be healed 

Dear All, 

our GlusterFS filesystem was showing some problem during some simple users 
actions (for example, during directory or file creation). 




mkdir -p test 
mkdir: impossibile creare la directory `test': Transport endpoint is not 
connected 




After received some users notification, I investigated about the issue and I 
detected that 3 bricks (each one in a separate gluster servers) were down. 
So, I forced the bricks to be up using “gluster vol start tier force” and 
bricks come back successfully. All the bricks are up. 

Anyway, I see from “gluster vol status” command output that also 2 self-heal 
daemons were down and I had to restart daemons to fix the problem. 
Now, everything seems to be ok watching the output of “gluster vol status” and 
I can create a test directory on the file system. 

But, during the last check made using “gluster volume heal tier2 info”, I saw a 
long list of files and directories that need to be healed. 
The list is very long and the command output is still going ahead on my 
terminal. 

What I can do to fix this issue? Does the self-heal feature fix automatically 
each files that need to be healed? 
Could you please help me to understand what I need to do in this case? 

You can find below some information about our GlusterFS configuration: 

Volume Name: tier2 
Type: Distributed-Disperse 
Volume ID: a28d88c5-3295-4e35-98d4-210b3af9358c 
Status: Started 
Snapshot Count: 0 
Number of Bricks: 12 x (4 + 2) = 72 
Transport-type: tcp 

Thank you in advance. 
Regards, 
Mauro 

 

Community Meeting Calendar: 

APAC Schedule - 
Every 2nd and 4th Tuesday at 11:30 AM IST 
Bridge: https://bluejeans.com/118564314 

NA/EMEA Schedule - 
Every 1st and 3rd Tuesday at 01:00 PM EDT 
Bridge: https://bluejeans.com/118564314 

Gluster-users mailing list 
Gluster-users@gluster.org 
https://lists.gluster.org/mailman/listinfo/gluster-users 



Community Meeting Calendar:

APAC Schedule -
Every 2nd and 4th Tuesday at 11:30 AM IST
Bridge: https://bluejeans.com/118564314

NA/EMEA Schedule -
Every 1st and 3rd Tuesday at 01:00 PM EDT
Bridge: https://bluejeans.com/118564314

Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


[Gluster-users] "Transport endpoint is not connected" error + long list of files to be healed

2019-11-13 Thread Mauro Tridici
Dear All,

our GlusterFS filesystem was showing some problem during some simple users 
actions (for example, during directory or file creation).

> mkdir -p test
> mkdir: impossibile creare la directory `test': Transport endpoint is not 
> connected


After received some users notification, I investigated about the issue and I 
detected that 3 bricks (each one in a separate gluster servers) were down.
So, I forced the bricks to be up using “gluster vol start tier force” and 
bricks come back successfully. All the bricks are up.

Anyway, I see from “gluster vol status” command output that also 2 self-heal 
daemons were down and I had to restart daemons to fix the problem.
Now, everything seems to be ok watching the output of “gluster vol status” and 
I can create a test directory on the file system.

But, during the last check made using “gluster volume heal tier2 info”, I saw a 
long list of files and directories that need to be healed.
The list is very long and the command output is still going ahead on my 
terminal.

What I can do to fix this issue? Does the self-heal feature fix automatically 
each files that need to be healed?
Could you please help me to understand what I need to do in this case?

You can find below some information about our GlusterFS configuration:

Volume Name: tier2
Type: Distributed-Disperse
Volume ID: a28d88c5-3295-4e35-98d4-210b3af9358c
Status: Started
Snapshot Count: 0
Number of Bricks: 12 x (4 + 2) = 72
Transport-type: tcp

Thank you in advance.
Regards,
Mauro

Community Meeting Calendar:

APAC Schedule -
Every 2nd and 4th Tuesday at 11:30 AM IST
Bridge: https://bluejeans.com/118564314

NA/EMEA Schedule -
Every 1st and 3rd Tuesday at 01:00 PM EDT
Bridge: https://bluejeans.com/118564314

Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Transport endpoint is not connected

2019-06-09 Thread David Cunningham
Thank you Strahil.


On Tue, 4 Jun 2019 at 23:48, Strahil Nikolov  wrote:

> Hi David,
>
> You can ensure that 49152-49160 are opened in advance...
> You never know when you will need to deploy another Gluster Volume.
>
> best Regards,
> Strahil Nikolov
>
> В понеделник, 3 юни 2019 г., 18:16:00 ч. Гринуич-4, David Cunningham <
> dcunning...@voisonics.com> написа:
>
>
> Hello all,
>
> We confirmed that the network provider blocking port 49152 was the issue.
> Thanks for all the help.
>
>
> On Thu, 30 May 2019 at 16:11, Strahil  wrote:
>
> You can try to run a ncat from gfs3:
>
> ncat -z -v gfs1 49152
> ncat -z -v gfs2 49152
>
> If ncat fails to connect ->  it's definately a firewall.
>
> Best Regards,
> Strahil Nikolov
> On May 30, 2019 01:33, David Cunningham  wrote:
>
> Hi Ravi,
>
> I think it probably is a firewall issue with the network provider. I was
> hoping to see a specific connection failure message we could send to them,
> but will take it up with them anyway.
>
> Thanks for your help.
>
>
> On Wed, 29 May 2019 at 23:10, Ravishankar N 
> wrote:
>
> I don't see a "Connected to gvol0-client-1" in the log.  Perhaps a
> firewall issue like the last time? Even in the earlier add-brick log from
> the other email thread, connection to the 2nd brick was not established.
>
> -Ravi
> On 29/05/19 2:26 PM, David Cunningham wrote:
>
> Hi Ravi and Joe,
>
> The command "gluster volume status gvol0" shows all 3 nodes as being
> online, even on gfs3 as below. I've attached the glfsheal-gvol0.log, in
> which I can't see anything like a connection error. Would you have any
> further suggestions? Thank you.
>
> [root@gfs3 glusterfs]# gluster volume status gvol0
> Status of volume: gvol0
> Gluster process TCP Port  RDMA Port  Online
> Pid
>
> --
> Brick gfs1:/nodirectwritedata/gluster/gvol0 49152 0  Y
> 7706
> Brick gfs2:/nodirectwritedata/gluster/gvol0 49152 0  Y
> 7625
> Brick gfs3:/nodirectwritedata/gluster/gvol0 49152 0  Y
> 7307
> Self-heal Daemon on localhost   N/A   N/AY
> 7316
> Self-heal Daemon on gfs1N/A   N/AY
> 40591
> Self-heal Daemon on gfs2N/A   N/AY
> 7634
>
> Task Status of Volume gvol0
>
> --
> There are no active volume tasks
>
>
> On Wed, 29 May 2019 at 16:26, Ravishankar N 
> wrote:
>
>
> On 29/05/19 6:21 AM, David Cunningham wrote:
>
>
>
> --
> David Cunningham, Voisonics Limited
> http://voisonics.com/
> USA: +1 213 221 1092
> New Zealand: +64 (0)28 2558 3782
>


-- 
David Cunningham, Voisonics Limited
http://voisonics.com/
USA: +1 213 221 1092
New Zealand: +64 (0)28 2558 3782
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Transport endpoint is not connected

2019-06-03 Thread David Cunningham
Hello all,

We confirmed that the network provider blocking port 49152 was the issue.
Thanks for all the help.


On Thu, 30 May 2019 at 16:11, Strahil  wrote:

> You can try to run a ncat from gfs3:
>
> ncat -z -v gfs1 49152
> ncat -z -v gfs2 49152
>
> If ncat fails to connect ->  it's definately a firewall.
>
> Best Regards,
> Strahil Nikolov
> On May 30, 2019 01:33, David Cunningham  wrote:
>
> Hi Ravi,
>
> I think it probably is a firewall issue with the network provider. I was
> hoping to see a specific connection failure message we could send to them,
> but will take it up with them anyway.
>
> Thanks for your help.
>
>
> On Wed, 29 May 2019 at 23:10, Ravishankar N 
> wrote:
>
> I don't see a "Connected to gvol0-client-1" in the log.  Perhaps a
> firewall issue like the last time? Even in the earlier add-brick log from
> the other email thread, connection to the 2nd brick was not established.
>
> -Ravi
> On 29/05/19 2:26 PM, David Cunningham wrote:
>
> Hi Ravi and Joe,
>
> The command "gluster volume status gvol0" shows all 3 nodes as being
> online, even on gfs3 as below. I've attached the glfsheal-gvol0.log, in
> which I can't see anything like a connection error. Would you have any
> further suggestions? Thank you.
>
> [root@gfs3 glusterfs]# gluster volume status gvol0
> Status of volume: gvol0
> Gluster process TCP Port  RDMA Port  Online
> Pid
>
> --
> Brick gfs1:/nodirectwritedata/gluster/gvol0 49152 0  Y
> 7706
> Brick gfs2:/nodirectwritedata/gluster/gvol0 49152 0  Y
> 7625
> Brick gfs3:/nodirectwritedata/gluster/gvol0 49152 0  Y
> 7307
> Self-heal Daemon on localhost   N/A   N/AY
> 7316
> Self-heal Daemon on gfs1N/A   N/AY
> 40591
> Self-heal Daemon on gfs2N/A   N/AY
> 7634
>
> Task Status of Volume gvol0
>
> --
> There are no active volume tasks
>
>
> On Wed, 29 May 2019 at 16:26, Ravishankar N 
> wrote:
>
>
> On 29/05/19 6:21 AM, David Cunningham wrote:
>
>

-- 
David Cunningham, Voisonics Limited
http://voisonics.com/
USA: +1 213 221 1092
New Zealand: +64 (0)28 2558 3782
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Transport endpoint is not connected

2019-05-29 Thread David Cunningham
Hi Ravi,

I think it probably is a firewall issue with the network provider. I was
hoping to see a specific connection failure message we could send to them,
but will take it up with them anyway.

Thanks for your help.


On Wed, 29 May 2019 at 23:10, Ravishankar N  wrote:

> I don't see a "Connected to gvol0-client-1" in the log.  Perhaps a
> firewall issue like the last time? Even in the earlier add-brick log from
> the other email thread, connection to the 2nd brick was not established.
>
> -Ravi
> On 29/05/19 2:26 PM, David Cunningham wrote:
>
> Hi Ravi and Joe,
>
> The command "gluster volume status gvol0" shows all 3 nodes as being
> online, even on gfs3 as below. I've attached the glfsheal-gvol0.log, in
> which I can't see anything like a connection error. Would you have any
> further suggestions? Thank you.
>
> [root@gfs3 glusterfs]# gluster volume status gvol0
> Status of volume: gvol0
> Gluster process TCP Port  RDMA Port  Online
> Pid
>
> --
> Brick gfs1:/nodirectwritedata/gluster/gvol0 49152 0  Y
> 7706
> Brick gfs2:/nodirectwritedata/gluster/gvol0 49152 0  Y
> 7625
> Brick gfs3:/nodirectwritedata/gluster/gvol0 49152 0  Y
> 7307
> Self-heal Daemon on localhost   N/A   N/AY
> 7316
> Self-heal Daemon on gfs1N/A   N/AY
> 40591
> Self-heal Daemon on gfs2N/A   N/AY
> 7634
>
> Task Status of Volume gvol0
>
> --
> There are no active volume tasks
>
>
> On Wed, 29 May 2019 at 16:26, Ravishankar N 
> wrote:
>
>>
>> On 29/05/19 6:21 AM, David Cunningham wrote:
>>
>> Hello all,
>>
>> We are seeing a strange issue where a new node gfs3 shows another node
>> gfs2 as not connected on the "gluster volume heal" info:
>>
>> [root@gfs3 bricks]# gluster volume heal gvol0 info
>> Brick gfs1:/nodirectwritedata/gluster/gvol0
>> Status: Connected
>> Number of entries: 0
>>
>> Brick gfs2:/nodirectwritedata/gluster/gvol0
>> Status: Transport endpoint is not connected
>> Number of entries: -
>>
>> Brick gfs3:/nodirectwritedata/gluster/gvol0
>> Status: Connected
>> Number of entries: 0
>>
>>
>> However it does show the same node connected on "gluster peer status".
>> Does anyone know why this would be?
>>
>> [root@gfs3 bricks]# gluster peer status
>> Number of Peers: 2
>>
>> Hostname: gfs2
>> Uuid: 91863102-23a8-43e1-b3d3-f0a1bd57f350
>> State: Peer in Cluster (Connected)
>>
>> Hostname: gfs1
>> Uuid: 32c99e7d-71f2-421c-86fc-b87c0f68ad1b
>> State: Peer in Cluster (Connected)
>>
>>
>> In nodirectwritedata-gluster-gvol0.log on gfs3 we see this logged with
>> regards to gfs2:
>>
>> You need to check glfsheal-$volname.log on the node where you ran the
>> command and check for any connection related errors.
>>
>> -Ravi
>>
>>
>> [2019-05-29 00:17:50.646360] I [MSGID: 115029]
>> [server-handshake.c:537:server_setvolume] 0-gvol0-server: accepted client
>> from
>> CTX_ID:30d74196-fece-4380-adc0-338760188b81-GRAPH_ID:0-PID:7718-HOST:gfs2.xxx.com-PC_NAME:gvol0-client-2-RECON_NO:-0
>> (version: 5.6)
>> [2019-05-29 00:17:50.761120] I [MSGID: 115036]
>> [server.c:469:server_rpc_notify] 0-gvol0-server: disconnecting connection
>> from
>> CTX_ID:30d74196-fece-4380-adc0-338760188b81-GRAPH_ID:0-PID:7718-HOST:gfs2.xxx.com-PC_NAME:gvol0-client-2-RECON_NO:-0
>> [2019-05-29 00:17:50.761352] I [MSGID: 101055]
>> [client_t.c:435:gf_client_unref] 0-gvol0-server: Shutting down connection
>> CTX_ID:30d74196-fece-4380-adc0-338760188b81-GRAPH_ID:0-PID:7718-HOST:gfs2.xxx.com-PC_NAME:gvol0-client-2-RECON_NO:-0
>>
>> Thanks in advance for any assistance.
>>
>> --
>> David Cunningham, Voisonics Limited
>> http://voisonics.com/
>> USA: +1 213 221 1092
>> New Zealand: +64 (0)28 2558 3782
>>
>> ___
>> Gluster-users mailing 
>> listGluster-users@gluster.orghttps://lists.gluster.org/mailman/listinfo/gluster-users
>>
>>
>
> --
> David Cunningham, Voisonics Limited
> http://voisonics.com/
> USA: +1 213 221 1092
> New Zealand: +64 (0)28 2558 3782
>
>

-- 
David Cunningham, Voisonics Limited
http://voisonics.com/
USA: +1 213 221 1092
New Zealand: +64 (0)28 2558 3782
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Transport endpoint is not connected

2019-05-29 Thread Ravishankar N
I don't see a "Connected to gvol0-client-1" in the log.  Perhaps a 
firewall issue like the last time? Even in the earlier add-brick log 
from the other email thread, connection to the 2nd brick was not 
established.


-Ravi

On 29/05/19 2:26 PM, David Cunningham wrote:

Hi Ravi and Joe,

The command "gluster volume status gvol0" shows all 3 nodes as being 
online, even on gfs3 as below. I've attached the glfsheal-gvol0.log, 
in which I can't see anything like a connection error. Would you have 
any further suggestions? Thank you.


[root@gfs3 glusterfs]# gluster volume status gvol0
Status of volume: gvol0
Gluster process TCP Port RDMA Port  
Online  Pid

--
Brick gfs1:/nodirectwritedata/gluster/gvol0 49152 0  Y   7706
Brick gfs2:/nodirectwritedata/gluster/gvol0 49152 0  Y   7625
Brick gfs3:/nodirectwritedata/gluster/gvol0 49152 0  Y   7307
Self-heal Daemon on localhost   N/A N/A    Y   7316
Self-heal Daemon on gfs1    N/A N/A    Y   40591
Self-heal Daemon on gfs2    N/A N/A    Y   7634

Task Status of Volume gvol0
--
There are no active volume tasks


On Wed, 29 May 2019 at 16:26, Ravishankar N > wrote:



On 29/05/19 6:21 AM, David Cunningham wrote:

Hello all,

We are seeing a strange issue where a new node gfs3 shows another
node gfs2 as not connected on the "gluster volume heal" info:

[root@gfs3 bricks]# gluster volume heal gvol0 info
Brick gfs1:/nodirectwritedata/gluster/gvol0
Status: Connected
Number of entries: 0

Brick gfs2:/nodirectwritedata/gluster/gvol0
Status: Transport endpoint is not connected
Number of entries: -

Brick gfs3:/nodirectwritedata/gluster/gvol0
Status: Connected
Number of entries: 0


However it does show the same node connected on "gluster peer
status". Does anyone know why this would be?

[root@gfs3 bricks]# gluster peer status
Number of Peers: 2

Hostname: gfs2
Uuid: 91863102-23a8-43e1-b3d3-f0a1bd57f350
State: Peer in Cluster (Connected)

Hostname: gfs1
Uuid: 32c99e7d-71f2-421c-86fc-b87c0f68ad1b
State: Peer in Cluster (Connected)


In nodirectwritedata-gluster-gvol0.log on gfs3 we see this logged
with regards to gfs2:


You need to check glfsheal-$volname.log on the node where you ran
the command and check for any connection related errors.

-Ravi



[2019-05-29 00:17:50.646360] I [MSGID: 115029]
[server-handshake.c:537:server_setvolume] 0-gvol0-server:
accepted client from

CTX_ID:30d74196-fece-4380-adc0-338760188b81-GRAPH_ID:0-PID:7718-HOST:gfs2.xxx.com-PC_NAME:gvol0-client-2-RECON_NO:-0
(version: 5.6)
[2019-05-29 00:17:50.761120] I [MSGID: 115036]
[server.c:469:server_rpc_notify] 0-gvol0-server: disconnecting
connection from

CTX_ID:30d74196-fece-4380-adc0-338760188b81-GRAPH_ID:0-PID:7718-HOST:gfs2.xxx.com-PC_NAME:gvol0-client-2-RECON_NO:-0
[2019-05-29 00:17:50.761352] I [MSGID: 101055]
[client_t.c:435:gf_client_unref] 0-gvol0-server: Shutting down
connection

CTX_ID:30d74196-fece-4380-adc0-338760188b81-GRAPH_ID:0-PID:7718-HOST:gfs2.xxx.com-PC_NAME:gvol0-client-2-RECON_NO:-0

Thanks in advance for any assistance.

-- 
David Cunningham, Voisonics Limited

http://voisonics.com/
USA: +1 213 221 1092
New Zealand: +64 (0)28 2558 3782

___
Gluster-users mailing list
Gluster-users@gluster.org  
https://lists.gluster.org/mailman/listinfo/gluster-users




--
David Cunningham, Voisonics Limited
http://voisonics.com/
USA: +1 213 221 1092
New Zealand: +64 (0)28 2558 3782
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Transport endpoint is not connected

2019-05-29 Thread David Cunningham
Hi Ravi and Joe,

The command "gluster volume status gvol0" shows all 3 nodes as being
online, even on gfs3 as below. I've attached the glfsheal-gvol0.log, in
which I can't see anything like a connection error. Would you have any
further suggestions? Thank you.

[root@gfs3 glusterfs]# gluster volume status gvol0
Status of volume: gvol0
Gluster process TCP Port  RDMA Port  Online  Pid
--
Brick gfs1:/nodirectwritedata/gluster/gvol0 49152 0  Y
7706
Brick gfs2:/nodirectwritedata/gluster/gvol0 49152 0  Y
7625
Brick gfs3:/nodirectwritedata/gluster/gvol0 49152 0  Y
7307
Self-heal Daemon on localhost   N/A   N/AY
7316
Self-heal Daemon on gfs1N/A   N/AY
40591
Self-heal Daemon on gfs2N/A   N/AY
7634

Task Status of Volume gvol0
--
There are no active volume tasks


On Wed, 29 May 2019 at 16:26, Ravishankar N  wrote:

>
> On 29/05/19 6:21 AM, David Cunningham wrote:
>
> Hello all,
>
> We are seeing a strange issue where a new node gfs3 shows another node
> gfs2 as not connected on the "gluster volume heal" info:
>
> [root@gfs3 bricks]# gluster volume heal gvol0 info
> Brick gfs1:/nodirectwritedata/gluster/gvol0
> Status: Connected
> Number of entries: 0
>
> Brick gfs2:/nodirectwritedata/gluster/gvol0
> Status: Transport endpoint is not connected
> Number of entries: -
>
> Brick gfs3:/nodirectwritedata/gluster/gvol0
> Status: Connected
> Number of entries: 0
>
>
> However it does show the same node connected on "gluster peer status".
> Does anyone know why this would be?
>
> [root@gfs3 bricks]# gluster peer status
> Number of Peers: 2
>
> Hostname: gfs2
> Uuid: 91863102-23a8-43e1-b3d3-f0a1bd57f350
> State: Peer in Cluster (Connected)
>
> Hostname: gfs1
> Uuid: 32c99e7d-71f2-421c-86fc-b87c0f68ad1b
> State: Peer in Cluster (Connected)
>
>
> In nodirectwritedata-gluster-gvol0.log on gfs3 we see this logged with
> regards to gfs2:
>
> You need to check glfsheal-$volname.log on the node where you ran the
> command and check for any connection related errors.
>
> -Ravi
>
>
> [2019-05-29 00:17:50.646360] I [MSGID: 115029]
> [server-handshake.c:537:server_setvolume] 0-gvol0-server: accepted client
> from
> CTX_ID:30d74196-fece-4380-adc0-338760188b81-GRAPH_ID:0-PID:7718-HOST:gfs2.xxx.com-PC_NAME:gvol0-client-2-RECON_NO:-0
> (version: 5.6)
> [2019-05-29 00:17:50.761120] I [MSGID: 115036]
> [server.c:469:server_rpc_notify] 0-gvol0-server: disconnecting connection
> from
> CTX_ID:30d74196-fece-4380-adc0-338760188b81-GRAPH_ID:0-PID:7718-HOST:gfs2.xxx.com-PC_NAME:gvol0-client-2-RECON_NO:-0
> [2019-05-29 00:17:50.761352] I [MSGID: 101055]
> [client_t.c:435:gf_client_unref] 0-gvol0-server: Shutting down connection
> CTX_ID:30d74196-fece-4380-adc0-338760188b81-GRAPH_ID:0-PID:7718-HOST:gfs2.xxx.com-PC_NAME:gvol0-client-2-RECON_NO:-0
>
> Thanks in advance for any assistance.
>
> --
> David Cunningham, Voisonics Limited
> http://voisonics.com/
> USA: +1 213 221 1092
> New Zealand: +64 (0)28 2558 3782
>
> ___
> Gluster-users mailing 
> listGluster-users@gluster.orghttps://lists.gluster.org/mailman/listinfo/gluster-users
>
>

-- 
David Cunningham, Voisonics Limited
http://voisonics.com/
USA: +1 213 221 1092
New Zealand: +64 (0)28 2558 3782
[2019-05-29 08:44:11.435439] I [MSGID: 104045] [glfs-master.c:86:notify] 0-gfapi: New graph 67667333-2e74-656c-6562-726f61642e63 (0) coming up
[2019-05-29 08:44:11.435532] I [MSGID: 114020] [client.c:2358:notify] 0-gvol0-client-0: parent translators are ready, attempting connect on transport
[2019-05-29 08:44:11.441023] I [MSGID: 114020] [client.c:2358:notify] 0-gvol0-client-1: parent translators are ready, attempting connect on transport
[2019-05-29 08:44:11.444160] I [rpc-clnt.c:2042:rpc_clnt_reconfig] 0-gvol0-client-0: changing port to 49152 (from 0)
[2019-05-29 08:44:11.445679] I [MSGID: 114020] [client.c:2358:notify] 0-gvol0-client-2: parent translators are ready, attempting connect on transport
[2019-05-29 08:44:11.455002] I [MSGID: 114046] [client-handshake.c:1106:client_setvolume_cbk] 0-gvol0-client-0: Connected to gvol0-client-0, attached to remote volume '/nodirectwritedata/gluster/gvol0'.
Final graph:
+--+
  1: volume gvol0-client-0
  2: type protocol/client
  3: option opversion 50400
  4: option clnt-lk-version 1
  5: option volfile-checksum 0
  6: option volfile-key gvol0
  7: option client-version 5.6
  8: option process-name gfapi.glfsheal
  9: option process-uuid CTX_ID:bda2caab-106e-4097-9b8a-b2c66fbce168-GRAPH_ID:0-PID:14552-HOST:gfs3.example.com-PC_NAME:gvol0-client-0-RECON_NO:-0
[2019-05-29 08:44:11.455154] I [MSGID: 108005] [afr-common

Re: [Gluster-users] Transport endpoint is not connected

2019-05-28 Thread Joe Julian

Check

gluster volume status gvol0

and make sure your bricks are all running.

On 5/29/19 2:51 AM, David Cunningham wrote:

Hello all,

We are seeing a strange issue where a new node gfs3 shows another node 
gfs2 as not connected on the "gluster volume heal" info:


[root@gfs3 bricks]# gluster volume heal gvol0 info
Brick gfs1:/nodirectwritedata/gluster/gvol0
Status: Connected
Number of entries: 0

Brick gfs2:/nodirectwritedata/gluster/gvol0
Status: Transport endpoint is not connected
Number of entries: -

Brick gfs3:/nodirectwritedata/gluster/gvol0
Status: Connected
Number of entries: 0


However it does show the same node connected on "gluster peer status". 
Does anyone know why this would be?


[root@gfs3 bricks]# gluster peer status
Number of Peers: 2

Hostname: gfs2
Uuid: 91863102-23a8-43e1-b3d3-f0a1bd57f350
State: Peer in Cluster (Connected)

Hostname: gfs1
Uuid: 32c99e7d-71f2-421c-86fc-b87c0f68ad1b
State: Peer in Cluster (Connected)


In nodirectwritedata-gluster-gvol0.log on gfs3 we see this logged with 
regards to gfs2:


[2019-05-29 00:17:50.646360] I [MSGID: 115029] 
[server-handshake.c:537:server_setvolume] 0-gvol0-server: accepted 
client from 
CTX_ID:30d74196-fece-4380-adc0-338760188b81-GRAPH_ID:0-PID:7718-HOST:gfs2.xxx.com-PC_NAME:gvol0-client-2-RECON_NO:-0 
(version: 5.6)
[2019-05-29 00:17:50.761120] I [MSGID: 115036] 
[server.c:469:server_rpc_notify] 0-gvol0-server: disconnecting 
connection from 
CTX_ID:30d74196-fece-4380-adc0-338760188b81-GRAPH_ID:0-PID:7718-HOST:gfs2.xxx.com-PC_NAME:gvol0-client-2-RECON_NO:-0
[2019-05-29 00:17:50.761352] I [MSGID: 101055] 
[client_t.c:435:gf_client_unref] 0-gvol0-server: Shutting down 
connection 
CTX_ID:30d74196-fece-4380-adc0-338760188b81-GRAPH_ID:0-PID:7718-HOST:gfs2.xxx.com-PC_NAME:gvol0-client-2-RECON_NO:-0


Thanks in advance for any assistance.

--
David Cunningham, Voisonics Limited
http://voisonics.com/
USA: +1 213 221 1092
New Zealand: +64 (0)28 2558 3782

___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Transport endpoint is not connected

2019-05-28 Thread Ravishankar N


On 29/05/19 6:21 AM, David Cunningham wrote:

Hello all,

We are seeing a strange issue where a new node gfs3 shows another node 
gfs2 as not connected on the "gluster volume heal" info:


[root@gfs3 bricks]# gluster volume heal gvol0 info
Brick gfs1:/nodirectwritedata/gluster/gvol0
Status: Connected
Number of entries: 0

Brick gfs2:/nodirectwritedata/gluster/gvol0
Status: Transport endpoint is not connected
Number of entries: -

Brick gfs3:/nodirectwritedata/gluster/gvol0
Status: Connected
Number of entries: 0


However it does show the same node connected on "gluster peer status". 
Does anyone know why this would be?


[root@gfs3 bricks]# gluster peer status
Number of Peers: 2

Hostname: gfs2
Uuid: 91863102-23a8-43e1-b3d3-f0a1bd57f350
State: Peer in Cluster (Connected)

Hostname: gfs1
Uuid: 32c99e7d-71f2-421c-86fc-b87c0f68ad1b
State: Peer in Cluster (Connected)


In nodirectwritedata-gluster-gvol0.log on gfs3 we see this logged with 
regards to gfs2:


You need to check glfsheal-$volname.log on the node where you ran the 
command and check for any connection related errors.


-Ravi



[2019-05-29 00:17:50.646360] I [MSGID: 115029] 
[server-handshake.c:537:server_setvolume] 0-gvol0-server: accepted 
client from 
CTX_ID:30d74196-fece-4380-adc0-338760188b81-GRAPH_ID:0-PID:7718-HOST:gfs2.xxx.com-PC_NAME:gvol0-client-2-RECON_NO:-0 
(version: 5.6)
[2019-05-29 00:17:50.761120] I [MSGID: 115036] 
[server.c:469:server_rpc_notify] 0-gvol0-server: disconnecting 
connection from 
CTX_ID:30d74196-fece-4380-adc0-338760188b81-GRAPH_ID:0-PID:7718-HOST:gfs2.xxx.com-PC_NAME:gvol0-client-2-RECON_NO:-0
[2019-05-29 00:17:50.761352] I [MSGID: 101055] 
[client_t.c:435:gf_client_unref] 0-gvol0-server: Shutting down 
connection 
CTX_ID:30d74196-fece-4380-adc0-338760188b81-GRAPH_ID:0-PID:7718-HOST:gfs2.xxx.com-PC_NAME:gvol0-client-2-RECON_NO:-0


Thanks in advance for any assistance.

--
David Cunningham, Voisonics Limited
http://voisonics.com/
USA: +1 213 221 1092
New Zealand: +64 (0)28 2558 3782

___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] Transport endpoint is not connected

2019-05-28 Thread David Cunningham
Hello all,

We are seeing a strange issue where a new node gfs3 shows another node gfs2
as not connected on the "gluster volume heal" info:

[root@gfs3 bricks]# gluster volume heal gvol0 info
Brick gfs1:/nodirectwritedata/gluster/gvol0
Status: Connected
Number of entries: 0

Brick gfs2:/nodirectwritedata/gluster/gvol0
Status: Transport endpoint is not connected
Number of entries: -

Brick gfs3:/nodirectwritedata/gluster/gvol0
Status: Connected
Number of entries: 0


However it does show the same node connected on "gluster peer status". Does
anyone know why this would be?

[root@gfs3 bricks]# gluster peer status
Number of Peers: 2

Hostname: gfs2
Uuid: 91863102-23a8-43e1-b3d3-f0a1bd57f350
State: Peer in Cluster (Connected)

Hostname: gfs1
Uuid: 32c99e7d-71f2-421c-86fc-b87c0f68ad1b
State: Peer in Cluster (Connected)


In nodirectwritedata-gluster-gvol0.log on gfs3 we see this logged with
regards to gfs2:

[2019-05-29 00:17:50.646360] I [MSGID: 115029]
[server-handshake.c:537:server_setvolume] 0-gvol0-server: accepted client
from
CTX_ID:30d74196-fece-4380-adc0-338760188b81-GRAPH_ID:0-PID:7718-HOST:gfs2.xxx.com-PC_NAME:gvol0-client-2-RECON_NO:-0
(version: 5.6)
[2019-05-29 00:17:50.761120] I [MSGID: 115036]
[server.c:469:server_rpc_notify] 0-gvol0-server: disconnecting connection
from
CTX_ID:30d74196-fece-4380-adc0-338760188b81-GRAPH_ID:0-PID:7718-HOST:gfs2.xxx.com-PC_NAME:gvol0-client-2-RECON_NO:-0
[2019-05-29 00:17:50.761352] I [MSGID: 101055]
[client_t.c:435:gf_client_unref] 0-gvol0-server: Shutting down connection
CTX_ID:30d74196-fece-4380-adc0-338760188b81-GRAPH_ID:0-PID:7718-HOST:gfs2.xxx.com-PC_NAME:gvol0-client-2-RECON_NO:-0

Thanks in advance for any assistance.

-- 
David Cunningham, Voisonics Limited
http://voisonics.com/
USA: +1 213 221 1092
New Zealand: +64 (0)28 2558 3782
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Transport endpoint is not connected failures in

2019-03-30 Thread Raghavendra Gowdappa
On Sat, Mar 30, 2019 at 1:18 AM  wrote:

> Hello,
>
>
>
> Yes I did find some hits on this in the following logs. We started seeing
> failures after upgrading to 5.3 from 4.6.
>

There are no relevant fixes for ping timer expiry between 5.5 and 5.3. So,
I attribute the failures not being seen to the increased number of
client.event-threads and server.event-threads to 8 in current setup from
lower values earlier.

If you want me to check for something else let me know.   Thank you all on
> the gluster team for finding and fixing that problem whatever it was!
>
>
>
> [root@lonbaknode3 glusterfs]# zgrep ping_timer
> /var/log/glusterfs/home-volbackups*
>
> 
>
> /var/log/glusterfs/home-volbackups.log-20190317.gz:[2019-03-16
> 10:34:44.419605] C [rpc-clnt-ping.c:162:rpc_clnt_ping_timer_expired]
> 0-volbackups-client-3: server 1.2.3.4:49153 has not responded in the last
> 42 seconds, disconnecting.
>
> /var/log/glusterfs/home-volbackups.log-20190317.gz:[2019-03-16
> 10:34:44.419672] C [rpc-clnt-ping.c:162:rpc_clnt_ping_timer_expired]
> 0-volbackups-client-6: server 1.2.3.4:49153 has not responded in the last
> 42 seconds, disconnecting.
>
> /var/log/glusterfs/home-volbackups.log-20190317.gz:[2019-03-16
> 10:34:57.425211] C [rpc-clnt-ping.c:162:rpc_clnt_ping_timer_expired]
> 0-volbackups-client-9: server 1.2.3.4:49153 has not responded in the last
> 42 seconds, disconnecting.
>
> /var/log/glusterfs/home-volbackups.log-20190317.gz:[2019-03-16
> 11:46:25.768650] C [rpc-clnt-ping.c:162:rpc_clnt_ping_timer_expired]
> 0-volbackups-client-6: server 1.2.3.4:49153 has not responded in the last
> 42 seconds, disconnecting.
>
> /var/log/glusterfs/home-volbackups.log-20190317.gz:[2019-03-16
> 16:02:29.921450] C [rpc-clnt-ping.c:162:rpc_clnt_ping_timer_expired]
> 0-volbackups-client-3: server 1.2.3.4:49153 has not responded in the last
> 42 seconds, disconnecting.
>
> 
>
>
>
> -
>
> What was the version you saw failures in? Were there any logs matching
> with the pattern "ping_timer_expired" earlier?
>
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Transport endpoint is not connected failures in

2019-03-29 Thread brandon
Hello,

 

Yes I did find some hits on this in the following logs. We started seeing 
failures after upgrading to 5.3 from 4.6.  If you want me to check for 
something else let me know.   Thank you all on the gluster team for finding and 
fixing that problem whatever it was!

 

[root@lonbaknode3 glusterfs]# zgrep ping_timer 
/var/log/glusterfs/home-volbackups*



/var/log/glusterfs/home-volbackups.log-20190317.gz:[2019-03-16 10:34:44.419605] 
C [rpc-clnt-ping.c:162:rpc_clnt_ping_timer_expired] 0-volbackups-client-3: 
server 1.2.3.4:49153 has not responded in the last 42 seconds, disconnecting.

/var/log/glusterfs/home-volbackups.log-20190317.gz:[2019-03-16 10:34:44.419672] 
C [rpc-clnt-ping.c:162:rpc_clnt_ping_timer_expired] 0-volbackups-client-6: 
server 1.2.3.4:49153 has not responded in the last 42 seconds, disconnecting.

/var/log/glusterfs/home-volbackups.log-20190317.gz:[2019-03-16 10:34:57.425211] 
C [rpc-clnt-ping.c:162:rpc_clnt_ping_timer_expired] 0-volbackups-client-9: 
server 1.2.3.4:49153 has not responded in the last 42 seconds, disconnecting.

/var/log/glusterfs/home-volbackups.log-20190317.gz:[2019-03-16 11:46:25.768650] 
C [rpc-clnt-ping.c:162:rpc_clnt_ping_timer_expired] 0-volbackups-client-6: 
server 1.2.3.4:49153 has not responded in the last 42 seconds, disconnecting.

/var/log/glusterfs/home-volbackups.log-20190317.gz:[2019-03-16 16:02:29.921450] 
C [rpc-clnt-ping.c:162:rpc_clnt_ping_timer_expired] 0-volbackups-client-3: 
server 1.2.3.4:49153 has not responded in the last 42 seconds, disconnecting.



 

-

What was the version you saw failures in? Were there any logs matching with the 
pattern "ping_timer_expired" earlier? 

___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Transport endpoint is not connected failures in

2019-03-29 Thread brandon
Hello Nithya,

 

I removed several options that I admit I didn't quite understand and I had 
added from Google searches.  Was dumb for me to have added in the first place 
not understanding them.

 

1 of these options apparently was causing directory listing to be about 7 
seconds vs when I cut down to more minimal volume settings 1-2 seconds.   That 
is with about 14,000 files in the largest directory.

 

Before:

 

Options Reconfigured:

transport.address-family: inet

nfs.disable: on

cluster.min-free-disk: 1%

performance.cache-size: 8GB

performance.cache-max-file-size: 128MB

diagnostics.brick-log-level: WARNING

diagnostics.brick-sys-log-level: WARNING

client.event-threads: 3

performance.client-io-threads: on

performance.io-thread-count: 24

network.inode-lru-limit: 1048576

performance.parallel-readdir: on

performance.cache-invalidation: on

performance.md-cache-timeout: 600

features.cache-invalidation: on

features.cache-invalidation-timeout: 600

 

After:

 

Options Reconfigured:

performance.io-thread-count: 32

performance.client-io-threads: on

client.event-threads: 8

diagnostics.brick-sys-log-level: WARNING

diagnostics.brick-log-level: WARNING

performance.cache-max-file-size: 2MB

performance.cache-size: 256MB

cluster.min-free-disk: 1%

nfs.disable: on

transport.address-family: inet

server.event-threads: 8

 

--

Hi Brandon,

 

Which options were removed?

 

Thanks,

Nithya  

___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Transport endpoint is not connected failures in

2019-03-27 Thread Nithya Balachandran
On Wed, 27 Mar 2019 at 21:47,  wrote:

> Hello Amar and list,
>
>
>
> I wanted to follow-up to confirm that upgrading to 5.5 seem to fix the
> “Transport endpoint is not connected failures” for us.
>
>
>
> We did not have any of these failures in this past weekend backups cycle.
>
>
>
> Thank you very much for fixing whatever was the problem.
>
>
>
> I also removed some volume config options.  One or more of the settings
> was contributing to the slow directory listing.
>

Hi Brandon,

Which options were removed?

Thanks,
Nithya

>
>
> Here is our current volume info.
>
>
>
> [root@lonbaknode3 ~]# gluster volume info
>
>
>
> Volume Name: volbackups
>
> Type: Distribute
>
> Volume ID: 32bf4fe9-5450-49f8-b6aa-05471d3bdffa
>
> Status: Started
>
> Snapshot Count: 0
>
> Number of Bricks: 8
>
> Transport-type: tcp
>
> Bricks:
>
> Brick1: lonbaknode3.domain.net:/lvbackups/brick
>
> Brick2: lonbaknode4.domain.net:/lvbackups/brick
>
> Brick3: lonbaknode5.domain.net:/lvbackups/brick
>
> Brick4: lonbaknode6.domain.net:/lvbackups/brick
>
> Brick5: lonbaknode7.domain.net:/lvbackups/brick
>
> Brick6: lonbaknode8.domain.net:/lvbackups/brick
>
> Brick7: lonbaknode9.domain.net:/lvbackups/brick
>
> Brick8: lonbaknode10.domain.net:/lvbackups/brick
>
> Options Reconfigured:
>
> performance.io-thread-count: 32
>
> performance.client-io-threads: on
>
> client.event-threads: 8
>
> diagnostics.brick-sys-log-level: WARNING
>
> diagnostics.brick-log-level: WARNING
>
> performance.cache-max-file-size: 2MB
>
> performance.cache-size: 256MB
>
> cluster.min-free-disk: 1%
>
> nfs.disable: on
>
> transport.address-family: inet
>
> server.event-threads: 8
>
> [root@lonbaknode3 ~]#
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Transport endpoint is not connected failures in

2019-03-27 Thread Raghavendra Gowdappa
On Wed, Mar 27, 2019 at 9:46 PM  wrote:

> Hello Amar and list,
>
>
>
> I wanted to follow-up to confirm that upgrading to 5.5 seem to fix the
> “Transport endpoint is not connected failures” for us.
>

What was the version you saw failures in? Were there any logs matching with
the pattern "ping_timer_expired" earlier?


>
> We did not have any of these failures in this past weekend backups cycle.
>
>
>
> Thank you very much for fixing whatever was the problem.
>
>
>
> I also removed some volume config options.  One or more of the settings
> was contributing to the slow directory listing.
>
>
>
> Here is our current volume info.
>
>
>
> [root@lonbaknode3 ~]# gluster volume info
>
>
>
> Volume Name: volbackups
>
> Type: Distribute
>
> Volume ID: 32bf4fe9-5450-49f8-b6aa-05471d3bdffa
>
> Status: Started
>
> Snapshot Count: 0
>
> Number of Bricks: 8
>
> Transport-type: tcp
>
> Bricks:
>
> Brick1: lonbaknode3.domain.net:/lvbackups/brick
>
> Brick2: lonbaknode4.domain.net:/lvbackups/brick
>
> Brick3: lonbaknode5.domain.net:/lvbackups/brick
>
> Brick4: lonbaknode6.domain.net:/lvbackups/brick
>
> Brick5: lonbaknode7.domain.net:/lvbackups/brick
>
> Brick6: lonbaknode8.domain.net:/lvbackups/brick
>
> Brick7: lonbaknode9.domain.net:/lvbackups/brick
>
> Brick8: lonbaknode10.domain.net:/lvbackups/brick
>
> Options Reconfigured:
>
> performance.io-thread-count: 32
>
> performance.client-io-threads: on
>
> client.event-threads: 8
>
> diagnostics.brick-sys-log-level: WARNING
>
> diagnostics.brick-log-level: WARNING
>
> performance.cache-max-file-size: 2MB
>
> performance.cache-size: 256MB
>
> cluster.min-free-disk: 1%
>
> nfs.disable: on
>
> transport.address-family: inet
>
> server.event-threads: 8
>
> [root@lonbaknode3 ~]#
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Transport endpoint is not connected failures in

2019-03-27 Thread Thing
I have this issue, for a few days with my new setup.   I will have to get
back to you on versions but it was centos7.6 patched yesterday (27/3/2019).



On Thu, 28 Mar 2019 at 12:58, Sankarshan Mukhopadhyay <
sankarshan.mukhopadh...@gmail.com> wrote:

> On Wed, Mar 27, 2019 at 9:46 PM  wrote:
> >
> > Hello Amar and list,
> >
> >
> >
> > I wanted to follow-up to confirm that upgrading to 5.5 seem to fix the
> “Transport endpoint is not connected failures” for us.
> >
> >
> >
> > We did not have any of these failures in this past weekend backups cycle.
> >
> >
> >
> > Thank you very much for fixing whatever was the problem.
>
> As always, thank you for circling back to the list and sharing that
> the issues have been addressed.
> >
> > I also removed some volume config options.  One or more of the settings
> was contributing to the slow directory listing.
> >
> >
> >
> > Here is our current volume info.
> >
>
> This is very useful!
>
> >
> > [root@lonbaknode3 ~]# gluster volume info
> >
> >
> >
> > Volume Name: volbackups
> >
> > Type: Distribute
> >
> > Volume ID: 32bf4fe9-5450-49f8-b6aa-05471d3bdffa
> >
> > Status: Started
> >
> > Snapshot Count: 0
> >
> > Number of Bricks: 8
> >
> > Transport-type: tcp
> >
> > Bricks:
> >
> > Brick1: lonbaknode3.domain.net:/lvbackups/brick
> >
> > Brick2: lonbaknode4.domain.net:/lvbackups/brick
> >
> > Brick3: lonbaknode5.domain.net:/lvbackups/brick
> >
> > Brick4: lonbaknode6.domain.net:/lvbackups/brick
> >
> > Brick5: lonbaknode7.domain.net:/lvbackups/brick
> >
> > Brick6: lonbaknode8.domain.net:/lvbackups/brick
> >
> > Brick7: lonbaknode9.domain.net:/lvbackups/brick
> >
> > Brick8: lonbaknode10.domain.net:/lvbackups/brick
> >
> > Options Reconfigured:
> >
> > performance.io-thread-count: 32
> >
> > performance.client-io-threads: on
> >
> > client.event-threads: 8
> >
> > diagnostics.brick-sys-log-level: WARNING
> >
> > diagnostics.brick-log-level: WARNING
> >
> > performance.cache-max-file-size: 2MB
> >
> > performance.cache-size: 256MB
> >
> > cluster.min-free-disk: 1%
> >
> > nfs.disable: on
> >
> > transport.address-family: inet
> >
> > server.event-threads: 8
> >
> > [root@lonbaknode3 ~]#
> >
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Transport endpoint is not connected failures in

2019-03-27 Thread Sankarshan Mukhopadhyay
On Wed, Mar 27, 2019 at 9:46 PM  wrote:
>
> Hello Amar and list,
>
>
>
> I wanted to follow-up to confirm that upgrading to 5.5 seem to fix the 
> “Transport endpoint is not connected failures” for us.
>
>
>
> We did not have any of these failures in this past weekend backups cycle.
>
>
>
> Thank you very much for fixing whatever was the problem.

As always, thank you for circling back to the list and sharing that
the issues have been addressed.
>
> I also removed some volume config options.  One or more of the settings was 
> contributing to the slow directory listing.
>
>
>
> Here is our current volume info.
>

This is very useful!

>
> [root@lonbaknode3 ~]# gluster volume info
>
>
>
> Volume Name: volbackups
>
> Type: Distribute
>
> Volume ID: 32bf4fe9-5450-49f8-b6aa-05471d3bdffa
>
> Status: Started
>
> Snapshot Count: 0
>
> Number of Bricks: 8
>
> Transport-type: tcp
>
> Bricks:
>
> Brick1: lonbaknode3.domain.net:/lvbackups/brick
>
> Brick2: lonbaknode4.domain.net:/lvbackups/brick
>
> Brick3: lonbaknode5.domain.net:/lvbackups/brick
>
> Brick4: lonbaknode6.domain.net:/lvbackups/brick
>
> Brick5: lonbaknode7.domain.net:/lvbackups/brick
>
> Brick6: lonbaknode8.domain.net:/lvbackups/brick
>
> Brick7: lonbaknode9.domain.net:/lvbackups/brick
>
> Brick8: lonbaknode10.domain.net:/lvbackups/brick
>
> Options Reconfigured:
>
> performance.io-thread-count: 32
>
> performance.client-io-threads: on
>
> client.event-threads: 8
>
> diagnostics.brick-sys-log-level: WARNING
>
> diagnostics.brick-log-level: WARNING
>
> performance.cache-max-file-size: 2MB
>
> performance.cache-size: 256MB
>
> cluster.min-free-disk: 1%
>
> nfs.disable: on
>
> transport.address-family: inet
>
> server.event-threads: 8
>
> [root@lonbaknode3 ~]#
>
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Transport endpoint is not connected failures in

2019-03-27 Thread brandon
Hello Amar and list,

 

I wanted to follow-up to confirm that upgrading to 5.5 seem to fix the
"Transport endpoint is not connected failures" for us.  

 

We did not have any of these failures in this past weekend backups cycle.

 

Thank you very much for fixing whatever was the problem.

 

I also removed some volume config options.  One or more of the settings was
contributing to the slow directory listing.

 

Here is our current volume info.

 

[root@lonbaknode3 ~]# gluster volume info

 

Volume Name: volbackups

Type: Distribute

Volume ID: 32bf4fe9-5450-49f8-b6aa-05471d3bdffa

Status: Started

Snapshot Count: 0

Number of Bricks: 8

Transport-type: tcp

Bricks:

Brick1: lonbaknode3.domain.net:/lvbackups/brick

Brick2: lonbaknode4.domain.net:/lvbackups/brick

Brick3: lonbaknode5.domain.net:/lvbackups/brick

Brick4: lonbaknode6.domain.net:/lvbackups/brick

Brick5: lonbaknode7.domain.net:/lvbackups/brick

Brick6: lonbaknode8.domain.net:/lvbackups/brick

Brick7: lonbaknode9.domain.net:/lvbackups/brick

Brick8: lonbaknode10.domain.net:/lvbackups/brick

Options Reconfigured:

performance.io-thread-count: 32

performance.client-io-threads: on

client.event-threads: 8

diagnostics.brick-sys-log-level: WARNING

diagnostics.brick-log-level: WARNING

performance.cache-max-file-size: 2MB

performance.cache-size: 256MB

cluster.min-free-disk: 1%

nfs.disable: on

transport.address-family: inet

server.event-threads: 8

[root@lonbaknode3 ~]#

___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Transport endpoint is not connected failures in 5.3 under high I/O load

2019-03-19 Thread Amar Tumballi Suryanarayan
Hi Brandon,

There were few concerns raised about 5.3 issues recently, and we fixed some
of them and made 5.5 (in 5.4 we faced an upgrade issue, so 5.5 is
recommended upgrade version).

Can you please upgrade to 5.5 version?

-Amar


On Mon, Mar 18, 2019 at 10:16 PM  wrote:

> Hello list,
>
>
>
> We are having critical failures under load of CentOS7 glusterfs 5.3 with
> our servers losing their local mount point with the issue - "Transport
> endpoint is not connected"
>
>
>
> Not sure if it is related but the logs are full of the following message.
>
>
>
> [2019-03-18 14:00:02.656876] E [MSGID: 101191]
> [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch
> handler
>
>
>
> We operate multiple separate glusterfs distributed clusters of about 6-8
> nodes.  Our 2 biggest, separate, and most I/O active glusterfs clusters are
> both having the issues.
>
>
>
> We are trying to use glusterfs as a unified file system for pureftpd
> backup services for a VPS service.  We have a relatively small backup
> window of the weekend where all our servers backup at the same time.  When
> backups start early on Saturday it causes a sustained massive amount of FTP
> file upload I/O for around 48 hours with all the compressed backup files
> being uploaded.   For our london 8 node cluster for example there is about
> 45 TB of uploads in ~48 hours currently.
>
>
>
> We do have some other smaller issues with directory listing under this
> load too but, it has been working for a couple years since 3.x until we've
> updated recently and randomly now all servers are losing their glusterfs
> mount with the "Transport endpoint is not connected" issue.
>
>
>
> Our glusterfs servers are all mostly the same with small variations.
> Mostly they are supermicro E3 cpu, 16 gb ram, LSI raid10 hdd (with and
> without bbu).  Drive arrays vary between 4-16 sata3 hdd drives each node
> depending on if the servers are older or newer. Firmware is kept up-to-date
> as well as running the latest LSI compiled driver.  the newer 16 drive
> backup servers have 2 x 1Gbit LACP teamed interfaces also.
>
>
>
> [root@lonbaknode3 ~]# uname -r
>
> 3.10.0-957.5.1.el7.x86_64
>
>
>
> [root@lonbaknode3 ~]# rpm -qa |grep gluster
>
> centos-release-gluster5-1.0-1.el7.centos.noarch
>
> glusterfs-libs-5.3-2.el7.x86_64
>
> glusterfs-api-5.3-2.el7.x86_64
>
> glusterfs-5.3-2.el7.x86_64
>
> glusterfs-cli-5.3-2.el7.x86_64
>
> glusterfs-client-xlators-5.3-2.el7.x86_64
>
> glusterfs-server-5.3-2.el7.x86_64
>
> glusterfs-fuse-5.3-2.el7.x86_64
>
> [root@lonbaknode3 ~]#
>
>
>
> [root@lonbaknode3 ~]# gluster volume info all
>
>
>
> Volume Name: volbackups
>
> Type: Distribute
>
> Volume ID: 32bf4fe9-5450-49f8-b6aa-05471d3bdffa
>
> Status: Started
>
> Snapshot Count: 0
>
> Number of Bricks: 8
>
> Transport-type: tcp
>
> Bricks:
>
> Brick1: lonbaknode3.domain.net:/lvbackups/brick
>
> Brick2: lonbaknode4.domain.net:/lvbackups/brick
>
> Brick3: lonbaknode5.domain.net:/lvbackups/brick
>
> Brick4: lonbaknode6.domain.net:/lvbackups/brick
>
> Brick5: lonbaknode7.domain.net:/lvbackups/brick
>
> Brick6: lonbaknode8.domain.net:/lvbackups/brick
>
> Brick7: lonbaknode9.domain.net:/lvbackups/brick
>
> Brick8: lonbaknode10.domain.net:/lvbackups/brick
>
> Options Reconfigured:
>
> transport.address-family: inet
>
> nfs.disable: on
>
> cluster.min-free-disk: 1%
>
> performance.cache-size: 8GB
>
> performance.cache-max-file-size: 128MB
>
> diagnostics.brick-log-level: WARNING
>
> diagnostics.brick-sys-log-level: WARNING
>
> client.event-threads: 3
>
> performance.client-io-threads: on
>
> performance.io-thread-count: 24
>
> network.inode-lru-limit: 1048576
>
> performance.parallel-readdir: on
>
> performance.cache-invalidation: on
>
> performance.md-cache-timeout: 600
>
> features.cache-invalidation: on
>
> features.cache-invalidation-timeout: 600
>
> [root@lonbaknode3 ~]#
>
>
>
> Mount output shows the following:
>
>
>
> lonbaknode3.domain.net:/volbackups on /home/volbackups type
> fuse.glusterfs
> (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)
>
>
>
> If you notice anything in our volume or mount settings above missing or
> otherwise bad feel free to let us know.  Still learning this glusterfs.  I
> tried searching for any recommended performance settings but, it's not
> always clear which setting is most applicable or beneficial to our workload.
>
>
>
> I have just found this post that looks like it is the same issues.
>
>
>
> https://lists.gluster.org/pipermail/gluster-users/2019-March/035958.html
>
>
>
> We have not yet tried the suggestion of "performance.write-behind: off"
> but, we will do so if that is recommended.
>
>
>
> Could someone knowledgeable advise anything for these issues?
>
>
>
> If any more information is needed do let us know.
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users



-- 
Amar Tumballi (ama

[Gluster-users] Transport endpoint is not connected failures in 5.3 under high I/O load

2019-03-18 Thread brandon
Hello list,

 

We are having critical failures under load of CentOS7 glusterfs 5.3 with our
servers losing their local mount point with the issue - "Transport endpoint
is not connected"

 

Not sure if it is related but the logs are full of the following message.

 

[2019-03-18 14:00:02.656876] E [MSGID: 101191]
[event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch
handler

 

We operate multiple separate glusterfs distributed clusters of about 6-8
nodes.  Our 2 biggest, separate, and most I/O active glusterfs clusters are
both having the issues. 

 

We are trying to use glusterfs as a unified file system for pureftpd backup
services for a VPS service.  We have a relatively small backup window of the
weekend where all our servers backup at the same time.  When backups start
early on Saturday it causes a sustained massive amount of FTP file upload
I/O for around 48 hours with all the compressed backup files being uploaded.
For our london 8 node cluster for example there is about 45 TB of uploads in
~48 hours currently.

 

We do have some other smaller issues with directory listing under this load
too but, it has been working for a couple years since 3.x until we've
updated recently and randomly now all servers are losing their glusterfs
mount with the "Transport endpoint is not connected" issue.

 

Our glusterfs servers are all mostly the same with small variations.  Mostly
they are supermicro E3 cpu, 16 gb ram, LSI raid10 hdd (with and without
bbu).  Drive arrays vary between 4-16 sata3 hdd drives each node depending
on if the servers are older or newer. Firmware is kept up-to-date as well as
running the latest LSI compiled driver.  the newer 16 drive backup servers
have 2 x 1Gbit LACP teamed interfaces also.

 

[root@lonbaknode3 ~]# uname -r

3.10.0-957.5.1.el7.x86_64

 

[root@lonbaknode3 ~]# rpm -qa |grep gluster

centos-release-gluster5-1.0-1.el7.centos.noarch

glusterfs-libs-5.3-2.el7.x86_64

glusterfs-api-5.3-2.el7.x86_64

glusterfs-5.3-2.el7.x86_64

glusterfs-cli-5.3-2.el7.x86_64

glusterfs-client-xlators-5.3-2.el7.x86_64

glusterfs-server-5.3-2.el7.x86_64

glusterfs-fuse-5.3-2.el7.x86_64

[root@lonbaknode3 ~]#

 

[root@lonbaknode3 ~]# gluster volume info all

 

Volume Name: volbackups

Type: Distribute

Volume ID: 32bf4fe9-5450-49f8-b6aa-05471d3bdffa

Status: Started

Snapshot Count: 0

Number of Bricks: 8

Transport-type: tcp

Bricks:

Brick1: lonbaknode3.domain.net:/lvbackups/brick

Brick2: lonbaknode4.domain.net:/lvbackups/brick

Brick3: lonbaknode5.domain.net:/lvbackups/brick

Brick4: lonbaknode6.domain.net:/lvbackups/brick

Brick5: lonbaknode7.domain.net:/lvbackups/brick

Brick6: lonbaknode8.domain.net:/lvbackups/brick

Brick7: lonbaknode9.domain.net:/lvbackups/brick

Brick8: lonbaknode10.domain.net:/lvbackups/brick

Options Reconfigured:

transport.address-family: inet

nfs.disable: on

cluster.min-free-disk: 1%

performance.cache-size: 8GB

performance.cache-max-file-size: 128MB

diagnostics.brick-log-level: WARNING

diagnostics.brick-sys-log-level: WARNING

client.event-threads: 3

performance.client-io-threads: on

performance.io-thread-count: 24

network.inode-lru-limit: 1048576

performance.parallel-readdir: on

performance.cache-invalidation: on

performance.md-cache-timeout: 600

features.cache-invalidation: on

features.cache-invalidation-timeout: 600

[root@lonbaknode3 ~]#

 

Mount output shows the following:

 

lonbaknode3.domain.net:/volbackups on /home/volbackups type fuse.glusterfs
(rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=1
31072)

 

If you notice anything in our volume or mount settings above missing or
otherwise bad feel free to let us know.  Still learning this glusterfs.  I
tried searching for any recommended performance settings but, it's not
always clear which setting is most applicable or beneficial to our workload.

 

I have just found this post that looks like it is the same issues.

 

https://lists.gluster.org/pipermail/gluster-users/2019-March/035958.html

 

We have not yet tried the suggestion of "performance.write-behind: off" but,
we will do so if that is recommended.

 

Could someone knowledgeable advise anything for these issues?   

 

If any more information is needed do let us know.

___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Transport endpoint is not connected : issue

2018-09-04 Thread Johnson, Tim
Not sure if this is a good depiction of this issue, as after the shutdown of 
all the hosts (all three (2)data (1) arbiter) we were able to get the double 
processes per volume to stop.
But anyways here is the output of ps ,  Thanks again. :






ps aux |grep gluster
root   3412  0.0  0.3 3870120 205064 ?  Ssl  Aug30   5:47 
/usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO
root   5521  1.9  0.0 3169256 63580 ?   Ssl  Aug30 136:01 
/usr/sbin/glusterfsd -s fs1-tier3.rrc.local --volfile-id 
ovirt_engine.fs1-tier3.rrc.local.bricks-brick0-ovirt_engine -p 
/var/run/gluster/vols/ovirt_engine/fs1-tier3.rrc.local-bricks-brick0-ovirt_engine.pid
 -S /var/run/gluster/51a5a80d87661c2c4f9479e59a19b7cc.socket --brick-name 
/bricks/brick0/ovirt_engine -l 
/var/log/glusterfs/bricks/bricks-brick0-ovirt_engine.log --xlator-option 
*-posix.glusterd-uuid=ab34955c-a0ba-4f1e-8bac-a448f52e145f --brick-port 49152 
--xlator-option ovirt_engine-server.listen-port=49152
root   5528  0.1  0.0 2182576 46092 ?   Ssl  Aug30   8:02 
/usr/sbin/glusterfsd -s fs1-tier3.rrc.local --volfile-id 
ovirt_export.fs1-tier3.rrc.local.bricks-brick1-ovirt_export -p 
/var/run/gluster/vols/ovirt_export/fs1-tier3.rrc.local-bricks-brick1-ovirt_export.pid
 -S /var/run/gluster/ea5558bf22be5fae3d6168a3d07415ba.socket --brick-name 
/bricks/brick1/ovirt_export -l 
/var/log/glusterfs/bricks/bricks-brick1-ovirt_export.log --xlator-option 
*-posix.glusterd-uuid=ab34955c-a0ba-4f1e-8bac-a448f52e145f --brick-port 49153 
--xlator-option ovirt_export-server.listen-port=49153
root   5538  0.1  0.0 2314168 50512 ?   Ssl  Aug30   8:04 
/usr/sbin/glusterfsd -s fs1-tier3.rrc.local --volfile-id 
ovirt_isos.fs1-tier3.rrc.local.bricks-brick1-ovirt_isos -p 
/var/run/gluster/vols/ovirt_isos/fs1-tier3.rrc.local-bricks-brick1-ovirt_isos.pid
 -S /var/run/gluster/25acf05d530c8e041298c362b1589a51.socket --brick-name 
/bricks/brick1/ovirt_isos -l 
/var/log/glusterfs/bricks/bricks-brick1-ovirt_isos.log --xlator-option 
*-posix.glusterd-uuid=ab34955c-a0ba-4f1e-8bac-a448f52e145f --brick-port 49154 
--xlator-option ovirt_isos-server.listen-port=49154
root   5549  0.0  0.0 1895584 47136 ?   Ssl  Aug30   1:20 
/usr/sbin/glusterfsd -s fs1-tier3.rrc.local --volfile-id 
ovirt_mmpf_samba.fs1-tier3.rrc.local.bricks-brick2-ovirt_mmpf_samba -p 
/var/run/gluster/vols/ovirt_mmpf_samba/fs1-tier3.rrc.local-bricks-brick2-ovirt_mmpf_samba.pid
 -S /var/run/gluster/a65c4e775a4fb7bbaccb8807de3e1413.socket --brick-name 
/bricks/brick2/ovirt_mmpf_samba -l 
/var/log/glusterfs/bricks/bricks-brick2-ovirt_mmpf_samba.log --xlator-option 
*-posix.glusterd-uuid=ab34955c-a0ba-4f1e-8bac-a448f52e145f --brick-port 49155 
--xlator-option ovirt_mmpf_samba-server.listen-port=49155
root   5559 19.9  0.0 3169256 63020 ?   Ssl  Aug30 1375:48 
/usr/sbin/glusterfsd -s fs1-tier3.rrc.local --volfile-id 
ovirt_vms.fs1-tier3.rrc.local.bricks-brick1-ovirt_vms -p 
/var/run/gluster/vols/ovirt_vms/fs1-tier3.rrc.local-bricks-brick1-ovirt_vms.pid 
-S /var/run/gluster/8bd29ece67b8bb364fa9038d630c5a26.socket --brick-name 
/bricks/brick1/ovirt_vms -l 
/var/log/glusterfs/bricks/bricks-brick1-ovirt_vms.log --xlator-option 
*-posix.glusterd-uuid=ab34955c-a0ba-4f1e-8bac-a448f52e145f --brick-port 49156 
--xlator-option ovirt_vms-server.listen-port=49156
root 190876 28.4  0.1 1454264 83836 ?   Ssl  07:59  17:44 
/usr/sbin/glusterfsd -s fs1-tier3.rrc.local --volfile-id 
ccts_oracle.fs1-tier3.rrc.local.bricks-brick3-ccts_oracle -p 
/var/run/gluster/vols/ccts_oracle/fs1-tier3.rrc.local-bricks-brick3-ccts_oracle.pid
 -S /var/run/gluster/db718796520ecaf218d3889c9af2d3a5.socket --brick-name 
/bricks/brick3/ccts_oracle -l 
/var/log/glusterfs/bricks/bricks-brick3-ccts_oracle.log --xlator-option 
*-posix.glusterd-uuid=ab34955c-a0ba-4f1e-8bac-a448f52e145f --brick-port 49157 
--xlator-option ccts_oracle-server.listen-port=49157
root 190901  0.0  0.0 2318040 21636 ?   Ssl  07:59   0:01 
/usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p 
/var/run/gluster/glustershd/glustershd.pid -l /var/log/glusterfs/glustershd.log 
-S /var/run/gluster/1e9d2979118671294386bfac399847c5.socket --xlator-option 
*replicate*.node-uuid=ab34955c-a0ba-4f1e-8bac-a448f52e145f
root 195210  0.0  0.0 112708   964 pts/0S+   09:01   0:00 grep 
--color=auto gluster


From: Karthik Subrahmanya 
Date: Monday, September 3, 2018 at 6:36 AM
To: "Johnson, Tim" 
Cc: Atin Mukherjee , Ravishankar N 
, gluster-users , "Chlipala, 
George Edward" 
Subject: Re: [Gluster-users] Transport endpoint is not connected : issue


On Mon, Sep 3, 2018 at 11:17 AM Karthik Subrahmanya 
mailto:ksubr...@redhat.com>> wrote:
Hey,

We need some more information to debug this.
I think you missed to send the output of 'gluster volume info '.
Can you also provide the bricks, shd and glfsheal logs as well?
In the setup how many peers are present? You also mentioned that "one of

Re: [Gluster-users] Transport endpoint is not connected : issue

2018-09-03 Thread Karthik Subrahmanya
On Mon, Sep 3, 2018 at 11:17 AM Karthik Subrahmanya 
wrote:

> Hey,
>
> We need some more information to debug this.
> I think you missed to send the output of 'gluster volume info '.
> Can you also provide the bricks, shd and glfsheal logs as well?
> In the setup how many peers are present? You also mentioned that "one of
> the file servers have two processes for each of the volumes instead of one
> per volume", which process are you talking about here?
>
Also provide the "ps aux | grep gluster" output.

>
> Regards,
> Karthik
>
> On Sat, Sep 1, 2018 at 12:10 AM Johnson, Tim  wrote:
>
>> Thanks for the reply.
>>
>>
>>
>>I have attached the gluster.log file from the host that it is
>> happening to at this time.
>>
>> It does change which host it does this on.
>>
>>
>>
>> Thanks.
>>
>>
>>
>> *From: *Atin Mukherjee 
>> *Date: *Friday, August 31, 2018 at 1:03 PM
>> *To: *"Johnson, Tim" 
>> *Cc: *Karthik Subrahmanya , Ravishankar N <
>> ravishan...@redhat.com>, "gluster-users@gluster.org" <
>> gluster-users@gluster.org>
>> *Subject: *Re: [Gluster-users] Transport endpoint is not connected :
>> issue
>>
>>
>>
>> Can you please pass all the gluster log files from the server where the
>> transport end point not connected error is reported? As restarting glusterd
>> didn’t solve this issue, I believe this isn’t a stale port problem but
>> something else. Also please provide the output of ‘gluster v info ’
>>
>>
>>
>> (@cc Ravi, Karthik)
>>
>>
>>
>> On Fri, 31 Aug 2018 at 23:24, Johnson, Tim  wrote:
>>
>> Hello all,
>>
>>
>>
>>   We have a gluster replicate (with arbiter)  volumes that we are
>> getting “Transport endpoint is not connected” with on a rotating basis
>>  from each of the two file servers, and a third host that has the arbiter
>> bricks on.
>>
>> This is happening when trying to run a heal on all the volumes on the
>> gluster hosts   When I get the status of all the volumes all looks good.
>>
>>This behavior seems to be a forshadowing of the gluster volumes
>> becoming unresponsive to our vm cluster.  As well as one of the file
>> servers have two processes for each of the volumes instead of one per
>> volume. Eventually the affected file server
>>
>> will drop off the listed peers. Restarting glusterd/glusterfsd on the
>> affected file server does not take care of the issue, we have to bring down
>> both file
>>
>> Servers due to the volumes not being seen by the vm cluster after the
>> errors start occurring. I had seen that there were bug reports about the
>> “Transport endpoint is not connected” on earlier versions of Gluster
>> however had thought that
>>
>> It had been addressed.
>>
>>  Dmesg did have some entries for “a possible syn flood on port *”
>> which we changed the  sysctl to “net.ipv4.tcp_max_syn_backlog = 2048” which
>> seemed to help the syn flood messages but not the underlying volume issues.
>>
>> I have put the versions of all the Gluster packages installed below
>> as well as the   “Heal” and “Status” commands showing the volumes are
>>
>>
>>
>>This has just started happening but cannot definitively say if
>> this started occurring after an update or not.
>>
>>
>>
>>
>>
>> Thanks for any assistance.
>>
>>
>>
>>
>>
>> Running Heal  :
>>
>>
>>
>> gluster volume heal ovirt_engine info
>>
>> Brick 1.rrc.local:/bricks/brick0/ovirt_engine
>>
>> Status: Connected
>>
>> Number of entries: 0
>>
>>
>>
>> Brick 3.rrc.local:/bricks/brick0/ovirt_engine
>>
>> Status: Transport endpoint is not connected
>>
>> Number of entries: -
>>
>>
>>
>> Brick *3.rrc.local:/bricks/arb-brick/ovirt_engine
>>
>> Status: Transport endpoint is not connected
>>
>> Number of entries: -
>>
>>
>>
>>
>>
>> Running status :
>>
>>
>>
>> gluster volume status ovirt_engine
>>
>> Status of volume: ovirt_engine
>>
>> Gluster process TCP Port  RDMA Port  Online
>> Pid
>>
>>
>> --
>>
>> Brick*.rrc.local:/bricks/brick0/ov
>>
>> irt_engine  

Re: [Gluster-users] Transport endpoint is not connected : issue

2018-09-02 Thread Karthik Subrahmanya
Hey,

We need some more information to debug this.
I think you missed to send the output of 'gluster volume info '.
Can you also provide the bricks, shd and glfsheal logs as well?
In the setup how many peers are present? You also mentioned that "one of
the file servers have two processes for each of the volumes instead of one
per volume", which process are you talking about here?

Regards,
Karthik

On Sat, Sep 1, 2018 at 12:10 AM Johnson, Tim  wrote:

> Thanks for the reply.
>
>
>
>I have attached the gluster.log file from the host that it is happening
> to at this time.
>
> It does change which host it does this on.
>
>
>
> Thanks.
>
>
>
> *From: *Atin Mukherjee 
> *Date: *Friday, August 31, 2018 at 1:03 PM
> *To: *"Johnson, Tim" 
> *Cc: *Karthik Subrahmanya , Ravishankar N <
> ravishan...@redhat.com>, "gluster-users@gluster.org" <
> gluster-users@gluster.org>
> *Subject: *Re: [Gluster-users] Transport endpoint is not connected : issue
>
>
>
> Can you please pass all the gluster log files from the server where the
> transport end point not connected error is reported? As restarting glusterd
> didn’t solve this issue, I believe this isn’t a stale port problem but
> something else. Also please provide the output of ‘gluster v info ’
>
>
>
> (@cc Ravi, Karthik)
>
>
>
> On Fri, 31 Aug 2018 at 23:24, Johnson, Tim  wrote:
>
> Hello all,
>
>
>
>   We have a gluster replicate (with arbiter)  volumes that we are
> getting “Transport endpoint is not connected” with on a rotating basis
>  from each of the two file servers, and a third host that has the arbiter
> bricks on.
>
> This is happening when trying to run a heal on all the volumes on the
> gluster hosts   When I get the status of all the volumes all looks good.
>
>This behavior seems to be a forshadowing of the gluster volumes
> becoming unresponsive to our vm cluster.  As well as one of the file
> servers have two processes for each of the volumes instead of one per
> volume. Eventually the affected file server
>
> will drop off the listed peers. Restarting glusterd/glusterfsd on the
> affected file server does not take care of the issue, we have to bring down
> both file
>
> Servers due to the volumes not being seen by the vm cluster after the
> errors start occurring. I had seen that there were bug reports about the
> “Transport endpoint is not connected” on earlier versions of Gluster
> however had thought that
>
> It had been addressed.
>
>  Dmesg did have some entries for “a possible syn flood on port *”
> which we changed the  sysctl to “net.ipv4.tcp_max_syn_backlog = 2048” which
> seemed to help the syn flood messages but not the underlying volume issues.
>
> I have put the versions of all the Gluster packages installed below as
> well as the   “Heal” and “Status” commands showing the volumes are
>
>
>
>This has just started happening but cannot definitively say if this
> started occurring after an update or not.
>
>
>
>
>
> Thanks for any assistance.
>
>
>
>
>
> Running Heal  :
>
>
>
> gluster volume heal ovirt_engine info
>
> Brick 1.rrc.local:/bricks/brick0/ovirt_engine
>
> Status: Connected
>
> Number of entries: 0
>
>
>
> Brick 3.rrc.local:/bricks/brick0/ovirt_engine
>
> Status: Transport endpoint is not connected
>
> Number of entries: -
>
>
>
> Brick *3.rrc.local:/bricks/arb-brick/ovirt_engine
>
> Status: Transport endpoint is not connected
>
> Number of entries: -
>
>
>
>
>
> Running status :
>
>
>
> gluster volume status ovirt_engine
>
> Status of volume: ovirt_engine
>
> Gluster process TCP Port  RDMA Port  Online
> Pid
>
>
> --
>
> Brick*.rrc.local:/bricks/brick0/ov
>
> irt_engine  49152 0  Y
> 5521
>
> Brick fs2-tier3.rrc.local:/bricks/brick0/ov
>
> irt_engine  49152 0  Y
> 6245
>
> Brick .rrc.local:/bricks/arb-b
>
> rick/ovirt_engine   49152 0  Y
> 3526
>
> Self-heal Daemon on localhost   N/A   N/AY
> 5509
>
> Self-heal Daemon on ***.rrc.local N/A   N/AY   6218
>
> Self-heal Daemon on ***.rrc.local   N/A   N/AY   3501
>
> Self-heal Daemon on .rrc.local N/A   N/AY   3657
>
> Self-heal Daemon on *.rrc.local   N/A   N/AY   3753
>

Re: [Gluster-users] Transport endpoint is not connected : issue

2018-08-31 Thread Atin Mukherjee
Can you please pass all the gluster log files from the server where the
transport end point not connected error is reported? As restarting glusterd
didn’t solve this issue, I believe this isn’t a stale port problem but
something else. Also please provide the output of ‘gluster v info ’

(@cc Ravi, Karthik)

On Fri, 31 Aug 2018 at 23:24, Johnson, Tim  wrote:

> Hello all,
>
>
>
>   We have a gluster replicate (with arbiter)  volumes that we are
> getting “Transport endpoint is not connected” with on a rotating basis
>  from each of the two file servers, and a third host that has the arbiter
> bricks on.
>
> This is happening when trying to run a heal on all the volumes on the
> gluster hosts   When I get the status of all the volumes all looks good.
>
>This behavior seems to be a forshadowing of the gluster volumes
> becoming unresponsive to our vm cluster.  As well as one of the file
> servers have two processes for each of the volumes instead of one per
> volume. Eventually the affected file server
>
> will drop off the listed peers. Restarting glusterd/glusterfsd on the
> affected file server does not take care of the issue, we have to bring down
> both file
>
> Servers due to the volumes not being seen by the vm cluster after the
> errors start occurring. I had seen that there were bug reports about the
> “Transport endpoint is not connected” on earlier versions of Gluster
> however had thought that
>
> It had been addressed.
>
>  Dmesg did have some entries for “a possible syn flood on port *”
> which we changed the  sysctl to “net.ipv4.tcp_max_syn_backlog = 2048” which
> seemed to help the syn flood messages but not the underlying volume issues.
>
> I have put the versions of all the Gluster packages installed below as
> well as the   “Heal” and “Status” commands showing the volumes are
>
>
>
>This has just started happening but cannot definitively say if this
> started occurring after an update or not.
>
>
>
>
>
> Thanks for any assistance.
>
>
>
>
>
> Running Heal  :
>
>
>
> gluster volume heal ovirt_engine info
>
> Brick 1.rrc.local:/bricks/brick0/ovirt_engine
>
> Status: Connected
>
> Number of entries: 0
>
>
>
> Brick 3.rrc.local:/bricks/brick0/ovirt_engine
>
> Status: Transport endpoint is not connected
>
> Number of entries: -
>
>
>
> Brick *3.rrc.local:/bricks/arb-brick/ovirt_engine
>
> Status: Transport endpoint is not connected
>
> Number of entries: -
>
>
>
>
>
> Running status :
>
>
>
> gluster volume status ovirt_engine
>
> Status of volume: ovirt_engine
>
> Gluster process TCP Port  RDMA Port  Online
> Pid
>
>
> --
>
> Brick*.rrc.local:/bricks/brick0/ov
>
> irt_engine  49152 0  Y
> 5521
>
> Brick fs2-tier3.rrc.local:/bricks/brick0/ov
>
> irt_engine  49152 0  Y
> 6245
>
> Brick .rrc.local:/bricks/arb-b
>
> rick/ovirt_engine   49152 0  Y
> 3526
>
> Self-heal Daemon on localhost   N/A   N/AY
> 5509
>
> Self-heal Daemon on ***.rrc.local N/A   N/AY   6218
>
> Self-heal Daemon on ***.rrc.local   N/A   N/AY   3501
>
> Self-heal Daemon on .rrc.local N/A   N/AY   3657
>
> Self-heal Daemon on *.rrc.local   N/A   N/AY   3753
>
> Self-heal Daemon on .rrc.local N/A   N/AY   17284
>
>
>
> Task Status of Volume ovirt_engine
>
>
> --
>
> There are no active volume tasks
>
>
>
>
>
>
>
>
>
> /etc/glusterd.vol.   :
>
>
>
>
>
> volume management
>
> type mgmt/glusterd
>
> option working-directory /var/lib/glusterd
>
> option transport-type socket,rdma
>
> option transport.socket.keepalive-time 10
>
> option transport.socket.keepalive-interval 2
>
> option transport.socket.read-fail-log off
>
> option ping-timeout 0
>
> option event-threads 1
>
> option rpc-auth-allow-insecure on
>
> #   option transport.address-family inet6
>
> #   option base-port 49152
>
> end-volume
>
>
>
>
>
>
>
>
>
>
>
> rpm -qa |grep gluster
>
> glusterfs-3.12.13-1.el7.x86_64
>
> glusterfs-gnfs-3.12.13-1.el7.x86_64
>
> glusterfs-api-3.12.13-1.el7.x86_64
>
> glusterfs-cli-3.12.13-1.el7.x86_64
>
> glusterfs-client-xlators-3.12.13-1.el7.x86_64
>
> glusterfs-fuse-3.12.13-1.el7.x86_64
>
> centos-release-gluster312-1.0-2.el7.centos.noarch
>
> glusterfs-rdma-3.12.13-1.el7.x86_64
>
> glusterfs-libs-3.12.13-1.el7.x86_64
>
> glusterfs-server-3.12.13-1.el7.x86_64
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users

-- 
- Atin (atinm)
___
Gluster-users mailing list
Gluster-users@gluster.or

[Gluster-users] Transport endpoint is not connected : issue

2018-08-31 Thread Johnson, Tim
Hello all,

  We have a gluster replicate (with arbiter)  volumes that we are getting 
“Transport endpoint is not connected” with on a rotating basis  from each of 
the two file servers, and a third host that has the arbiter bricks on.
This is happening when trying to run a heal on all the volumes on the gluster 
hosts   When I get the status of all the volumes all looks good.
   This behavior seems to be a forshadowing of the gluster volumes becoming 
unresponsive to our vm cluster.  As well as one of the file servers have two 
processes for each of the volumes instead of one per volume. Eventually the 
affected file server
will drop off the listed peers. Restarting glusterd/glusterfsd on the affected 
file server does not take care of the issue, we have to bring down both file
Servers due to the volumes not being seen by the vm cluster after the errors 
start occurring. I had seen that there were bug reports about the “Transport 
endpoint is not connected” on earlier versions of Gluster however had thought 
that
It had been addressed.
 Dmesg did have some entries for “a possible syn flood on port *” which we 
changed the  sysctl to “net.ipv4.tcp_max_syn_backlog = 2048” which seemed to 
help the syn flood messages but not the underlying volume issues.
I have put the versions of all the Gluster packages installed below as well 
as the   “Heal” and “Status” commands showing the volumes are

   This has just started happening but cannot definitively say if this 
started occurring after an update or not.


Thanks for any assistance.


Running Heal  :

gluster volume heal ovirt_engine info
Brick 1.rrc.local:/bricks/brick0/ovirt_engine
Status: Connected
Number of entries: 0

Brick 3.rrc.local:/bricks/brick0/ovirt_engine
Status: Transport endpoint is not connected
Number of entries: -

Brick *3.rrc.local:/bricks/arb-brick/ovirt_engine
Status: Transport endpoint is not connected
Number of entries: -


Running status :

gluster volume status ovirt_engine
Status of volume: ovirt_engine
Gluster process TCP Port  RDMA Port  Online  Pid
--
Brick*.rrc.local:/bricks/brick0/ov
irt_engine  49152 0  Y   5521
Brick fs2-tier3.rrc.local:/bricks/brick0/ov
irt_engine  49152 0  Y   6245
Brick .rrc.local:/bricks/arb-b
rick/ovirt_engine   49152 0  Y   3526
Self-heal Daemon on localhost   N/A   N/AY   5509
Self-heal Daemon on ***.rrc.local N/A   N/AY   6218
Self-heal Daemon on ***.rrc.local   N/A   N/AY   3501
Self-heal Daemon on .rrc.local N/A   N/AY   3657
Self-heal Daemon on *.rrc.local   N/A   N/AY   3753
Self-heal Daemon on .rrc.local N/A   N/AY   17284

Task Status of Volume ovirt_engine
--
There are no active volume tasks




/etc/glusterd.vol.   :


volume management
type mgmt/glusterd
option working-directory /var/lib/glusterd
option transport-type socket,rdma
option transport.socket.keepalive-time 10
option transport.socket.keepalive-interval 2
option transport.socket.read-fail-log off
option ping-timeout 0
option event-threads 1
option rpc-auth-allow-insecure on
#   option transport.address-family inet6
#   option base-port 49152
end-volume





rpm -qa |grep gluster
glusterfs-3.12.13-1.el7.x86_64
glusterfs-gnfs-3.12.13-1.el7.x86_64
glusterfs-api-3.12.13-1.el7.x86_64
glusterfs-cli-3.12.13-1.el7.x86_64
glusterfs-client-xlators-3.12.13-1.el7.x86_64
glusterfs-fuse-3.12.13-1.el7.x86_64
centos-release-gluster312-1.0-2.el7.centos.noarch
glusterfs-rdma-3.12.13-1.el7.x86_64
glusterfs-libs-3.12.13-1.el7.x86_64
glusterfs-server-3.12.13-1.el7.x86_64
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] Transport endpoint is not connected with fuse

2017-05-23 Thread Flemming Frandsen
I'm getting Transport endpoint is not connected on two client nodes, but I
don't see any errors on the server, any idea where to look next?

I'm using gluster 3.10 from the ppa for ubuntu-server, any idea if this
version is known to be broken?

I'm mounting the file systems using fuse, is that supposed to work?

I'm using tcp on a 40 Gb/s Infiniband with ipoib and so far it has seemed
to be reliable enough when I tested it, does anybody use IB?


+--+
[2017-05-23 12:54:47.716519] I [rpc-clnt.c:2000:rpc_clnt_reconfig]
0-prism-client-0: changing port to 49158 (from 0)
[2017-05-23 12:54:47.717202] I [MSGID: 114057]
[client-handshake.c:1451:select_server_supported_programs]
0-prism-client-0: Using Program GlusterFS 3.3, Num (1298437), Version (330)
[2017-05-23 12:54:47.717703] I [MSGID: 114046]
[client-handshake.c:1216:client_setvolume_cbk] 0-prism-client-0: Connected
to prism-client-0, attached to remote volume '/bricks/prism/glenlivet'.
[2017-05-23 12:54:47.717729] I [MSGID: 114047]
[client-handshake.c:1227:client_setvolume_cbk] 0-prism-client-0: Server and
Client lk-version numbers are not same, reopening the fds
[2017-05-23 12:54:47.718513] I [MSGID: 114035]
[client-handshake.c:202:client_set_lk_version_cbk] 0-prism-client-0: Server
lk version = 1
[2017-05-23 12:54:47.718623] I [fuse-bridge.c:4146:fuse_init]
0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.24 kernel
7.23
[2017-05-23 12:54:47.718643] I [fuse-bridge.c:4831:fuse_graph_sync] 0-fuse:
switched to graph 0
[2017-05-23 13:04:26.767636] C
[rpc-clnt-ping.c:160:rpc_clnt_ping_timer_expired] 0-prism-client-0: server
192.168.42.118:49158 has not responded in the last 42 seconds,
disconnecting.
[2017-05-23 13:04:26.767821] I [MSGID: 114018]
[client.c:2276:client_rpc_notify] 0-prism-client-0: disconnected from
prism-client-0. Client process will keep trying to connect to glusterd
until brick's port is available
[2017-05-23 13:04:26.768308] E [rpc-clnt.c:365:saved_frames_unwind] (-->
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7f950c64962b]
(-->
/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_unwind+0x1d1)[0x7f950c416eb1]
(-->
/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f950c416fce]
(-->
/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x94)[0x7f950c418654]
(-->
/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_notify+0x288)[0x7f950c419138]
) 0-prism-client-0: forced unwinding frame type(GlusterFS 3.3)
op(READDIRP(40)) called at 2017-05-23 13:03:43.07 (xid=0x63a7)
[2017-05-23 13:04:26.768337] W [MSGID: 114031]
[client-rpc-fops.c:2640:client3_3_readdirp_cbk] 0-prism-client-0: remote
operation failed [Transport endpoint is not connected]
[2017-05-23 13:04:26.768557] E [rpc-clnt.c:365:saved_frames_unwind] (-->
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7f950c64962b]
(-->
/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_unwind+0x1d1)[0x7f950c416eb1]
(-->
/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f950c416fce]
(-->
/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x94)[0x7f950c418654]
(-->
/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_notify+0x288)[0x7f950c419138]
) 0-prism-client-0: forced unwinding frame type(GF-DUMP) op(NULL(2))
called at 2017-05-23 13:03:44.763233 (xid=0x63a8)
[2017-05-23 13:04:26.768581] W [rpc-clnt-ping.c:202:rpc_clnt_ping_cbk]
0-prism-client-0: socket disconnected
[2017-05-23 13:04:26.768625] W [MSGID: 114031]
[client-rpc-fops.c:503:client3_3_stat_cbk] 0-prism-client-0: remote
operation failed [Transport endpoint is not connected]
[2017-05-23 13:04:26.768682] W [fuse-bridge.c:767:fuse_attr_cbk]
0-glusterfs-fuse: 30770: STAT() /gradle-metrics/hulo/36 => -1 (Transport
endpoint is not connected)
[2017-05-23 13:04:26.779347] W [fuse-bridge.c:767:fuse_attr_cbk]
0-glusterfs-fuse: 30813: STAT() /gradle-metrics/hulo => -1 (Transport
endpoint is not connected)
[2017-05-23 13:04:26.788802] W [fuse-bridge.c:767:fuse_attr_cbk]
0-glusterfs-fuse: 30855: STAT() /gradle-metrics/hulo => -1 (Transport
endpoint is not connected)
[2017-05-23 13:04:26.792743] W [fuse-bridge.c:767:fuse_attr_cbk]
0-glusterfs-fuse: 30897: STAT() /gradle-metrics/hulo => -1 (Transport
endpoint is not connected)
[2017-05-23 13:04:26.797661] W [fuse-bridge.c:767:fuse_attr_cbk]
0-glusterfs-fuse: 30940: STAT() /gradle-metrics => -1 (Transport endpoint
is not connected)
[2017-05-23 13:04:26.800631] W [fuse-bridge.c:767:fuse_attr_cbk]
0-glusterfs-fuse: 30982: STAT() /gradle-metrics => -1 (Transport endpoint
is not connected)
[2017-05-23 13:04:26.802215] W [MSGID: 114031]
[client-rpc-fops.c:2928:client3_3_lookup_cbk] 0-prism-client-0: remote
operation failed. Path: / (----0001) [Transport
endpoint is not connected]
[2017-05-23 13:04:26.804198] W [fuse-bridge.c:767:fuse_attr_cbk]
0-glusterfs-fuse: 31025: L

Re: [Gluster-users] Transport endpoint is not connected

2016-11-10 Thread Joe Julian
Your first step is to look at you client logs.

On November 10, 2016 2:31:02 AM PST, Cory Sanders 
 wrote:
>We removed a server from our cluster: node4
>
>
>Now, on node1,  when I type df -h
>I get this:
>
>root@node1:/mnt/pve/machines# df -h
>
>df: `/mnt/pve/machines0': Transport endpoint is not connected
>
>typing # mount
>
>Produced this:
>
>node4:machines0 on /mnt/pve/machines0 type fuse.glusterfs
>(rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)
>
>
>I did this: # umount /mnt/pve/machines0
>
>And now a df -h produces nothing.  The screen just hangs there with no
>information.
>
>The same on node0 and node3.  On node0 I did not unmount anything and I
>get this:
>
>root@node0:/mnt/pve/machines1# df -h
>df: `/mnt/pve/machines0': Transport endpoint is not connected
>
>node0 mount entry is this:
>
>
>node4:machines0 on /mnt/pve/machines0 type fuse.glusterfs
>(rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)
>
>node3 mount entry is this:
>
>
>node4:machines0 on /mnt/pve/machines0 type fuse.glusterfs
>(rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)
>
>
>
>My Load Averages are in the 8s and should be in the 1s
>
>Thanks.
>
>
>
>
>
>
>
>
>
>
>
>
>___
>Gluster-users mailing list
>Gluster-users@gluster.org
>http://www.gluster.org/mailman/listinfo/gluster-users

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] Transport endpoint is not connected

2016-11-10 Thread Cory Sanders
We removed a server from our cluster: node4


Now, on node1,  when I type df -h
I get this:

root@node1:/mnt/pve/machines# df -h

df: `/mnt/pve/machines0': Transport endpoint is not connected

typing # mount

Produced this:

node4:machines0 on /mnt/pve/machines0 type fuse.glusterfs 
(rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)


I did this: # umount /mnt/pve/machines0

And now a df -h produces nothing.  The screen just hangs there with no 
information.

The same on node0 and node3.  On node0 I did not unmount anything and I get 
this:

root@node0:/mnt/pve/machines1# df -h
df: `/mnt/pve/machines0': Transport endpoint is not connected

node0 mount entry is this:


node4:machines0 on /mnt/pve/machines0 type fuse.glusterfs 
(rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)

node3 mount entry is this:


node4:machines0 on /mnt/pve/machines0 type fuse.glusterfs 
(rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)



My Load Averages are in the 8s and should be in the 1s

Thanks.








___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] Transport endpoint is not connected

2012-07-12 Thread Ivan Dimitrov

Hi group,
I'm in production with gluster for the last 2 weeks. No problems until 
today.
As of today I've got the "Transport endpoint is not connected" problem 
on the client, maybe once every hour.

df: `/services/users/6': Transport endpoint is not connected

Here is my setup:
I have 1 Client and 2 Servers with 2 Disks each for bricks. Glusterfs 
3.3 compiled from source.


# gluster volume info

Volume Name: freecloud
Type: Distributed-Replicate
Volume ID: 1cf4804f-12aa-4cd1-a892-cec69fc2cf22
Status: Started
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: XX.25.137.252:/mnt/35be42b4-afb3-48a2-8b3c-17a422fd1e15
Brick2: YY.40.3.216:/mnt/7ee4f117-8aee-4cae-b08c-5e441b703886
Brick3: XX.25.137.252:/mnt/9ee7c816-085d-4c5c-9276-fd3dadac6c72
Brick4: YY.40.3.216:/mnt/311399bc-4d55-445d-8480-286c56cf493e
Options Reconfigured:
cluster.self-heal-daemon: on
performance.cache-size: 256MB
performance.io-thread-count: 32
features.quota: on

Quota is ON but not used
-

# gluster volume status all detail
Status of volume: freecloud
--
Brick: Brick 
XX.25.137.252:/mnt/35be42b4-afb3-48a2-8b3c-17a422fd1e15

Port : 24009
Online   : Y
Pid  : 29221
File System  : xfs
Device   : /dev/sdd1
Mount Options: rw
Inode Size   : 256
Disk Space Free  : 659.7GB
Total Disk Space : 698.3GB
Inode Count  : 732571968
Free Inodes  : 730418928
--
Brick: Brick 
YY.40.3.216:/mnt/7ee4f117-8aee-4cae-b08c-5e441b703886

Port : 24009
Online   : Y
Pid  : 15496
File System  : xfs
Device   : /dev/sdc1
Mount Options: rw
Inode Size   : 256
Disk Space Free  : 659.7GB
Total Disk Space : 698.3GB
Inode Count  : 732571968
Free Inodes  : 730410396
--
Brick: Brick 
XX.25.137.252:/mnt/9ee7c816-085d-4c5c-9276-fd3dadac6c72

Port : 24010
Online   : Y
Pid  : 29227
File System  : xfs
Device   : /dev/sdc1
Mount Options: rw
Inode Size   : 256
Disk Space Free  : 659.9GB
Total Disk Space : 698.3GB
Inode Count  : 732571968
Free Inodes  : 730417864
--
Brick: Brick 
YY.40.3.216:/mnt/311399bc-4d55-445d-8480-286c56cf493e

Port : 24010
Online   : Y
Pid  : 15502
File System  : xfs
Device   : /dev/sdb1
Mount Options: rw
Inode Size   : 256
Disk Space Free  : 659.9GB
Total Disk Space : 698.3GB
Inode Count  : 732571968
Free Inodes  : 730409337


On server1 I mount the volume and start copying files to it. Server1 is 
used like storage.


209.25.137.252:freecloud  1.4T   78G  1.3T   6% 
/home/freecloud


One thing to mention is that I have a large list of subdirectories in 
the main directory and the list keeps getting bigger.

client1# ls | wc -l
42424

---
I have one client server that mounts glusterfs and uses the files 
directly as the files are for low traffic web sites. On the client, 
there is no gluster daemon, just the mount.


client1# mount -t glusterfs rscloud1.domain.net:/freecloud 
/services/users/6/


This all worked fine for the last 2-3 weeks. Here is a log from the 
crash client1:/var/log/glusterfs/services-users-6-.log


pending frames:
frame : type(1) op(RENAME)
frame : type(1) op(RENAME)
frame : type(1) op(RENAME)
frame : type(1) op(RENAME)

patchset: git://git.gluster.com/glusterfs.git
signal received: 11
time of crash: 2012-07-12 14:51:01
configuration details:
argp 1
backtrace 1
dlfcn 1
fdatasync 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.3.0
/lib/x86_64-linux-gnu/libc.so.6(+0x32480)[0x7f1e0e9f0480]
/services/glusterfs//lib/libglusterfs.so.0(uuid_unpack+0x0)[0x7f1e0f79d760]
/services/glusterfs//lib/libglusterfs.so.0(+0x4c526)[0x7f1e0f79d526]
/services/glusterfs//lib/libglusterfs.so.0(uuid_utoa+0x26)[0x7f1e0f77ca66]
/services/glusterfs//lib/glusterfs/3.3.0/xlator/features/quota.so(quota_rename_cbk+0x308)[0x7f1e09b940c8]
/services/glusterfs//lib/glusterfs/3.3.0/xlator/cluster/distribute.so(dht_rename_unlink_cbk+0x454)[0x7f1e09dad264]
/services/glusterfs//lib/glusterfs/3.3.0/xlator/cluster/replicate.so(afr_unlink_unwind+0xf7)[0x7f1e09ff23c7]
/services/glusterfs//lib/glusterfs/3.3.0/xlator/cluster/replicate.so(afr_unlink_wind_cbk+0xb6)[0x7f1e09ff43d6]
/services/glusterfs//lib/glusterfs/3.3.0/xlator/protocol/cli

[Gluster-users] Transport endpoint is not connected

2012-03-20 Thread Holger Steinhaus
I am using the rpms from the epel repository (currently 3.2.5-8.el6) on
Scientific Linux 6.2. My test setup consists of two machines, running
several volumes with replica=2. I am experiencing a lot of trouble
currently, especially I/O errors on random files if using the fuse
client after one machine was temporarily down.

The first strange thing in the logfiles is a bunch of the following
messages (>500 MB/week) in etc-glusterfs-glusterd.vol.log on one node
(192.168.2.9). There are no messages on the other node.

Firewall and SELinux are disabled.

Any suggestions?

Regards,
  Holger

[2012-03-20 19:26:46.347926] E [socket.c:2080:socket_connect]
0-management: connection attempt failed (Connection refused)
[2012-03-20 19:26:46.347958] E [socket.c:2080:socket_connect]
0-management: connection attempt failed (Connection refused)
[2012-03-20 19:26:46.347987] E [socket.c:2080:socket_connect]
0-management: connection attempt failed (Connection refused)
[2012-03-20 19:26:46.348016] E [socket.c:2080:socket_connect]
0-management: connection attempt failed (Connection refused)
[2012-03-20 19:26:46.348045] E [socket.c:2080:socket_connect]
0-management: connection attempt failed (Connection refused)
[2012-03-20 19:26:46.348073] E [socket.c:2080:socket_connect]
0-management: connection attempt failed (Connection refused)
[2012-03-20 19:26:46.758118] W
[socket.c:1494:__socket_proto_state_machine] 0-socket.management:
reading from socket failed. Error (Transport endpoint is not connected),
peer (192.168.2.4:988)
[2012-03-20 19:26:46.759507] W
[socket.c:1494:__socket_proto_state_machine] 0-socket.management:
reading from socket failed. Error (Transport endpoint is not connected),
peer (192.168.2.4:987)
[2012-03-20 19:26:47.117369] W
[socket.c:1494:__socket_proto_state_machine] 0-socket.management:
reading from socket failed. Error (Transport endpoint is not connected),
peer (192.168.2.9:780)
[2012-03-20 19:26:47.122469] W
[socket.c:1494:__socket_proto_state_machine] 0-socket.management:
reading from socket failed. Error (Transport endpoint is not connected),
peer (192.168.2.9:761)
[2012-03-20 19:26:47.359062] W
[socket.c:1494:__socket_proto_state_machine] 0-socket.management:
reading from socket failed. Error (Transport endpoint is not connected),
peer (192.168.2.8:997)
[2012-03-20 19:26:47.651679] W
[socket.c:1494:__socket_proto_state_machine] 0-socket.management:
reading from socket failed. Error (Transport endpoint is not connected),
peer (192.168.2.9:984)
[2012-03-20 19:26:48.37491] W
[socket.c:1494:__socket_proto_state_machine] 0-socket.management:
reading from socket failed. Error (Transport endpoint is not connected),
peer (192.168.2.7:1017)
[2012-03-20 19:26:48.127588] W
[socket.c:1494:__socket_proto_state_machine] 0-socket.management:
reading from socket failed. Error (Transport endpoint is not connected),
peer (192.168.2.9:977)
[2012-03-20 19:26:48.133402] W
[socket.c:1494:__socket_proto_state_machine] 0-socket.management:
reading from socket failed. Error (Transport endpoint is not connected),
peer (192.168.2.9:976)
[2012-03-20 19:26:48.170200] W
[socket.c:1494:__socket_proto_state_machine] 0-socket.management:
reading from socket failed. Error (Transport endpoint is not connected),
peer (192.168.2.9:975)
[2012-03-20 19:26:48.187941] W
[socket.c:1494:__socket_proto_state_machine] 0-socket.management:
reading from socket failed. Error (Transport endpoint is not connected),
peer (192.168.2.7:1016)
[2012-03-20 19:26:48.193975] W
[socket.c:1494:__socket_proto_state_machine] 0-socket.management:
reading from socket failed. Error (Transport endpoint is not connected),
peer (192.168.2.7:1015)
[2012-03-20 19:26:48.198601] W
[socket.c:1494:__socket_proto_state_machine] 0-socket.management:
reading from socket failed. Error (Transport endpoint is not connected),
peer (192.168.2.7:1013)
[2012-03-20 19:26:48.203602] W
[socket.c:1494:__socket_proto_state_machine] 0-socket.management:
reading from socket failed. Error (Transport endpoint is not connected),
peer (192.168.2.7:1012)
[2012-03-20 19:26:48.208457] W
[socket.c:1494:__socket_proto_state_machine] 0-socket.management:
reading from socket failed. Error (Transport endpoint is not connected),
peer (192.168.2.7:1006)
[2012-03-20 19:26:48.278537] W
[socket.c:1494:__socket_proto_state_machine] 0-socket.management:
reading from socket failed. Error (Transport endpoint is not connected),
peer (192.168.2.7:1005)
[2012-03-20 19:26:48.794218] W
[socket.c:1494:__socket_proto_state_machine] 0-socket.management:
reading from socket failed. Error (Transport endpoint is not connected),
peer (192.168.2.4:986)
[2012-03-20 19:26:48.861831] W
[socket.c:1494:__socket_proto_state_machine] 0-socket.management:
reading from socket failed. Error (Transport endpoint is not connected),
peer (192.168.2.9:974)
[2012-03-20 19:26:48.947367] W
[socket.c:1494:__socket_proto_state_machine] 0-socket.management:
reading from socket 

Re: [Gluster-users] Transport endpoint is not connected

2011-12-07 Thread William L. Sebok
On Wed, Dec 07, 2011 at 11:16:54PM +0530, Pranith Kumar K wrote:
> William,
>In which log do you see the messages?.
> 
> Pranith

/var/log/etc-glusterfs-glusterd.vol.log on any of the servers.

I now have determined that I need to add a nfsvers=3 on mount on the client I
was using as a test.  The client is running Scientific Linux 6.1 and probably
was defaulting to nfs 4.  That was the source of my nfs timeouts.  Sorry about
that.
-
Bill Sebok  Computer Software Manager, Univ. of Maryland, Astronomy
Internet: w...@astro.umd.eduURL: http://furo.astro.umd.edu/
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] Transport endpoint is not connected

2011-12-07 Thread Pranith Kumar K

On 12/07/2011 11:07 PM, William L. Sebok wrote:

This looks like the messages that were worrying me.  I'm still getting timeouts
on nfs mounts.

Bill Sebok  Computer Software Manager, Univ. of Maryland, Astronomy
Internet: w...@astro.umd.eduURL: http://furo.astro.umd.edu/

On Wed, Dec 07, 2011 at 10:33:58AM -0600, Matt Weil wrote:

All,

Is this normal?  Can this be corrected?

Thanks in advance for your responses.


[2011-12-06 17:56:59.48153] I 
[glusterd-rpc-ops.c:1243:glusterd3_1_commit_op_cbk] 0-glusterd: Received ACC 
from uuid: fc5e6659-a90a-4e25-a3a7-11de9a7de81d
[2011-12-06 17:56:59.48811] I 
[glusterd-rpc-ops.c:1243:glusterd3_1_commit_op_cbk] 0-glusterd: Received ACC 
from uuid: d1216f43-2ae6-42bd-a597-c0ab6a101d6b
[2011-12-06 17:56:59.49073] I 
[glusterd-rpc-ops.c:1243:glusterd3_1_commit_op_cbk] 0-glusterd: Received ACC 
from uuid: 4bf94e6e-69ca-4d51-9a85-c1d98a95325d
[2011-12-06 17:56:59.49137] I 
[glusterd-rpc-ops.c:1243:glusterd3_1_commit_op_cbk] 0-glusterd: Received ACC 
from uuid: 154cdbb2-6a53-449d-b6e3-bfd84091d90c
[2011-12-06 17:56:59.49567] I 
[glusterd-rpc-ops.c:1243:glusterd3_1_commit_op_cbk] 0-glusterd: Received ACC 
from uuid: 4c9d68d6-d573-43d0-aec5-07173c1699d0
[2011-12-06 17:56:59.49803] I 
[glusterd-rpc-ops.c:818:glusterd3_1_cluster_unlock_cbk] 0-glusterd: Received 
ACC from uuid: fc5e6659-a90a-4e25-a3a7-11de9a7de81d

--- etc.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users

William,
   In which log do you see the messages?.

Pranith
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] Transport endpoint is not connected

2011-12-07 Thread William L. Sebok
This looks like the messages that were worrying me.  I'm still getting timeouts
on nfs mounts.

Bill Sebok  Computer Software Manager, Univ. of Maryland, Astronomy
Internet: w...@astro.umd.eduURL: http://furo.astro.umd.edu/

On Wed, Dec 07, 2011 at 10:33:58AM -0600, Matt Weil wrote:
> 
> All,
> 
> Is this normal?  Can this be corrected?
> 
> Thanks in advance for your responses.
> 
> >[2011-12-06 17:56:59.48153] I 
> >[glusterd-rpc-ops.c:1243:glusterd3_1_commit_op_cbk] 0-glusterd: Received ACC 
> >from uuid: fc5e6659-a90a-4e25-a3a7-11de9a7de81d
> >[2011-12-06 17:56:59.48811] I 
> >[glusterd-rpc-ops.c:1243:glusterd3_1_commit_op_cbk] 0-glusterd: Received ACC 
> >from uuid: d1216f43-2ae6-42bd-a597-c0ab6a101d6b
> >[2011-12-06 17:56:59.49073] I 
> >[glusterd-rpc-ops.c:1243:glusterd3_1_commit_op_cbk] 0-glusterd: Received ACC 
> >from uuid: 4bf94e6e-69ca-4d51-9a85-c1d98a95325d
> >[2011-12-06 17:56:59.49137] I 
> >[glusterd-rpc-ops.c:1243:glusterd3_1_commit_op_cbk] 0-glusterd: Received ACC 
> >from uuid: 154cdbb2-6a53-449d-b6e3-bfd84091d90c
> >[2011-12-06 17:56:59.49567] I 
> >[glusterd-rpc-ops.c:1243:glusterd3_1_commit_op_cbk] 0-glusterd: Received ACC 
> >from uuid: 4c9d68d6-d573-43d0-aec5-07173c1699d0
> >[2011-12-06 17:56:59.49803] I 
> >[glusterd-rpc-ops.c:818:glusterd3_1_cluster_unlock_cbk] 0-glusterd: Received 
> >ACC from uuid: fc5e6659-a90a-4e25-a3a7-11de9a7de81d

--- etc.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] Transport endpoint is not connected

2011-12-07 Thread Pranith Kumar K

On 12/07/2011 10:03 PM, Matt Weil wrote:


All,

Is this normal?  Can this be corrected?

Thanks in advance for your responses.

[2011-12-06 17:56:59.48153] I 
[glusterd-rpc-ops.c:1243:glusterd3_1_commit_op_cbk] 0-glusterd: 
Received ACC from uuid: fc5e6659-a90a-4e25-a3a7-11de9a7de81d
[2011-12-06 17:56:59.48811] I 
[glusterd-rpc-ops.c:1243:glusterd3_1_commit_op_cbk] 0-glusterd: 
Received ACC from uuid: d1216f43-2ae6-42bd-a597-c0ab6a101d6b
[2011-12-06 17:56:59.49073] I 
[glusterd-rpc-ops.c:1243:glusterd3_1_commit_op_cbk] 0-glusterd: 
Received ACC from uuid: 4bf94e6e-69ca-4d51-9a85-c1d98a95325d
[2011-12-06 17:56:59.49137] I 
[glusterd-rpc-ops.c:1243:glusterd3_1_commit_op_cbk] 0-glusterd: 
Received ACC from uuid: 154cdbb2-6a53-449d-b6e3-bfd84091d90c
[2011-12-06 17:56:59.49567] I 
[glusterd-rpc-ops.c:1243:glusterd3_1_commit_op_cbk] 0-glusterd: 
Received ACC from uuid: 4c9d68d6-d573-43d0-aec5-07173c1699d0
[2011-12-06 17:56:59.49803] I 
[glusterd-rpc-ops.c:818:glusterd3_1_cluster_unlock_cbk] 0-glusterd: 
Received ACC from uuid: fc5e6659-a90a-4e25-a3a7-11de9a7de81d
[2011-12-06 17:56:59.49850] I 
[glusterd-rpc-ops.c:818:glusterd3_1_cluster_unlock_cbk] 0-glusterd: 
Received ACC from uuid: d1216f43-2ae6-42bd-a597-c0ab6a101d6b
[2011-12-06 17:56:59.50228] I 
[glusterd-rpc-ops.c:818:glusterd3_1_cluster_unlock_cbk] 0-glusterd: 
Received ACC from uuid: 4bf94e6e-69ca-4d51-9a85-c1d98a95325d
[2011-12-06 17:56:59.50285] I 
[glusterd-rpc-ops.c:818:glusterd3_1_cluster_unlock_cbk] 0-glusterd: 
Received ACC from uuid: 154cdbb2-6a53-449d-b6e3-bfd84091d90c
[2011-12-06 17:56:59.50346] I 
[glusterd-rpc-ops.c:818:glusterd3_1_cluster_unlock_cbk] 0-glusterd: 
Received ACC from uuid: 4c9d68d6-d573-43d0-aec5-07173c1699d0
[2011-12-06 17:56:59.50375] I 
[glusterd-op-sm.c:7250:glusterd_op_txn_complete] 0-glusterd: Cleared 
local lock
[2011-12-06 17:56:59.52105] W 
[socket.c:1494:__socket_proto_state_machine] 0-socket.management: 
reading from socket failed. Error (Transport endpoint is not 
connected), peer (127.0.0.1:694)
[2011-12-06 17:56:59.168257] W 
[socket.c:1494:__socket_proto_state_machine] 0-socket.management: 
reading from socket failed. Error (Transport endpoint is not 
connected), peer (10.0.30.11:730)
[2011-12-06 17:56:59.168357] W 
[socket.c:1494:__socket_proto_state_machine] 0-socket.management: 
reading from socket failed. Error (Transport endpoint is not 
connected), peer (10.0.30.11:728)
[2011-12-06 17:56:59.168441] W 
[socket.c:1494:__socket_proto_state_machine] 0-socket.management: 
reading from socket failed. Error (Transport endpoint is not 
connected), peer (10.0.30.11:726)
[2011-12-06 17:56:59.168503] W 
[socket.c:1494:__socket_proto_state_machine] 0-socket.management: 
reading from socket failed. Error (Transport endpoint is not 
connected), peer (10.0.30.11:724)
[2011-12-06 17:56:59.168591] W 
[socket.c:1494:__socket_proto_state_machine] 0-socket.management: 
reading from socket failed. Error (Transport endpoint is not 
connected), peer (10.0.30.11:722)
[2011-12-06 17:56:59.168672] W 
[socket.c:1494:__socket_proto_state_machine] 0-socket.management: 
reading from socket failed. Error (Transport endpoint is not 
connected), peer (10.0.30.11:720)
[2011-12-06 17:56:59.169287] W 
[socket.c:1494:__socket_proto_state_machine] 0-socket.management: 
reading from socket failed. Error (Transport endpoint is not 
connected), peer (10.0.30.15:730)
[2011-12-06 17:56:59.169359] W 
[socket.c:1494:__socket_proto_state_machine] 0-socket.management: 
reading from socket failed. Error (Transport endpoint is not 
connected), peer (10.0.30.15:728)
[2011-12-06 17:56:59.169398] W 
[socket.c:1494:__socket_proto_state_machine] 0-socket.management: 
reading from socket failed. Error (Transport endpoint is not 
connected), peer (10.0.30.13:730)
[2011-12-06 17:56:59.169438] W 
[socket.c:1494:__socket_proto_state_machine] 0-socket.management: 
reading from socket failed. Error (Transport endpoint is not 
connected), peer (10.0.30.13:728)
[2011-12-06 17:56:59.169476] W 
[socket.c:1494:__socket_proto_state_machine] 0-socket.management: 
reading from socket failed. Error (Transport endpoint is not 
connected), peer (10.0.30.15:726)
[2011-12-06 17:56:59.169529] W 
[socket.c:1494:__socket_proto_state_machine] 0-socket.management: 
reading from socket failed. Error (Transport endpoint is not 
connected), peer (10.0.30.13:726)
[2011-12-06 17:56:59.169581] W 
[socket.c:1494:__socket_proto_state_machine] 0-socket.management: 
reading from socket failed. Error (Transport endpoint is not 
connected), peer (10.0.30.13:724)


___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Dont worry about it. We fixed this message in glusterd for future releases.

Pranith.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


[Gluster-users] Transport endpoint is not connected

2011-12-07 Thread Matt Weil


All,

Is this normal?  Can this be corrected?

Thanks in advance for your responses.


[2011-12-06 17:56:59.48153] I 
[glusterd-rpc-ops.c:1243:glusterd3_1_commit_op_cbk] 0-glusterd: Received ACC 
from uuid: fc5e6659-a90a-4e25-a3a7-11de9a7de81d
[2011-12-06 17:56:59.48811] I 
[glusterd-rpc-ops.c:1243:glusterd3_1_commit_op_cbk] 0-glusterd: Received ACC 
from uuid: d1216f43-2ae6-42bd-a597-c0ab6a101d6b
[2011-12-06 17:56:59.49073] I 
[glusterd-rpc-ops.c:1243:glusterd3_1_commit_op_cbk] 0-glusterd: Received ACC 
from uuid: 4bf94e6e-69ca-4d51-9a85-c1d98a95325d
[2011-12-06 17:56:59.49137] I 
[glusterd-rpc-ops.c:1243:glusterd3_1_commit_op_cbk] 0-glusterd: Received ACC 
from uuid: 154cdbb2-6a53-449d-b6e3-bfd84091d90c
[2011-12-06 17:56:59.49567] I 
[glusterd-rpc-ops.c:1243:glusterd3_1_commit_op_cbk] 0-glusterd: Received ACC 
from uuid: 4c9d68d6-d573-43d0-aec5-07173c1699d0
[2011-12-06 17:56:59.49803] I 
[glusterd-rpc-ops.c:818:glusterd3_1_cluster_unlock_cbk] 0-glusterd: Received 
ACC from uuid: fc5e6659-a90a-4e25-a3a7-11de9a7de81d
[2011-12-06 17:56:59.49850] I 
[glusterd-rpc-ops.c:818:glusterd3_1_cluster_unlock_cbk] 0-glusterd: Received 
ACC from uuid: d1216f43-2ae6-42bd-a597-c0ab6a101d6b
[2011-12-06 17:56:59.50228] I 
[glusterd-rpc-ops.c:818:glusterd3_1_cluster_unlock_cbk] 0-glusterd: Received 
ACC from uuid: 4bf94e6e-69ca-4d51-9a85-c1d98a95325d
[2011-12-06 17:56:59.50285] I 
[glusterd-rpc-ops.c:818:glusterd3_1_cluster_unlock_cbk] 0-glusterd: Received 
ACC from uuid: 154cdbb2-6a53-449d-b6e3-bfd84091d90c
[2011-12-06 17:56:59.50346] I 
[glusterd-rpc-ops.c:818:glusterd3_1_cluster_unlock_cbk] 0-glusterd: Received 
ACC from uuid: 4c9d68d6-d573-43d0-aec5-07173c1699d0
[2011-12-06 17:56:59.50375] I [glusterd-op-sm.c:7250:glusterd_op_txn_complete] 
0-glusterd: Cleared local lock
[2011-12-06 17:56:59.52105] W [socket.c:1494:__socket_proto_state_machine] 
0-socket.management: reading from socket failed. Error (Transport endpoint is 
not connected), peer (127.0.0.1:694)
[2011-12-06 17:56:59.168257] W [socket.c:1494:__socket_proto_state_machine] 
0-socket.management: reading from socket failed. Error (Transport endpoint is 
not connected), peer (10.0.30.11:730)
[2011-12-06 17:56:59.168357] W [socket.c:1494:__socket_proto_state_machine] 
0-socket.management: reading from socket failed. Error (Transport endpoint is 
not connected), peer (10.0.30.11:728)
[2011-12-06 17:56:59.168441] W [socket.c:1494:__socket_proto_state_machine] 
0-socket.management: reading from socket failed. Error (Transport endpoint is 
not connected), peer (10.0.30.11:726)
[2011-12-06 17:56:59.168503] W [socket.c:1494:__socket_proto_state_machine] 
0-socket.management: reading from socket failed. Error (Transport endpoint is 
not connected), peer (10.0.30.11:724)
[2011-12-06 17:56:59.168591] W [socket.c:1494:__socket_proto_state_machine] 
0-socket.management: reading from socket failed. Error (Transport endpoint is 
not connected), peer (10.0.30.11:722)
[2011-12-06 17:56:59.168672] W [socket.c:1494:__socket_proto_state_machine] 
0-socket.management: reading from socket failed. Error (Transport endpoint is 
not connected), peer (10.0.30.11:720)
[2011-12-06 17:56:59.169287] W [socket.c:1494:__socket_proto_state_machine] 
0-socket.management: reading from socket failed. Error (Transport endpoint is 
not connected), peer (10.0.30.15:730)
[2011-12-06 17:56:59.169359] W [socket.c:1494:__socket_proto_state_machine] 
0-socket.management: reading from socket failed. Error (Transport endpoint is 
not connected), peer (10.0.30.15:728)
[2011-12-06 17:56:59.169398] W [socket.c:1494:__socket_proto_state_machine] 
0-socket.management: reading from socket failed. Error (Transport endpoint is 
not connected), peer (10.0.30.13:730)
[2011-12-06 17:56:59.169438] W [socket.c:1494:__socket_proto_state_machine] 
0-socket.management: reading from socket failed. Error (Transport endpoint is 
not connected), peer (10.0.30.13:728)
[2011-12-06 17:56:59.169476] W [socket.c:1494:__socket_proto_state_machine] 
0-socket.management: reading from socket failed. Error (Transport endpoint is 
not connected), peer (10.0.30.15:726)
[2011-12-06 17:56:59.169529] W [socket.c:1494:__socket_proto_state_machine] 
0-socket.management: reading from socket failed. Error (Transport endpoint is 
not connected), peer (10.0.30.13:726)
[2011-12-06 17:56:59.169581] W [socket.c:1494:__socket_proto_state_machine] 
0-socket.management: reading from socket failed. Error (Transport endpoint is 
not connected), peer (10.0.30.13:724)


___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


[Gluster-users] TRansport endpoint is not connected.

2011-07-06 Thread Dheeraj Kv

 
Hi
 
I am facing an issue while mounting GlusterFS3.2 on client.
Its giving  "Transport endpoint is not connected" while doing df.
>From the logs I m getting the following error.


[2011-07-06 16:07:41.936275] E [rdma.c:3414:rdma_handle_failed_send_completion] 
0-rpc-transport/rdma: send work request on `mthca1' returned error wc.status = 
12, wc.vendor_err = 129, post->buf = 0x32bc000, wc.byte_len = 0, post->reused = 
328
[2011-07-06 16:07:41.936315] E [rdma.c:3422:rdma_handle_failed_send_completion] 
0-rdma: connection between client and server not working. check by running 
'ibv_srq_pingpong'. also make sure subnet manager is running (eg: 'opensm'), or 
check if rdma port is valid (or active) by running 'ibv_devinfo'. contact 
Gluster Support Team if the problem persists.
[2011-07-06 16:07:41.936731] E [rpc-clnt.c:338:saved_frames_unwind] 
(-->/opt/glusterfs/3.2.0/lib64/libgfrpc.so.0(rpc_clnt_notify+0xb9) 
[0x7fc383f9e5f9] 
(-->/opt/glusterfs/3.2.0/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7e) 
[0x7fc383f9dd9e] 
(-->/opt/glusterfs/3.2.0/lib64/libgfrpc.so.0(saved_frames_destroy+0xe) 
[0x7fc383f9dd0e]))) 0-crlgfs1-client-1: forced unwinding frame type(GF-DUMP) 
op(DUMP(1)) called at 2011-07-06 16:07:40.883471
[2011-07-06 16:07:41.936760] W 
[client-handshake.c:1242:client_dump_version_cbk] 0-crlgfs1-client-1: received 
RPC status error
[2011-07-06 16:07:41.936779] I [client.c:1883:client_rpc_notify] 
0-crlgfs1-client-1: disconnected


OS: FC12: 
kernel version : 2.6.32.21-168.fc12.x86_64
ofed- 1.5.1


Thanks
Dheeraj K V
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


[Gluster-users] 'Transport endpoint is not connected' occurs while running long jobs

2011-04-08 Thread phil cryer
I'm having failures with long running processes. I'm running glusterfs
3.1.2 (glusterfs 3.1.2 built on Jan 16 2011 18:14:56 - Repository
revision: v3.1.1-64-gf2a067c) on Debian 6 (sqeeze) and it's been
stable for use, serving images via http - but when I issue a long
running task, for example last night I ran a google sitemap generator,
other times it was a chmod -R across a section of directories, it will
eventually crash with the errors below, Transport endpoint is not
connected. Then I have to stop glusterfsd, killall remaining
glusterfs/glusterfsd apps running, unmount the gluster share, restart
glusterfsd and then remount the share. What can I do to fix this? I
wanted to have the sitemap run once a week, and since you can throttle
it I thought it wouldn't be as heavy handed as chown or chmod would
be, but no, it crashes it the same way.

[...]
2011/04/08 09:14:15 [crit] 24137#0: *8533 open()
"/mnt/glusterfs/www/r/recordsofgeneral04lond/recordso
fgeneral04lond_bw.pdf" failed (107: Transport endpoint is not
connected), client: 128.128.164.174, ser
ver: cluster.biodiversitylibrary.org, request: "GET
/r/recordsofgeneral04lond/recordsofgeneral04lond_b
w.pdf HTTP/1.1", host: "cluster.biodiversitylibrary.org"
2011/04/08 09:14:23 [crit] 24137#0: *8534 open()
"/mnt/glusterfs/www/d/dieumbelliferenu00liro/dieumbel
liferenu00liro_djvu.txt" failed (107: Transport endpoint is not
connected), client: 128.128.164.174, s
erver: cluster.biodiversitylibrary.org, request: "GET
/d/dieumbelliferenu00liro/dieumbelliferenu00liro
_djvu.txt HTTP/1.0", host: "cluster.biodiversitylibrary.org"
2011/04/08 09:14:33 [crit] 24137#0: *8535 open()
"/mnt/glusterfs/www/j/justsbotanischer4601berl/justsb
otanischer4601berl_metasource.xml" failed (107: Transport endpoint is
not connected), client: 128.128.
164.174, server: cluster.biodiversitylibrary.org, request: "GET
/j/justsbotanischer4601berl/justsbotan
ischer4601berl_metasource.xml HTTP/1.1", host: "cluster.biodiversitylibrary.org"
2011/04/08 09:14:39 [crit] 24137#0: *8537 open()
"/mnt/glusterfs/www/r/recordsofindianm21indi/recordso
findianm21indi.gif" failed (107: Transport endpoint is not connected),
client: 128.128.164.174, server
: cluster.biodiversitylibrary.org, request: "GET
/r/recordsofindianm21indi/recordsofindianm21indi.gif
HTTP/1.1", host: "cluster.biodiversitylibrary.org"
2011/04/08 09:14:51 [crit] 24137#0: *8538 open()
"/mnt/glusterfs/www/r/recherchesdemorp00ameg/recherch
esdemorp00ameg_dc.xml" failed (107: Transport endpoint is not
connected), client: 128.128.164.174, ser
ver: cluster.biodiversitylibrary.org, request: "GET
/r/recherchesdemorp00ameg/recherchesdemorp00ameg_d
c.xml HTTP/1.1", host: "cluster.biodiversitylibrary.org"
2011/04/08 09:15:06 [crit] 24137#0: *8539 open()
"/mnt/glusterfs/www/j/journalfrdiega56stut/journalfrd
iega56stut.pdf" failed (107: Transport endpoint is not connected),
client: 128.128.164.174, server: cl
uster.biodiversitylibrary.org, request: "GET
/j/journalfrdiega56stut/journalfrdiega56stut.pdf HTTP/1.1
", host: "cluster.biodiversitylibrary.org"
2011/04/08 09:15:17 [crit] 24137#0: *8540 stat()
"/mnt/glusterfs/www/r/reisenundforschu32schr" failed
(107: Transport endpoint is not connected), client: 128.128.164.174,
server: cluster.biodiversitylibra
ry.org, request: "GET /r/reisenundforschu32schr/ HTTP/1.1", host:
"cluster.biodiversitylibrary.org"
2011/04/08 09:15:27 [crit] 24137#0: *8541 open()
"/mnt/glusterfs/www/r/recreativescienc01lond/recreati
vescienc01lond_bw.pdf" failed (107: Transport endpoint is not
connected), client: 128.128.164.174, ser
ver: cluster.biodiversitylibrary.org, request: "GET
/r/recreativescienc01lond/recreativescienc01lond_b
w.pdf HTTP/1.1", host: "cluster.biodiversitylibrary.org"
[...]

Thanks

P
-- 
http://philcryer.com
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] Transport endpoint is not connected - getfattr

2010-06-18 Thread phil cryer
So I'm working on this today, maybe I can simplify my issue: in
Glusterfs 3.0.4, I can create files and directories fine, I can delete
files, but not directories.  I'm running the server in DEBUG and it's
not saying anything.  For example, I want to delete
/mnt/glusterfs/www/new :

[23:12:03] [r...@clustr-01 /mnt]# mount -t glusterfs
/etc/glusterfs/glusterfs.vol /mnt/glusterfs -o log-level=DEBUG
[23:12:13] [r...@clustr-01 /mnt]# ls -al /mnt/glusterfs/www/ | grep new
drwxrwxrwx   3 www-data www-data 196608 2010-06-18 23:10 new
[23:12:26] [r...@clustr-01 /mnt]# rm -rf /mnt/glusterfs/www/new/
rm: cannot remove directory `/mnt/glusterfs/www/new/__MACOSX':
Transport endpoint is not connected

I'm running glusterfsd in another window in DEBUG, and it doesn't log
anything when this happens.  So I've already deleted the files in that
directory, I just can't remove the two remaining directories, new and
__MACOSX.  Again, I created these yesterday so I haven't made any
config changes between then and now, how can I figure out why this is
failing?

Thanks

P



On Thu, Jun 17, 2010 at 4:23 PM, phil cryer  wrote:
> I'm having problems removing directories, if I do a mv or if I do a rm
> I'll get an error like this:
>
> [00:57:57] [r...@clustr-01 /]# rm -rf /mnt/glusterfs/bhl/
> rm: cannot remove directory `/mnt/glusterfs/bhl': Transport endpoint
> is not connected
>
> EdWyse on IRC suggested I run getfattr -m "" on a few bricks, when I
> did I got various results (see below).  Is this a case where I can run
> something like backend-cleanup.sh or backend-xattr-sanitize.sh to fix,
> or is there a manual command?  We're around 45TB, so I don't have
> anywhere to copy the files off.  Thanks!
>
> [16:30:25] [r...@clustr-04 /root/bin]# getfattr -m "" /mnt/data04
> getfattr: Removing leading '/' from absolute path names
> # file: mnt/data04
> trusted.afr.clustr-04-1
> trusted.afr.clustr-04-10
> trusted.afr.clustr-04-11
> trusted.afr.clustr-04-12
> trusted.afr.clustr-04-13
> trusted.afr.clustr-04-14
> trusted.afr.clustr-04-15
> trusted.afr.clustr-04-16
> trusted.afr.clustr-04-17
> trusted.afr.clustr-04-18
> trusted.afr.clustr-04-19
> trusted.afr.clustr-04-2
> trusted.afr.clustr-04-20
> trusted.afr.clustr-04-21
> trusted.afr.clustr-04-22
> trusted.afr.clustr-04-23
> trusted.afr.clustr-04-24
> trusted.afr.clustr-04-3
> trusted.afr.clustr-04-4
> trusted.afr.clustr-04-5
> trusted.afr.clustr-04-6
> trusted.afr.clustr-04-7
> trusted.afr.clustr-04-8
> trusted.afr.clustr-04-9
> trusted.afr.clustr-05-1
> trusted.afr.clustr-05-10
> trusted.afr.clustr-05-11
> trusted.afr.clustr-05-12
> trusted.afr.clustr-05-13
> trusted.afr.clustr-05-14
> trusted.afr.clustr-05-15
> trusted.afr.clustr-05-16
> trusted.afr.clustr-05-17
> trusted.afr.clustr-05-18
> trusted.afr.clustr-05-19
> trusted.afr.clustr-05-2
> trusted.afr.clustr-05-20
> trusted.afr.clustr-05-21
> trusted.afr.clustr-05-22
> trusted.afr.clustr-05-23
> trusted.afr.clustr-05-24
> trusted.afr.clustr-05-3
> trusted.afr.clustr-05-4
> trusted.afr.clustr-05-5
> trusted.afr.clustr-05-6
> trusted.afr.clustr-05-7
> trusted.afr.clustr-05-8
> trusted.afr.clustr-05-9
> trusted.glusterfs.dht
> trusted.posix4.gen
>
> ---
> Another server
>
> [01:02:05] [r...@clustr-01 /]#  getfattr -m "" /mnt/data09
> getfattr: Removing leading '/' from absolute path names
> # file: mnt/data09
> trusted.afr.clustr-01-10
> trusted.afr.clustr-01-9
> trusted.glusterfs.dht
> trusted.glusterfs.test
> trusted.posix9.gen
>
>
> [00:43:14] [r...@clustr-01 /]#  getfattr -m "" /mnt/data04
> getfattr: Removing leading '/' from absolute path names
> # file: mnt/data04
> trusted.afr.clustr-01-3
> trusted.afr.clustr-01-4
> trusted.glusterfs.dht
> trusted.glusterfs.test
> trusted.posix4.gen
>



-- 
http://philcryer.com
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


[Gluster-users] Transport endpoint is not connected - getfattr

2010-06-17 Thread phil cryer
I'm having problems removing directories, if I do a mv or if I do a rm
I'll get an error like this:

[00:57:57] [r...@clustr-01 /]# rm -rf /mnt/glusterfs/bhl/
rm: cannot remove directory `/mnt/glusterfs/bhl': Transport endpoint
is not connected

EdWyse on IRC suggested I run getfattr -m "" on a few bricks, when I
did I got various results (see below).  Is this a case where I can run
something like backend-cleanup.sh or backend-xattr-sanitize.sh to fix,
or is there a manual command?  We're around 45TB, so I don't have
anywhere to copy the files off.  Thanks!

[16:30:25] [r...@clustr-04 /root/bin]# getfattr -m "" /mnt/data04
getfattr: Removing leading '/' from absolute path names
# file: mnt/data04
trusted.afr.clustr-04-1
trusted.afr.clustr-04-10
trusted.afr.clustr-04-11
trusted.afr.clustr-04-12
trusted.afr.clustr-04-13
trusted.afr.clustr-04-14
trusted.afr.clustr-04-15
trusted.afr.clustr-04-16
trusted.afr.clustr-04-17
trusted.afr.clustr-04-18
trusted.afr.clustr-04-19
trusted.afr.clustr-04-2
trusted.afr.clustr-04-20
trusted.afr.clustr-04-21
trusted.afr.clustr-04-22
trusted.afr.clustr-04-23
trusted.afr.clustr-04-24
trusted.afr.clustr-04-3
trusted.afr.clustr-04-4
trusted.afr.clustr-04-5
trusted.afr.clustr-04-6
trusted.afr.clustr-04-7
trusted.afr.clustr-04-8
trusted.afr.clustr-04-9
trusted.afr.clustr-05-1
trusted.afr.clustr-05-10
trusted.afr.clustr-05-11
trusted.afr.clustr-05-12
trusted.afr.clustr-05-13
trusted.afr.clustr-05-14
trusted.afr.clustr-05-15
trusted.afr.clustr-05-16
trusted.afr.clustr-05-17
trusted.afr.clustr-05-18
trusted.afr.clustr-05-19
trusted.afr.clustr-05-2
trusted.afr.clustr-05-20
trusted.afr.clustr-05-21
trusted.afr.clustr-05-22
trusted.afr.clustr-05-23
trusted.afr.clustr-05-24
trusted.afr.clustr-05-3
trusted.afr.clustr-05-4
trusted.afr.clustr-05-5
trusted.afr.clustr-05-6
trusted.afr.clustr-05-7
trusted.afr.clustr-05-8
trusted.afr.clustr-05-9
trusted.glusterfs.dht
trusted.posix4.gen

---
Another server

[01:02:05] [r...@clustr-01 /]#  getfattr -m "" /mnt/data09
getfattr: Removing leading '/' from absolute path names
# file: mnt/data09
trusted.afr.clustr-01-10
trusted.afr.clustr-01-9
trusted.glusterfs.dht
trusted.glusterfs.test
trusted.posix9.gen


[00:43:14] [r...@clustr-01 /]#  getfattr -m "" /mnt/data04
getfattr: Removing leading '/' from absolute path names
# file: mnt/data04
trusted.afr.clustr-01-3
trusted.afr.clustr-01-4
trusted.glusterfs.dht
trusted.glusterfs.test
trusted.posix4.gen
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users