[Gluster-users] Brick Disconnect from Gluster Volume

2019-08-29 Thread Ashayam Gupta
Hi All,
We are facing Brick Disconnect issue on our setup , please find more info
about the issue below:






*Gluster: 5.3-6 node distributed Cluster -No replication-2 Bricks Per
Node-Ubuntu 18.04*

*-Issue:*
1 of the Brick on 1 node keeps disconnecting , (after every n days)
not very frequently , but observed more than 3-4 times

Nothing much in glusterd or bricks logs
*GlusterD Logs:*
[2019-08-23 10:39:29.811950] E [MSGID: 101191]
[event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch
handler
[2019-08-23 10:39:29.815723] E [MSGID: 101191]
[event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch
handler
[2019-08-23 10:39:38.88] E [MSGID: 101191]
[event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch
handler
The message "E [MSGID: 101191]
[event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch
handler" repeated 77 times between [2019-08-23 10:39:38.88] an
d [2019-08-23 10:39:44.990156]

^^ Messages keeps pouring in the logs , this from time when we saw loss of
1 brick,nothing specail about the above logs , but this is what is there in
logs from the time we saw loss of the brick

*DataBrick Logs:*
The message "E [MSGID: 101191]
[event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch
handler" repeated 4788 times between [2019-08-23 10:39:44.993667]

^^ Same here we see this type to logs only


Would be helpful if we can get some pointers about the above issue.

Thanks
Ashayam
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] Issues with Geo-replication (GlusterFS 6.3 on Ubuntu 18.04)

2019-08-29 Thread Alexander Iliev

Hello dear GlusterFS users list,

I have been trying to set up geo-replication between two clusters for 
some time now. The desired state is (Cluster #1) being replicated to 
(Cluster #2).


Here are some details about the setup:

Cluster #1: three nodes connected via a local network (172.31.35.0/24), 
one replicated (3 replica) volume.


Cluster #2: three nodes connected via a local network (172.31.36.0/24), 
one replicated (3 replica) volume.


The two clusters are connected to the Internet via separate network 
adapters.


Only SSH (port 22) is open on cluster #2 nodes' adapters connected to 
the Internet.


All nodes are running Ubuntu 18.04 and GlusterFS 6.3 installed from [1].

The first time I followed the guide[2] everything went fine up until I 
reached the "Create the session" step. That was like a month ago, then I 
had to temporarily stop working in this and now I am coming back to it.


Currently, if I try to see the mountbroker status I get the following:


# gluster-mountbroker status
Traceback (most recent call last):
  File "/usr/sbin/gluster-mountbroker", line 396, in 
runcli()
  File "/usr/lib/python3/dist-packages/gluster/cliutils/cliutils.py", line 225, 
in runcli
cls.run(args)
  File "/usr/sbin/gluster-mountbroker", line 275, in run
out = execute_in_peers("node-status")
  File "/usr/lib/python3/dist-packages/gluster/cliutils/cliutils.py", 

line 127, in execute_in_peers

raise GlusterCmdException((rc, out, err, " ".join(cmd)))
gluster.cliutils.cliutils.GlusterCmdException: (1, '', 'Unable to 
end. Error : Success\n', 'gluster system:: execute mountbroker.py 
node-status')


And in /var/log/gluster/glusterd.log I have:

[2019-08-10 15:24:21.418834] E [MSGID: 106336] 
[glusterd-geo-rep.c:5413:glusterd_op_sys_exec] 0-management: Unable to 
end. Error : Success
[2019-08-10 15:24:21.418908] E [MSGID: 106122] 
[glusterd-syncop.c:1445:gd_commit_op_phase] 0-management: Commit of 
operation 'Volume Execute system commands' failed on localhost : Unable 
to end. Error : Success


So, I have two questions right now:

1) Is there anything wrong with my setup (networking, open ports, etc.)? 
Is it expected to work with this setup or should I redo it in a 
different way?
2) How can I troubleshoot the current status of my setup? Can I find out 
what's missing/wrong and continue from there or should I just start from 
scratch?


Links:
[1] http://ppa.launchpad.net/gluster/glusterfs-6/ubuntu
[2] 
https://docs.gluster.org/en/latest/Administrator%20Guide/Geo%20Replication/


Thank you!

Best regards,
--
alexander iliev
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Brick Reboot => VMs slowdown, client crashes

2019-08-29 Thread Carl Sirotic

Yes,

this makes alot of sense.

It's the behavior that I was experiencing that makes no sense.

When one node was shut down, the whole VM cluster locked up.

However, I managed to find that the culprit were the quorum settings.

I put the quorum at 2 bricks for quorum now, and I am not experiencing 
the problem anymore.


All my vm boot disks and data disks are now sharded.

We are on 10gbit networks, when the node comes backs, we do not see any 
latency really.



Carl


On 2019-08-29 3:58 p.m., Darrell Budic wrote:
You may be mis-understanding the way the gluster system works in 
detail here, but you’ve got the right idea overall. Since gluster is 
maintaining 3 copies of your data, you can lose a drive or a whole 
system and things will keep going without interruption (well, mostly, 
if a host node was using the system that just died, it may pause 
briefly before re-connecting to one that is still running via a 
backup-server setting or your dns configs). While the system is still 
going with one node down, that node is falling behind and new disk 
writes, and the remaining ones are keeping track of what’s changing. 
Once you repair/recover/reboot the down node, it will rejoin the 
cluster. Now the recovered system has to catch up, and it does this by 
having the other two nodes send it the changes. In the meantime, 
gluster is serving any reads for that data from one of the up to date 
nodes, even if you ask the one you just restarted. In order to do this 
healing, it had to lock the files to ensure no changes are made while 
it copies a chunk of them over the recovered node. When it locks them, 
your hypervisor notices they have gone read-only, and especially if it 
has a pending write for that file, may pause the VM because this looks 
like a storage issue to it. Once the file gets unlocked, it can be 
written again, and your hypervisor notices and will generally 
reactivate your VM. You may see delays too, especially if you only 
have 1G networking between your host nodes while everything is getting 
copied around. And your files could be being locked, updated, 
unlocked, locked again a few seconds or minutes later, etc.


That’s where sharding comes into play, once you have a file broken up 
into shards, gluster can get away with only locking the particular 
shard it needs to heal, and leaving the whole disk image unlocked. You 
may still catch a brief pause if you try and write the specific 
segment of the file gluster is healing at the moment, but it’s also 
going to be much faster because it’s a small chuck of the file, and 
copies quickly.


Also, check out 
https://staged-gluster-docs.readthedocs.io/en/release3.7.0beta1/Features/server-quorum/, 
you probably want to set cluster.server-quorum-ratio to 50 for a 
replica-3 setup to avoid the possibility of split-brains. Your cluster 
will go write only if it loses two nodes though, but you can always 
make a change to the server-quorum-ratio later if you need to keep it 
running temporarily.


Hope that makes sense of what’s going on for you,

  -Darrell

On Aug 23, 2019, at 5:06 PM, Carl Sirotic 
> wrote:


Okay,

so it means, at least I am not getting the expected behavior and 
there is hope.


I put the quorum settings that I was told a couple of emails ago.

After applying virt group, they are

cluster.quorum-type auto
cluster.quorum-count (null)
cluster.server-quorum-type server
cluster.server-quorum-ratio 0
cluster.quorum-reads no

Also,

I just put the ping timeout to 5 seconds now.


Carl

On 2019-08-23 5:45 p.m., Ingo Fischer wrote:

Hi Carl,

In my understanding and experience (I have a replica 3 System 
running too) this should not happen. Can you tell your client and 
server quorum settings?


Ingo

Am 23.08.2019 um 15:53 schrieb Carl Sirotic 
mailto:csiro...@evoqarchitecture.com>>:



However,

I must have misunderstood the whole concept of gluster.

In a replica 3, for me, it's completely unacceptable, regardless of 
the options, that all my VMs go down when I reboot one node.


The whole purpose of having a full 3 copy of my data on the fly is 
suposed to be this.


I am in the process of sharding every file.

But even if the healing time would be longer, I would still expect 
a non-sharded replica 3 brick with vm boot disk, to not go down if 
I reboot one of its copy.



I am not very impressed by gluster so far.

Carl

On 2019-08-19 4:15 p.m., Darrell Budic wrote:
/var/lib/glusterd/groups/virt is a good start for ideas, notably 
some thread settings and choose-local=off to improve read 
performance. If you don’t have at least 10 cores on your servers, 
you may want to lower the recommended shd-max-threads=8 to no more 
than half your CPU cores to keep healing from swamping out regular 
work.


It’s also starting to depend on what your backing store and 
networking setup are, so you’re going to want to test changes and 
find what works best for your setup.


In addition to the virt group 

Re: [Gluster-users] Brick Reboot => VMs slowdown, client crashes

2019-08-29 Thread Darrell Budic
You may be mis-understanding the way the gluster system works in detail here, 
but you’ve got the right idea overall. Since gluster is maintaining 3 copies of 
your data, you can lose a drive or a whole system and things will keep going 
without interruption (well, mostly, if a host node was using the system that 
just died, it may pause briefly before re-connecting to one that is still 
running via a backup-server setting or your dns configs). While the system is 
still going with one node down, that node is falling behind and new disk 
writes, and the remaining ones are keeping track of what’s changing. Once you 
repair/recover/reboot the down node, it will rejoin the cluster. Now the 
recovered system has to catch up, and it does this by having the other two 
nodes send it the changes. In the meantime, gluster is serving any reads for 
that data from one of the up to date nodes, even if you ask the one you just 
restarted. In order to do this healing, it had to lock the files to ensure no 
changes are made while it copies a chunk of them over the recovered node. When 
it locks them, your hypervisor notices they have gone read-only, and especially 
if it has a pending write for that file, may pause the VM because this looks 
like a storage issue to it. Once the file gets unlocked, it can be written 
again, and your hypervisor notices and will generally reactivate your VM. You 
may see delays too, especially if you only have 1G networking between your host 
nodes while everything is getting copied around. And your files could be being 
locked, updated, unlocked, locked again a few seconds or minutes later, etc.

That’s where sharding comes into play, once you have a file broken up into 
shards, gluster can get away with only locking the particular shard it needs to 
heal, and leaving the whole disk image unlocked. You may still catch a brief 
pause if you try and write the specific segment of the file gluster is healing 
at the moment, but it’s also going to be much faster because it’s a small chuck 
of the file, and copies quickly.

Also, check out 
https://staged-gluster-docs.readthedocs.io/en/release3.7.0beta1/Features/server-quorum/
 
,
 you probably want to set cluster.server-quorum-ratio to 50 for a replica-3 
setup to avoid the possibility of split-brains. Your cluster will go write only 
if it loses two nodes though, but you can always make a change to the 
server-quorum-ratio later if you need to keep it running temporarily.

Hope that makes sense of what’s going on for you,

  -Darrell

> On Aug 23, 2019, at 5:06 PM, Carl Sirotic  
> wrote:
> 
> Okay,
> 
> so it means, at least I am not getting the expected behavior and there is 
> hope.
> 
> I put the quorum settings that I was told a couple of emails ago.
> 
> After applying virt group, they are
> 
> cluster.quorum-type auto  
>   
> cluster.quorum-count(null)
>   
> cluster.server-quorum-type  server
>   
> cluster.server-quorum-ratio 0 
>   
> cluster.quorum-readsno
>   
> 
> 
> Also,
> 
> I just put the ping timeout to 5 seconds now.
> 
> 
> Carl
> 
> On 2019-08-23 5:45 p.m., Ingo Fischer wrote:
>> Hi Carl,
>> 
>> In my understanding and experience (I have a replica 3 System running too) 
>> this should not happen. Can you tell your client and server quorum settings?
>> 
>> Ingo
>> 
>> Am 23.08.2019 um 15:53 schrieb Carl Sirotic > >:
>> 
>>> However,
>>> 
>>> I must have misunderstood the whole concept of gluster.
>>> 
>>> In a replica 3, for me, it's completely unacceptable, regardless of the 
>>> options, that all my VMs go down when I reboot one node.
>>> 
>>> The whole purpose of having a full 3 copy of my data on the fly is suposed 
>>> to be this.
>>> 
>>> I am in the process of sharding every file.
>>> 
>>> But even if the healing time would be longer, I would still expect a 
>>> non-sharded replica 3 brick with vm boot disk, to not go down if I reboot 
>>> one of its copy.
>>> 
>>> 
>>> 
>>> I am not very impressed by gluster so far.
>>> 
>>> Carl
>>> 
>>> On 2019-08-19 4:15 p.m., Darrell Budic wrote:
 /var/lib/glusterd/groups/virt is a good start for ideas, notably some 
 thread settings and choose-local=off to improve read performance. If you 
 don’t have at least 10 cores on your servers, you may want to lower the 
 recommended shd-max-threads=8 to no more than half your CPU cores to keep 
 healing from swamping out regular work.
 
 It’s also starting to depend on what your backing store and networking 
 setup are, so you’re going to want to test changes and find what works 
 best for your setup.
 
 In addition to 

Re: [Gluster-users] Question about Healing estimated time ...

2019-08-29 Thread Darrell Budic
Depends on your disks, your network, some CPU since you’re using a dispersed 
volume, and the amount of data you’ve got on them. Watch this heal and see how 
long it takes to baseline your system. If you’ve got 10G and SSDs, it’s 
probably not going to take too long. If you’ve got 1G, HDDs, and your test case 
is a TB, it’ll be an hour or three...

> On Aug 29, 2019, at 5:46 AM, Anand Malagi  wrote:
> 
> Can someone please respond ??
>  
> Thanks and Regards,
> --Anand
> Extn : 6974
> Mobile : 9552527199
>  
> From: Anand Malagi 
> Sent: Wednesday, August 28, 2019 5:13 PM
> To: gluster-users@gluster.org ; Gluster 
> Devel mailto:gluster-de...@gluster.org>>
> Subject: Question about Healing estimated time ...
>  
> Hi Gluster Team,
>  
> I have Distributed-Disperse gluster volume which uses erasure coding. I 
> basically two of the bricks within a sub volume (4+2 config), then generated 
> some data which obviously will not be written to these two bricks which I 
> brought down.
>  
> However before bringing them up and get them healed, is there a way to know 
> how much time it will take to heal the files or a way to measure the healing 
> time ??
>  
>  
> Thanks and Regards,
> --Anand
> Extn : 6974
> Mobile : 9552527199
>  
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org 
> https://lists.gluster.org/mailman/listinfo/gluster-users 
> 
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Question about Healing estimated time ...

2019-08-29 Thread Anand Malagi
Can someone please respond ??

Thanks and Regards,
--Anand
Extn : 6974
Mobile : 9552527199

From: Anand Malagi
Sent: Wednesday, August 28, 2019 5:13 PM
To: gluster-users@gluster.org; Gluster Devel 
Subject: Question about Healing estimated time ...

Hi Gluster Team,

I have Distributed-Disperse gluster volume which uses erasure coding. I 
basically two of the bricks within a sub volume (4+2 config), then generated 
some data which obviously will not be written to these two bricks which I 
brought down.

However before bringing them up and get them healed, is there a way to know how 
much time it will take to heal the files or a way to measure the healing time ??


Thanks and Regards,
--Anand
Extn : 6974
Mobile : 9552527199

___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] Several issues when using Gluster with SSL and CRL

2019-08-29 Thread Miha Verlic
Hello,

I've setup Glusterfs 6.3 cluster with 2 nodes + arbiter (and some
additional clients), SSL and CRL:

server.ssl: on
client.ssl: on
ssl.crl-path: /etc/ssl/crl

After a month (when CRL Next Update date came) cluster collapsed with
"error:14094415:SSL routines:ssl3_read_bytes:sslv3 alert certificate
expired" error. I had to restart all processes on all nodes.

fetch-crl is installed on all nodes and properly synces CRLs, but it
seems gluster caches CRLs indefinitely and never re-reads them. When
initial CRL reaches "Next Update" date Gluster starts to reject all
connetions, even though CRL was updated during this time. Even -HUPing
all gluster processes does not help.

This can easily be reproduced by setting CRL option default_crl_days to
two days and refreshing CRL every day. Cluster will crash when initial
CRL will expire, even if it is updated in between.

Another problem happened when one of the clients did not have
up-to-dated CRL. When client was trying to connect, cluster was
apparently constantly busy with client and did not come online. After
client was killed, cluster came online instantly. Even debug logs were
not especially helpful, as client's IP is not logged with error messages.

Cheers
-- 
Miha
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


[Gluster-users] Disappearing files on gluster mount

2019-08-29 Thread Pat Riehecky
I moved my wife's photo archive to a mirrored gluster volume. Lately, 
she's noticed that a number of files are missing.  I'm pretty sure they 
are still in the .glusterfs dir as no one deleted them, but they simply 
don't display


Any ideas how to get the files to reappear?

Glusterfs 3.14

--
Pat Riehecky

Fermi National Accelerator Laboratory
www.fnal.gov
www.scientificlinux.org

___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] Switching from Debian Packages to Gluster Community Packages

2019-08-29 Thread Michael Böhm
Hey folks,

Question ist essentially in the Subject. Right now i'm running Debian
oldstable/stretch with Gluster 3.8.8 from the normal repo and i'm thinking
about switching to the community repo from gluster.

I couldn't find much about which problems i can expect. I checked the
packages, and besides that the debian packages still have init-scripts and
the ones from gluster are already systemd offering only the unit file.

In the end i would like to get from gluster 3 -> 6 in as few steps as
possible, if i stay in with the debian packages that would mean:
3.8.8 (stretch) -> 4.1 (stretch-backports) -> 5.5 (buster) -> 6.4
(buster-backports)

If i can use the gluster-repos that would be only:
3.8.8 (stretch) -> 3.12 (gluster-repo) -> 6.5 (gluster-repo)

Will this be possible? And of course i want to do that online, i have only
replicate and distributed-replicate volumes.

Regards

Mika
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] glusterfs mount crashes with "Transport endpoint is not connected"

2019-08-29 Thread João Baúto
You're most likely hitting this bug https://bugzilla.redhat.com/show_bug.cgi
?id=1671556
Upgrading to gluster 5.5 should fix it.

JB

Shreyansh Shah  escreveu no dia quinta,
29/08/2019 à(s) 10:24:

>
>
> On Thu, Aug 29, 2019 at 2:50 PM Shreyansh Shah <
> shreyansh.s...@alpha-grep.com> wrote:
>
>> Hi,
>> Running on cloud centos7.5 VM, same machine has gluster volume mounted at
>> 2 endpoints (read/write), say A and B. Gluster version server is 5.3 and on
>> client is 3.12.2.
>> B is used very rarely and only for light reads. Mount A failed when our
>> processes were running, but B was still present and could access data
>> through B but not through A.
>>
>> Here is the trace from /var/log/glusterfs:
>> The message "E [MSGID: 101191]
>> [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch
>> handler" repeated 968 times between [2019-08-28 20:40:59.654898] and
>> [2019-08-28 20:41:36.417335]
>> pending frames:
>> frame : type(1) op(FSTAT)
>> frame : type(1) op(READ)
>> frame : type(1) op(READ)
>> frame : type(1) op(READ)
>> frame : type(1) op(READ)
>> frame : type(1) op(READ)
>> frame : type(1) op(READ)
>> frame : type(1) op(READ)
>> frame : type(0) op(0)
>> patchset: git://git.gluster.org/glusterfs.git
>> signal received: 11
>> time of crash:
>> 2019-08-28 20:41:37
>> configuration details:
>> argp 1
>> backtrace 1
>> dlfcn 1
>> libpthread 1
>> llistxattr 1
>> setfsid 1
>> spinlock 1
>> epoll.h 1
>> xattr.h 1
>> st_atim.tv_nsec 1
>> package-string: glusterfs 5.3
>> /lib64/libglusterfs.so.0(+0x26610)[0x7fea89c32610]
>> /lib64/libglusterfs.so.0(gf_print_trace+0x334)[0x7fea89c3cb84]
>> /lib64/libc.so.6(+0x36340)[0x7fea88295340]
>> /lib64/libpthread.so.0(pthread_mutex_lock+0x0)[0x7fea88a97c30]
>> /lib64/libglusterfs.so.0(__gf_free+0x12c)[0x7fea89c5dc3c]
>> /lib64/libglusterfs.so.0(rbthash_remove+0xd5)[0x7fea89c69d35]
>>
>> /usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xcace)[0x7fea7771dace]
>>
>> /usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xcdd7)[0x7fea7771ddd7]
>>
>> /usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xcfc5)[0x7fea7771dfc5]
>>
>> /usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xf0ca)[0x7fea777200ca]
>>
>> /usr/lib64/glusterfs/5.3/xlator/performance/read-ahead.so(+0xa6a1)[0x7fea77b426a1]
>>
>> /usr/lib64/glusterfs/5.3/xlator/performance/read-ahead.so(+0xaa6f)[0x7fea77b42a6f]
>>
>> /usr/lib64/glusterfs/5.3/xlator/performance/read-ahead.so(+0xb0ce)[0x7fea77b430ce]
>> /lib64/libglusterfs.so.0(default_readv_cbk+0x180)[0x7fea89cbb8e0]
>>
>> /usr/lib64/glusterfs/5.3/xlator/cluster/distribute.so(+0x81c1a)[0x7fea77dc9c1a]
>>
>> /usr/lib64/glusterfs/5.3/xlator/protocol/client.so(+0x6d636)[0x7fea7c307636]
>> /lib64/libgfrpc.so.0(+0xec70)[0x7fea899fec70]
>> /lib64/libgfrpc.so.0(+0xf043)[0x7fea899ff043]
>> /lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7fea899faf23]
>> /usr/lib64/glusterfs/5.3/rpc-transport/socket.so(+0xa37b)[0x7fea7e5e637b]
>> /lib64/libglusterfs.so.0(+0x8aa49)[0x7fea89c96a49]
>> /lib64/libpthread.so.0(+0x7dd5)[0x7fea88a95dd5]
>> /lib64/libc.so.6(clone+0x6d)[0x7fea8835d02d]
>>
>>
>> --
>> Regards,
>> Shreyansh Shah
>>
>
>
> --
> Regards,
> Shreyansh Shah
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] glusterfs mount crashes with "Transport endpoint is not connected"

2019-08-29 Thread Shreyansh Shah
On Thu, Aug 29, 2019 at 2:50 PM Shreyansh Shah <
shreyansh.s...@alpha-grep.com> wrote:

> Hi,
> Running on cloud centos7.5 VM, same machine has gluster volume mounted at
> 2 endpoints (read/write), say A and B. Gluster version server is 5.3 and on
> client is 3.12.2.
> B is used very rarely and only for light reads. Mount A failed when our
> processes were running, but B was still present and could access data
> through B but not through A.
>
> Here is the trace from /var/log/glusterfs:
> The message "E [MSGID: 101191]
> [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch
> handler" repeated 968 times between [2019-08-28 20:40:59.654898] and
> [2019-08-28 20:41:36.417335]
> pending frames:
> frame : type(1) op(FSTAT)
> frame : type(1) op(READ)
> frame : type(1) op(READ)
> frame : type(1) op(READ)
> frame : type(1) op(READ)
> frame : type(1) op(READ)
> frame : type(1) op(READ)
> frame : type(1) op(READ)
> frame : type(0) op(0)
> patchset: git://git.gluster.org/glusterfs.git
> signal received: 11
> time of crash:
> 2019-08-28 20:41:37
> configuration details:
> argp 1
> backtrace 1
> dlfcn 1
> libpthread 1
> llistxattr 1
> setfsid 1
> spinlock 1
> epoll.h 1
> xattr.h 1
> st_atim.tv_nsec 1
> package-string: glusterfs 5.3
> /lib64/libglusterfs.so.0(+0x26610)[0x7fea89c32610]
> /lib64/libglusterfs.so.0(gf_print_trace+0x334)[0x7fea89c3cb84]
> /lib64/libc.so.6(+0x36340)[0x7fea88295340]
> /lib64/libpthread.so.0(pthread_mutex_lock+0x0)[0x7fea88a97c30]
> /lib64/libglusterfs.so.0(__gf_free+0x12c)[0x7fea89c5dc3c]
> /lib64/libglusterfs.so.0(rbthash_remove+0xd5)[0x7fea89c69d35]
>
> /usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xcace)[0x7fea7771dace]
>
> /usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xcdd7)[0x7fea7771ddd7]
>
> /usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xcfc5)[0x7fea7771dfc5]
>
> /usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xf0ca)[0x7fea777200ca]
>
> /usr/lib64/glusterfs/5.3/xlator/performance/read-ahead.so(+0xa6a1)[0x7fea77b426a1]
>
> /usr/lib64/glusterfs/5.3/xlator/performance/read-ahead.so(+0xaa6f)[0x7fea77b42a6f]
>
> /usr/lib64/glusterfs/5.3/xlator/performance/read-ahead.so(+0xb0ce)[0x7fea77b430ce]
> /lib64/libglusterfs.so.0(default_readv_cbk+0x180)[0x7fea89cbb8e0]
>
> /usr/lib64/glusterfs/5.3/xlator/cluster/distribute.so(+0x81c1a)[0x7fea77dc9c1a]
>
> /usr/lib64/glusterfs/5.3/xlator/protocol/client.so(+0x6d636)[0x7fea7c307636]
> /lib64/libgfrpc.so.0(+0xec70)[0x7fea899fec70]
> /lib64/libgfrpc.so.0(+0xf043)[0x7fea899ff043]
> /lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7fea899faf23]
> /usr/lib64/glusterfs/5.3/rpc-transport/socket.so(+0xa37b)[0x7fea7e5e637b]
> /lib64/libglusterfs.so.0(+0x8aa49)[0x7fea89c96a49]
> /lib64/libpthread.so.0(+0x7dd5)[0x7fea88a95dd5]
> /lib64/libc.so.6(clone+0x6d)[0x7fea8835d02d]
>
>
> --
> Regards,
> Shreyansh Shah
>


-- 
Regards,
Shreyansh Shah
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] glusterfs mount crashes with "Transport endpoint is not connected"

2019-08-29 Thread Shreyansh Shah
Hi,
Running on cloud centos7.5 VM, same machine has gluster volume mounted at 2
endpoints (read/write), say A and B. Gluster version server is 5.3 and on
client is 3.12.2.
B is used very rarely and only for light reads. Mount A failed when our
processes were running, but B was still present and could access data
through B but not through A.

Here is the trace from /var/log/glusterfs:
The message "E [MSGID: 101191]
[event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch
handler" repeated 968 times between [2019-08-28 20:40:59.654898] and
[2019-08-28 20:41:36.417335]
pending frames:
frame : type(1) op(FSTAT)
frame : type(1) op(READ)
frame : type(1) op(READ)
frame : type(1) op(READ)
frame : type(1) op(READ)
frame : type(1) op(READ)
frame : type(1) op(READ)
frame : type(1) op(READ)
frame : type(0) op(0)
patchset: git://git.gluster.org/glusterfs.git
signal received: 11
time of crash:
2019-08-28 20:41:37
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 5.3
/lib64/libglusterfs.so.0(+0x26610)[0x7fea89c32610]
/lib64/libglusterfs.so.0(gf_print_trace+0x334)[0x7fea89c3cb84]
/lib64/libc.so.6(+0x36340)[0x7fea88295340]
/lib64/libpthread.so.0(pthread_mutex_lock+0x0)[0x7fea88a97c30]
/lib64/libglusterfs.so.0(__gf_free+0x12c)[0x7fea89c5dc3c]
/lib64/libglusterfs.so.0(rbthash_remove+0xd5)[0x7fea89c69d35]
/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xcace)[0x7fea7771dace]
/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xcdd7)[0x7fea7771ddd7]
/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xcfc5)[0x7fea7771dfc5]
/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xf0ca)[0x7fea777200ca]
/usr/lib64/glusterfs/5.3/xlator/performance/read-ahead.so(+0xa6a1)[0x7fea77b426a1]
/usr/lib64/glusterfs/5.3/xlator/performance/read-ahead.so(+0xaa6f)[0x7fea77b42a6f]
/usr/lib64/glusterfs/5.3/xlator/performance/read-ahead.so(+0xb0ce)[0x7fea77b430ce]
/lib64/libglusterfs.so.0(default_readv_cbk+0x180)[0x7fea89cbb8e0]
/usr/lib64/glusterfs/5.3/xlator/cluster/distribute.so(+0x81c1a)[0x7fea77dc9c1a]
/usr/lib64/glusterfs/5.3/xlator/protocol/client.so(+0x6d636)[0x7fea7c307636]
/lib64/libgfrpc.so.0(+0xec70)[0x7fea899fec70]
/lib64/libgfrpc.so.0(+0xf043)[0x7fea899ff043]
/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7fea899faf23]
/usr/lib64/glusterfs/5.3/rpc-transport/socket.so(+0xa37b)[0x7fea7e5e637b]
/lib64/libglusterfs.so.0(+0x8aa49)[0x7fea89c96a49]
/lib64/libpthread.so.0(+0x7dd5)[0x7fea88a95dd5]
/lib64/libc.so.6(clone+0x6d)[0x7fea8835d02d]


-- 
Regards,
Shreyansh Shah
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users