Hi,

The first (and so far only) crash happened at 2am the next day after we
upgraded, on only one of four servers and only to one of two mounts.

I have no idea what caused it, but yeah, we do have a pretty busy site (
apkmirror.com), and it caused a disruption for any uploads or downloads
from that server until I woke up and fixed the mount.

I wish I could be more helpful but all I have is that stack trace.

I'm glad it's a blocker and will hopefully be resolved soon.

On Thu, Jan 31, 2019, 7:26 PM Amar Tumballi Suryanarayan <
atumb...@redhat.com> wrote:

> Hi Artem,
>
> Opened https://bugzilla.redhat.com/show_bug.cgi?id=1671603 (ie, as a
> clone of other bugs where recent discussions happened), and marked it as a
> blocker for glusterfs-5.4 release.
>
> We already have fixes for log flooding - https://review.gluster.org/22128,
> and are the process of identifying and fixing the issue seen with crash.
>
> Can you please tell if the crashes happened as soon as upgrade ? or was
> there any particular pattern you observed before the crash.
>
> -Amar
>
>
> On Thu, Jan 31, 2019 at 11:40 PM Artem Russakovskii <archon...@gmail.com>
> wrote:
>
>> Within 24 hours after updating from rock solid 4.1 to 5.3, I already got
>> a crash which others have mentioned in
>> https://bugzilla.redhat.com/show_bug.cgi?id=1313567 and had to unmount,
>> kill gluster, and remount:
>>
>>
>> [2019-01-31 09:38:04.317604] W [dict.c:761:dict_ref]
>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
>> [0x7fcccafcd329]
>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
>> [0x7fcccb1deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
>> [0x7fccd705b218] ) 2-dict: dict is NULL [Invalid argument]
>> [2019-01-31 09:38:04.319308] W [dict.c:761:dict_ref]
>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
>> [0x7fcccafcd329]
>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
>> [0x7fcccb1deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
>> [0x7fccd705b218] ) 2-dict: dict is NULL [Invalid argument]
>> [2019-01-31 09:38:04.320047] W [dict.c:761:dict_ref]
>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
>> [0x7fcccafcd329]
>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
>> [0x7fcccb1deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
>> [0x7fccd705b218] ) 2-dict: dict is NULL [Invalid argument]
>> [2019-01-31 09:38:04.320677] W [dict.c:761:dict_ref]
>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
>> [0x7fcccafcd329]
>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
>> [0x7fcccb1deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
>> [0x7fccd705b218] ) 2-dict: dict is NULL [Invalid argument]
>> The message "I [MSGID: 108031]
>> [afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data1-replicate-0:
>> selecting local read_child SITE_data1-client-3" repeated 5 times between
>> [2019-01-31 09:37:54.751905] and [2019-01-31 09:38:03.958061]
>> The message "E [MSGID: 101191]
>> [event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to dispatch
>> handler" repeated 72 times between [2019-01-31 09:37:53.746741] and
>> [2019-01-31 09:38:04.696993]
>> pending frames:
>> frame : type(1) op(READ)
>> frame : type(1) op(OPEN)
>> frame : type(0) op(0)
>> patchset: git://git.gluster.org/glusterfs.git
>> signal received: 6
>> time of crash:
>> 2019-01-31 09:38:04
>> configuration details:
>> argp 1
>> backtrace 1
>> dlfcn 1
>> libpthread 1
>> llistxattr 1
>> setfsid 1
>> spinlock 1
>> epoll.h 1
>> xattr.h 1
>> st_atim.tv_nsec 1
>> package-string: glusterfs 5.3
>> /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7fccd706664c]
>> /usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7fccd7070cb6]
>> /lib64/libc.so.6(+0x36160)[0x7fccd622d160]
>> /lib64/libc.so.6(gsignal+0x110)[0x7fccd622d0e0]
>> /lib64/libc.so.6(abort+0x151)[0x7fccd622e6c1]
>> /lib64/libc.so.6(+0x2e6fa)[0x7fccd62256fa]
>> /lib64/libc.so.6(+0x2e772)[0x7fccd6225772]
>> /lib64/libpthread.so.0(pthread_mutex_lock+0x228)[0x7fccd65bb0b8]
>>
>> /usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x32c4d)[0x7fcccbb01c4d]
>>
>> /usr/lib64/glusterfs/5.3/xlator/protocol/client.so(+0x65778)[0x7fcccbdd1778]
>> /usr/lib64/libgfrpc.so.0(+0xe820)[0x7fccd6e31820]
>> /usr/lib64/libgfrpc.so.0(+0xeb6f)[0x7fccd6e31b6f]
>> /usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7fccd6e2e063]
>> /usr/lib64/glusterfs/5.3/rpc-transport/socket.so(+0xa0b2)[0x7fccd0b7e0b2]
>> /usr/lib64/libglusterfs.so.0(+0x854c3)[0x7fccd70c44c3]
>> /lib64/libpthread.so.0(+0x7559)[0x7fccd65b8559]
>> /lib64/libc.so.6(clone+0x3f)[0x7fccd62ef81f]
>> ---------
>>
>> Do the pending patches fix the crash or only the repeated warnings? I'm
>> running glusterfs on OpenSUSE 15.0 installed via
>> http://download.opensuse.org/repositories/home:/glusterfs:/Leap15-5/openSUSE_Leap_15.0/,
>> not too sure how to make it core dump.
>>
>> If it's not fixed by the patches above, has anyone already opened a
>> ticket for the crashes that I can join and monitor? This is going to create
>> a massive problem for us since production systems are crashing.
>>
>> Thanks.
>>
>> Sincerely,
>> Artem
>>
>> --
>> Founder, Android Police <http://www.androidpolice.com>, APK Mirror
>> <http://www.apkmirror.com/>, Illogical Robot LLC
>> beerpla.net | +ArtemRussakovskii
>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR
>> <http://twitter.com/ArtemR>
>>
>>
>> On Wed, Jan 30, 2019 at 6:37 PM Raghavendra Gowdappa <rgowd...@redhat.com>
>> wrote:
>>
>>>
>>>
>>> On Thu, Jan 31, 2019 at 2:14 AM Artem Russakovskii <archon...@gmail.com>
>>> wrote:
>>>
>>>> Also, not sure if related or not, but I got a ton of these "Failed to
>>>> dispatch handler" in my logs as well. Many people have been commenting
>>>> about this issue here
>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1651246.
>>>>
>>>
>>> https://review.gluster.org/#/c/glusterfs/+/22046/ addresses this.
>>>
>>>
>>>> ==> mnt-SITE_data1.log <==
>>>>> [2019-01-30 20:38:20.783713] W [dict.c:761:dict_ref]
>>>>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
>>>>> [0x7fd966fcd329]
>>>>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
>>>>> [0x7fd9671deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
>>>>> [0x7fd9731ea218] ) 2-dict: dict is NULL [Invalid argument]
>>>>> ==> mnt-SITE_data3.log <==
>>>>> The message "E [MSGID: 101191]
>>>>> [event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to 
>>>>> dispatch
>>>>> handler" repeated 413 times between [2019-01-30 20:36:23.881090] and
>>>>> [2019-01-30 20:38:20.015593]
>>>>> The message "I [MSGID: 108031]
>>>>> [afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data3-replicate-0:
>>>>> selecting local read_child SITE_data3-client-0" repeated 42 times between
>>>>> [2019-01-30 20:36:23.290287] and [2019-01-30 20:38:20.280306]
>>>>> ==> mnt-SITE_data1.log <==
>>>>> The message "I [MSGID: 108031]
>>>>> [afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data1-replicate-0:
>>>>> selecting local read_child SITE_data1-client-0" repeated 50 times between
>>>>> [2019-01-30 20:36:22.247367] and [2019-01-30 20:38:19.459789]
>>>>> The message "E [MSGID: 101191]
>>>>> [event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to 
>>>>> dispatch
>>>>> handler" repeated 2654 times between [2019-01-30 20:36:22.667327] and
>>>>> [2019-01-30 20:38:20.546355]
>>>>> [2019-01-30 20:38:21.492319] I [MSGID: 108031]
>>>>> [afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data1-replicate-0:
>>>>> selecting local read_child SITE_data1-client-0
>>>>> ==> mnt-SITE_data3.log <==
>>>>> [2019-01-30 20:38:22.349689] I [MSGID: 108031]
>>>>> [afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data3-replicate-0:
>>>>> selecting local read_child SITE_data3-client-0
>>>>> ==> mnt-SITE_data1.log <==
>>>>> [2019-01-30 20:38:22.762941] E [MSGID: 101191]
>>>>> [event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to 
>>>>> dispatch
>>>>> handler
>>>>
>>>>
>>>> I'm hoping raising the issue here on the mailing list may bring some
>>>> additional eyeballs and get them both fixed.
>>>>
>>>> Thanks.
>>>>
>>>> Sincerely,
>>>> Artem
>>>>
>>>> --
>>>> Founder, Android Police <http://www.androidpolice.com>, APK Mirror
>>>> <http://www.apkmirror.com/>, Illogical Robot LLC
>>>> beerpla.net | +ArtemRussakovskii
>>>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR
>>>> <http://twitter.com/ArtemR>
>>>>
>>>>
>>>> On Wed, Jan 30, 2019 at 12:26 PM Artem Russakovskii <
>>>> archon...@gmail.com> wrote:
>>>>
>>>>> I found a similar issue here:
>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1313567. There's a
>>>>> comment from 3 days ago from someone else with 5.3 who started seeing the
>>>>> spam.
>>>>>
>>>>> Here's the command that repeats over and over:
>>>>> [2019-01-30 20:23:24.481581] W [dict.c:761:dict_ref]
>>>>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
>>>>> [0x7fd966fcd329]
>>>>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
>>>>> [0x7fd9671deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
>>>>> [0x7fd9731ea218] ) 2-dict: dict is NULL [Invalid argument]
>>>>>
>>>>
>>> +Milind Changire <mchan...@redhat.com> Can you check why this message
>>> is logged and send a fix?
>>>
>>>
>>>>> Is there any fix for this issue?
>>>>>
>>>>> Thanks.
>>>>>
>>>>> Sincerely,
>>>>> Artem
>>>>>
>>>>> --
>>>>> Founder, Android Police <http://www.androidpolice.com>, APK Mirror
>>>>> <http://www.apkmirror.com/>, Illogical Robot LLC
>>>>> beerpla.net | +ArtemRussakovskii
>>>>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR
>>>>> <http://twitter.com/ArtemR>
>>>>>
>>>> _______________________________________________
>>>> Gluster-users mailing list
>>>> Gluster-users@gluster.org
>>>> https://lists.gluster.org/mailman/listinfo/gluster-users
>>>
>>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users@gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-users
>
>
>
> --
> Amar Tumballi (amarts)
>
_______________________________________________
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Reply via email to