Re: [Gluster-users] Message repeated over and over after upgrade from 4.1 to 5.3: W [dict.c:761:dict_ref] (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) [0x7fd966fcd329] -->/us

2019-10-10 Thread lejeczek
On 30/01/2019 20:26, Artem Russakovskii wrote:
> I found a similar issue
> here: https://bugzilla.redhat.com/show_bug.cgi?id=1313567. There's a
> comment from 3 days ago from someone else with 5.3 who started seeing
> the spam.
>
> Here's the command that repeats over and over:
> [2019-01-30 20:23:24.481581] W [dict.c:761:dict_ref]
> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
> [0x7fd966fcd329]
> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
> [0x7fd9671deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
> [0x7fd9731ea218] ) 2-dict: dict is NULL [Invalid argument]
>
> Is there any fix for this issue?
>
> Thanks.
>
> Sincerely,
> Artem
>
> --
> Founder, Android Police , APK Mirror
> , Illogical Robot LLC
> beerpla.net  | +ArtemRussakovskii
>  | @ArtemR
> 
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users

I get no crashes but with 6.5 I see these:

...

[2019-10-10 15:52:08.528208] I [io-stats.c:4027:fini] 0-USER-HOME:
io-stats translator unloaded
[2019-10-10 15:52:10.441283] E [MSGID: 101046]
[dht-common.c:11247:dht_pt_fgetxattr_cbk] 0-USER-HOME-dht: dict is null
[2019-10-10 15:52:10.441387] E [MSGID: 101046]
[dht-common.c:11248:dht_pt_fgetxattr_cbk] 0-USER-HOME-dht: dict is null
[2019-10-10 15:52:10.555957] E [MSGID: 108006]
[afr-common.c:5318:__afr_handle_child_down_event]
0-USER-HOME-replicate-0: All subvolumes are down. Going offline until at
least one of them comes back up.
[2019-10-10 15:52:10.557136] I [io-stats.c:4027:fini] 0-USER-HOME:
io-stats translator unloaded
The message "E [MSGID: 101046] [dht-common.c:11220:dht_pt_getxattr_cbk]
0-USER-HOME-dht: dict is null" repeated 8 times between [2019-10-10
15:52:07.263547] and [2019-10-10 15:52:07.649220]
The message "E [MSGID: 101046] [dht-common.c:11221:dht_pt_getxattr_cbk]
0-USER-HOME-dht: dict is null" repeated 8 times between [2019-10-10
15:52:07.263620] and [2019-10-10 15:52:07.649223]
[2019-10-10 15:56:11.291652] E [MSGID: 101046]
[dht-common.c:11247:dht_pt_fgetxattr_cbk] 0-USER-HOME-dht: dict is null
[2019-10-10 15:56:11.291742] E [MSGID: 101046]
[dht-common.c:11248:dht_pt_fgetxattr_cbk] 0-USER-HOME-dht: dict is null
[2019-10-10 15:56:11.974495] E [MSGID: 101046]
[dht-common.c:11220:dht_pt_getxattr_cbk] 0-USER-HOME-dht: dict is null
[2019-10-10 15:56:11.974568] E [MSGID: 101046]
[dht-common.c:11221:dht_pt_getxattr_cbk] 0-USER-HOME-dht: dict is null
The message "E [MSGID: 101046] [dht-common.c:11220:dht_pt_getxattr_cbk]
0-USER-HOME-dht: dict is null" repeated 8 times between [2019-10-10
15:56:11.974495] and [2019-10-10 15:56:23.911313]
The message "E [MSGID: 101046] [dht-common.c:11221:dht_pt_getxattr_cbk]
0-USER-HOME-dht: dict is null" repeated 8 times between [2019-10-10
15:56:11.974568] and [2019-10-10 15:56:23.911316]

...

 
And in case it might have something to do with above log errors,
interestingly, if quotas are in use on paths in the volume then windows
shares (Samba) deny to copy in new data saying that 0 bytes is
free(which is false)




pEpkey.asc
Description: application/pgp-keys


Community Meeting Calendar:

APAC Schedule -
Every 2nd and 4th Tuesday at 11:30 AM IST
Bridge: https://bluejeans.com/118564314

NA/EMEA Schedule -
Every 1st and 3rd Tuesday at 01:00 PM EDT
Bridge: https://bluejeans.com/118564314

Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Message repeated over and over after upgrade from 4.1 to 5.3: W [dict.c:761:dict_ref] (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) [0x7fd966fcd329] -->/us

2019-02-19 Thread Artem Russakovskii
Hi Nithya,

Unfortunately, I just had another crash on the same server, with
performance.write-behind still set to off. I'll email the core file
privately.


[2019-02-19 19:50:39.511743] W [dict.c:761:dict_ref]
(-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
[0x7f9598991329]
-->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
[0x7f9598ba2af5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
[0x7f95a137d218] ) 2-dict: dict is NULL [Invalid argument]
The message "E [MSGID: 101191]
[event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to dispatch
handler" repeated 95 times between [2019-02-19 19:49:07.655620] and
[2019-02-19 19:50:39.499284]
The message "I [MSGID: 108031] [afr-common.c:2543:afr_local_discovery_cbk]
2-_data3-replicate-0: selecting local read_child
_data3-client-3" repeated 56 times between [2019-02-19
19:49:07.602370] and [2019-02-19 19:50:42.912766]
pending frames:
frame : type(1) op(LOOKUP)
frame : type(0) op(0)
patchset: git://git.gluster.org/glusterfs.git
signal received: 6
time of crash:
2019-02-19 19:50:43
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 5.3
/usr/lib64/libglusterfs.so.0(+0x2764c)[0x7f95a138864c]
/usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7f95a1392cb6]
/lib64/libc.so.6(+0x36160)[0x7f95a054f160]
/lib64/libc.so.6(gsignal+0x110)[0x7f95a054f0e0]
/lib64/libc.so.6(abort+0x151)[0x7f95a05506c1]
/lib64/libc.so.6(+0x2e6fa)[0x7f95a05476fa]
/lib64/libc.so.6(+0x2e772)[0x7f95a0547772]
/lib64/libpthread.so.0(pthread_mutex_lock+0x228)[0x7f95a08dd0b8]
/usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x5dc9d)[0x7f95994f0c9d]
/usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x70ba1)[0x7f9599503ba1]
/usr/lib64/glusterfs/5.3/xlator/protocol/client.so(+0x58f3f)[0x7f9599788f3f]
/usr/lib64/libgfrpc.so.0(+0xe820)[0x7f95a1153820]
/usr/lib64/libgfrpc.so.0(+0xeb6f)[0x7f95a1153b6f]
/usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7f95a1150063]
/usr/lib64/glusterfs/5.3/rpc-transport/socket.so(+0xa0b2)[0x7f959aea00b2]
/usr/lib64/libglusterfs.so.0(+0x854c3)[0x7f95a13e64c3]
/lib64/libpthread.so.0(+0x7559)[0x7f95a08da559]
/lib64/libc.so.6(clone+0x3f)[0x7f95a061181f]
-
[2019-02-19 19:51:34.425106] I [MSGID: 100030] [glusterfsd.c:2715:main]
0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 5.3
(args: /usr/sbin/glusterfs --lru-limit=0 --process-name fuse
--volfile-server=localhost --volfile-id=/_data3 /mnt/_data3)
[2019-02-19 19:51:34.435206] I [MSGID: 101190]
[event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll: Started thread
with index 1
[2019-02-19 19:51:34.450272] I [MSGID: 101190]
[event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll: Started thread
with index 2
[2019-02-19 19:51:34.450394] I [MSGID: 101190]
[event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll: Started thread
with index 4
[2019-02-19 19:51:34.450488] I [MSGID: 101190]
[event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll: Started thread
with index 3

Sincerely,
Artem

--
Founder, Android Police , APK Mirror
, Illogical Robot LLC
beerpla.net | +ArtemRussakovskii
 | @ArtemR



On Tue, Feb 12, 2019 at 12:38 AM Nithya Balachandran 
wrote:

>
> Not yet but we are discussing an interim release. It is going to take a
> couple of days to review the fixes so not before then. We will update on
> the list with dates once we decide.
>
>
> On Tue, 12 Feb 2019 at 11:46, Artem Russakovskii 
> wrote:
>
>> Awesome. But is there a release schedule and an ETA for when these will
>> be out in the repos?
>>
>> On Mon, Feb 11, 2019, 9:34 PM Raghavendra Gowdappa 
>> wrote:
>>
>>>
>>>
>>> On Tue, Feb 12, 2019 at 10:24 AM Artem Russakovskii 
>>> wrote:
>>>
 Great job identifying the issue!

 Any ETA on the next release with the logging and crash fixes in it?

>>>
>>> I've marked write-behind corruption as a blocker for release-6. Logging
>>> fixes are already in codebase.
>>>
>>>
 On Mon, Feb 11, 2019, 7:19 PM Raghavendra Gowdappa 
 wrote:

>
>
> On Mon, Feb 11, 2019 at 3:49 PM João Baúto <
> joao.ba...@neuro.fchampalimaud.org> wrote:
>
>> Although I don't have these error messages, I'm having fuse crashes
>> as frequent as you. I have disabled write-behind and the mount has been
>> running over the weekend with heavy usage and no issues.
>>
>
> The issue you are facing will likely be fixed by patch [1]. Me, Xavi
> and Nithya were able to identify the corruption in write-behind.
>
> [1] https://review.gluster.org/22189
>
>
>> I can provide coredumps before disabling write-behind if needed. I
>> opened a BZ report
>>  with the
>> crashes that I was having.

Re: [Gluster-users] Message repeated over and over after upgrade from 4.1 to 5.3: W [dict.c:761:dict_ref] (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) [0x7fd966fcd329] -->/us

2019-02-12 Thread Nithya Balachandran
Not yet but we are discussing an interim release. It is going to take a
couple of days to review the fixes so not before then. We will update on
the list with dates once we decide.


On Tue, 12 Feb 2019 at 11:46, Artem Russakovskii 
wrote:

> Awesome. But is there a release schedule and an ETA for when these will be
> out in the repos?
>
> On Mon, Feb 11, 2019, 9:34 PM Raghavendra Gowdappa 
> wrote:
>
>>
>>
>> On Tue, Feb 12, 2019 at 10:24 AM Artem Russakovskii 
>> wrote:
>>
>>> Great job identifying the issue!
>>>
>>> Any ETA on the next release with the logging and crash fixes in it?
>>>
>>
>> I've marked write-behind corruption as a blocker for release-6. Logging
>> fixes are already in codebase.
>>
>>
>>> On Mon, Feb 11, 2019, 7:19 PM Raghavendra Gowdappa 
>>> wrote:
>>>


 On Mon, Feb 11, 2019 at 3:49 PM João Baúto <
 joao.ba...@neuro.fchampalimaud.org> wrote:

> Although I don't have these error messages, I'm having fuse crashes as
> frequent as you. I have disabled write-behind and the mount has been
> running over the weekend with heavy usage and no issues.
>

 The issue you are facing will likely be fixed by patch [1]. Me, Xavi
 and Nithya were able to identify the corruption in write-behind.

 [1] https://review.gluster.org/22189


> I can provide coredumps before disabling write-behind if needed. I
> opened a BZ report
>  with the
> crashes that I was having.
>
> *João Baúto*
> ---
>
> *Scientific Computing and Software Platform*
> Champalimaud Research
> Champalimaud Center for the Unknown
> Av. Brasília, Doca de Pedrouços
> 1400-038 Lisbon, Portugal
> fchampalimaud.org 
>
>
> Artem Russakovskii  escreveu no dia sábado,
> 9/02/2019 à(s) 22:18:
>
>> Alright. I've enabled core-dumping (hopefully), so now I'm waiting
>> for the next crash to see if it dumps a core for you guys to remotely 
>> debug.
>>
>> Then I can consider setting performance.write-behind to off and
>> monitoring for further crashes.
>>
>> Sincerely,
>> Artem
>>
>> --
>> Founder, Android Police , APK Mirror
>> , Illogical Robot LLC
>> beerpla.net | +ArtemRussakovskii
>>  | @ArtemR
>> 
>>
>>
>> On Fri, Feb 8, 2019 at 7:22 PM Raghavendra Gowdappa <
>> rgowd...@redhat.com> wrote:
>>
>>>
>>>
>>> On Sat, Feb 9, 2019 at 12:53 AM Artem Russakovskii <
>>> archon...@gmail.com> wrote:
>>>
 Hi Nithya,

 I can try to disable write-behind as long as it doesn't heavily
 impact performance for us. Which option is it exactly? I don't see it 
 set
 in my list of changed volume variables that I sent you guys earlier.

>>>
>>> The option is performance.write-behind
>>>
>>>
 Sincerely,
 Artem

 --
 Founder, Android Police , APK Mirror
 , Illogical Robot LLC
 beerpla.net | +ArtemRussakovskii
  | @ArtemR
 


 On Fri, Feb 8, 2019 at 4:57 AM Nithya Balachandran <
 nbala...@redhat.com> wrote:

> Hi Artem,
>
> We have found the cause of one crash. Unfortunately we have not
> managed to reproduce the one you reported so we don't know if it is 
> the
> same cause.
>
> Can you disable write-behind on the volume and let us know if it
> solves the problem? If yes, it is likely to be the same issue.
>
>
> regards,
> Nithya
>
> On Fri, 8 Feb 2019 at 06:51, Artem Russakovskii <
> archon...@gmail.com> wrote:
>
>> Sorry to disappoint, but the crash just happened again, so
>> lru-limit=0 didn't help.
>>
>> Here's the snippet of the crash and the subsequent remount by
>> monit.
>>
>>
>> [2019-02-08 01:13:05.854391] W [dict.c:761:dict_ref]
>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
>> [0x7f4402b99329]
>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
>> [0x7f4402daaaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
>> [0x7f440b6b5218] ) 0-dict: dict is NULL [In
>> valid argument]
>> The message "I [MSGID: 108031]
>> [afr-common.c:2543:afr_local_discovery_cbk] 
>> 0-_data1-replicate-0:
>> selecting local read_child _data1-client-3" repeated 39 times 
>> between

Re: [Gluster-users] Message repeated over and over after upgrade from 4.1 to 5.3: W [dict.c:761:dict_ref] (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) [0x7fd966fcd329] -->/us

2019-02-11 Thread Artem Russakovskii
Awesome. But is there a release schedule and an ETA for when these will be
out in the repos?

On Mon, Feb 11, 2019, 9:34 PM Raghavendra Gowdappa 
wrote:

>
>
> On Tue, Feb 12, 2019 at 10:24 AM Artem Russakovskii 
> wrote:
>
>> Great job identifying the issue!
>>
>> Any ETA on the next release with the logging and crash fixes in it?
>>
>
> I've marked write-behind corruption as a blocker for release-6. Logging
> fixes are already in codebase.
>
>
>> On Mon, Feb 11, 2019, 7:19 PM Raghavendra Gowdappa 
>> wrote:
>>
>>>
>>>
>>> On Mon, Feb 11, 2019 at 3:49 PM João Baúto <
>>> joao.ba...@neuro.fchampalimaud.org> wrote:
>>>
 Although I don't have these error messages, I'm having fuse crashes as
 frequent as you. I have disabled write-behind and the mount has been
 running over the weekend with heavy usage and no issues.

>>>
>>> The issue you are facing will likely be fixed by patch [1]. Me, Xavi and
>>> Nithya were able to identify the corruption in write-behind.
>>>
>>> [1] https://review.gluster.org/22189
>>>
>>>
 I can provide coredumps before disabling write-behind if needed. I
 opened a BZ report
  with the crashes
 that I was having.

 *João Baúto*
 ---

 *Scientific Computing and Software Platform*
 Champalimaud Research
 Champalimaud Center for the Unknown
 Av. Brasília, Doca de Pedrouços
 1400-038 Lisbon, Portugal
 fchampalimaud.org 


 Artem Russakovskii  escreveu no dia sábado,
 9/02/2019 à(s) 22:18:

> Alright. I've enabled core-dumping (hopefully), so now I'm waiting for
> the next crash to see if it dumps a core for you guys to remotely debug.
>
> Then I can consider setting performance.write-behind to off and
> monitoring for further crashes.
>
> Sincerely,
> Artem
>
> --
> Founder, Android Police , APK Mirror
> , Illogical Robot LLC
> beerpla.net | +ArtemRussakovskii
>  | @ArtemR
> 
>
>
> On Fri, Feb 8, 2019 at 7:22 PM Raghavendra Gowdappa <
> rgowd...@redhat.com> wrote:
>
>>
>>
>> On Sat, Feb 9, 2019 at 12:53 AM Artem Russakovskii <
>> archon...@gmail.com> wrote:
>>
>>> Hi Nithya,
>>>
>>> I can try to disable write-behind as long as it doesn't heavily
>>> impact performance for us. Which option is it exactly? I don't see it 
>>> set
>>> in my list of changed volume variables that I sent you guys earlier.
>>>
>>
>> The option is performance.write-behind
>>
>>
>>> Sincerely,
>>> Artem
>>>
>>> --
>>> Founder, Android Police , APK Mirror
>>> , Illogical Robot LLC
>>> beerpla.net | +ArtemRussakovskii
>>>  | @ArtemR
>>> 
>>>
>>>
>>> On Fri, Feb 8, 2019 at 4:57 AM Nithya Balachandran <
>>> nbala...@redhat.com> wrote:
>>>
 Hi Artem,

 We have found the cause of one crash. Unfortunately we have not
 managed to reproduce the one you reported so we don't know if it is the
 same cause.

 Can you disable write-behind on the volume and let us know if it
 solves the problem? If yes, it is likely to be the same issue.


 regards,
 Nithya

 On Fri, 8 Feb 2019 at 06:51, Artem Russakovskii <
 archon...@gmail.com> wrote:

> Sorry to disappoint, but the crash just happened again, so
> lru-limit=0 didn't help.
>
> Here's the snippet of the crash and the subsequent remount by
> monit.
>
>
> [2019-02-08 01:13:05.854391] W [dict.c:761:dict_ref]
> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
> [0x7f4402b99329]
> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
> [0x7f4402daaaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
> [0x7f440b6b5218] ) 0-dict: dict is NULL [In
> valid argument]
> The message "I [MSGID: 108031]
> [afr-common.c:2543:afr_local_discovery_cbk] 
> 0-_data1-replicate-0:
> selecting local read_child _data1-client-3" repeated 39 times 
> between
> [2019-02-08 01:11:18.043286] and [2019-02-08 01:13:07.915604]
> The message "E [MSGID: 101191]
> [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to 
> dispatch
> handler" repeated 515 times between [2019-02-08 01:11:17.932515] and
> [2019-02-08 01:13:09.311554]
> pending frames:
> frame : type(1) op(LOOKUP)
> frame : 

Re: [Gluster-users] Message repeated over and over after upgrade from 4.1 to 5.3: W [dict.c:761:dict_ref] (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) [0x7fd966fcd329] -->/us

2019-02-11 Thread Raghavendra Gowdappa
On Tue, Feb 12, 2019 at 10:24 AM Artem Russakovskii 
wrote:

> Great job identifying the issue!
>
> Any ETA on the next release with the logging and crash fixes in it?
>

I've marked write-behind corruption as a blocker for release-6. Logging
fixes are already in codebase.


> On Mon, Feb 11, 2019, 7:19 PM Raghavendra Gowdappa 
> wrote:
>
>>
>>
>> On Mon, Feb 11, 2019 at 3:49 PM João Baúto <
>> joao.ba...@neuro.fchampalimaud.org> wrote:
>>
>>> Although I don't have these error messages, I'm having fuse crashes as
>>> frequent as you. I have disabled write-behind and the mount has been
>>> running over the weekend with heavy usage and no issues.
>>>
>>
>> The issue you are facing will likely be fixed by patch [1]. Me, Xavi and
>> Nithya were able to identify the corruption in write-behind.
>>
>> [1] https://review.gluster.org/22189
>>
>>
>>> I can provide coredumps before disabling write-behind if needed. I
>>> opened a BZ report  
>>> with
>>> the crashes that I was having.
>>>
>>> *João Baúto*
>>> ---
>>>
>>> *Scientific Computing and Software Platform*
>>> Champalimaud Research
>>> Champalimaud Center for the Unknown
>>> Av. Brasília, Doca de Pedrouços
>>> 1400-038 Lisbon, Portugal
>>> fchampalimaud.org 
>>>
>>>
>>> Artem Russakovskii  escreveu no dia sábado,
>>> 9/02/2019 à(s) 22:18:
>>>
 Alright. I've enabled core-dumping (hopefully), so now I'm waiting for
 the next crash to see if it dumps a core for you guys to remotely debug.

 Then I can consider setting performance.write-behind to off and
 monitoring for further crashes.

 Sincerely,
 Artem

 --
 Founder, Android Police , APK Mirror
 , Illogical Robot LLC
 beerpla.net | +ArtemRussakovskii
  | @ArtemR
 


 On Fri, Feb 8, 2019 at 7:22 PM Raghavendra Gowdappa <
 rgowd...@redhat.com> wrote:

>
>
> On Sat, Feb 9, 2019 at 12:53 AM Artem Russakovskii <
> archon...@gmail.com> wrote:
>
>> Hi Nithya,
>>
>> I can try to disable write-behind as long as it doesn't heavily
>> impact performance for us. Which option is it exactly? I don't see it set
>> in my list of changed volume variables that I sent you guys earlier.
>>
>
> The option is performance.write-behind
>
>
>> Sincerely,
>> Artem
>>
>> --
>> Founder, Android Police , APK Mirror
>> , Illogical Robot LLC
>> beerpla.net | +ArtemRussakovskii
>>  | @ArtemR
>> 
>>
>>
>> On Fri, Feb 8, 2019 at 4:57 AM Nithya Balachandran <
>> nbala...@redhat.com> wrote:
>>
>>> Hi Artem,
>>>
>>> We have found the cause of one crash. Unfortunately we have not
>>> managed to reproduce the one you reported so we don't know if it is the
>>> same cause.
>>>
>>> Can you disable write-behind on the volume and let us know if it
>>> solves the problem? If yes, it is likely to be the same issue.
>>>
>>>
>>> regards,
>>> Nithya
>>>
>>> On Fri, 8 Feb 2019 at 06:51, Artem Russakovskii 
>>> wrote:
>>>
 Sorry to disappoint, but the crash just happened again, so
 lru-limit=0 didn't help.

 Here's the snippet of the crash and the subsequent remount by monit.


 [2019-02-08 01:13:05.854391] W [dict.c:761:dict_ref]
 (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
 [0x7f4402b99329]
 -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
 [0x7f4402daaaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
 [0x7f440b6b5218] ) 0-dict: dict is NULL [In
 valid argument]
 The message "I [MSGID: 108031]
 [afr-common.c:2543:afr_local_discovery_cbk] 0-_data1-replicate-0:
 selecting local read_child _data1-client-3" repeated 39 times 
 between
 [2019-02-08 01:11:18.043286] and [2019-02-08 01:13:07.915604]
 The message "E [MSGID: 101191]
 [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to 
 dispatch
 handler" repeated 515 times between [2019-02-08 01:11:17.932515] and
 [2019-02-08 01:13:09.311554]
 pending frames:
 frame : type(1) op(LOOKUP)
 frame : type(0) op(0)
 patchset: git://git.gluster.org/glusterfs.git
 signal received: 6
 time of crash:
 2019-02-08 01:13:09
 configuration details:
 argp 1
 backtrace 1
 dlfcn 1
 libpthread 1
 llistxattr 1
 setfsid 1
 spinlock 1
 epoll.h 1
 xattr.h 1

Re: [Gluster-users] Message repeated over and over after upgrade from 4.1 to 5.3: W [dict.c:761:dict_ref] (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) [0x7fd966fcd329] -->/us

2019-02-11 Thread Artem Russakovskii
Great job identifying the issue!

Any ETA on the next release with the logging and crash fixes in it?

On Mon, Feb 11, 2019, 7:19 PM Raghavendra Gowdappa 
wrote:

>
>
> On Mon, Feb 11, 2019 at 3:49 PM João Baúto <
> joao.ba...@neuro.fchampalimaud.org> wrote:
>
>> Although I don't have these error messages, I'm having fuse crashes as
>> frequent as you. I have disabled write-behind and the mount has been
>> running over the weekend with heavy usage and no issues.
>>
>
> The issue you are facing will likely be fixed by patch [1]. Me, Xavi and
> Nithya were able to identify the corruption in write-behind.
>
> [1] https://review.gluster.org/22189
>
>
>> I can provide coredumps before disabling write-behind if needed. I opened
>> a BZ report  with
>> the crashes that I was having.
>>
>> *João Baúto*
>> ---
>>
>> *Scientific Computing and Software Platform*
>> Champalimaud Research
>> Champalimaud Center for the Unknown
>> Av. Brasília, Doca de Pedrouços
>> 1400-038 Lisbon, Portugal
>> fchampalimaud.org 
>>
>>
>> Artem Russakovskii  escreveu no dia sábado,
>> 9/02/2019 à(s) 22:18:
>>
>>> Alright. I've enabled core-dumping (hopefully), so now I'm waiting for
>>> the next crash to see if it dumps a core for you guys to remotely debug.
>>>
>>> Then I can consider setting performance.write-behind to off and
>>> monitoring for further crashes.
>>>
>>> Sincerely,
>>> Artem
>>>
>>> --
>>> Founder, Android Police , APK Mirror
>>> , Illogical Robot LLC
>>> beerpla.net | +ArtemRussakovskii
>>>  | @ArtemR
>>> 
>>>
>>>
>>> On Fri, Feb 8, 2019 at 7:22 PM Raghavendra Gowdappa 
>>> wrote:
>>>


 On Sat, Feb 9, 2019 at 12:53 AM Artem Russakovskii 
 wrote:

> Hi Nithya,
>
> I can try to disable write-behind as long as it doesn't heavily impact
> performance for us. Which option is it exactly? I don't see it set in my
> list of changed volume variables that I sent you guys earlier.
>

 The option is performance.write-behind


> Sincerely,
> Artem
>
> --
> Founder, Android Police , APK Mirror
> , Illogical Robot LLC
> beerpla.net | +ArtemRussakovskii
>  | @ArtemR
> 
>
>
> On Fri, Feb 8, 2019 at 4:57 AM Nithya Balachandran <
> nbala...@redhat.com> wrote:
>
>> Hi Artem,
>>
>> We have found the cause of one crash. Unfortunately we have not
>> managed to reproduce the one you reported so we don't know if it is the
>> same cause.
>>
>> Can you disable write-behind on the volume and let us know if it
>> solves the problem? If yes, it is likely to be the same issue.
>>
>>
>> regards,
>> Nithya
>>
>> On Fri, 8 Feb 2019 at 06:51, Artem Russakovskii 
>> wrote:
>>
>>> Sorry to disappoint, but the crash just happened again, so
>>> lru-limit=0 didn't help.
>>>
>>> Here's the snippet of the crash and the subsequent remount by monit.
>>>
>>>
>>> [2019-02-08 01:13:05.854391] W [dict.c:761:dict_ref]
>>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
>>> [0x7f4402b99329]
>>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
>>> [0x7f4402daaaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
>>> [0x7f440b6b5218] ) 0-dict: dict is NULL [In
>>> valid argument]
>>> The message "I [MSGID: 108031]
>>> [afr-common.c:2543:afr_local_discovery_cbk] 0-_data1-replicate-0:
>>> selecting local read_child _data1-client-3" repeated 39 times 
>>> between
>>> [2019-02-08 01:11:18.043286] and [2019-02-08 01:13:07.915604]
>>> The message "E [MSGID: 101191]
>>> [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to 
>>> dispatch
>>> handler" repeated 515 times between [2019-02-08 01:11:17.932515] and
>>> [2019-02-08 01:13:09.311554]
>>> pending frames:
>>> frame : type(1) op(LOOKUP)
>>> frame : type(0) op(0)
>>> patchset: git://git.gluster.org/glusterfs.git
>>> signal received: 6
>>> time of crash:
>>> 2019-02-08 01:13:09
>>> configuration details:
>>> argp 1
>>> backtrace 1
>>> dlfcn 1
>>> libpthread 1
>>> llistxattr 1
>>> setfsid 1
>>> spinlock 1
>>> epoll.h 1
>>> xattr.h 1
>>> st_atim.tv_nsec 1
>>> package-string: glusterfs 5.3
>>> /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7f440b6c064c]
>>> /usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7f440b6cacb6]
>>> /lib64/libc.so.6(+0x36160)[0x7f440a887160]
>>> /lib64/libc.so.6(gsignal+0x110)[0x7f440a8870e0]
>>> /lib64/libc.so.6(abort+0x151)[0x7f440a8886c1]

Re: [Gluster-users] Message repeated over and over after upgrade from 4.1 to 5.3: W [dict.c:761:dict_ref] (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) [0x7fd966fcd329] -->/us

2019-02-11 Thread Raghavendra Gowdappa
On Mon, Feb 11, 2019 at 3:49 PM João Baúto <
joao.ba...@neuro.fchampalimaud.org> wrote:

> Although I don't have these error messages, I'm having fuse crashes as
> frequent as you. I have disabled write-behind and the mount has been
> running over the weekend with heavy usage and no issues.
>
> I can provide coredumps before disabling write-behind if needed. I opened
> a BZ report  with
> the crashes that I was having.
>

I've created a bug 
and marked it as a blocker for release-6. I've marked bz 1671014 as a
duplicate of this bug report on master. If you disagree about the bug you
filed being a duplicate, please reopen.


> *João Baúto*
> ---
>
> *Scientific Computing and Software Platform*
> Champalimaud Research
> Champalimaud Center for the Unknown
> Av. Brasília, Doca de Pedrouços
> 1400-038 Lisbon, Portugal
> fchampalimaud.org 
>
>
> Artem Russakovskii  escreveu no dia sábado,
> 9/02/2019 à(s) 22:18:
>
>> Alright. I've enabled core-dumping (hopefully), so now I'm waiting for
>> the next crash to see if it dumps a core for you guys to remotely debug.
>>
>> Then I can consider setting performance.write-behind to off and
>> monitoring for further crashes.
>>
>> Sincerely,
>> Artem
>>
>> --
>> Founder, Android Police , APK Mirror
>> , Illogical Robot LLC
>> beerpla.net | +ArtemRussakovskii
>>  | @ArtemR
>> 
>>
>>
>> On Fri, Feb 8, 2019 at 7:22 PM Raghavendra Gowdappa 
>> wrote:
>>
>>>
>>>
>>> On Sat, Feb 9, 2019 at 12:53 AM Artem Russakovskii 
>>> wrote:
>>>
 Hi Nithya,

 I can try to disable write-behind as long as it doesn't heavily impact
 performance for us. Which option is it exactly? I don't see it set in my
 list of changed volume variables that I sent you guys earlier.

>>>
>>> The option is performance.write-behind
>>>
>>>
 Sincerely,
 Artem

 --
 Founder, Android Police , APK Mirror
 , Illogical Robot LLC
 beerpla.net | +ArtemRussakovskii
  | @ArtemR
 


 On Fri, Feb 8, 2019 at 4:57 AM Nithya Balachandran 
 wrote:

> Hi Artem,
>
> We have found the cause of one crash. Unfortunately we have not
> managed to reproduce the one you reported so we don't know if it is the
> same cause.
>
> Can you disable write-behind on the volume and let us know if it
> solves the problem? If yes, it is likely to be the same issue.
>
>
> regards,
> Nithya
>
> On Fri, 8 Feb 2019 at 06:51, Artem Russakovskii 
> wrote:
>
>> Sorry to disappoint, but the crash just happened again, so
>> lru-limit=0 didn't help.
>>
>> Here's the snippet of the crash and the subsequent remount by monit.
>>
>>
>> [2019-02-08 01:13:05.854391] W [dict.c:761:dict_ref]
>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
>> [0x7f4402b99329]
>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
>> [0x7f4402daaaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
>> [0x7f440b6b5218] ) 0-dict: dict is NULL [In
>> valid argument]
>> The message "I [MSGID: 108031]
>> [afr-common.c:2543:afr_local_discovery_cbk] 0-_data1-replicate-0:
>> selecting local read_child _data1-client-3" repeated 39 times 
>> between
>> [2019-02-08 01:11:18.043286] and [2019-02-08 01:13:07.915604]
>> The message "E [MSGID: 101191]
>> [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to 
>> dispatch
>> handler" repeated 515 times between [2019-02-08 01:11:17.932515] and
>> [2019-02-08 01:13:09.311554]
>> pending frames:
>> frame : type(1) op(LOOKUP)
>> frame : type(0) op(0)
>> patchset: git://git.gluster.org/glusterfs.git
>> signal received: 6
>> time of crash:
>> 2019-02-08 01:13:09
>> configuration details:
>> argp 1
>> backtrace 1
>> dlfcn 1
>> libpthread 1
>> llistxattr 1
>> setfsid 1
>> spinlock 1
>> epoll.h 1
>> xattr.h 1
>> st_atim.tv_nsec 1
>> package-string: glusterfs 5.3
>> /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7f440b6c064c]
>> /usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7f440b6cacb6]
>> /lib64/libc.so.6(+0x36160)[0x7f440a887160]
>> /lib64/libc.so.6(gsignal+0x110)[0x7f440a8870e0]
>> /lib64/libc.so.6(abort+0x151)[0x7f440a8886c1]
>> /lib64/libc.so.6(+0x2e6fa)[0x7f440a87f6fa]
>> /lib64/libc.so.6(+0x2e772)[0x7f440a87f772]
>> /lib64/libpthread.so.0(pthread_mutex_lock+0x228)[0x7f440ac150b8]
>>
>> 

Re: [Gluster-users] Message repeated over and over after upgrade from 4.1 to 5.3: W [dict.c:761:dict_ref] (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) [0x7fd966fcd329] -->/us

2019-02-11 Thread Raghavendra Gowdappa
On Mon, Feb 11, 2019 at 3:49 PM João Baúto <
joao.ba...@neuro.fchampalimaud.org> wrote:

> Although I don't have these error messages, I'm having fuse crashes as
> frequent as you. I have disabled write-behind and the mount has been
> running over the weekend with heavy usage and no issues.
>

The issue you are facing will likely be fixed by patch [1]. Me, Xavi and
Nithya were able to identify the corruption in write-behind.

[1] https://review.gluster.org/22189


> I can provide coredumps before disabling write-behind if needed. I opened
> a BZ report  with
> the crashes that I was having.
>
> *João Baúto*
> ---
>
> *Scientific Computing and Software Platform*
> Champalimaud Research
> Champalimaud Center for the Unknown
> Av. Brasília, Doca de Pedrouços
> 1400-038 Lisbon, Portugal
> fchampalimaud.org 
>
>
> Artem Russakovskii  escreveu no dia sábado,
> 9/02/2019 à(s) 22:18:
>
>> Alright. I've enabled core-dumping (hopefully), so now I'm waiting for
>> the next crash to see if it dumps a core for you guys to remotely debug.
>>
>> Then I can consider setting performance.write-behind to off and
>> monitoring for further crashes.
>>
>> Sincerely,
>> Artem
>>
>> --
>> Founder, Android Police , APK Mirror
>> , Illogical Robot LLC
>> beerpla.net | +ArtemRussakovskii
>>  | @ArtemR
>> 
>>
>>
>> On Fri, Feb 8, 2019 at 7:22 PM Raghavendra Gowdappa 
>> wrote:
>>
>>>
>>>
>>> On Sat, Feb 9, 2019 at 12:53 AM Artem Russakovskii 
>>> wrote:
>>>
 Hi Nithya,

 I can try to disable write-behind as long as it doesn't heavily impact
 performance for us. Which option is it exactly? I don't see it set in my
 list of changed volume variables that I sent you guys earlier.

>>>
>>> The option is performance.write-behind
>>>
>>>
 Sincerely,
 Artem

 --
 Founder, Android Police , APK Mirror
 , Illogical Robot LLC
 beerpla.net | +ArtemRussakovskii
  | @ArtemR
 


 On Fri, Feb 8, 2019 at 4:57 AM Nithya Balachandran 
 wrote:

> Hi Artem,
>
> We have found the cause of one crash. Unfortunately we have not
> managed to reproduce the one you reported so we don't know if it is the
> same cause.
>
> Can you disable write-behind on the volume and let us know if it
> solves the problem? If yes, it is likely to be the same issue.
>
>
> regards,
> Nithya
>
> On Fri, 8 Feb 2019 at 06:51, Artem Russakovskii 
> wrote:
>
>> Sorry to disappoint, but the crash just happened again, so
>> lru-limit=0 didn't help.
>>
>> Here's the snippet of the crash and the subsequent remount by monit.
>>
>>
>> [2019-02-08 01:13:05.854391] W [dict.c:761:dict_ref]
>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
>> [0x7f4402b99329]
>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
>> [0x7f4402daaaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
>> [0x7f440b6b5218] ) 0-dict: dict is NULL [In
>> valid argument]
>> The message "I [MSGID: 108031]
>> [afr-common.c:2543:afr_local_discovery_cbk] 0-_data1-replicate-0:
>> selecting local read_child _data1-client-3" repeated 39 times 
>> between
>> [2019-02-08 01:11:18.043286] and [2019-02-08 01:13:07.915604]
>> The message "E [MSGID: 101191]
>> [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to 
>> dispatch
>> handler" repeated 515 times between [2019-02-08 01:11:17.932515] and
>> [2019-02-08 01:13:09.311554]
>> pending frames:
>> frame : type(1) op(LOOKUP)
>> frame : type(0) op(0)
>> patchset: git://git.gluster.org/glusterfs.git
>> signal received: 6
>> time of crash:
>> 2019-02-08 01:13:09
>> configuration details:
>> argp 1
>> backtrace 1
>> dlfcn 1
>> libpthread 1
>> llistxattr 1
>> setfsid 1
>> spinlock 1
>> epoll.h 1
>> xattr.h 1
>> st_atim.tv_nsec 1
>> package-string: glusterfs 5.3
>> /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7f440b6c064c]
>> /usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7f440b6cacb6]
>> /lib64/libc.so.6(+0x36160)[0x7f440a887160]
>> /lib64/libc.so.6(gsignal+0x110)[0x7f440a8870e0]
>> /lib64/libc.so.6(abort+0x151)[0x7f440a8886c1]
>> /lib64/libc.so.6(+0x2e6fa)[0x7f440a87f6fa]
>> /lib64/libc.so.6(+0x2e772)[0x7f440a87f772]
>> /lib64/libpthread.so.0(pthread_mutex_lock+0x228)[0x7f440ac150b8]
>>
>> /usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x5dc9d)[0x7f44036f8c9d]
>>
>> 

Re: [Gluster-users] Message repeated over and over after upgrade from 4.1 to 5.3: W [dict.c:761:dict_ref] (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) [0x7fd966fcd329] -->/us

2019-02-11 Thread João Baúto
Although I don't have these error messages, I'm having fuse crashes as
frequent as you. I have disabled write-behind and the mount has been
running over the weekend with heavy usage and no issues.

I can provide coredumps before disabling write-behind if needed. I opened a BZ
report  with the
crashes that I was having.

*João Baúto*
---

*Scientific Computing and Software Platform*
Champalimaud Research
Champalimaud Center for the Unknown
Av. Brasília, Doca de Pedrouços
1400-038 Lisbon, Portugal
fchampalimaud.org 


Artem Russakovskii  escreveu no dia sábado, 9/02/2019
à(s) 22:18:

> Alright. I've enabled core-dumping (hopefully), so now I'm waiting for the
> next crash to see if it dumps a core for you guys to remotely debug.
>
> Then I can consider setting performance.write-behind to off and monitoring
> for further crashes.
>
> Sincerely,
> Artem
>
> --
> Founder, Android Police , APK Mirror
> , Illogical Robot LLC
> beerpla.net | +ArtemRussakovskii
>  | @ArtemR
> 
>
>
> On Fri, Feb 8, 2019 at 7:22 PM Raghavendra Gowdappa 
> wrote:
>
>>
>>
>> On Sat, Feb 9, 2019 at 12:53 AM Artem Russakovskii 
>> wrote:
>>
>>> Hi Nithya,
>>>
>>> I can try to disable write-behind as long as it doesn't heavily impact
>>> performance for us. Which option is it exactly? I don't see it set in my
>>> list of changed volume variables that I sent you guys earlier.
>>>
>>
>> The option is performance.write-behind
>>
>>
>>> Sincerely,
>>> Artem
>>>
>>> --
>>> Founder, Android Police , APK Mirror
>>> , Illogical Robot LLC
>>> beerpla.net | +ArtemRussakovskii
>>>  | @ArtemR
>>> 
>>>
>>>
>>> On Fri, Feb 8, 2019 at 4:57 AM Nithya Balachandran 
>>> wrote:
>>>
 Hi Artem,

 We have found the cause of one crash. Unfortunately we have not managed
 to reproduce the one you reported so we don't know if it is the same cause.

 Can you disable write-behind on the volume and let us know if it solves
 the problem? If yes, it is likely to be the same issue.


 regards,
 Nithya

 On Fri, 8 Feb 2019 at 06:51, Artem Russakovskii 
 wrote:

> Sorry to disappoint, but the crash just happened again, so lru-limit=0
> didn't help.
>
> Here's the snippet of the crash and the subsequent remount by monit.
>
>
> [2019-02-08 01:13:05.854391] W [dict.c:761:dict_ref]
> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
> [0x7f4402b99329]
> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
> [0x7f4402daaaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
> [0x7f440b6b5218] ) 0-dict: dict is NULL [In
> valid argument]
> The message "I [MSGID: 108031]
> [afr-common.c:2543:afr_local_discovery_cbk] 0-_data1-replicate-0:
> selecting local read_child _data1-client-3" repeated 39 times 
> between
> [2019-02-08 01:11:18.043286] and [2019-02-08 01:13:07.915604]
> The message "E [MSGID: 101191]
> [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to 
> dispatch
> handler" repeated 515 times between [2019-02-08 01:11:17.932515] and
> [2019-02-08 01:13:09.311554]
> pending frames:
> frame : type(1) op(LOOKUP)
> frame : type(0) op(0)
> patchset: git://git.gluster.org/glusterfs.git
> signal received: 6
> time of crash:
> 2019-02-08 01:13:09
> configuration details:
> argp 1
> backtrace 1
> dlfcn 1
> libpthread 1
> llistxattr 1
> setfsid 1
> spinlock 1
> epoll.h 1
> xattr.h 1
> st_atim.tv_nsec 1
> package-string: glusterfs 5.3
> /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7f440b6c064c]
> /usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7f440b6cacb6]
> /lib64/libc.so.6(+0x36160)[0x7f440a887160]
> /lib64/libc.so.6(gsignal+0x110)[0x7f440a8870e0]
> /lib64/libc.so.6(abort+0x151)[0x7f440a8886c1]
> /lib64/libc.so.6(+0x2e6fa)[0x7f440a87f6fa]
> /lib64/libc.so.6(+0x2e772)[0x7f440a87f772]
> /lib64/libpthread.so.0(pthread_mutex_lock+0x228)[0x7f440ac150b8]
>
> /usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x5dc9d)[0x7f44036f8c9d]
>
> /usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x70ba1)[0x7f440370bba1]
>
> /usr/lib64/glusterfs/5.3/xlator/protocol/client.so(+0x58f3f)[0x7f4403990f3f]
> /usr/lib64/libgfrpc.so.0(+0xe820)[0x7f440b48b820]
> /usr/lib64/libgfrpc.so.0(+0xeb6f)[0x7f440b48bb6f]
> /usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7f440b488063]
>
> /usr/lib64/glusterfs/5.3/rpc-transport/socket.so(+0xa0b2)[0x7f44050a80b2]
> 

Re: [Gluster-users] Message repeated over and over after upgrade from 4.1 to 5.3: W [dict.c:761:dict_ref] (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) [0x7fd966fcd329] -->/us

2019-02-09 Thread Artem Russakovskii
Alright. I've enabled core-dumping (hopefully), so now I'm waiting for the
next crash to see if it dumps a core for you guys to remotely debug.

Then I can consider setting performance.write-behind to off and monitoring
for further crashes.

Sincerely,
Artem

--
Founder, Android Police , APK Mirror
, Illogical Robot LLC
beerpla.net | +ArtemRussakovskii
 | @ArtemR



On Fri, Feb 8, 2019 at 7:22 PM Raghavendra Gowdappa 
wrote:

>
>
> On Sat, Feb 9, 2019 at 12:53 AM Artem Russakovskii 
> wrote:
>
>> Hi Nithya,
>>
>> I can try to disable write-behind as long as it doesn't heavily impact
>> performance for us. Which option is it exactly? I don't see it set in my
>> list of changed volume variables that I sent you guys earlier.
>>
>
> The option is performance.write-behind
>
>
>> Sincerely,
>> Artem
>>
>> --
>> Founder, Android Police , APK Mirror
>> , Illogical Robot LLC
>> beerpla.net | +ArtemRussakovskii
>>  | @ArtemR
>> 
>>
>>
>> On Fri, Feb 8, 2019 at 4:57 AM Nithya Balachandran 
>> wrote:
>>
>>> Hi Artem,
>>>
>>> We have found the cause of one crash. Unfortunately we have not managed
>>> to reproduce the one you reported so we don't know if it is the same cause.
>>>
>>> Can you disable write-behind on the volume and let us know if it solves
>>> the problem? If yes, it is likely to be the same issue.
>>>
>>>
>>> regards,
>>> Nithya
>>>
>>> On Fri, 8 Feb 2019 at 06:51, Artem Russakovskii 
>>> wrote:
>>>
 Sorry to disappoint, but the crash just happened again, so lru-limit=0
 didn't help.

 Here's the snippet of the crash and the subsequent remount by monit.


 [2019-02-08 01:13:05.854391] W [dict.c:761:dict_ref]
 (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
 [0x7f4402b99329]
 -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
 [0x7f4402daaaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
 [0x7f440b6b5218] ) 0-dict: dict is NULL [In
 valid argument]
 The message "I [MSGID: 108031]
 [afr-common.c:2543:afr_local_discovery_cbk] 0-_data1-replicate-0:
 selecting local read_child _data1-client-3" repeated 39 times between
 [2019-02-08 01:11:18.043286] and [2019-02-08 01:13:07.915604]
 The message "E [MSGID: 101191]
 [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch
 handler" repeated 515 times between [2019-02-08 01:11:17.932515] and
 [2019-02-08 01:13:09.311554]
 pending frames:
 frame : type(1) op(LOOKUP)
 frame : type(0) op(0)
 patchset: git://git.gluster.org/glusterfs.git
 signal received: 6
 time of crash:
 2019-02-08 01:13:09
 configuration details:
 argp 1
 backtrace 1
 dlfcn 1
 libpthread 1
 llistxattr 1
 setfsid 1
 spinlock 1
 epoll.h 1
 xattr.h 1
 st_atim.tv_nsec 1
 package-string: glusterfs 5.3
 /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7f440b6c064c]
 /usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7f440b6cacb6]
 /lib64/libc.so.6(+0x36160)[0x7f440a887160]
 /lib64/libc.so.6(gsignal+0x110)[0x7f440a8870e0]
 /lib64/libc.so.6(abort+0x151)[0x7f440a8886c1]
 /lib64/libc.so.6(+0x2e6fa)[0x7f440a87f6fa]
 /lib64/libc.so.6(+0x2e772)[0x7f440a87f772]
 /lib64/libpthread.so.0(pthread_mutex_lock+0x228)[0x7f440ac150b8]

 /usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x5dc9d)[0x7f44036f8c9d]

 /usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x70ba1)[0x7f440370bba1]

 /usr/lib64/glusterfs/5.3/xlator/protocol/client.so(+0x58f3f)[0x7f4403990f3f]
 /usr/lib64/libgfrpc.so.0(+0xe820)[0x7f440b48b820]
 /usr/lib64/libgfrpc.so.0(+0xeb6f)[0x7f440b48bb6f]
 /usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7f440b488063]

 /usr/lib64/glusterfs/5.3/rpc-transport/socket.so(+0xa0b2)[0x7f44050a80b2]
 /usr/lib64/libglusterfs.so.0(+0x854c3)[0x7f440b71e4c3]
 /lib64/libpthread.so.0(+0x7559)[0x7f440ac12559]
 /lib64/libc.so.6(clone+0x3f)[0x7f440a94981f]
 -
 [2019-02-08 01:13:35.628478] I [MSGID: 100030] [glusterfsd.c:2715:main]
 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 5.3
 (args: /usr/sbin/glusterfs --lru-limit=0 --process-name fuse
 --volfile-server=localhost --volfile-id=/_data1 /mnt/_data1)
 [2019-02-08 01:13:35.637830] I [MSGID: 101190]
 [event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll: Started thread
 with index 1
 [2019-02-08 01:13:35.651405] I [MSGID: 101190]
 [event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll: Started thread
 with index 2
 [2019-02-08 01:13:35.651628] I [MSGID: 101190]
 [event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll: Started thread

Re: [Gluster-users] Message repeated over and over after upgrade from 4.1 to 5.3: W [dict.c:761:dict_ref] (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) [0x7fd966fcd329] -->/us

2019-02-08 Thread Raghavendra Gowdappa
On Sat, Feb 9, 2019 at 12:53 AM Artem Russakovskii 
wrote:

> Hi Nithya,
>
> I can try to disable write-behind as long as it doesn't heavily impact
> performance for us. Which option is it exactly? I don't see it set in my
> list of changed volume variables that I sent you guys earlier.
>

The option is performance.write-behind


> Sincerely,
> Artem
>
> --
> Founder, Android Police , APK Mirror
> , Illogical Robot LLC
> beerpla.net | +ArtemRussakovskii
>  | @ArtemR
> 
>
>
> On Fri, Feb 8, 2019 at 4:57 AM Nithya Balachandran 
> wrote:
>
>> Hi Artem,
>>
>> We have found the cause of one crash. Unfortunately we have not managed
>> to reproduce the one you reported so we don't know if it is the same cause.
>>
>> Can you disable write-behind on the volume and let us know if it solves
>> the problem? If yes, it is likely to be the same issue.
>>
>>
>> regards,
>> Nithya
>>
>> On Fri, 8 Feb 2019 at 06:51, Artem Russakovskii 
>> wrote:
>>
>>> Sorry to disappoint, but the crash just happened again, so lru-limit=0
>>> didn't help.
>>>
>>> Here's the snippet of the crash and the subsequent remount by monit.
>>>
>>>
>>> [2019-02-08 01:13:05.854391] W [dict.c:761:dict_ref]
>>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
>>> [0x7f4402b99329]
>>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
>>> [0x7f4402daaaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
>>> [0x7f440b6b5218] ) 0-dict: dict is NULL [In
>>> valid argument]
>>> The message "I [MSGID: 108031]
>>> [afr-common.c:2543:afr_local_discovery_cbk] 0-_data1-replicate-0:
>>> selecting local read_child _data1-client-3" repeated 39 times between
>>> [2019-02-08 01:11:18.043286] and [2019-02-08 01:13:07.915604]
>>> The message "E [MSGID: 101191]
>>> [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch
>>> handler" repeated 515 times between [2019-02-08 01:11:17.932515] and
>>> [2019-02-08 01:13:09.311554]
>>> pending frames:
>>> frame : type(1) op(LOOKUP)
>>> frame : type(0) op(0)
>>> patchset: git://git.gluster.org/glusterfs.git
>>> signal received: 6
>>> time of crash:
>>> 2019-02-08 01:13:09
>>> configuration details:
>>> argp 1
>>> backtrace 1
>>> dlfcn 1
>>> libpthread 1
>>> llistxattr 1
>>> setfsid 1
>>> spinlock 1
>>> epoll.h 1
>>> xattr.h 1
>>> st_atim.tv_nsec 1
>>> package-string: glusterfs 5.3
>>> /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7f440b6c064c]
>>> /usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7f440b6cacb6]
>>> /lib64/libc.so.6(+0x36160)[0x7f440a887160]
>>> /lib64/libc.so.6(gsignal+0x110)[0x7f440a8870e0]
>>> /lib64/libc.so.6(abort+0x151)[0x7f440a8886c1]
>>> /lib64/libc.so.6(+0x2e6fa)[0x7f440a87f6fa]
>>> /lib64/libc.so.6(+0x2e772)[0x7f440a87f772]
>>> /lib64/libpthread.so.0(pthread_mutex_lock+0x228)[0x7f440ac150b8]
>>>
>>> /usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x5dc9d)[0x7f44036f8c9d]
>>>
>>> /usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x70ba1)[0x7f440370bba1]
>>>
>>> /usr/lib64/glusterfs/5.3/xlator/protocol/client.so(+0x58f3f)[0x7f4403990f3f]
>>> /usr/lib64/libgfrpc.so.0(+0xe820)[0x7f440b48b820]
>>> /usr/lib64/libgfrpc.so.0(+0xeb6f)[0x7f440b48bb6f]
>>> /usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7f440b488063]
>>> /usr/lib64/glusterfs/5.3/rpc-transport/socket.so(+0xa0b2)[0x7f44050a80b2]
>>> /usr/lib64/libglusterfs.so.0(+0x854c3)[0x7f440b71e4c3]
>>> /lib64/libpthread.so.0(+0x7559)[0x7f440ac12559]
>>> /lib64/libc.so.6(clone+0x3f)[0x7f440a94981f]
>>> -
>>> [2019-02-08 01:13:35.628478] I [MSGID: 100030] [glusterfsd.c:2715:main]
>>> 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 5.3
>>> (args: /usr/sbin/glusterfs --lru-limit=0 --process-name fuse
>>> --volfile-server=localhost --volfile-id=/_data1 /mnt/_data1)
>>> [2019-02-08 01:13:35.637830] I [MSGID: 101190]
>>> [event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll: Started thread
>>> with index 1
>>> [2019-02-08 01:13:35.651405] I [MSGID: 101190]
>>> [event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll: Started thread
>>> with index 2
>>> [2019-02-08 01:13:35.651628] I [MSGID: 101190]
>>> [event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll: Started thread
>>> with index 3
>>> [2019-02-08 01:13:35.651747] I [MSGID: 101190]
>>> [event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll: Started thread
>>> with index 4
>>> [2019-02-08 01:13:35.652575] I [MSGID: 114020] [client.c:2354:notify]
>>> 0-_data1-client-0: parent translators are ready, attempting connect
>>> on transport
>>> [2019-02-08 01:13:35.652978] I [MSGID: 114020] [client.c:2354:notify]
>>> 0-_data1-client-1: parent translators are ready, attempting connect
>>> on transport
>>> [2019-02-08 01:13:35.655197] I [MSGID: 114020] [client.c:2354:notify]
>>> 0-_data1-client-2: parent translators are ready, attempting connect
>>> on transport
>>> [2019-02-08 01:13:35.655497] I 

Re: [Gluster-users] Message repeated over and over after upgrade from 4.1 to 5.3: W [dict.c:761:dict_ref] (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) [0x7fd966fcd329] -->/us

2019-02-08 Thread Artem Russakovskii
Hi Nithya,

I can try to disable write-behind as long as it doesn't heavily impact
performance for us. Which option is it exactly? I don't see it set in my
list of changed volume variables that I sent you guys earlier.

Sincerely,
Artem

--
Founder, Android Police , APK Mirror
, Illogical Robot LLC
beerpla.net | +ArtemRussakovskii
 | @ArtemR



On Fri, Feb 8, 2019 at 4:57 AM Nithya Balachandran 
wrote:

> Hi Artem,
>
> We have found the cause of one crash. Unfortunately we have not managed to
> reproduce the one you reported so we don't know if it is the same cause.
>
> Can you disable write-behind on the volume and let us know if it solves
> the problem? If yes, it is likely to be the same issue.
>
>
> regards,
> Nithya
>
> On Fri, 8 Feb 2019 at 06:51, Artem Russakovskii 
> wrote:
>
>> Sorry to disappoint, but the crash just happened again, so lru-limit=0
>> didn't help.
>>
>> Here's the snippet of the crash and the subsequent remount by monit.
>>
>>
>> [2019-02-08 01:13:05.854391] W [dict.c:761:dict_ref]
>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
>> [0x7f4402b99329]
>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
>> [0x7f4402daaaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
>> [0x7f440b6b5218] ) 0-dict: dict is NULL [In
>> valid argument]
>> The message "I [MSGID: 108031]
>> [afr-common.c:2543:afr_local_discovery_cbk] 0-_data1-replicate-0:
>> selecting local read_child _data1-client-3" repeated 39 times between
>> [2019-02-08 01:11:18.043286] and [2019-02-08 01:13:07.915604]
>> The message "E [MSGID: 101191]
>> [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch
>> handler" repeated 515 times between [2019-02-08 01:11:17.932515] and
>> [2019-02-08 01:13:09.311554]
>> pending frames:
>> frame : type(1) op(LOOKUP)
>> frame : type(0) op(0)
>> patchset: git://git.gluster.org/glusterfs.git
>> signal received: 6
>> time of crash:
>> 2019-02-08 01:13:09
>> configuration details:
>> argp 1
>> backtrace 1
>> dlfcn 1
>> libpthread 1
>> llistxattr 1
>> setfsid 1
>> spinlock 1
>> epoll.h 1
>> xattr.h 1
>> st_atim.tv_nsec 1
>> package-string: glusterfs 5.3
>> /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7f440b6c064c]
>> /usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7f440b6cacb6]
>> /lib64/libc.so.6(+0x36160)[0x7f440a887160]
>> /lib64/libc.so.6(gsignal+0x110)[0x7f440a8870e0]
>> /lib64/libc.so.6(abort+0x151)[0x7f440a8886c1]
>> /lib64/libc.so.6(+0x2e6fa)[0x7f440a87f6fa]
>> /lib64/libc.so.6(+0x2e772)[0x7f440a87f772]
>> /lib64/libpthread.so.0(pthread_mutex_lock+0x228)[0x7f440ac150b8]
>>
>> /usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x5dc9d)[0x7f44036f8c9d]
>>
>> /usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x70ba1)[0x7f440370bba1]
>>
>> /usr/lib64/glusterfs/5.3/xlator/protocol/client.so(+0x58f3f)[0x7f4403990f3f]
>> /usr/lib64/libgfrpc.so.0(+0xe820)[0x7f440b48b820]
>> /usr/lib64/libgfrpc.so.0(+0xeb6f)[0x7f440b48bb6f]
>> /usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7f440b488063]
>> /usr/lib64/glusterfs/5.3/rpc-transport/socket.so(+0xa0b2)[0x7f44050a80b2]
>> /usr/lib64/libglusterfs.so.0(+0x854c3)[0x7f440b71e4c3]
>> /lib64/libpthread.so.0(+0x7559)[0x7f440ac12559]
>> /lib64/libc.so.6(clone+0x3f)[0x7f440a94981f]
>> -
>> [2019-02-08 01:13:35.628478] I [MSGID: 100030] [glusterfsd.c:2715:main]
>> 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 5.3
>> (args: /usr/sbin/glusterfs --lru-limit=0 --process-name fuse
>> --volfile-server=localhost --volfile-id=/_data1 /mnt/_data1)
>> [2019-02-08 01:13:35.637830] I [MSGID: 101190]
>> [event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll: Started thread
>> with index 1
>> [2019-02-08 01:13:35.651405] I [MSGID: 101190]
>> [event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll: Started thread
>> with index 2
>> [2019-02-08 01:13:35.651628] I [MSGID: 101190]
>> [event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll: Started thread
>> with index 3
>> [2019-02-08 01:13:35.651747] I [MSGID: 101190]
>> [event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll: Started thread
>> with index 4
>> [2019-02-08 01:13:35.652575] I [MSGID: 114020] [client.c:2354:notify]
>> 0-_data1-client-0: parent translators are ready, attempting connect
>> on transport
>> [2019-02-08 01:13:35.652978] I [MSGID: 114020] [client.c:2354:notify]
>> 0-_data1-client-1: parent translators are ready, attempting connect
>> on transport
>> [2019-02-08 01:13:35.655197] I [MSGID: 114020] [client.c:2354:notify]
>> 0-_data1-client-2: parent translators are ready, attempting connect
>> on transport
>> [2019-02-08 01:13:35.655497] I [MSGID: 114020] [client.c:2354:notify]
>> 0-_data1-client-3: parent translators are ready, attempting connect
>> on transport
>> [2019-02-08 01:13:35.655527] I [rpc-clnt.c:2042:rpc_clnt_reconfig]
>> 0-_data1-client-0: changing port to 

Re: [Gluster-users] Message repeated over and over after upgrade from 4.1 to 5.3: W [dict.c:761:dict_ref] (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) [0x7fd966fcd329] -->/us

2019-02-08 Thread Nithya Balachandran
Hi Artem,

We have found the cause of one crash. Unfortunately we have not managed to
reproduce the one you reported so we don't know if it is the same cause.

Can you disable write-behind on the volume and let us know if it solves the
problem? If yes, it is likely to be the same issue.


regards,
Nithya

On Fri, 8 Feb 2019 at 06:51, Artem Russakovskii  wrote:

> Sorry to disappoint, but the crash just happened again, so lru-limit=0
> didn't help.
>
> Here's the snippet of the crash and the subsequent remount by monit.
>
>
> [2019-02-08 01:13:05.854391] W [dict.c:761:dict_ref]
> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
> [0x7f4402b99329]
> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
> [0x7f4402daaaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
> [0x7f440b6b5218] ) 0-dict: dict is NULL [In
> valid argument]
> The message "I [MSGID: 108031] [afr-common.c:2543:afr_local_discovery_cbk]
> 0-_data1-replicate-0: selecting local read_child
> _data1-client-3" repeated 39 times between [2019-02-08
> 01:11:18.043286] and [2019-02-08 01:13:07.915604]
> The message "E [MSGID: 101191]
> [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch
> handler" repeated 515 times between [2019-02-08 01:11:17.932515] and
> [2019-02-08 01:13:09.311554]
> pending frames:
> frame : type(1) op(LOOKUP)
> frame : type(0) op(0)
> patchset: git://git.gluster.org/glusterfs.git
> signal received: 6
> time of crash:
> 2019-02-08 01:13:09
> configuration details:
> argp 1
> backtrace 1
> dlfcn 1
> libpthread 1
> llistxattr 1
> setfsid 1
> spinlock 1
> epoll.h 1
> xattr.h 1
> st_atim.tv_nsec 1
> package-string: glusterfs 5.3
> /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7f440b6c064c]
> /usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7f440b6cacb6]
> /lib64/libc.so.6(+0x36160)[0x7f440a887160]
> /lib64/libc.so.6(gsignal+0x110)[0x7f440a8870e0]
> /lib64/libc.so.6(abort+0x151)[0x7f440a8886c1]
> /lib64/libc.so.6(+0x2e6fa)[0x7f440a87f6fa]
> /lib64/libc.so.6(+0x2e772)[0x7f440a87f772]
> /lib64/libpthread.so.0(pthread_mutex_lock+0x228)[0x7f440ac150b8]
>
> /usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x5dc9d)[0x7f44036f8c9d]
>
> /usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x70ba1)[0x7f440370bba1]
>
> /usr/lib64/glusterfs/5.3/xlator/protocol/client.so(+0x58f3f)[0x7f4403990f3f]
> /usr/lib64/libgfrpc.so.0(+0xe820)[0x7f440b48b820]
> /usr/lib64/libgfrpc.so.0(+0xeb6f)[0x7f440b48bb6f]
> /usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7f440b488063]
> /usr/lib64/glusterfs/5.3/rpc-transport/socket.so(+0xa0b2)[0x7f44050a80b2]
> /usr/lib64/libglusterfs.so.0(+0x854c3)[0x7f440b71e4c3]
> /lib64/libpthread.so.0(+0x7559)[0x7f440ac12559]
> /lib64/libc.so.6(clone+0x3f)[0x7f440a94981f]
> -
> [2019-02-08 01:13:35.628478] I [MSGID: 100030] [glusterfsd.c:2715:main]
> 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 5.3
> (args: /usr/sbin/glusterfs --lru-limit=0 --process-name fuse
> --volfile-server=localhost --volfile-id=/_data1 /mnt/_data1)
> [2019-02-08 01:13:35.637830] I [MSGID: 101190]
> [event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll: Started thread
> with index 1
> [2019-02-08 01:13:35.651405] I [MSGID: 101190]
> [event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll: Started thread
> with index 2
> [2019-02-08 01:13:35.651628] I [MSGID: 101190]
> [event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll: Started thread
> with index 3
> [2019-02-08 01:13:35.651747] I [MSGID: 101190]
> [event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll: Started thread
> with index 4
> [2019-02-08 01:13:35.652575] I [MSGID: 114020] [client.c:2354:notify]
> 0-_data1-client-0: parent translators are ready, attempting connect
> on transport
> [2019-02-08 01:13:35.652978] I [MSGID: 114020] [client.c:2354:notify]
> 0-_data1-client-1: parent translators are ready, attempting connect
> on transport
> [2019-02-08 01:13:35.655197] I [MSGID: 114020] [client.c:2354:notify]
> 0-_data1-client-2: parent translators are ready, attempting connect
> on transport
> [2019-02-08 01:13:35.655497] I [MSGID: 114020] [client.c:2354:notify]
> 0-_data1-client-3: parent translators are ready, attempting connect
> on transport
> [2019-02-08 01:13:35.655527] I [rpc-clnt.c:2042:rpc_clnt_reconfig]
> 0-_data1-client-0: changing port to 49153 (from 0)
> Final graph:
>
>
> Sincerely,
> Artem
>
> --
> Founder, Android Police , APK Mirror
> , Illogical Robot LLC
> beerpla.net | +ArtemRussakovskii
>  | @ArtemR
> 
>
>
> On Thu, Feb 7, 2019 at 1:28 PM Artem Russakovskii 
> wrote:
>
>> I've added the lru-limit=0 parameter to the mounts, and I see it's taken
>> effect correctly:
>> "/usr/sbin/glusterfs --lru-limit=0 --process-name fuse
>> --volfile-server=localhost --volfile-id=/  /mnt/"
>>
>> Let's see if it stops crashing or not.
>>
>> Sincerely,
>> Artem
>>

Re: [Gluster-users] Message repeated over and over after upgrade from 4.1 to 5.3: W [dict.c:761:dict_ref] (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) [0x7fd966fcd329] -->/us

2019-02-08 Thread Raghavendra Gowdappa
On Fri, Feb 8, 2019 at 8:50 AM Raghavendra Gowdappa 
wrote:

>
>
> On Fri, Feb 8, 2019 at 8:48 AM Raghavendra Gowdappa 
> wrote:
>
>> One possible reason could be
>> https://review.gluster.org/r/18b6d7ce7d490e807815270918a17a4b392a829d
>>
>
>  https://review.gluster.org/#/c/glusterfs/+/19997/
>

This patch is not in release-5.0 branch.


> as that changed some code in epoll handler. Though the change is largely
>> on server side, the epoll and socket changes are relevant for client too.
>> I'll try to see whether there is anything wrong with that.
>>
>> On Fri, Feb 8, 2019 at 8:36 AM Nithya Balachandran 
>> wrote:
>>
>>> Thanks Artem. Can you send us the coredump or the bt with symbols from
>>> the crash?
>>>
>>> Regards,
>>> Nithya
>>>
>>> On Fri, 8 Feb 2019 at 06:51, Artem Russakovskii 
>>> wrote:
>>>
 Sorry to disappoint, but the crash just happened again, so lru-limit=0
 didn't help.

 Here's the snippet of the crash and the subsequent remount by monit.


 [2019-02-08 01:13:05.854391] W [dict.c:761:dict_ref]
 (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
 [0x7f4402b99329]
 -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
 [0x7f4402daaaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
 [0x7f440b6b5218] ) 0-dict: dict is NULL [In
 valid argument]
 The message "I [MSGID: 108031]
 [afr-common.c:2543:afr_local_discovery_cbk] 0-_data1-replicate-0:
 selecting local read_child _data1-client-3" repeated 39 times between
 [2019-02-08 01:11:18.043286] and [2019-02-08 01:13:07.915604]
 The message "E [MSGID: 101191]
 [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch
 handler" repeated 515 times between [2019-02-08 01:11:17.932515] and
 [2019-02-08 01:13:09.311554]
 pending frames:
 frame : type(1) op(LOOKUP)
 frame : type(0) op(0)
 patchset: git://git.gluster.org/glusterfs.git
 signal received: 6
 time of crash:
 2019-02-08 01:13:09
 configuration details:
 argp 1
 backtrace 1
 dlfcn 1
 libpthread 1
 llistxattr 1
 setfsid 1
 spinlock 1
 epoll.h 1
 xattr.h 1
 st_atim.tv_nsec 1
 package-string: glusterfs 5.3
 /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7f440b6c064c]
 /usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7f440b6cacb6]
 /lib64/libc.so.6(+0x36160)[0x7f440a887160]
 /lib64/libc.so.6(gsignal+0x110)[0x7f440a8870e0]
 /lib64/libc.so.6(abort+0x151)[0x7f440a8886c1]
 /lib64/libc.so.6(+0x2e6fa)[0x7f440a87f6fa]
 /lib64/libc.so.6(+0x2e772)[0x7f440a87f772]
 /lib64/libpthread.so.0(pthread_mutex_lock+0x228)[0x7f440ac150b8]

 /usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x5dc9d)[0x7f44036f8c9d]

 /usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x70ba1)[0x7f440370bba1]

 /usr/lib64/glusterfs/5.3/xlator/protocol/client.so(+0x58f3f)[0x7f4403990f3f]
 /usr/lib64/libgfrpc.so.0(+0xe820)[0x7f440b48b820]
 /usr/lib64/libgfrpc.so.0(+0xeb6f)[0x7f440b48bb6f]
 /usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7f440b488063]

 /usr/lib64/glusterfs/5.3/rpc-transport/socket.so(+0xa0b2)[0x7f44050a80b2]
 /usr/lib64/libglusterfs.so.0(+0x854c3)[0x7f440b71e4c3]
 /lib64/libpthread.so.0(+0x7559)[0x7f440ac12559]
 /lib64/libc.so.6(clone+0x3f)[0x7f440a94981f]
 -
 [2019-02-08 01:13:35.628478] I [MSGID: 100030] [glusterfsd.c:2715:main]
 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 5.3
 (args: /usr/sbin/glusterfs --lru-limit=0 --process-name fuse
 --volfile-server=localhost --volfile-id=/_data1 /mnt/_data1)
 [2019-02-08 01:13:35.637830] I [MSGID: 101190]
 [event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll: Started thread
 with index 1
 [2019-02-08 01:13:35.651405] I [MSGID: 101190]
 [event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll: Started thread
 with index 2
 [2019-02-08 01:13:35.651628] I [MSGID: 101190]
 [event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll: Started thread
 with index 3
 [2019-02-08 01:13:35.651747] I [MSGID: 101190]
 [event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll: Started thread
 with index 4
 [2019-02-08 01:13:35.652575] I [MSGID: 114020] [client.c:2354:notify]
 0-_data1-client-0: parent translators are ready, attempting connect
 on transport
 [2019-02-08 01:13:35.652978] I [MSGID: 114020] [client.c:2354:notify]
 0-_data1-client-1: parent translators are ready, attempting connect
 on transport
 [2019-02-08 01:13:35.655197] I [MSGID: 114020] [client.c:2354:notify]
 0-_data1-client-2: parent translators are ready, attempting connect
 on transport
 [2019-02-08 01:13:35.655497] I [MSGID: 114020] [client.c:2354:notify]
 0-_data1-client-3: parent translators are ready, attempting connect
 on transport
 [2019-02-08 01:13:35.655527] I 

Re: [Gluster-users] Message repeated over and over after upgrade from 4.1 to 5.3: W [dict.c:761:dict_ref] (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) [0x7fd966fcd329] -->/us

2019-02-07 Thread Raghavendra Gowdappa
On Fri, Feb 8, 2019 at 8:48 AM Raghavendra Gowdappa 
wrote:

> One possible reason could be
> https://review.gluster.org/r/18b6d7ce7d490e807815270918a17a4b392a829d
>

 https://review.gluster.org/#/c/glusterfs/+/19997/

as that changed some code in epoll handler. Though the change is largely on
> server side, the epoll and socket changes are relevant for client too. I'll
> try to see whether there is anything wrong with that.
>
> On Fri, Feb 8, 2019 at 8:36 AM Nithya Balachandran 
> wrote:
>
>> Thanks Artem. Can you send us the coredump or the bt with symbols from
>> the crash?
>>
>> Regards,
>> Nithya
>>
>> On Fri, 8 Feb 2019 at 06:51, Artem Russakovskii 
>> wrote:
>>
>>> Sorry to disappoint, but the crash just happened again, so lru-limit=0
>>> didn't help.
>>>
>>> Here's the snippet of the crash and the subsequent remount by monit.
>>>
>>>
>>> [2019-02-08 01:13:05.854391] W [dict.c:761:dict_ref]
>>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
>>> [0x7f4402b99329]
>>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
>>> [0x7f4402daaaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
>>> [0x7f440b6b5218] ) 0-dict: dict is NULL [In
>>> valid argument]
>>> The message "I [MSGID: 108031]
>>> [afr-common.c:2543:afr_local_discovery_cbk] 0-_data1-replicate-0:
>>> selecting local read_child _data1-client-3" repeated 39 times between
>>> [2019-02-08 01:11:18.043286] and [2019-02-08 01:13:07.915604]
>>> The message "E [MSGID: 101191]
>>> [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch
>>> handler" repeated 515 times between [2019-02-08 01:11:17.932515] and
>>> [2019-02-08 01:13:09.311554]
>>> pending frames:
>>> frame : type(1) op(LOOKUP)
>>> frame : type(0) op(0)
>>> patchset: git://git.gluster.org/glusterfs.git
>>> signal received: 6
>>> time of crash:
>>> 2019-02-08 01:13:09
>>> configuration details:
>>> argp 1
>>> backtrace 1
>>> dlfcn 1
>>> libpthread 1
>>> llistxattr 1
>>> setfsid 1
>>> spinlock 1
>>> epoll.h 1
>>> xattr.h 1
>>> st_atim.tv_nsec 1
>>> package-string: glusterfs 5.3
>>> /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7f440b6c064c]
>>> /usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7f440b6cacb6]
>>> /lib64/libc.so.6(+0x36160)[0x7f440a887160]
>>> /lib64/libc.so.6(gsignal+0x110)[0x7f440a8870e0]
>>> /lib64/libc.so.6(abort+0x151)[0x7f440a8886c1]
>>> /lib64/libc.so.6(+0x2e6fa)[0x7f440a87f6fa]
>>> /lib64/libc.so.6(+0x2e772)[0x7f440a87f772]
>>> /lib64/libpthread.so.0(pthread_mutex_lock+0x228)[0x7f440ac150b8]
>>>
>>> /usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x5dc9d)[0x7f44036f8c9d]
>>>
>>> /usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x70ba1)[0x7f440370bba1]
>>>
>>> /usr/lib64/glusterfs/5.3/xlator/protocol/client.so(+0x58f3f)[0x7f4403990f3f]
>>> /usr/lib64/libgfrpc.so.0(+0xe820)[0x7f440b48b820]
>>> /usr/lib64/libgfrpc.so.0(+0xeb6f)[0x7f440b48bb6f]
>>> /usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7f440b488063]
>>> /usr/lib64/glusterfs/5.3/rpc-transport/socket.so(+0xa0b2)[0x7f44050a80b2]
>>> /usr/lib64/libglusterfs.so.0(+0x854c3)[0x7f440b71e4c3]
>>> /lib64/libpthread.so.0(+0x7559)[0x7f440ac12559]
>>> /lib64/libc.so.6(clone+0x3f)[0x7f440a94981f]
>>> -
>>> [2019-02-08 01:13:35.628478] I [MSGID: 100030] [glusterfsd.c:2715:main]
>>> 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 5.3
>>> (args: /usr/sbin/glusterfs --lru-limit=0 --process-name fuse
>>> --volfile-server=localhost --volfile-id=/_data1 /mnt/_data1)
>>> [2019-02-08 01:13:35.637830] I [MSGID: 101190]
>>> [event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll: Started thread
>>> with index 1
>>> [2019-02-08 01:13:35.651405] I [MSGID: 101190]
>>> [event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll: Started thread
>>> with index 2
>>> [2019-02-08 01:13:35.651628] I [MSGID: 101190]
>>> [event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll: Started thread
>>> with index 3
>>> [2019-02-08 01:13:35.651747] I [MSGID: 101190]
>>> [event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll: Started thread
>>> with index 4
>>> [2019-02-08 01:13:35.652575] I [MSGID: 114020] [client.c:2354:notify]
>>> 0-_data1-client-0: parent translators are ready, attempting connect
>>> on transport
>>> [2019-02-08 01:13:35.652978] I [MSGID: 114020] [client.c:2354:notify]
>>> 0-_data1-client-1: parent translators are ready, attempting connect
>>> on transport
>>> [2019-02-08 01:13:35.655197] I [MSGID: 114020] [client.c:2354:notify]
>>> 0-_data1-client-2: parent translators are ready, attempting connect
>>> on transport
>>> [2019-02-08 01:13:35.655497] I [MSGID: 114020] [client.c:2354:notify]
>>> 0-_data1-client-3: parent translators are ready, attempting connect
>>> on transport
>>> [2019-02-08 01:13:35.655527] I [rpc-clnt.c:2042:rpc_clnt_reconfig]
>>> 0-_data1-client-0: changing port to 49153 (from 0)
>>> Final graph:
>>>
>>>
>>> Sincerely,
>>> Artem
>>>
>>> --
>>> Founder, Android Police , APK Mirror
>>> 

Re: [Gluster-users] Message repeated over and over after upgrade from 4.1 to 5.3: W [dict.c:761:dict_ref] (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) [0x7fd966fcd329] -->/us

2019-02-07 Thread Raghavendra Gowdappa
One possible reason could be
https://review.gluster.org/r/18b6d7ce7d490e807815270918a17a4b392a829d as
that changed some code in epoll handler. Though the change is largely on
server side, the epoll and socket changes are relevant for client too. I'll
try to see whether there is anything wrong with that.

On Fri, Feb 8, 2019 at 8:36 AM Nithya Balachandran 
wrote:

> Thanks Artem. Can you send us the coredump or the bt with symbols from the
> crash?
>
> Regards,
> Nithya
>
> On Fri, 8 Feb 2019 at 06:51, Artem Russakovskii 
> wrote:
>
>> Sorry to disappoint, but the crash just happened again, so lru-limit=0
>> didn't help.
>>
>> Here's the snippet of the crash and the subsequent remount by monit.
>>
>>
>> [2019-02-08 01:13:05.854391] W [dict.c:761:dict_ref]
>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
>> [0x7f4402b99329]
>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
>> [0x7f4402daaaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
>> [0x7f440b6b5218] ) 0-dict: dict is NULL [In
>> valid argument]
>> The message "I [MSGID: 108031]
>> [afr-common.c:2543:afr_local_discovery_cbk] 0-_data1-replicate-0:
>> selecting local read_child _data1-client-3" repeated 39 times between
>> [2019-02-08 01:11:18.043286] and [2019-02-08 01:13:07.915604]
>> The message "E [MSGID: 101191]
>> [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch
>> handler" repeated 515 times between [2019-02-08 01:11:17.932515] and
>> [2019-02-08 01:13:09.311554]
>> pending frames:
>> frame : type(1) op(LOOKUP)
>> frame : type(0) op(0)
>> patchset: git://git.gluster.org/glusterfs.git
>> signal received: 6
>> time of crash:
>> 2019-02-08 01:13:09
>> configuration details:
>> argp 1
>> backtrace 1
>> dlfcn 1
>> libpthread 1
>> llistxattr 1
>> setfsid 1
>> spinlock 1
>> epoll.h 1
>> xattr.h 1
>> st_atim.tv_nsec 1
>> package-string: glusterfs 5.3
>> /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7f440b6c064c]
>> /usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7f440b6cacb6]
>> /lib64/libc.so.6(+0x36160)[0x7f440a887160]
>> /lib64/libc.so.6(gsignal+0x110)[0x7f440a8870e0]
>> /lib64/libc.so.6(abort+0x151)[0x7f440a8886c1]
>> /lib64/libc.so.6(+0x2e6fa)[0x7f440a87f6fa]
>> /lib64/libc.so.6(+0x2e772)[0x7f440a87f772]
>> /lib64/libpthread.so.0(pthread_mutex_lock+0x228)[0x7f440ac150b8]
>>
>> /usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x5dc9d)[0x7f44036f8c9d]
>>
>> /usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x70ba1)[0x7f440370bba1]
>>
>> /usr/lib64/glusterfs/5.3/xlator/protocol/client.so(+0x58f3f)[0x7f4403990f3f]
>> /usr/lib64/libgfrpc.so.0(+0xe820)[0x7f440b48b820]
>> /usr/lib64/libgfrpc.so.0(+0xeb6f)[0x7f440b48bb6f]
>> /usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7f440b488063]
>> /usr/lib64/glusterfs/5.3/rpc-transport/socket.so(+0xa0b2)[0x7f44050a80b2]
>> /usr/lib64/libglusterfs.so.0(+0x854c3)[0x7f440b71e4c3]
>> /lib64/libpthread.so.0(+0x7559)[0x7f440ac12559]
>> /lib64/libc.so.6(clone+0x3f)[0x7f440a94981f]
>> -
>> [2019-02-08 01:13:35.628478] I [MSGID: 100030] [glusterfsd.c:2715:main]
>> 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 5.3
>> (args: /usr/sbin/glusterfs --lru-limit=0 --process-name fuse
>> --volfile-server=localhost --volfile-id=/_data1 /mnt/_data1)
>> [2019-02-08 01:13:35.637830] I [MSGID: 101190]
>> [event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll: Started thread
>> with index 1
>> [2019-02-08 01:13:35.651405] I [MSGID: 101190]
>> [event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll: Started thread
>> with index 2
>> [2019-02-08 01:13:35.651628] I [MSGID: 101190]
>> [event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll: Started thread
>> with index 3
>> [2019-02-08 01:13:35.651747] I [MSGID: 101190]
>> [event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll: Started thread
>> with index 4
>> [2019-02-08 01:13:35.652575] I [MSGID: 114020] [client.c:2354:notify]
>> 0-_data1-client-0: parent translators are ready, attempting connect
>> on transport
>> [2019-02-08 01:13:35.652978] I [MSGID: 114020] [client.c:2354:notify]
>> 0-_data1-client-1: parent translators are ready, attempting connect
>> on transport
>> [2019-02-08 01:13:35.655197] I [MSGID: 114020] [client.c:2354:notify]
>> 0-_data1-client-2: parent translators are ready, attempting connect
>> on transport
>> [2019-02-08 01:13:35.655497] I [MSGID: 114020] [client.c:2354:notify]
>> 0-_data1-client-3: parent translators are ready, attempting connect
>> on transport
>> [2019-02-08 01:13:35.655527] I [rpc-clnt.c:2042:rpc_clnt_reconfig]
>> 0-_data1-client-0: changing port to 49153 (from 0)
>> Final graph:
>>
>>
>> Sincerely,
>> Artem
>>
>> --
>> Founder, Android Police , APK Mirror
>> , Illogical Robot LLC
>> beerpla.net | +ArtemRussakovskii
>>  | @ArtemR
>> 
>>
>>
>> On Thu, Feb 7, 2019 at 1:28 PM Artem Russakovskii 
>> wrote:
>>

Re: [Gluster-users] Message repeated over and over after upgrade from 4.1 to 5.3: W [dict.c:761:dict_ref] (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) [0x7fd966fcd329] -->/us

2019-02-07 Thread Nithya Balachandran
Thanks Artem. Can you send us the coredump or the bt with symbols from the
crash?

Regards,
Nithya

On Fri, 8 Feb 2019 at 06:51, Artem Russakovskii  wrote:

> Sorry to disappoint, but the crash just happened again, so lru-limit=0
> didn't help.
>
> Here's the snippet of the crash and the subsequent remount by monit.
>
>
> [2019-02-08 01:13:05.854391] W [dict.c:761:dict_ref]
> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
> [0x7f4402b99329]
> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
> [0x7f4402daaaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
> [0x7f440b6b5218] ) 0-dict: dict is NULL [In
> valid argument]
> The message "I [MSGID: 108031] [afr-common.c:2543:afr_local_discovery_cbk]
> 0-_data1-replicate-0: selecting local read_child
> _data1-client-3" repeated 39 times between [2019-02-08
> 01:11:18.043286] and [2019-02-08 01:13:07.915604]
> The message "E [MSGID: 101191]
> [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch
> handler" repeated 515 times between [2019-02-08 01:11:17.932515] and
> [2019-02-08 01:13:09.311554]
> pending frames:
> frame : type(1) op(LOOKUP)
> frame : type(0) op(0)
> patchset: git://git.gluster.org/glusterfs.git
> signal received: 6
> time of crash:
> 2019-02-08 01:13:09
> configuration details:
> argp 1
> backtrace 1
> dlfcn 1
> libpthread 1
> llistxattr 1
> setfsid 1
> spinlock 1
> epoll.h 1
> xattr.h 1
> st_atim.tv_nsec 1
> package-string: glusterfs 5.3
> /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7f440b6c064c]
> /usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7f440b6cacb6]
> /lib64/libc.so.6(+0x36160)[0x7f440a887160]
> /lib64/libc.so.6(gsignal+0x110)[0x7f440a8870e0]
> /lib64/libc.so.6(abort+0x151)[0x7f440a8886c1]
> /lib64/libc.so.6(+0x2e6fa)[0x7f440a87f6fa]
> /lib64/libc.so.6(+0x2e772)[0x7f440a87f772]
> /lib64/libpthread.so.0(pthread_mutex_lock+0x228)[0x7f440ac150b8]
>
> /usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x5dc9d)[0x7f44036f8c9d]
>
> /usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x70ba1)[0x7f440370bba1]
>
> /usr/lib64/glusterfs/5.3/xlator/protocol/client.so(+0x58f3f)[0x7f4403990f3f]
> /usr/lib64/libgfrpc.so.0(+0xe820)[0x7f440b48b820]
> /usr/lib64/libgfrpc.so.0(+0xeb6f)[0x7f440b48bb6f]
> /usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7f440b488063]
> /usr/lib64/glusterfs/5.3/rpc-transport/socket.so(+0xa0b2)[0x7f44050a80b2]
> /usr/lib64/libglusterfs.so.0(+0x854c3)[0x7f440b71e4c3]
> /lib64/libpthread.so.0(+0x7559)[0x7f440ac12559]
> /lib64/libc.so.6(clone+0x3f)[0x7f440a94981f]
> -
> [2019-02-08 01:13:35.628478] I [MSGID: 100030] [glusterfsd.c:2715:main]
> 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 5.3
> (args: /usr/sbin/glusterfs --lru-limit=0 --process-name fuse
> --volfile-server=localhost --volfile-id=/_data1 /mnt/_data1)
> [2019-02-08 01:13:35.637830] I [MSGID: 101190]
> [event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll: Started thread
> with index 1
> [2019-02-08 01:13:35.651405] I [MSGID: 101190]
> [event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll: Started thread
> with index 2
> [2019-02-08 01:13:35.651628] I [MSGID: 101190]
> [event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll: Started thread
> with index 3
> [2019-02-08 01:13:35.651747] I [MSGID: 101190]
> [event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll: Started thread
> with index 4
> [2019-02-08 01:13:35.652575] I [MSGID: 114020] [client.c:2354:notify]
> 0-_data1-client-0: parent translators are ready, attempting connect
> on transport
> [2019-02-08 01:13:35.652978] I [MSGID: 114020] [client.c:2354:notify]
> 0-_data1-client-1: parent translators are ready, attempting connect
> on transport
> [2019-02-08 01:13:35.655197] I [MSGID: 114020] [client.c:2354:notify]
> 0-_data1-client-2: parent translators are ready, attempting connect
> on transport
> [2019-02-08 01:13:35.655497] I [MSGID: 114020] [client.c:2354:notify]
> 0-_data1-client-3: parent translators are ready, attempting connect
> on transport
> [2019-02-08 01:13:35.655527] I [rpc-clnt.c:2042:rpc_clnt_reconfig]
> 0-_data1-client-0: changing port to 49153 (from 0)
> Final graph:
>
>
> Sincerely,
> Artem
>
> --
> Founder, Android Police , APK Mirror
> , Illogical Robot LLC
> beerpla.net | +ArtemRussakovskii
>  | @ArtemR
> 
>
>
> On Thu, Feb 7, 2019 at 1:28 PM Artem Russakovskii 
> wrote:
>
>> I've added the lru-limit=0 parameter to the mounts, and I see it's taken
>> effect correctly:
>> "/usr/sbin/glusterfs --lru-limit=0 --process-name fuse
>> --volfile-server=localhost --volfile-id=/  /mnt/"
>>
>> Let's see if it stops crashing or not.
>>
>> Sincerely,
>> Artem
>>
>> --
>> Founder, Android Police , APK Mirror
>> , Illogical Robot LLC
>> beerpla.net | +ArtemRussakovskii
>> 

Re: [Gluster-users] Message repeated over and over after upgrade from 4.1 to 5.3: W [dict.c:761:dict_ref] (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) [0x7fd966fcd329] -->/us

2019-02-07 Thread Artem Russakovskii
Sorry to disappoint, but the crash just happened again, so lru-limit=0
didn't help.

Here's the snippet of the crash and the subsequent remount by monit.


[2019-02-08 01:13:05.854391] W [dict.c:761:dict_ref]
(-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
[0x7f4402b99329]
-->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
[0x7f4402daaaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
[0x7f440b6b5218] ) 0-dict: dict is NULL [In
valid argument]
The message "I [MSGID: 108031] [afr-common.c:2543:afr_local_discovery_cbk]
0-_data1-replicate-0: selecting local read_child
_data1-client-3" repeated 39 times between [2019-02-08
01:11:18.043286] and [2019-02-08 01:13:07.915604]
The message "E [MSGID: 101191]
[event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch
handler" repeated 515 times between [2019-02-08 01:11:17.932515] and
[2019-02-08 01:13:09.311554]
pending frames:
frame : type(1) op(LOOKUP)
frame : type(0) op(0)
patchset: git://git.gluster.org/glusterfs.git
signal received: 6
time of crash:
2019-02-08 01:13:09
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 5.3
/usr/lib64/libglusterfs.so.0(+0x2764c)[0x7f440b6c064c]
/usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7f440b6cacb6]
/lib64/libc.so.6(+0x36160)[0x7f440a887160]
/lib64/libc.so.6(gsignal+0x110)[0x7f440a8870e0]
/lib64/libc.so.6(abort+0x151)[0x7f440a8886c1]
/lib64/libc.so.6(+0x2e6fa)[0x7f440a87f6fa]
/lib64/libc.so.6(+0x2e772)[0x7f440a87f772]
/lib64/libpthread.so.0(pthread_mutex_lock+0x228)[0x7f440ac150b8]
/usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x5dc9d)[0x7f44036f8c9d]
/usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x70ba1)[0x7f440370bba1]
/usr/lib64/glusterfs/5.3/xlator/protocol/client.so(+0x58f3f)[0x7f4403990f3f]
/usr/lib64/libgfrpc.so.0(+0xe820)[0x7f440b48b820]
/usr/lib64/libgfrpc.so.0(+0xeb6f)[0x7f440b48bb6f]
/usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7f440b488063]
/usr/lib64/glusterfs/5.3/rpc-transport/socket.so(+0xa0b2)[0x7f44050a80b2]
/usr/lib64/libglusterfs.so.0(+0x854c3)[0x7f440b71e4c3]
/lib64/libpthread.so.0(+0x7559)[0x7f440ac12559]
/lib64/libc.so.6(clone+0x3f)[0x7f440a94981f]
-
[2019-02-08 01:13:35.628478] I [MSGID: 100030] [glusterfsd.c:2715:main]
0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 5.3
(args: /usr/sbin/glusterfs --lru-limit=0 --process-name fuse
--volfile-server=localhost --volfile-id=/_data1 /mnt/_data1)
[2019-02-08 01:13:35.637830] I [MSGID: 101190]
[event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll: Started thread
with index 1
[2019-02-08 01:13:35.651405] I [MSGID: 101190]
[event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll: Started thread
with index 2
[2019-02-08 01:13:35.651628] I [MSGID: 101190]
[event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll: Started thread
with index 3
[2019-02-08 01:13:35.651747] I [MSGID: 101190]
[event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll: Started thread
with index 4
[2019-02-08 01:13:35.652575] I [MSGID: 114020] [client.c:2354:notify]
0-_data1-client-0: parent translators are ready, attempting connect
on transport
[2019-02-08 01:13:35.652978] I [MSGID: 114020] [client.c:2354:notify]
0-_data1-client-1: parent translators are ready, attempting connect
on transport
[2019-02-08 01:13:35.655197] I [MSGID: 114020] [client.c:2354:notify]
0-_data1-client-2: parent translators are ready, attempting connect
on transport
[2019-02-08 01:13:35.655497] I [MSGID: 114020] [client.c:2354:notify]
0-_data1-client-3: parent translators are ready, attempting connect
on transport
[2019-02-08 01:13:35.655527] I [rpc-clnt.c:2042:rpc_clnt_reconfig]
0-_data1-client-0: changing port to 49153 (from 0)
Final graph:


Sincerely,
Artem

--
Founder, Android Police , APK Mirror
, Illogical Robot LLC
beerpla.net | +ArtemRussakovskii
 | @ArtemR



On Thu, Feb 7, 2019 at 1:28 PM Artem Russakovskii 
wrote:

> I've added the lru-limit=0 parameter to the mounts, and I see it's taken
> effect correctly:
> "/usr/sbin/glusterfs --lru-limit=0 --process-name fuse
> --volfile-server=localhost --volfile-id=/  /mnt/"
>
> Let's see if it stops crashing or not.
>
> Sincerely,
> Artem
>
> --
> Founder, Android Police , APK Mirror
> , Illogical Robot LLC
> beerpla.net | +ArtemRussakovskii
>  | @ArtemR
> 
>
>
> On Wed, Feb 6, 2019 at 10:48 AM Artem Russakovskii 
> wrote:
>
>> Hi Nithya,
>>
>> Indeed, I upgraded from 4.1 to 5.3, at which point I started seeing
>> crashes, and no further releases have been made yet.
>>
>> volume info:
>> Type: Replicate
>> Volume ID: SNIP
>> Status: Started
>> Snapshot Count: 0
>> Number of 

Re: [Gluster-users] Message repeated over and over after upgrade from 4.1 to 5.3: W [dict.c:761:dict_ref] (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) [0x7fd966fcd329] -->/us

2019-02-07 Thread Artem Russakovskii
I've added the lru-limit=0 parameter to the mounts, and I see it's taken
effect correctly:
"/usr/sbin/glusterfs --lru-limit=0 --process-name fuse
--volfile-server=localhost --volfile-id=/  /mnt/"

Let's see if it stops crashing or not.

Sincerely,
Artem

--
Founder, Android Police , APK Mirror
, Illogical Robot LLC
beerpla.net | +ArtemRussakovskii
 | @ArtemR



On Wed, Feb 6, 2019 at 10:48 AM Artem Russakovskii 
wrote:

> Hi Nithya,
>
> Indeed, I upgraded from 4.1 to 5.3, at which point I started seeing
> crashes, and no further releases have been made yet.
>
> volume info:
> Type: Replicate
> Volume ID: SNIP
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 1 x 4 = 4
> Transport-type: tcp
> Bricks:
> Brick1: SNIP
> Brick2: SNIP
> Brick3: SNIP
> Brick4: SNIP
> Options Reconfigured:
> cluster.quorum-count: 1
> cluster.quorum-type: fixed
> network.ping-timeout: 5
> network.remote-dio: enable
> performance.rda-cache-limit: 256MB
> performance.readdir-ahead: on
> performance.parallel-readdir: on
> network.inode-lru-limit: 50
> performance.md-cache-timeout: 600
> performance.cache-invalidation: on
> performance.stat-prefetch: on
> features.cache-invalidation-timeout: 600
> features.cache-invalidation: on
> cluster.readdir-optimize: on
> performance.io-thread-count: 32
> server.event-threads: 4
> client.event-threads: 4
> performance.read-ahead: off
> cluster.lookup-optimize: on
> performance.cache-size: 1GB
> cluster.self-heal-daemon: enable
> transport.address-family: inet
> nfs.disable: on
> performance.client-io-threads: on
> cluster.granular-entry-heal: enable
> cluster.data-self-heal-algorithm: full
>
> Sincerely,
> Artem
>
> --
> Founder, Android Police , APK Mirror
> , Illogical Robot LLC
> beerpla.net | +ArtemRussakovskii
>  | @ArtemR
> 
>
>
> On Wed, Feb 6, 2019 at 12:20 AM Nithya Balachandran 
> wrote:
>
>> Hi Artem,
>>
>> Do you still see the crashes with 5.3? If yes, please try mount the
>> volume using the mount option lru-limit=0 and see if that helps. We are
>> looking into the crashes and will update when have a fix.
>>
>> Also, please provide the gluster volume info for the volume in question.
>>
>>
>> regards,
>> Nithya
>>
>> On Tue, 5 Feb 2019 at 05:31, Artem Russakovskii 
>> wrote:
>>
>>> The fuse crash happened two more times, but this time monit helped
>>> recover within 1 minute, so it's a great workaround for now.
>>>
>>> What's odd is that the crashes are only happening on one of 4 servers,
>>> and I don't know why.
>>>
>>> Sincerely,
>>> Artem
>>>
>>> --
>>> Founder, Android Police , APK Mirror
>>> , Illogical Robot LLC
>>> beerpla.net | +ArtemRussakovskii
>>>  | @ArtemR
>>> 
>>>
>>>
>>> On Sat, Feb 2, 2019 at 12:14 PM Artem Russakovskii 
>>> wrote:
>>>
 The fuse crash happened again yesterday, to another volume. Are there
 any mount options that could help mitigate this?

 In the meantime, I set up a monit (https://mmonit.com/monit/) task to
 watch and restart the mount, which works and recovers the mount point
 within a minute. Not ideal, but a temporary workaround.

 By the way, the way to reproduce this "Transport endpoint is not
 connected" condition for testing purposes is to kill -9 the right
 "glusterfs --process-name fuse" process.


 monit check:
 check filesystem glusterfs_data1 with path /mnt/glusterfs_data1
   start program  = "/bin/mount  /mnt/glusterfs_data1"
   stop program  = "/bin/umount /mnt/glusterfs_data1"
   if space usage > 90% for 5 times within 15 cycles
 then alert else if succeeded for 10 cycles then alert


 stack trace:
 [2019-02-01 23:22:00.312894] W [dict.c:761:dict_ref]
 (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
 [0x7fa0249e4329]
 -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
 [0x7fa024bf5af5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
 [0x7fa02cf5b218] ) 0-dict: dict is NULL [Invalid argument]
 [2019-02-01 23:22:00.314051] W [dict.c:761:dict_ref]
 (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
 [0x7fa0249e4329]
 -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
 [0x7fa024bf5af5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
 [0x7fa02cf5b218] ) 0-dict: dict is NULL [Invalid argument]
 The message "E [MSGID: 101191]
 [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch
 handler" repeated 26 times between [2019-02-01 23:21:20.857333] and
 [2019-02-01 

Re: [Gluster-users] Message repeated over and over after upgrade from 4.1 to 5.3: W [dict.c:761:dict_ref] (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) [0x7fd966fcd329] -->/us

2019-02-06 Thread Artem Russakovskii
Hi Nithya,

Indeed, I upgraded from 4.1 to 5.3, at which point I started seeing
crashes, and no further releases have been made yet.

volume info:
Type: Replicate
Volume ID: SNIP
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 4 = 4
Transport-type: tcp
Bricks:
Brick1: SNIP
Brick2: SNIP
Brick3: SNIP
Brick4: SNIP
Options Reconfigured:
cluster.quorum-count: 1
cluster.quorum-type: fixed
network.ping-timeout: 5
network.remote-dio: enable
performance.rda-cache-limit: 256MB
performance.readdir-ahead: on
performance.parallel-readdir: on
network.inode-lru-limit: 50
performance.md-cache-timeout: 600
performance.cache-invalidation: on
performance.stat-prefetch: on
features.cache-invalidation-timeout: 600
features.cache-invalidation: on
cluster.readdir-optimize: on
performance.io-thread-count: 32
server.event-threads: 4
client.event-threads: 4
performance.read-ahead: off
cluster.lookup-optimize: on
performance.cache-size: 1GB
cluster.self-heal-daemon: enable
transport.address-family: inet
nfs.disable: on
performance.client-io-threads: on
cluster.granular-entry-heal: enable
cluster.data-self-heal-algorithm: full

Sincerely,
Artem

--
Founder, Android Police , APK Mirror
, Illogical Robot LLC
beerpla.net | +ArtemRussakovskii
 | @ArtemR



On Wed, Feb 6, 2019 at 12:20 AM Nithya Balachandran 
wrote:

> Hi Artem,
>
> Do you still see the crashes with 5.3? If yes, please try mount the volume
> using the mount option lru-limit=0 and see if that helps. We are looking
> into the crashes and will update when have a fix.
>
> Also, please provide the gluster volume info for the volume in question.
>
>
> regards,
> Nithya
>
> On Tue, 5 Feb 2019 at 05:31, Artem Russakovskii 
> wrote:
>
>> The fuse crash happened two more times, but this time monit helped
>> recover within 1 minute, so it's a great workaround for now.
>>
>> What's odd is that the crashes are only happening on one of 4 servers,
>> and I don't know why.
>>
>> Sincerely,
>> Artem
>>
>> --
>> Founder, Android Police , APK Mirror
>> , Illogical Robot LLC
>> beerpla.net | +ArtemRussakovskii
>>  | @ArtemR
>> 
>>
>>
>> On Sat, Feb 2, 2019 at 12:14 PM Artem Russakovskii 
>> wrote:
>>
>>> The fuse crash happened again yesterday, to another volume. Are there
>>> any mount options that could help mitigate this?
>>>
>>> In the meantime, I set up a monit (https://mmonit.com/monit/) task to
>>> watch and restart the mount, which works and recovers the mount point
>>> within a minute. Not ideal, but a temporary workaround.
>>>
>>> By the way, the way to reproduce this "Transport endpoint is not
>>> connected" condition for testing purposes is to kill -9 the right
>>> "glusterfs --process-name fuse" process.
>>>
>>>
>>> monit check:
>>> check filesystem glusterfs_data1 with path /mnt/glusterfs_data1
>>>   start program  = "/bin/mount  /mnt/glusterfs_data1"
>>>   stop program  = "/bin/umount /mnt/glusterfs_data1"
>>>   if space usage > 90% for 5 times within 15 cycles
>>> then alert else if succeeded for 10 cycles then alert
>>>
>>>
>>> stack trace:
>>> [2019-02-01 23:22:00.312894] W [dict.c:761:dict_ref]
>>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
>>> [0x7fa0249e4329]
>>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
>>> [0x7fa024bf5af5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
>>> [0x7fa02cf5b218] ) 0-dict: dict is NULL [Invalid argument]
>>> [2019-02-01 23:22:00.314051] W [dict.c:761:dict_ref]
>>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
>>> [0x7fa0249e4329]
>>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
>>> [0x7fa024bf5af5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
>>> [0x7fa02cf5b218] ) 0-dict: dict is NULL [Invalid argument]
>>> The message "E [MSGID: 101191]
>>> [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch
>>> handler" repeated 26 times between [2019-02-01 23:21:20.857333] and
>>> [2019-02-01 23:21:56.164427]
>>> The message "I [MSGID: 108031]
>>> [afr-common.c:2543:afr_local_discovery_cbk] 0-SITE_data3-replicate-0:
>>> selecting local read_child SITE_data3-client-3" repeated 27 times between
>>> [2019-02-01 23:21:11.142467] and [2019-02-01 23:22:03.474036]
>>> pending frames:
>>> frame : type(1) op(LOOKUP)
>>> frame : type(0) op(0)
>>> patchset: git://git.gluster.org/glusterfs.git
>>> signal received: 6
>>> time of crash:
>>> 2019-02-01 23:22:03
>>> configuration details:
>>> argp 1
>>> backtrace 1
>>> dlfcn 1
>>> libpthread 1
>>> llistxattr 1
>>> setfsid 1
>>> spinlock 1
>>> epoll.h 1
>>> xattr.h 1
>>> st_atim.tv_nsec 1
>>> package-string: glusterfs 5.3
>>> /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7fa02cf6664c]
>>> 

Re: [Gluster-users] Message repeated over and over after upgrade from 4.1 to 5.3: W [dict.c:761:dict_ref] (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) [0x7fd966fcd329] -->/us

2019-02-06 Thread Nithya Balachandran
Hi Artem,

Do you still see the crashes with 5.3? If yes, please try mount the volume
using the mount option lru-limit=0 and see if that helps. We are looking
into the crashes and will update when have a fix.

Also, please provide the gluster volume info for the volume in question.


regards,
Nithya

On Tue, 5 Feb 2019 at 05:31, Artem Russakovskii  wrote:

> The fuse crash happened two more times, but this time monit helped recover
> within 1 minute, so it's a great workaround for now.
>
> What's odd is that the crashes are only happening on one of 4 servers, and
> I don't know why.
>
> Sincerely,
> Artem
>
> --
> Founder, Android Police , APK Mirror
> , Illogical Robot LLC
> beerpla.net | +ArtemRussakovskii
>  | @ArtemR
> 
>
>
> On Sat, Feb 2, 2019 at 12:14 PM Artem Russakovskii 
> wrote:
>
>> The fuse crash happened again yesterday, to another volume. Are there any
>> mount options that could help mitigate this?
>>
>> In the meantime, I set up a monit (https://mmonit.com/monit/) task to
>> watch and restart the mount, which works and recovers the mount point
>> within a minute. Not ideal, but a temporary workaround.
>>
>> By the way, the way to reproduce this "Transport endpoint is not
>> connected" condition for testing purposes is to kill -9 the right
>> "glusterfs --process-name fuse" process.
>>
>>
>> monit check:
>> check filesystem glusterfs_data1 with path /mnt/glusterfs_data1
>>   start program  = "/bin/mount  /mnt/glusterfs_data1"
>>   stop program  = "/bin/umount /mnt/glusterfs_data1"
>>   if space usage > 90% for 5 times within 15 cycles
>> then alert else if succeeded for 10 cycles then alert
>>
>>
>> stack trace:
>> [2019-02-01 23:22:00.312894] W [dict.c:761:dict_ref]
>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
>> [0x7fa0249e4329]
>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
>> [0x7fa024bf5af5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
>> [0x7fa02cf5b218] ) 0-dict: dict is NULL [Invalid argument]
>> [2019-02-01 23:22:00.314051] W [dict.c:761:dict_ref]
>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
>> [0x7fa0249e4329]
>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
>> [0x7fa024bf5af5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
>> [0x7fa02cf5b218] ) 0-dict: dict is NULL [Invalid argument]
>> The message "E [MSGID: 101191]
>> [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch
>> handler" repeated 26 times between [2019-02-01 23:21:20.857333] and
>> [2019-02-01 23:21:56.164427]
>> The message "I [MSGID: 108031]
>> [afr-common.c:2543:afr_local_discovery_cbk] 0-SITE_data3-replicate-0:
>> selecting local read_child SITE_data3-client-3" repeated 27 times between
>> [2019-02-01 23:21:11.142467] and [2019-02-01 23:22:03.474036]
>> pending frames:
>> frame : type(1) op(LOOKUP)
>> frame : type(0) op(0)
>> patchset: git://git.gluster.org/glusterfs.git
>> signal received: 6
>> time of crash:
>> 2019-02-01 23:22:03
>> configuration details:
>> argp 1
>> backtrace 1
>> dlfcn 1
>> libpthread 1
>> llistxattr 1
>> setfsid 1
>> spinlock 1
>> epoll.h 1
>> xattr.h 1
>> st_atim.tv_nsec 1
>> package-string: glusterfs 5.3
>> /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7fa02cf6664c]
>> /usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7fa02cf70cb6]
>> /lib64/libc.so.6(+0x36160)[0x7fa02c12d160]
>> /lib64/libc.so.6(gsignal+0x110)[0x7fa02c12d0e0]
>> /lib64/libc.so.6(abort+0x151)[0x7fa02c12e6c1]
>> /lib64/libc.so.6(+0x2e6fa)[0x7fa02c1256fa]
>> /lib64/libc.so.6(+0x2e772)[0x7fa02c125772]
>> /lib64/libpthread.so.0(pthread_mutex_lock+0x228)[0x7fa02c4bb0b8]
>>
>> /usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x5dc9d)[0x7fa025543c9d]
>>
>> /usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x70ba1)[0x7fa025556ba1]
>>
>> /usr/lib64/glusterfs/5.3/xlator/protocol/client.so(+0x58f3f)[0x7fa0257dbf3f]
>> /usr/lib64/libgfrpc.so.0(+0xe820)[0x7fa02cd31820]
>> /usr/lib64/libgfrpc.so.0(+0xeb6f)[0x7fa02cd31b6f]
>> /usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7fa02cd2e063]
>> /usr/lib64/glusterfs/5.3/rpc-transport/socket.so(+0xa0b2)[0x7fa02694e0b2]
>> /usr/lib64/libglusterfs.so.0(+0x854c3)[0x7fa02cfc44c3]
>> /lib64/libpthread.so.0(+0x7559)[0x7fa02c4b8559]
>> /lib64/libc.so.6(clone+0x3f)[0x7fa02c1ef81f]
>>
>> Sincerely,
>> Artem
>>
>> --
>> Founder, Android Police , APK Mirror
>> , Illogical Robot LLC
>> beerpla.net | +ArtemRussakovskii
>>  | @ArtemR
>> 
>>
>>
>> On Fri, Feb 1, 2019 at 9:03 AM Artem Russakovskii 
>> wrote:
>>
>>> Hi,
>>>
>>> The first (and so far only) crash happened at 2am the next day after we
>>> upgraded, on only one of four servers and only to one of two mounts.
>>>
>>> I have no idea what caused it, but 

Re: [Gluster-users] Message repeated over and over after upgrade from 4.1 to 5.3: W [dict.c:761:dict_ref] (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) [0x7fd966fcd329] -->/us

2019-02-04 Thread Artem Russakovskii
The fuse crash happened two more times, but this time monit helped recover
within 1 minute, so it's a great workaround for now.

What's odd is that the crashes are only happening on one of 4 servers, and
I don't know why.

Sincerely,
Artem

--
Founder, Android Police , APK Mirror
, Illogical Robot LLC
beerpla.net | +ArtemRussakovskii
 | @ArtemR



On Sat, Feb 2, 2019 at 12:14 PM Artem Russakovskii 
wrote:

> The fuse crash happened again yesterday, to another volume. Are there any
> mount options that could help mitigate this?
>
> In the meantime, I set up a monit (https://mmonit.com/monit/) task to
> watch and restart the mount, which works and recovers the mount point
> within a minute. Not ideal, but a temporary workaround.
>
> By the way, the way to reproduce this "Transport endpoint is not
> connected" condition for testing purposes is to kill -9 the right
> "glusterfs --process-name fuse" process.
>
>
> monit check:
> check filesystem glusterfs_data1 with path /mnt/glusterfs_data1
>   start program  = "/bin/mount  /mnt/glusterfs_data1"
>   stop program  = "/bin/umount /mnt/glusterfs_data1"
>   if space usage > 90% for 5 times within 15 cycles
> then alert else if succeeded for 10 cycles then alert
>
>
> stack trace:
> [2019-02-01 23:22:00.312894] W [dict.c:761:dict_ref]
> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
> [0x7fa0249e4329]
> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
> [0x7fa024bf5af5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
> [0x7fa02cf5b218] ) 0-dict: dict is NULL [Invalid argument]
> [2019-02-01 23:22:00.314051] W [dict.c:761:dict_ref]
> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
> [0x7fa0249e4329]
> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
> [0x7fa024bf5af5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
> [0x7fa02cf5b218] ) 0-dict: dict is NULL [Invalid argument]
> The message "E [MSGID: 101191]
> [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch
> handler" repeated 26 times between [2019-02-01 23:21:20.857333] and
> [2019-02-01 23:21:56.164427]
> The message "I [MSGID: 108031] [afr-common.c:2543:afr_local_discovery_cbk]
> 0-SITE_data3-replicate-0: selecting local read_child SITE_data3-client-3"
> repeated 27 times between [2019-02-01 23:21:11.142467] and [2019-02-01
> 23:22:03.474036]
> pending frames:
> frame : type(1) op(LOOKUP)
> frame : type(0) op(0)
> patchset: git://git.gluster.org/glusterfs.git
> signal received: 6
> time of crash:
> 2019-02-01 23:22:03
> configuration details:
> argp 1
> backtrace 1
> dlfcn 1
> libpthread 1
> llistxattr 1
> setfsid 1
> spinlock 1
> epoll.h 1
> xattr.h 1
> st_atim.tv_nsec 1
> package-string: glusterfs 5.3
> /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7fa02cf6664c]
> /usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7fa02cf70cb6]
> /lib64/libc.so.6(+0x36160)[0x7fa02c12d160]
> /lib64/libc.so.6(gsignal+0x110)[0x7fa02c12d0e0]
> /lib64/libc.so.6(abort+0x151)[0x7fa02c12e6c1]
> /lib64/libc.so.6(+0x2e6fa)[0x7fa02c1256fa]
> /lib64/libc.so.6(+0x2e772)[0x7fa02c125772]
> /lib64/libpthread.so.0(pthread_mutex_lock+0x228)[0x7fa02c4bb0b8]
>
> /usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x5dc9d)[0x7fa025543c9d]
>
> /usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x70ba1)[0x7fa025556ba1]
>
> /usr/lib64/glusterfs/5.3/xlator/protocol/client.so(+0x58f3f)[0x7fa0257dbf3f]
> /usr/lib64/libgfrpc.so.0(+0xe820)[0x7fa02cd31820]
> /usr/lib64/libgfrpc.so.0(+0xeb6f)[0x7fa02cd31b6f]
> /usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7fa02cd2e063]
> /usr/lib64/glusterfs/5.3/rpc-transport/socket.so(+0xa0b2)[0x7fa02694e0b2]
> /usr/lib64/libglusterfs.so.0(+0x854c3)[0x7fa02cfc44c3]
> /lib64/libpthread.so.0(+0x7559)[0x7fa02c4b8559]
> /lib64/libc.so.6(clone+0x3f)[0x7fa02c1ef81f]
>
> Sincerely,
> Artem
>
> --
> Founder, Android Police , APK Mirror
> , Illogical Robot LLC
> beerpla.net | +ArtemRussakovskii
>  | @ArtemR
> 
>
>
> On Fri, Feb 1, 2019 at 9:03 AM Artem Russakovskii 
> wrote:
>
>> Hi,
>>
>> The first (and so far only) crash happened at 2am the next day after we
>> upgraded, on only one of four servers and only to one of two mounts.
>>
>> I have no idea what caused it, but yeah, we do have a pretty busy site (
>> apkmirror.com), and it caused a disruption for any uploads or downloads
>> from that server until I woke up and fixed the mount.
>>
>> I wish I could be more helpful but all I have is that stack trace.
>>
>> I'm glad it's a blocker and will hopefully be resolved soon.
>>
>> On Thu, Jan 31, 2019, 7:26 PM Amar Tumballi Suryanarayan <
>> atumb...@redhat.com> wrote:
>>
>>> Hi Artem,
>>>
>>> Opened https://bugzilla.redhat.com/show_bug.cgi?id=1671603 (ie, 

Re: [Gluster-users] Message repeated over and over after upgrade from 4.1 to 5.3: W [dict.c:761:dict_ref] (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) [0x7fd966fcd329] -->/us

2019-02-02 Thread Artem Russakovskii
The fuse crash happened again yesterday, to another volume. Are there any
mount options that could help mitigate this?

In the meantime, I set up a monit (https://mmonit.com/monit/) task to watch
and restart the mount, which works and recovers the mount point within a
minute. Not ideal, but a temporary workaround.

By the way, the way to reproduce this "Transport endpoint is not connected"
condition for testing purposes is to kill -9 the right "glusterfs
--process-name fuse" process.


monit check:
check filesystem glusterfs_data1 with path /mnt/glusterfs_data1
  start program  = "/bin/mount  /mnt/glusterfs_data1"
  stop program  = "/bin/umount /mnt/glusterfs_data1"
  if space usage > 90% for 5 times within 15 cycles
then alert else if succeeded for 10 cycles then alert


stack trace:
[2019-02-01 23:22:00.312894] W [dict.c:761:dict_ref]
(-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
[0x7fa0249e4329]
-->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
[0x7fa024bf5af5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
[0x7fa02cf5b218] ) 0-dict: dict is NULL [Invalid argument]
[2019-02-01 23:22:00.314051] W [dict.c:761:dict_ref]
(-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
[0x7fa0249e4329]
-->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
[0x7fa024bf5af5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
[0x7fa02cf5b218] ) 0-dict: dict is NULL [Invalid argument]
The message "E [MSGID: 101191]
[event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch
handler" repeated 26 times between [2019-02-01 23:21:20.857333] and
[2019-02-01 23:21:56.164427]
The message "I [MSGID: 108031] [afr-common.c:2543:afr_local_discovery_cbk]
0-SITE_data3-replicate-0: selecting local read_child SITE_data3-client-3"
repeated 27 times between [2019-02-01 23:21:11.142467] and [2019-02-01
23:22:03.474036]
pending frames:
frame : type(1) op(LOOKUP)
frame : type(0) op(0)
patchset: git://git.gluster.org/glusterfs.git
signal received: 6
time of crash:
2019-02-01 23:22:03
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 5.3
/usr/lib64/libglusterfs.so.0(+0x2764c)[0x7fa02cf6664c]
/usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7fa02cf70cb6]
/lib64/libc.so.6(+0x36160)[0x7fa02c12d160]
/lib64/libc.so.6(gsignal+0x110)[0x7fa02c12d0e0]
/lib64/libc.so.6(abort+0x151)[0x7fa02c12e6c1]
/lib64/libc.so.6(+0x2e6fa)[0x7fa02c1256fa]
/lib64/libc.so.6(+0x2e772)[0x7fa02c125772]
/lib64/libpthread.so.0(pthread_mutex_lock+0x228)[0x7fa02c4bb0b8]
/usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x5dc9d)[0x7fa025543c9d]
/usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x70ba1)[0x7fa025556ba1]
/usr/lib64/glusterfs/5.3/xlator/protocol/client.so(+0x58f3f)[0x7fa0257dbf3f]
/usr/lib64/libgfrpc.so.0(+0xe820)[0x7fa02cd31820]
/usr/lib64/libgfrpc.so.0(+0xeb6f)[0x7fa02cd31b6f]
/usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7fa02cd2e063]
/usr/lib64/glusterfs/5.3/rpc-transport/socket.so(+0xa0b2)[0x7fa02694e0b2]
/usr/lib64/libglusterfs.so.0(+0x854c3)[0x7fa02cfc44c3]
/lib64/libpthread.so.0(+0x7559)[0x7fa02c4b8559]
/lib64/libc.so.6(clone+0x3f)[0x7fa02c1ef81f]

Sincerely,
Artem

--
Founder, Android Police , APK Mirror
, Illogical Robot LLC
beerpla.net | +ArtemRussakovskii
 | @ArtemR



On Fri, Feb 1, 2019 at 9:03 AM Artem Russakovskii 
wrote:

> Hi,
>
> The first (and so far only) crash happened at 2am the next day after we
> upgraded, on only one of four servers and only to one of two mounts.
>
> I have no idea what caused it, but yeah, we do have a pretty busy site (
> apkmirror.com), and it caused a disruption for any uploads or downloads
> from that server until I woke up and fixed the mount.
>
> I wish I could be more helpful but all I have is that stack trace.
>
> I'm glad it's a blocker and will hopefully be resolved soon.
>
> On Thu, Jan 31, 2019, 7:26 PM Amar Tumballi Suryanarayan <
> atumb...@redhat.com> wrote:
>
>> Hi Artem,
>>
>> Opened https://bugzilla.redhat.com/show_bug.cgi?id=1671603 (ie, as a
>> clone of other bugs where recent discussions happened), and marked it as a
>> blocker for glusterfs-5.4 release.
>>
>> We already have fixes for log flooding - https://review.gluster.org/22128,
>> and are the process of identifying and fixing the issue seen with crash.
>>
>> Can you please tell if the crashes happened as soon as upgrade ? or was
>> there any particular pattern you observed before the crash.
>>
>> -Amar
>>
>>
>> On Thu, Jan 31, 2019 at 11:40 PM Artem Russakovskii 
>> wrote:
>>
>>> Within 24 hours after updating from rock solid 4.1 to 5.3, I already got
>>> a crash which others have mentioned in
>>> https://bugzilla.redhat.com/show_bug.cgi?id=1313567 and had to unmount,
>>> kill gluster, and remount:
>>>

Re: [Gluster-users] Message repeated over and over after upgrade from 4.1 to 5.3: W [dict.c:761:dict_ref] (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) [0x7fd966fcd329] -->/us

2019-02-01 Thread Artem Russakovskii
Hi,

The first (and so far only) crash happened at 2am the next day after we
upgraded, on only one of four servers and only to one of two mounts.

I have no idea what caused it, but yeah, we do have a pretty busy site (
apkmirror.com), and it caused a disruption for any uploads or downloads
from that server until I woke up and fixed the mount.

I wish I could be more helpful but all I have is that stack trace.

I'm glad it's a blocker and will hopefully be resolved soon.

On Thu, Jan 31, 2019, 7:26 PM Amar Tumballi Suryanarayan <
atumb...@redhat.com> wrote:

> Hi Artem,
>
> Opened https://bugzilla.redhat.com/show_bug.cgi?id=1671603 (ie, as a
> clone of other bugs where recent discussions happened), and marked it as a
> blocker for glusterfs-5.4 release.
>
> We already have fixes for log flooding - https://review.gluster.org/22128,
> and are the process of identifying and fixing the issue seen with crash.
>
> Can you please tell if the crashes happened as soon as upgrade ? or was
> there any particular pattern you observed before the crash.
>
> -Amar
>
>
> On Thu, Jan 31, 2019 at 11:40 PM Artem Russakovskii 
> wrote:
>
>> Within 24 hours after updating from rock solid 4.1 to 5.3, I already got
>> a crash which others have mentioned in
>> https://bugzilla.redhat.com/show_bug.cgi?id=1313567 and had to unmount,
>> kill gluster, and remount:
>>
>>
>> [2019-01-31 09:38:04.317604] W [dict.c:761:dict_ref]
>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
>> [0x7fcccafcd329]
>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
>> [0x7fcccb1deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
>> [0x7fccd705b218] ) 2-dict: dict is NULL [Invalid argument]
>> [2019-01-31 09:38:04.319308] W [dict.c:761:dict_ref]
>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
>> [0x7fcccafcd329]
>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
>> [0x7fcccb1deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
>> [0x7fccd705b218] ) 2-dict: dict is NULL [Invalid argument]
>> [2019-01-31 09:38:04.320047] W [dict.c:761:dict_ref]
>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
>> [0x7fcccafcd329]
>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
>> [0x7fcccb1deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
>> [0x7fccd705b218] ) 2-dict: dict is NULL [Invalid argument]
>> [2019-01-31 09:38:04.320677] W [dict.c:761:dict_ref]
>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
>> [0x7fcccafcd329]
>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
>> [0x7fcccb1deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
>> [0x7fccd705b218] ) 2-dict: dict is NULL [Invalid argument]
>> The message "I [MSGID: 108031]
>> [afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data1-replicate-0:
>> selecting local read_child SITE_data1-client-3" repeated 5 times between
>> [2019-01-31 09:37:54.751905] and [2019-01-31 09:38:03.958061]
>> The message "E [MSGID: 101191]
>> [event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to dispatch
>> handler" repeated 72 times between [2019-01-31 09:37:53.746741] and
>> [2019-01-31 09:38:04.696993]
>> pending frames:
>> frame : type(1) op(READ)
>> frame : type(1) op(OPEN)
>> frame : type(0) op(0)
>> patchset: git://git.gluster.org/glusterfs.git
>> signal received: 6
>> time of crash:
>> 2019-01-31 09:38:04
>> configuration details:
>> argp 1
>> backtrace 1
>> dlfcn 1
>> libpthread 1
>> llistxattr 1
>> setfsid 1
>> spinlock 1
>> epoll.h 1
>> xattr.h 1
>> st_atim.tv_nsec 1
>> package-string: glusterfs 5.3
>> /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7fccd706664c]
>> /usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7fccd7070cb6]
>> /lib64/libc.so.6(+0x36160)[0x7fccd622d160]
>> /lib64/libc.so.6(gsignal+0x110)[0x7fccd622d0e0]
>> /lib64/libc.so.6(abort+0x151)[0x7fccd622e6c1]
>> /lib64/libc.so.6(+0x2e6fa)[0x7fccd62256fa]
>> /lib64/libc.so.6(+0x2e772)[0x7fccd6225772]
>> /lib64/libpthread.so.0(pthread_mutex_lock+0x228)[0x7fccd65bb0b8]
>>
>> /usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x32c4d)[0x7fcccbb01c4d]
>>
>> /usr/lib64/glusterfs/5.3/xlator/protocol/client.so(+0x65778)[0x7fcccbdd1778]
>> /usr/lib64/libgfrpc.so.0(+0xe820)[0x7fccd6e31820]
>> /usr/lib64/libgfrpc.so.0(+0xeb6f)[0x7fccd6e31b6f]
>> /usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7fccd6e2e063]
>> /usr/lib64/glusterfs/5.3/rpc-transport/socket.so(+0xa0b2)[0x7fccd0b7e0b2]
>> /usr/lib64/libglusterfs.so.0(+0x854c3)[0x7fccd70c44c3]
>> /lib64/libpthread.so.0(+0x7559)[0x7fccd65b8559]
>> /lib64/libc.so.6(clone+0x3f)[0x7fccd62ef81f]
>> -
>>
>> Do the pending patches fix the crash or only the repeated warnings? I'm
>> running glusterfs on OpenSUSE 15.0 installed via
>> http://download.opensuse.org/repositories/home:/glusterfs:/Leap15-5/openSUSE_Leap_15.0/,
>> not too sure how to make it core dump.
>>
>> If it's not fixed by the patches above, has anyone already 

Re: [Gluster-users] Message repeated over and over after upgrade from 4.1 to 5.3: W [dict.c:761:dict_ref] (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) [0x7fd966fcd329] -->/us

2019-01-31 Thread Amar Tumballi Suryanarayan
Hi Artem,

Opened https://bugzilla.redhat.com/show_bug.cgi?id=1671603 (ie, as a clone
of other bugs where recent discussions happened), and marked it as a
blocker for glusterfs-5.4 release.

We already have fixes for log flooding - https://review.gluster.org/22128,
and are the process of identifying and fixing the issue seen with crash.

Can you please tell if the crashes happened as soon as upgrade ? or was
there any particular pattern you observed before the crash.

-Amar


On Thu, Jan 31, 2019 at 11:40 PM Artem Russakovskii 
wrote:

> Within 24 hours after updating from rock solid 4.1 to 5.3, I already got a
> crash which others have mentioned in
> https://bugzilla.redhat.com/show_bug.cgi?id=1313567 and had to unmount,
> kill gluster, and remount:
>
>
> [2019-01-31 09:38:04.317604] W [dict.c:761:dict_ref]
> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
> [0x7fcccafcd329]
> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
> [0x7fcccb1deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
> [0x7fccd705b218] ) 2-dict: dict is NULL [Invalid argument]
> [2019-01-31 09:38:04.319308] W [dict.c:761:dict_ref]
> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
> [0x7fcccafcd329]
> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
> [0x7fcccb1deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
> [0x7fccd705b218] ) 2-dict: dict is NULL [Invalid argument]
> [2019-01-31 09:38:04.320047] W [dict.c:761:dict_ref]
> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
> [0x7fcccafcd329]
> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
> [0x7fcccb1deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
> [0x7fccd705b218] ) 2-dict: dict is NULL [Invalid argument]
> [2019-01-31 09:38:04.320677] W [dict.c:761:dict_ref]
> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
> [0x7fcccafcd329]
> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
> [0x7fcccb1deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
> [0x7fccd705b218] ) 2-dict: dict is NULL [Invalid argument]
> The message "I [MSGID: 108031] [afr-common.c:2543:afr_local_discovery_cbk]
> 2-SITE_data1-replicate-0: selecting local read_child SITE_data1-client-3"
> repeated 5 times between [2019-01-31 09:37:54.751905] and [2019-01-31
> 09:38:03.958061]
> The message "E [MSGID: 101191]
> [event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to dispatch
> handler" repeated 72 times between [2019-01-31 09:37:53.746741] and
> [2019-01-31 09:38:04.696993]
> pending frames:
> frame : type(1) op(READ)
> frame : type(1) op(OPEN)
> frame : type(0) op(0)
> patchset: git://git.gluster.org/glusterfs.git
> signal received: 6
> time of crash:
> 2019-01-31 09:38:04
> configuration details:
> argp 1
> backtrace 1
> dlfcn 1
> libpthread 1
> llistxattr 1
> setfsid 1
> spinlock 1
> epoll.h 1
> xattr.h 1
> st_atim.tv_nsec 1
> package-string: glusterfs 5.3
> /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7fccd706664c]
> /usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7fccd7070cb6]
> /lib64/libc.so.6(+0x36160)[0x7fccd622d160]
> /lib64/libc.so.6(gsignal+0x110)[0x7fccd622d0e0]
> /lib64/libc.so.6(abort+0x151)[0x7fccd622e6c1]
> /lib64/libc.so.6(+0x2e6fa)[0x7fccd62256fa]
> /lib64/libc.so.6(+0x2e772)[0x7fccd6225772]
> /lib64/libpthread.so.0(pthread_mutex_lock+0x228)[0x7fccd65bb0b8]
>
> /usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x32c4d)[0x7fcccbb01c4d]
>
> /usr/lib64/glusterfs/5.3/xlator/protocol/client.so(+0x65778)[0x7fcccbdd1778]
> /usr/lib64/libgfrpc.so.0(+0xe820)[0x7fccd6e31820]
> /usr/lib64/libgfrpc.so.0(+0xeb6f)[0x7fccd6e31b6f]
> /usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7fccd6e2e063]
> /usr/lib64/glusterfs/5.3/rpc-transport/socket.so(+0xa0b2)[0x7fccd0b7e0b2]
> /usr/lib64/libglusterfs.so.0(+0x854c3)[0x7fccd70c44c3]
> /lib64/libpthread.so.0(+0x7559)[0x7fccd65b8559]
> /lib64/libc.so.6(clone+0x3f)[0x7fccd62ef81f]
> -
>
> Do the pending patches fix the crash or only the repeated warnings? I'm
> running glusterfs on OpenSUSE 15.0 installed via
> http://download.opensuse.org/repositories/home:/glusterfs:/Leap15-5/openSUSE_Leap_15.0/,
> not too sure how to make it core dump.
>
> If it's not fixed by the patches above, has anyone already opened a ticket
> for the crashes that I can join and monitor? This is going to create a
> massive problem for us since production systems are crashing.
>
> Thanks.
>
> Sincerely,
> Artem
>
> --
> Founder, Android Police , APK Mirror
> , Illogical Robot LLC
> beerpla.net | +ArtemRussakovskii
>  | @ArtemR
> 
>
>
> On Wed, Jan 30, 2019 at 6:37 PM Raghavendra Gowdappa 
> wrote:
>
>>
>>
>> On Thu, Jan 31, 2019 at 2:14 AM Artem Russakovskii 
>> wrote:
>>
>>> Also, not sure if related or not, but I got a ton of these "Failed to
>>> dispatch handler" in 

Re: [Gluster-users] Message repeated over and over after upgrade from 4.1 to 5.3: W [dict.c:761:dict_ref] (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) [0x7fd966fcd329] -->/us

2019-01-31 Thread Artem Russakovskii
Within 24 hours after updating from rock solid 4.1 to 5.3, I already got a
crash which others have mentioned in
https://bugzilla.redhat.com/show_bug.cgi?id=1313567 and had to unmount,
kill gluster, and remount:


[2019-01-31 09:38:04.317604] W [dict.c:761:dict_ref]
(-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
[0x7fcccafcd329]
-->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
[0x7fcccb1deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
[0x7fccd705b218] ) 2-dict: dict is NULL [Invalid argument]
[2019-01-31 09:38:04.319308] W [dict.c:761:dict_ref]
(-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
[0x7fcccafcd329]
-->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
[0x7fcccb1deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
[0x7fccd705b218] ) 2-dict: dict is NULL [Invalid argument]
[2019-01-31 09:38:04.320047] W [dict.c:761:dict_ref]
(-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
[0x7fcccafcd329]
-->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
[0x7fcccb1deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
[0x7fccd705b218] ) 2-dict: dict is NULL [Invalid argument]
[2019-01-31 09:38:04.320677] W [dict.c:761:dict_ref]
(-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
[0x7fcccafcd329]
-->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
[0x7fcccb1deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
[0x7fccd705b218] ) 2-dict: dict is NULL [Invalid argument]
The message "I [MSGID: 108031] [afr-common.c:2543:afr_local_discovery_cbk]
2-SITE_data1-replicate-0: selecting local read_child SITE_data1-client-3"
repeated 5 times between [2019-01-31 09:37:54.751905] and [2019-01-31
09:38:03.958061]
The message "E [MSGID: 101191]
[event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to dispatch
handler" repeated 72 times between [2019-01-31 09:37:53.746741] and
[2019-01-31 09:38:04.696993]
pending frames:
frame : type(1) op(READ)
frame : type(1) op(OPEN)
frame : type(0) op(0)
patchset: git://git.gluster.org/glusterfs.git
signal received: 6
time of crash:
2019-01-31 09:38:04
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 5.3
/usr/lib64/libglusterfs.so.0(+0x2764c)[0x7fccd706664c]
/usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7fccd7070cb6]
/lib64/libc.so.6(+0x36160)[0x7fccd622d160]
/lib64/libc.so.6(gsignal+0x110)[0x7fccd622d0e0]
/lib64/libc.so.6(abort+0x151)[0x7fccd622e6c1]
/lib64/libc.so.6(+0x2e6fa)[0x7fccd62256fa]
/lib64/libc.so.6(+0x2e772)[0x7fccd6225772]
/lib64/libpthread.so.0(pthread_mutex_lock+0x228)[0x7fccd65bb0b8]
/usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x32c4d)[0x7fcccbb01c4d]
/usr/lib64/glusterfs/5.3/xlator/protocol/client.so(+0x65778)[0x7fcccbdd1778]
/usr/lib64/libgfrpc.so.0(+0xe820)[0x7fccd6e31820]
/usr/lib64/libgfrpc.so.0(+0xeb6f)[0x7fccd6e31b6f]
/usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7fccd6e2e063]
/usr/lib64/glusterfs/5.3/rpc-transport/socket.so(+0xa0b2)[0x7fccd0b7e0b2]
/usr/lib64/libglusterfs.so.0(+0x854c3)[0x7fccd70c44c3]
/lib64/libpthread.so.0(+0x7559)[0x7fccd65b8559]
/lib64/libc.so.6(clone+0x3f)[0x7fccd62ef81f]
-

Do the pending patches fix the crash or only the repeated warnings? I'm
running glusterfs on OpenSUSE 15.0 installed via
http://download.opensuse.org/repositories/home:/glusterfs:/Leap15-5/openSUSE_Leap_15.0/,
not too sure how to make it core dump.

If it's not fixed by the patches above, has anyone already opened a ticket
for the crashes that I can join and monitor? This is going to create a
massive problem for us since production systems are crashing.

Thanks.

Sincerely,
Artem

--
Founder, Android Police , APK Mirror
, Illogical Robot LLC
beerpla.net | +ArtemRussakovskii
 | @ArtemR



On Wed, Jan 30, 2019 at 6:37 PM Raghavendra Gowdappa 
wrote:

>
>
> On Thu, Jan 31, 2019 at 2:14 AM Artem Russakovskii 
> wrote:
>
>> Also, not sure if related or not, but I got a ton of these "Failed to
>> dispatch handler" in my logs as well. Many people have been commenting
>> about this issue here https://bugzilla.redhat.com/show_bug.cgi?id=1651246
>> .
>>
>
> https://review.gluster.org/#/c/glusterfs/+/22046/ addresses this.
>
>
>> ==> mnt-SITE_data1.log <==
>>> [2019-01-30 20:38:20.783713] W [dict.c:761:dict_ref]
>>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
>>> [0x7fd966fcd329]
>>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
>>> [0x7fd9671deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
>>> [0x7fd9731ea218] ) 2-dict: dict is NULL [Invalid argument]
>>> ==> mnt-SITE_data3.log <==
>>> The message "E [MSGID: 101191]
>>> [event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to dispatch
>>> 

Re: [Gluster-users] Message repeated over and over after upgrade from 4.1 to 5.3: W [dict.c:761:dict_ref] (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) [0x7fd966fcd329] -->/us

2019-01-30 Thread Raghavendra Gowdappa
On Thu, Jan 31, 2019 at 2:14 AM Artem Russakovskii 
wrote:

> Also, not sure if related or not, but I got a ton of these "Failed to
> dispatch handler" in my logs as well. Many people have been commenting
> about this issue here https://bugzilla.redhat.com/show_bug.cgi?id=1651246.
>

https://review.gluster.org/#/c/glusterfs/+/22046/ addresses this.


> ==> mnt-SITE_data1.log <==
>> [2019-01-30 20:38:20.783713] W [dict.c:761:dict_ref]
>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
>> [0x7fd966fcd329]
>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
>> [0x7fd9671deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
>> [0x7fd9731ea218] ) 2-dict: dict is NULL [Invalid argument]
>> ==> mnt-SITE_data3.log <==
>> The message "E [MSGID: 101191]
>> [event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to dispatch
>> handler" repeated 413 times between [2019-01-30 20:36:23.881090] and
>> [2019-01-30 20:38:20.015593]
>> The message "I [MSGID: 108031]
>> [afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data3-replicate-0:
>> selecting local read_child SITE_data3-client-0" repeated 42 times between
>> [2019-01-30 20:36:23.290287] and [2019-01-30 20:38:20.280306]
>> ==> mnt-SITE_data1.log <==
>> The message "I [MSGID: 108031]
>> [afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data1-replicate-0:
>> selecting local read_child SITE_data1-client-0" repeated 50 times between
>> [2019-01-30 20:36:22.247367] and [2019-01-30 20:38:19.459789]
>> The message "E [MSGID: 101191]
>> [event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to dispatch
>> handler" repeated 2654 times between [2019-01-30 20:36:22.667327] and
>> [2019-01-30 20:38:20.546355]
>> [2019-01-30 20:38:21.492319] I [MSGID: 108031]
>> [afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data1-replicate-0:
>> selecting local read_child SITE_data1-client-0
>> ==> mnt-SITE_data3.log <==
>> [2019-01-30 20:38:22.349689] I [MSGID: 108031]
>> [afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data3-replicate-0:
>> selecting local read_child SITE_data3-client-0
>> ==> mnt-SITE_data1.log <==
>> [2019-01-30 20:38:22.762941] E [MSGID: 101191]
>> [event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to dispatch
>> handler
>
>
> I'm hoping raising the issue here on the mailing list may bring some
> additional eyeballs and get them both fixed.
>
> Thanks.
>
> Sincerely,
> Artem
>
> --
> Founder, Android Police , APK Mirror
> , Illogical Robot LLC
> beerpla.net | +ArtemRussakovskii
>  | @ArtemR
> 
>
>
> On Wed, Jan 30, 2019 at 12:26 PM Artem Russakovskii 
> wrote:
>
>> I found a similar issue here:
>> https://bugzilla.redhat.com/show_bug.cgi?id=1313567. There's a comment
>> from 3 days ago from someone else with 5.3 who started seeing the spam.
>>
>> Here's the command that repeats over and over:
>> [2019-01-30 20:23:24.481581] W [dict.c:761:dict_ref]
>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
>> [0x7fd966fcd329]
>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
>> [0x7fd9671deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
>> [0x7fd9731ea218] ) 2-dict: dict is NULL [Invalid argument]
>>
>
+Milind Changire  Can you check why this message is
logged and send a fix?


>> Is there any fix for this issue?
>>
>> Thanks.
>>
>> Sincerely,
>> Artem
>>
>> --
>> Founder, Android Police , APK Mirror
>> , Illogical Robot LLC
>> beerpla.net | +ArtemRussakovskii
>>  | @ArtemR
>> 
>>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Message repeated over and over after upgrade from 4.1 to 5.3: W [dict.c:761:dict_ref] (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) [0x7fd966fcd329] -->/us

2019-01-30 Thread Artem Russakovskii
Also, not sure if related or not, but I got a ton of these "Failed to
dispatch handler" in my logs as well. Many people have been commenting
about this issue here https://bugzilla.redhat.com/show_bug.cgi?id=1651246.

==> mnt-SITE_data1.log <==
> [2019-01-30 20:38:20.783713] W [dict.c:761:dict_ref]
> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
> [0x7fd966fcd329]
> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
> [0x7fd9671deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
> [0x7fd9731ea218] ) 2-dict: dict is NULL [Invalid argument]
> ==> mnt-SITE_data3.log <==
> The message "E [MSGID: 101191]
> [event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to dispatch
> handler" repeated 413 times between [2019-01-30 20:36:23.881090] and
> [2019-01-30 20:38:20.015593]
> The message "I [MSGID: 108031] [afr-common.c:2543:afr_local_discovery_cbk]
> 2-SITE_data3-replicate-0: selecting local read_child SITE_data3-client-0"
> repeated 42 times between [2019-01-30 20:36:23.290287] and [2019-01-30
> 20:38:20.280306]
> ==> mnt-SITE_data1.log <==
> The message "I [MSGID: 108031] [afr-common.c:2543:afr_local_discovery_cbk]
> 2-SITE_data1-replicate-0: selecting local read_child SITE_data1-client-0"
> repeated 50 times between [2019-01-30 20:36:22.247367] and [2019-01-30
> 20:38:19.459789]
> The message "E [MSGID: 101191]
> [event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to dispatch
> handler" repeated 2654 times between [2019-01-30 20:36:22.667327] and
> [2019-01-30 20:38:20.546355]
> [2019-01-30 20:38:21.492319] I [MSGID: 108031]
> [afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data1-replicate-0:
> selecting local read_child SITE_data1-client-0
> ==> mnt-SITE_data3.log <==
> [2019-01-30 20:38:22.349689] I [MSGID: 108031]
> [afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data3-replicate-0:
> selecting local read_child SITE_data3-client-0
> ==> mnt-SITE_data1.log <==
> [2019-01-30 20:38:22.762941] E [MSGID: 101191]
> [event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to dispatch
> handler


I'm hoping raising the issue here on the mailing list may bring some
additional eyeballs and get them both fixed.

Thanks.

Sincerely,
Artem

--
Founder, Android Police , APK Mirror
, Illogical Robot LLC
beerpla.net | +ArtemRussakovskii
 | @ArtemR



On Wed, Jan 30, 2019 at 12:26 PM Artem Russakovskii 
wrote:

> I found a similar issue here:
> https://bugzilla.redhat.com/show_bug.cgi?id=1313567. There's a comment
> from 3 days ago from someone else with 5.3 who started seeing the spam.
>
> Here's the command that repeats over and over:
> [2019-01-30 20:23:24.481581] W [dict.c:761:dict_ref]
> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
> [0x7fd966fcd329]
> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
> [0x7fd9671deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
> [0x7fd9731ea218] ) 2-dict: dict is NULL [Invalid argument]
>
> Is there any fix for this issue?
>
> Thanks.
>
> Sincerely,
> Artem
>
> --
> Founder, Android Police , APK Mirror
> , Illogical Robot LLC
> beerpla.net | +ArtemRussakovskii
>  | @ArtemR
> 
>
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users