subject:"\[Gluster\-users\] Glusterd proccess hangs on reboot"

Re: [Gluster-users] Glusterd proccess hangs on reboot

2017-09-14 Thread Serkan Çoban

I have the %100 CPU usage issue when I restart a glusterd instance and
I do not have null client errors in log.
The issue was related to number of bricks/servers so I decreased the
brick count in volume. It resolved the problem.


On Thu, Sep 14, 2017 at 9:02 AM, Sam McLeod  wrote:
> Hi Serkan,
>
> I was wondering if you resolved your issue with the high CPU usage and hang 
> after starting gluster?
>
> I'm setting up a 3 server (replica 3, arbiter 1), 300 volume, Gluster 3.12 
> cluster on CentOS 7 and am having what looks to be exactly the same issue as 
> you.
>
> With no volumes created CPU usage / load is normal, but after creating all 
> the volumes even with no data CPU and RAM usage keeps creeping up and the 
> logs are filling up with:
>
> [2017-09-14 05:47:45.447772] E [client_t.c:324:gf_client_ref] 
> (-->/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf8) [0x7fe3f2a1b7e8] 
> -->/lib64/libgfrpc.so.0(rpcsvc_request_init+0x7f) [0x7fe3f2a1893f] 
> -->/lib64/libglusterfs.so.0(gf_client_ref+0x179) [0x7fe3f2cb2e59] ) 
> 0-client_t: null client [Invalid argument]
> [2017-09-14 05:47:45.486593] E [client_t.c:324:gf_client_ref] 
> (-->/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf8) [0x7fe3f2a1b7e8] 
> -->/lib64/libgfrpc.so.0(rpcsvc_request_init+0x7f) [0x7fe3f2a1893f] --
>
> etc...
>
> It's not an overly helpful error message as although it says a null client 
> gave an invalid argument, it doesn't state which client and what the argument 
> was.
>
> I've tried strace and valgrind on glusterd as well as starting glusterd with 
> --debug to no avail.
>
> --
> Sam McLeod
> @s_mcleod
> https://smcleod.net
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Glusterd proccess hangs on reboot

2017-09-13 Thread Sam McLeod

Hi Serkan,

I was wondering if you resolved your issue with the high CPU usage and hang 
after starting gluster?

I'm setting up a 3 server (replica 3, arbiter 1), 300 volume, Gluster 3.12 
cluster on CentOS 7 and am having what looks to be exactly the same issue as 
you.

With no volumes created CPU usage / load is normal, but after creating all the 
volumes even with no data CPU and RAM usage keeps creeping up and the logs are 
filling up with:

[2017-09-14 05:47:45.447772] E [client_t.c:324:gf_client_ref] 
(-->/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf8) [0x7fe3f2a1b7e8] 
-->/lib64/libgfrpc.so.0(rpcsvc_request_init+0x7f) [0x7fe3f2a1893f] 
-->/lib64/libglusterfs.so.0(gf_client_ref+0x179) [0x7fe3f2cb2e59] ) 0-client_t: 
null client [Invalid argument]
[2017-09-14 05:47:45.486593] E [client_t.c:324:gf_client_ref] 
(-->/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf8) [0x7fe3f2a1b7e8] 
-->/lib64/libgfrpc.so.0(rpcsvc_request_init+0x7f) [0x7fe3f2a1893f] --

etc...

It's not an overly helpful error message as although it says a null client gave 
an invalid argument, it doesn't state which client and what the argument was.

I've tried strace and valgrind on glusterd as well as starting glusterd with 
--debug to no avail.

--
Sam McLeod 
@s_mcleod
https://smcleod.net

___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Glusterd proccess hangs on reboot

2017-09-05 Thread Serkan Çoban

Ok, I am going for 2x40 server clusters then, thanks for help.

On Tue, Sep 5, 2017 at 4:57 PM, Atin Mukherjee  wrote:
>
>
> On Tue, Sep 5, 2017 at 6:13 PM, Serkan Çoban  wrote:
>>
>> Some corrections about the previous mails. Problem does not happen
>> when no volumes created.
>> Problem happens volumes created but in stopped state. Problem also
>> happens when volumes started state.
>> Below is the 5 stack traces taken by 10 min intervals and volumes stopped
>> state.
>
>
> As I mentioned earlier, this is technically not a *hang* . Due to the costly
> handshaking operations for too many bricks from too many nodes, the glusterd
> takes a quite long amount of time to finish the handshake.
>
>>
>>
>> --1--
>> Thread 8 (Thread 0x7f413f3a7700 (LWP 104249)):
>> #0  0x003d99c0f00d in nanosleep () from /lib64/libpthread.so.0
>> #1  0x7f4146312d57 in gf_timer_proc () from
>> /usr/lib64/libglusterfs.so.0
>> #2  0x003d99c07aa1 in start_thread () from /lib64/libpthread.so.0
>> #3  0x003d998e8bbd in clone () from /lib64/libc.so.6
>> Thread 7 (Thread 0x7f413e9a6700 (LWP 104250)):
>> #0  0x003d99c0f585 in sigwait () from /lib64/libpthread.so.0
>> #1  0x0040643b in glusterfs_sigwaiter ()
>> #2  0x003d99c07aa1 in start_thread () from /lib64/libpthread.so.0
>> #3  0x003d998e8bbd in clone () from /lib64/libc.so.6
>> Thread 6 (Thread 0x7f413dfa5700 (LWP 104251)):
>> #0  0x003d998acc4d in nanosleep () from /lib64/libc.so.6
>> #1  0x003d998acac0 in sleep () from /lib64/libc.so.6
>> #2  0x7f414632d8fb in pool_sweeper () from
>> /usr/lib64/libglusterfs.so.0
>> #3  0x003d99c07aa1 in start_thread () from /lib64/libpthread.so.0
>> #4  0x003d998e8bbd in clone () from /lib64/libc.so.6
>> Thread 5 (Thread 0x7f413d5a4700 (LWP 104252)):
>> #0  0x003d99c0ba5e in pthread_cond_timedwait@@GLIBC_2.3.2 () from
>> /lib64/libpthread.so.0
>> #1  0x7f414633fafc in syncenv_task () from
>> /usr/lib64/libglusterfs.so.0
>> #2  0x7f414634d9f0 in syncenv_processor () from
>> /usr/lib64/libglusterfs.so.0
>> #3  0x003d99c07aa1 in start_thread () from /lib64/libpthread.so.0
>> #4  0x003d998e8bbd in clone () from /lib64/libc.so.6
>> Thread 4 (Thread 0x7f413cba3700 (LWP 104253)):
>> #0  0x003d99c0ba5e in pthread_cond_timedwait@@GLIBC_2.3.2 () from
>> /lib64/libpthread.so.0
>> #1  0x7f414633fafc in syncenv_task () from
>> /usr/lib64/libglusterfs.so.0
>> #2  0x7f414634d9f0 in syncenv_processor () from
>> /usr/lib64/libglusterfs.so.0
>> #3  0x003d99c07aa1 in start_thread () from /lib64/libpthread.so.0
>> #4  0x003d998e8bbd in clone () from /lib64/libc.so.6
>> Thread 3 (Thread 0x7f413aa48700 (LWP 104255)):
>> #0  0x003d99c0b68c in pthread_cond_wait@@GLIBC_2.3.2 () from
>> /lib64/libpthread.so.0
>> #1  0x7f413befb99b in hooks_worker () from
>> /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
>> #2  0x003d99c07aa1 in start_thread () from /lib64/libpthread.so.0
>> #3  0x003d998e8bbd in clone () from /lib64/libc.so.6
>> Thread 2 (Thread 0x7f413a047700 (LWP 104256)):
>> #0  0x7f41462fd43d in dict_lookup_common () from
>> /usr/lib64/libglusterfs.so.0
>> #1  0x7f41462ff33d in dict_set_lk () from /usr/lib64/libglusterfs.so.0
>> #2  0x7f41462ff5f5 in dict_set () from /usr/lib64/libglusterfs.so.0
>> #3  0x7f414630024c in dict_set_str () from
>> /usr/lib64/libglusterfs.so.0
>> #4  0x7f413be75f29 in glusterd_add_volume_to_dict () from
>> /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
>> #5  0x7f413be7647c in glusterd_add_volumes_to_export_dict () from
>> /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
>> #6  0x7f413be8cedf in glusterd_rpc_friend_add () from
>> /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
>> #7  0x7f413be4d8f7 in glusterd_ac_friend_add () from
>> /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
>> #8  0x7f413be4bbb9 in glusterd_friend_sm () from
>> /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
>> #9  0x7f413bea789a in __glusterd_mgmt_hndsk_version_ack_cbk ()
>> from /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
>> #10 0x7f413be8d3ee in glusterd_big_locked_cbk () from
>> /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
>> #11 0x7f41460cfad5 in rpc_clnt_handle_reply () from
>> /usr/lib64/libgfrpc.so.0
>> #12 0x7f41460d0c85 in rpc_clnt_notify () from /usr/lib64/libgfrpc.so.0
>> #13 0x7f41460cbd68 in rpc_transport_notify () from
>> /usr/lib64/libgfrpc.so.0
>> #14 0x7f413ae8dccd in socket_event_poll_in () from
>> /usr/lib64/glusterfs/3.10.5/rpc-transport/socket.so
>> #15 0x7f413ae8effe in socket_event_handler () from
>> /usr/lib64/glusterfs/3.10.5/rpc-transport/socket.so
>> #16 0x7f4146362806 in event_dispatch_epoll_worker () from
>> /usr/lib64/libglusterfs.so.0
>> #17 0x003d99c07aa1 in start_thread () from /lib64/libpthread.so.0
>> #18 0x003d998e8bbd in clone () from /lib64/libc.so.6
>> Thread 1 (Thread 0x7f4145e9e740 (LWP 104248)):
>> #0

Re: [Gluster-users] Glusterd proccess hangs on reboot

2017-09-05 Thread Atin Mukherjee

On Tue, Sep 5, 2017 at 6:13 PM, Serkan Çoban  wrote:

> Some corrections about the previous mails. Problem does not happen
> when no volumes created.
> Problem happens volumes created but in stopped state. Problem also
> happens when volumes started state.
> Below is the 5 stack traces taken by 10 min intervals and volumes stopped
> state.
>

As I mentioned earlier, this is technically not a *hang* . Due to the
costly handshaking operations for too many bricks from too many nodes, the
glusterd takes a quite long amount of time to finish the handshake.


>
> --1--
> Thread 8 (Thread 0x7f413f3a7700 (LWP 104249)):
> #0  0x003d99c0f00d in nanosleep () from /lib64/libpthread.so.0
> #1  0x7f4146312d57 in gf_timer_proc () from
> /usr/lib64/libglusterfs.so.0
> #2  0x003d99c07aa1 in start_thread () from /lib64/libpthread.so.0
> #3  0x003d998e8bbd in clone () from /lib64/libc.so.6
> Thread 7 (Thread 0x7f413e9a6700 (LWP 104250)):
> #0  0x003d99c0f585 in sigwait () from /lib64/libpthread.so.0
> #1  0x0040643b in glusterfs_sigwaiter ()
> #2  0x003d99c07aa1 in start_thread () from /lib64/libpthread.so.0
> #3  0x003d998e8bbd in clone () from /lib64/libc.so.6
> Thread 6 (Thread 0x7f413dfa5700 (LWP 104251)):
> #0  0x003d998acc4d in nanosleep () from /lib64/libc.so.6
> #1  0x003d998acac0 in sleep () from /lib64/libc.so.6
> #2  0x7f414632d8fb in pool_sweeper () from /usr/lib64/libglusterfs.so.0
> #3  0x003d99c07aa1 in start_thread () from /lib64/libpthread.so.0
> #4  0x003d998e8bbd in clone () from /lib64/libc.so.6
> Thread 5 (Thread 0x7f413d5a4700 (LWP 104252)):
> #0  0x003d99c0ba5e in pthread_cond_timedwait@@GLIBC_2.3.2 () from
> /lib64/libpthread.so.0
> #1  0x7f414633fafc in syncenv_task () from /usr/lib64/libglusterfs.so.0
> #2  0x7f414634d9f0 in syncenv_processor () from
> /usr/lib64/libglusterfs.so.0
> #3  0x003d99c07aa1 in start_thread () from /lib64/libpthread.so.0
> #4  0x003d998e8bbd in clone () from /lib64/libc.so.6
> Thread 4 (Thread 0x7f413cba3700 (LWP 104253)):
> #0  0x003d99c0ba5e in pthread_cond_timedwait@@GLIBC_2.3.2 () from
> /lib64/libpthread.so.0
> #1  0x7f414633fafc in syncenv_task () from /usr/lib64/libglusterfs.so.0
> #2  0x7f414634d9f0 in syncenv_processor () from
> /usr/lib64/libglusterfs.so.0
> #3  0x003d99c07aa1 in start_thread () from /lib64/libpthread.so.0
> #4  0x003d998e8bbd in clone () from /lib64/libc.so.6
> Thread 3 (Thread 0x7f413aa48700 (LWP 104255)):
> #0  0x003d99c0b68c in pthread_cond_wait@@GLIBC_2.3.2 () from
> /lib64/libpthread.so.0
> #1  0x7f413befb99b in hooks_worker () from
> /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
> #2  0x003d99c07aa1 in start_thread () from /lib64/libpthread.so.0
> #3  0x003d998e8bbd in clone () from /lib64/libc.so.6
> Thread 2 (Thread 0x7f413a047700 (LWP 104256)):
> #0  0x7f41462fd43d in dict_lookup_common () from
> /usr/lib64/libglusterfs.so.0
> #1  0x7f41462ff33d in dict_set_lk () from /usr/lib64/libglusterfs.so.0
> #2  0x7f41462ff5f5 in dict_set () from /usr/lib64/libglusterfs.so.0
> #3  0x7f414630024c in dict_set_str () from /usr/lib64/libglusterfs.so.0
> #4  0x7f413be75f29 in glusterd_add_volume_to_dict () from
> /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
> #5  0x7f413be7647c in glusterd_add_volumes_to_export_dict () from
> /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
> #6  0x7f413be8cedf in glusterd_rpc_friend_add () from
> /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
> #7  0x7f413be4d8f7 in glusterd_ac_friend_add () from
> /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
> #8  0x7f413be4bbb9 in glusterd_friend_sm () from
> /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
> #9  0x7f413bea789a in __glusterd_mgmt_hndsk_version_ack_cbk ()
> from /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
> #10 0x7f413be8d3ee in glusterd_big_locked_cbk () from
> /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
> #11 0x7f41460cfad5 in rpc_clnt_handle_reply () from
> /usr/lib64/libgfrpc.so.0
> #12 0x7f41460d0c85 in rpc_clnt_notify () from /usr/lib64/libgfrpc.so.0
> #13 0x7f41460cbd68 in rpc_transport_notify () from
> /usr/lib64/libgfrpc.so.0
> #14 0x7f413ae8dccd in socket_event_poll_in () from
> /usr/lib64/glusterfs/3.10.5/rpc-transport/socket.so
> #15 0x7f413ae8effe in socket_event_handler () from
> /usr/lib64/glusterfs/3.10.5/rpc-transport/socket.so
> #16 0x7f4146362806 in event_dispatch_epoll_worker () from
> /usr/lib64/libglusterfs.so.0
> #17 0x003d99c07aa1 in start_thread () from /lib64/libpthread.so.0
> #18 0x003d998e8bbd in clone () from /lib64/libc.so.6
> Thread 1 (Thread 0x7f4145e9e740 (LWP 104248)):
> #0  0x003d99c082fd in pthread_join () from /lib64/libpthread.so.0
> #1  0x7f41463622d5 in event_dispatch_epoll () from
> /usr/lib64/libglusterfs.so.0
> #2  0x00409020 in main ()
>
> --2--
> Thread 8 (Thread 0x7f413f3a77

Re: [Gluster-users] Glusterd proccess hangs on reboot

2017-09-05 Thread Serkan Çoban

Some corrections about the previous mails. Problem does not happen
when no volumes created.
Problem happens volumes created but in stopped state. Problem also
happens when volumes started state.
Below is the 5 stack traces taken by 10 min intervals and volumes stopped state.


--1--
Thread 8 (Thread 0x7f413f3a7700 (LWP 104249)):
#0  0x003d99c0f00d in nanosleep () from /lib64/libpthread.so.0
#1  0x7f4146312d57 in gf_timer_proc () from /usr/lib64/libglusterfs.so.0
#2  0x003d99c07aa1 in start_thread () from /lib64/libpthread.so.0
#3  0x003d998e8bbd in clone () from /lib64/libc.so.6
Thread 7 (Thread 0x7f413e9a6700 (LWP 104250)):
#0  0x003d99c0f585 in sigwait () from /lib64/libpthread.so.0
#1  0x0040643b in glusterfs_sigwaiter ()
#2  0x003d99c07aa1 in start_thread () from /lib64/libpthread.so.0
#3  0x003d998e8bbd in clone () from /lib64/libc.so.6
Thread 6 (Thread 0x7f413dfa5700 (LWP 104251)):
#0  0x003d998acc4d in nanosleep () from /lib64/libc.so.6
#1  0x003d998acac0 in sleep () from /lib64/libc.so.6
#2  0x7f414632d8fb in pool_sweeper () from /usr/lib64/libglusterfs.so.0
#3  0x003d99c07aa1 in start_thread () from /lib64/libpthread.so.0
#4  0x003d998e8bbd in clone () from /lib64/libc.so.6
Thread 5 (Thread 0x7f413d5a4700 (LWP 104252)):
#0  0x003d99c0ba5e in pthread_cond_timedwait@@GLIBC_2.3.2 () from
/lib64/libpthread.so.0
#1  0x7f414633fafc in syncenv_task () from /usr/lib64/libglusterfs.so.0
#2  0x7f414634d9f0 in syncenv_processor () from /usr/lib64/libglusterfs.so.0
#3  0x003d99c07aa1 in start_thread () from /lib64/libpthread.so.0
#4  0x003d998e8bbd in clone () from /lib64/libc.so.6
Thread 4 (Thread 0x7f413cba3700 (LWP 104253)):
#0  0x003d99c0ba5e in pthread_cond_timedwait@@GLIBC_2.3.2 () from
/lib64/libpthread.so.0
#1  0x7f414633fafc in syncenv_task () from /usr/lib64/libglusterfs.so.0
#2  0x7f414634d9f0 in syncenv_processor () from /usr/lib64/libglusterfs.so.0
#3  0x003d99c07aa1 in start_thread () from /lib64/libpthread.so.0
#4  0x003d998e8bbd in clone () from /lib64/libc.so.6
Thread 3 (Thread 0x7f413aa48700 (LWP 104255)):
#0  0x003d99c0b68c in pthread_cond_wait@@GLIBC_2.3.2 () from
/lib64/libpthread.so.0
#1  0x7f413befb99b in hooks_worker () from
/usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
#2  0x003d99c07aa1 in start_thread () from /lib64/libpthread.so.0
#3  0x003d998e8bbd in clone () from /lib64/libc.so.6
Thread 2 (Thread 0x7f413a047700 (LWP 104256)):
#0  0x7f41462fd43d in dict_lookup_common () from
/usr/lib64/libglusterfs.so.0
#1  0x7f41462ff33d in dict_set_lk () from /usr/lib64/libglusterfs.so.0
#2  0x7f41462ff5f5 in dict_set () from /usr/lib64/libglusterfs.so.0
#3  0x7f414630024c in dict_set_str () from /usr/lib64/libglusterfs.so.0
#4  0x7f413be75f29 in glusterd_add_volume_to_dict () from
/usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
#5  0x7f413be7647c in glusterd_add_volumes_to_export_dict () from
/usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
#6  0x7f413be8cedf in glusterd_rpc_friend_add () from
/usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
#7  0x7f413be4d8f7 in glusterd_ac_friend_add () from
/usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
#8  0x7f413be4bbb9 in glusterd_friend_sm () from
/usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
#9  0x7f413bea789a in __glusterd_mgmt_hndsk_version_ack_cbk ()
from /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
#10 0x7f413be8d3ee in glusterd_big_locked_cbk () from
/usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
#11 0x7f41460cfad5 in rpc_clnt_handle_reply () from /usr/lib64/libgfrpc.so.0
#12 0x7f41460d0c85 in rpc_clnt_notify () from /usr/lib64/libgfrpc.so.0
#13 0x7f41460cbd68 in rpc_transport_notify () from /usr/lib64/libgfrpc.so.0
#14 0x7f413ae8dccd in socket_event_poll_in () from
/usr/lib64/glusterfs/3.10.5/rpc-transport/socket.so
#15 0x7f413ae8effe in socket_event_handler () from
/usr/lib64/glusterfs/3.10.5/rpc-transport/socket.so
#16 0x7f4146362806 in event_dispatch_epoll_worker () from
/usr/lib64/libglusterfs.so.0
#17 0x003d99c07aa1 in start_thread () from /lib64/libpthread.so.0
#18 0x003d998e8bbd in clone () from /lib64/libc.so.6
Thread 1 (Thread 0x7f4145e9e740 (LWP 104248)):
#0  0x003d99c082fd in pthread_join () from /lib64/libpthread.so.0
#1  0x7f41463622d5 in event_dispatch_epoll () from
/usr/lib64/libglusterfs.so.0
#2  0x00409020 in main ()

--2--
Thread 8 (Thread 0x7f413f3a7700 (LWP 104249)):
#0  0x003d99c0b68c in pthread_cond_wait@@GLIBC_2.3.2 () from
/lib64/libpthread.so.0
#1  0x7f41463405cb in __synclock_lock () from /usr/lib64/libglusterfs.so.0
#2  0x7f41463407ae in synclock_lock () from /usr/lib64/libglusterfs.so.0
#3  0x7f413be8d3df in glusterd_big_locked_cbk () from
/usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
#4  0x7f41460d04c4 in call_bail () from /usr/lib64/libgfrpc.so.0
#5  0x

Re: [Gluster-users] Glusterd proccess hangs on reboot

2017-09-04 Thread Atin Mukherjee

On Mon, 4 Sep 2017 at 20:04, Serkan Çoban  wrote:

> I have been using a 60 server 1560 brick 3.7.11 cluster without
> problems for 1 years. I did not see this problem with it.
> Note that this problem does not happen when I install packages & start
> glusterd & peer probe and create the volumes. But after glusterd
> restart.
>
> Also note that this still happens without any volumes. So it is not
> related with brick count I think...


The backtrace you shared earlier involves code path where all brick details
are synced up. So I'd be really interested to see the backtrace of this
when there are no volumes associated.


>
> On Mon, Sep 4, 2017 at 5:08 PM, Atin Mukherjee 
> wrote:
> >
> >
> > On Mon, Sep 4, 2017 at 5:28 PM, Serkan Çoban 
> wrote:
> >>
> >> >1. On 80 nodes cluster, did you reboot only one node or multiple ones?
> >> Tried both, result is same, but the logs/stacks are from stopping and
> >> starting glusterd only on one server while others are running.
> >>
> >> >2. Are you sure that pstack output was always constantly pointing on
> >> > strcmp being stuck?
> >> It stays 70-80 minutes in %100 cpu consuming state, the stacks I send
> >> is from first 5-10 minutes. I will capture stack traces with 10
> >> minutes waits and send them to you tomorrow. Also with 40 servers It
> >> stays that way for 5 minutes and then returns to normal.
> >>
> >> >3. Are you absolutely sure even after few hours glusterd is stuck at
> the
> >> > same point?
> >> It goes to normal state after 70-80 minutes and I can run cluster
> >> commands after that. I will check this again to be sure..
> >
> >
> > So this is scalability issue you're hitting with current glusterd's
> design.
> > As I mentioned earlier, peer handshaking can be a really costly
> operations
> > based on you scale the cluster and hence you might experience a huge
> delay
> > in the node bringing up all the services and be operational.
> >
> >>
> >> On Mon, Sep 4, 2017 at 1:43 PM, Atin Mukherjee 
> >> wrote:
> >> >
> >> >
> >> > On Fri, Sep 1, 2017 at 8:47 AM, Milind Changire 
> >> > wrote:
> >> >>
> >> >> Serkan,
> >> >> I have gone through other mails in the mail thread as well but
> >> >> responding
> >> >> to this one specifically.
> >> >>
> >> >> Is this a source install or an RPM install ?
> >> >> If this is an RPM install, could you please install the
> >> >> glusterfs-debuginfo RPM and retry to capture the gdb backtrace.
> >> >>
> >> >> If this is a source install, then you'll need to configure the build
> >> >> with
> >> >> --enable-debug and reinstall and retry capturing the gdb backtrace.
> >> >>
> >> >> Having the debuginfo package or a debug build helps to resolve the
> >> >> function names and/or line numbers.
> >> >> --
> >> >> Milind
> >> >>
> >> >>
> >> >>
> >> >> On Thu, Aug 24, 2017 at 11:19 AM, Serkan Çoban <
> cobanser...@gmail.com>
> >> >> wrote:
> >> >>>
> >> >>> Here you can find 10 stack trace samples from glusterd. I wait 10
> >> >>> seconds between each trace.
> >> >>> https://www.dropbox.com/s/9f36goq5xn3p1yt/glusterd_pstack.zip?dl=0
> >> >>>
> >> >>> Content of the first stack trace is here:
> >> >>>
> >> >>> Thread 8 (Thread 0x7f7a8cd4e700 (LWP 43069)):
> >> >>> #0  0x003aa5c0f00d in nanosleep () from /lib64/libpthread.so.0
> >> >>> #1  0x00303f837d57 in ?? () from /usr/lib64/libglusterfs.so.0
> >> >>> #2  0x003aa5c07aa1 in start_thread () from
> /lib64/libpthread.so.0
> >> >>> #3  0x003aa58e8bbd in clone () from /lib64/libc.so.6
> >> >>> Thread 7 (Thread 0x7f7a8c34d700 (LWP 43070)):
> >> >>> #0  0x003aa5c0f585 in sigwait () from /lib64/libpthread.so.0
> >> >>> #1  0x0040643b in glusterfs_sigwaiter ()
> >> >>> #2  0x003aa5c07aa1 in start_thread () from
> /lib64/libpthread.so.0
> >> >>> #3  0x003aa58e8bbd in clone () from /lib64/libc.so.6
> >> >>> Thread 6 (Thread 0x7f7a8b94c700 (LWP 43071)):
> >> >>> #0  0x003aa58acc4d in nanosleep () from /lib64/libc.so.6
> >> >>> #1  0x003aa58acac0 in sleep () from /lib64/libc.so.6
> >> >>> #2  0x00303f8528fb in pool_sweeper () from
> >> >>> /usr/lib64/libglusterfs.so.0
> >> >>> #3  0x003aa5c07aa1 in start_thread () from
> /lib64/libpthread.so.0
> >> >>> #4  0x003aa58e8bbd in clone () from /lib64/libc.so.6
> >> >>> Thread 5 (Thread 0x7f7a8af4b700 (LWP 43072)):
> >> >>> #0  0x003aa5c0ba5e in pthread_cond_timedwait@@GLIBC_2.3.2 ()
> from
> >> >>> /lib64/libpthread.so.0
> >> >>> #1  0x00303f864afc in syncenv_task () from
> >> >>> /usr/lib64/libglusterfs.so.0
> >> >>> #2  0x00303f8729f0 in syncenv_processor () from
> >> >>> /usr/lib64/libglusterfs.so.0
> >> >>> #3  0x003aa5c07aa1 in start_thread () from
> /lib64/libpthread.so.0
> >> >>> #4  0x003aa58e8bbd in clone () from /lib64/libc.so.6
> >> >>> Thread 4 (Thread 0x7f7a8a54a700 (LWP 43073)):
> >> >>> #0  0x003aa5c0ba5e in pthread_cond_timedwait@@GLIBC_2.3.2 ()
> from
> >> >>> /lib64/libpthread.so.0
> >> >>> #1  0x00303f864afc in syncenv_task () from
> >> >>

Re: [Gluster-users] Glusterd proccess hangs on reboot

2017-09-04 Thread Serkan Çoban

I have been using a 60 server 1560 brick 3.7.11 cluster without
problems for 1 years. I did not see this problem with it.
Note that this problem does not happen when I install packages & start
glusterd & peer probe and create the volumes. But after glusterd
restart.

Also note that this still happens without any volumes. So it is not
related with brick count I think...

On Mon, Sep 4, 2017 at 5:08 PM, Atin Mukherjee  wrote:
>
>
> On Mon, Sep 4, 2017 at 5:28 PM, Serkan Çoban  wrote:
>>
>> >1. On 80 nodes cluster, did you reboot only one node or multiple ones?
>> Tried both, result is same, but the logs/stacks are from stopping and
>> starting glusterd only on one server while others are running.
>>
>> >2. Are you sure that pstack output was always constantly pointing on
>> > strcmp being stuck?
>> It stays 70-80 minutes in %100 cpu consuming state, the stacks I send
>> is from first 5-10 minutes. I will capture stack traces with 10
>> minutes waits and send them to you tomorrow. Also with 40 servers It
>> stays that way for 5 minutes and then returns to normal.
>>
>> >3. Are you absolutely sure even after few hours glusterd is stuck at the
>> > same point?
>> It goes to normal state after 70-80 minutes and I can run cluster
>> commands after that. I will check this again to be sure..
>
>
> So this is scalability issue you're hitting with current glusterd's design.
> As I mentioned earlier, peer handshaking can be a really costly operations
> based on you scale the cluster and hence you might experience a huge delay
> in the node bringing up all the services and be operational.
>
>>
>> On Mon, Sep 4, 2017 at 1:43 PM, Atin Mukherjee 
>> wrote:
>> >
>> >
>> > On Fri, Sep 1, 2017 at 8:47 AM, Milind Changire 
>> > wrote:
>> >>
>> >> Serkan,
>> >> I have gone through other mails in the mail thread as well but
>> >> responding
>> >> to this one specifically.
>> >>
>> >> Is this a source install or an RPM install ?
>> >> If this is an RPM install, could you please install the
>> >> glusterfs-debuginfo RPM and retry to capture the gdb backtrace.
>> >>
>> >> If this is a source install, then you'll need to configure the build
>> >> with
>> >> --enable-debug and reinstall and retry capturing the gdb backtrace.
>> >>
>> >> Having the debuginfo package or a debug build helps to resolve the
>> >> function names and/or line numbers.
>> >> --
>> >> Milind
>> >>
>> >>
>> >>
>> >> On Thu, Aug 24, 2017 at 11:19 AM, Serkan Çoban 
>> >> wrote:
>> >>>
>> >>> Here you can find 10 stack trace samples from glusterd. I wait 10
>> >>> seconds between each trace.
>> >>> https://www.dropbox.com/s/9f36goq5xn3p1yt/glusterd_pstack.zip?dl=0
>> >>>
>> >>> Content of the first stack trace is here:
>> >>>
>> >>> Thread 8 (Thread 0x7f7a8cd4e700 (LWP 43069)):
>> >>> #0  0x003aa5c0f00d in nanosleep () from /lib64/libpthread.so.0
>> >>> #1  0x00303f837d57 in ?? () from /usr/lib64/libglusterfs.so.0
>> >>> #2  0x003aa5c07aa1 in start_thread () from /lib64/libpthread.so.0
>> >>> #3  0x003aa58e8bbd in clone () from /lib64/libc.so.6
>> >>> Thread 7 (Thread 0x7f7a8c34d700 (LWP 43070)):
>> >>> #0  0x003aa5c0f585 in sigwait () from /lib64/libpthread.so.0
>> >>> #1  0x0040643b in glusterfs_sigwaiter ()
>> >>> #2  0x003aa5c07aa1 in start_thread () from /lib64/libpthread.so.0
>> >>> #3  0x003aa58e8bbd in clone () from /lib64/libc.so.6
>> >>> Thread 6 (Thread 0x7f7a8b94c700 (LWP 43071)):
>> >>> #0  0x003aa58acc4d in nanosleep () from /lib64/libc.so.6
>> >>> #1  0x003aa58acac0 in sleep () from /lib64/libc.so.6
>> >>> #2  0x00303f8528fb in pool_sweeper () from
>> >>> /usr/lib64/libglusterfs.so.0
>> >>> #3  0x003aa5c07aa1 in start_thread () from /lib64/libpthread.so.0
>> >>> #4  0x003aa58e8bbd in clone () from /lib64/libc.so.6
>> >>> Thread 5 (Thread 0x7f7a8af4b700 (LWP 43072)):
>> >>> #0  0x003aa5c0ba5e in pthread_cond_timedwait@@GLIBC_2.3.2 () from
>> >>> /lib64/libpthread.so.0
>> >>> #1  0x00303f864afc in syncenv_task () from
>> >>> /usr/lib64/libglusterfs.so.0
>> >>> #2  0x00303f8729f0 in syncenv_processor () from
>> >>> /usr/lib64/libglusterfs.so.0
>> >>> #3  0x003aa5c07aa1 in start_thread () from /lib64/libpthread.so.0
>> >>> #4  0x003aa58e8bbd in clone () from /lib64/libc.so.6
>> >>> Thread 4 (Thread 0x7f7a8a54a700 (LWP 43073)):
>> >>> #0  0x003aa5c0ba5e in pthread_cond_timedwait@@GLIBC_2.3.2 () from
>> >>> /lib64/libpthread.so.0
>> >>> #1  0x00303f864afc in syncenv_task () from
>> >>> /usr/lib64/libglusterfs.so.0
>> >>> #2  0x00303f8729f0 in syncenv_processor () from
>> >>> /usr/lib64/libglusterfs.so.0
>> >>> #3  0x003aa5c07aa1 in start_thread () from /lib64/libpthread.so.0
>> >>> #4  0x003aa58e8bbd in clone () from /lib64/libc.so.6
>> >>> Thread 3 (Thread 0x7f7a886ac700 (LWP 43075)):
>> >>> #0  0x003aa5c0b68c in pthread_cond_wait@@GLIBC_2.3.2 () from
>> >>> /lib64/libpthread.so.0
>> >>> #1  0x7f7a898a099b in ?? () from
>> >>> /usr/lib64/glusterfs/3.1

Re: [Gluster-users] Glusterd proccess hangs on reboot

2017-09-04 Thread Atin Mukherjee

On Mon, Sep 4, 2017 at 5:28 PM, Serkan Çoban  wrote:

> >1. On 80 nodes cluster, did you reboot only one node or multiple ones?
> Tried both, result is same, but the logs/stacks are from stopping and
> starting glusterd only on one server while others are running.
>
> >2. Are you sure that pstack output was always constantly pointing on
> strcmp being stuck?
> It stays 70-80 minutes in %100 cpu consuming state, the stacks I send
> is from first 5-10 minutes. I will capture stack traces with 10
> minutes waits and send them to you tomorrow. Also with 40 servers It
> stays that way for 5 minutes and then returns to normal.
>
> >3. Are you absolutely sure even after few hours glusterd is stuck at the
> same point?
> It goes to normal state after 70-80 minutes and I can run cluster
> commands after that. I will check this again to be sure..
>

So this is scalability issue you're hitting with current glusterd's design.
As I mentioned earlier, peer handshaking can be a really costly operations
based on you scale the cluster and hence you might experience a huge delay
in the node bringing up all the services and be operational.


> On Mon, Sep 4, 2017 at 1:43 PM, Atin Mukherjee 
> wrote:
> >
> >
> > On Fri, Sep 1, 2017 at 8:47 AM, Milind Changire 
> wrote:
> >>
> >> Serkan,
> >> I have gone through other mails in the mail thread as well but
> responding
> >> to this one specifically.
> >>
> >> Is this a source install or an RPM install ?
> >> If this is an RPM install, could you please install the
> >> glusterfs-debuginfo RPM and retry to capture the gdb backtrace.
> >>
> >> If this is a source install, then you'll need to configure the build
> with
> >> --enable-debug and reinstall and retry capturing the gdb backtrace.
> >>
> >> Having the debuginfo package or a debug build helps to resolve the
> >> function names and/or line numbers.
> >> --
> >> Milind
> >>
> >>
> >>
> >> On Thu, Aug 24, 2017 at 11:19 AM, Serkan Çoban 
> >> wrote:
> >>>
> >>> Here you can find 10 stack trace samples from glusterd. I wait 10
> >>> seconds between each trace.
> >>> https://www.dropbox.com/s/9f36goq5xn3p1yt/glusterd_pstack.zip?dl=0
> >>>
> >>> Content of the first stack trace is here:
> >>>
> >>> Thread 8 (Thread 0x7f7a8cd4e700 (LWP 43069)):
> >>> #0  0x003aa5c0f00d in nanosleep () from /lib64/libpthread.so.0
> >>> #1  0x00303f837d57 in ?? () from /usr/lib64/libglusterfs.so.0
> >>> #2  0x003aa5c07aa1 in start_thread () from /lib64/libpthread.so.0
> >>> #3  0x003aa58e8bbd in clone () from /lib64/libc.so.6
> >>> Thread 7 (Thread 0x7f7a8c34d700 (LWP 43070)):
> >>> #0  0x003aa5c0f585 in sigwait () from /lib64/libpthread.so.0
> >>> #1  0x0040643b in glusterfs_sigwaiter ()
> >>> #2  0x003aa5c07aa1 in start_thread () from /lib64/libpthread.so.0
> >>> #3  0x003aa58e8bbd in clone () from /lib64/libc.so.6
> >>> Thread 6 (Thread 0x7f7a8b94c700 (LWP 43071)):
> >>> #0  0x003aa58acc4d in nanosleep () from /lib64/libc.so.6
> >>> #1  0x003aa58acac0 in sleep () from /lib64/libc.so.6
> >>> #2  0x00303f8528fb in pool_sweeper () from
> >>> /usr/lib64/libglusterfs.so.0
> >>> #3  0x003aa5c07aa1 in start_thread () from /lib64/libpthread.so.0
> >>> #4  0x003aa58e8bbd in clone () from /lib64/libc.so.6
> >>> Thread 5 (Thread 0x7f7a8af4b700 (LWP 43072)):
> >>> #0  0x003aa5c0ba5e in pthread_cond_timedwait@@GLIBC_2.3.2 () from
> >>> /lib64/libpthread.so.0
> >>> #1  0x00303f864afc in syncenv_task () from
> >>> /usr/lib64/libglusterfs.so.0
> >>> #2  0x00303f8729f0 in syncenv_processor () from
> >>> /usr/lib64/libglusterfs.so.0
> >>> #3  0x003aa5c07aa1 in start_thread () from /lib64/libpthread.so.0
> >>> #4  0x003aa58e8bbd in clone () from /lib64/libc.so.6
> >>> Thread 4 (Thread 0x7f7a8a54a700 (LWP 43073)):
> >>> #0  0x003aa5c0ba5e in pthread_cond_timedwait@@GLIBC_2.3.2 () from
> >>> /lib64/libpthread.so.0
> >>> #1  0x00303f864afc in syncenv_task () from
> >>> /usr/lib64/libglusterfs.so.0
> >>> #2  0x00303f8729f0 in syncenv_processor () from
> >>> /usr/lib64/libglusterfs.so.0
> >>> #3  0x003aa5c07aa1 in start_thread () from /lib64/libpthread.so.0
> >>> #4  0x003aa58e8bbd in clone () from /lib64/libc.so.6
> >>> Thread 3 (Thread 0x7f7a886ac700 (LWP 43075)):
> >>> #0  0x003aa5c0b68c in pthread_cond_wait@@GLIBC_2.3.2 () from
> >>> /lib64/libpthread.so.0
> >>> #1  0x7f7a898a099b in ?? () from
> >>> /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
> >>> #2  0x003aa5c07aa1 in start_thread () from /lib64/libpthread.so.0
> >>> #3  0x003aa58e8bbd in clone () from /lib64/libc.so.6
> >>> Thread 2 (Thread 0x7f7a87cab700 (LWP 43076)):
> >>> #0  0x003aa5928692 in __strcmp_sse42 () from /lib64/libc.so.6
> >>> #1  0x00303f82244a in ?? () from /usr/lib64/libglusterfs.so.0
> >>> #2  0x00303f82433d in ?? () from /usr/lib64/libglusterfs.so.0
> >>> #3  0x00303f8245f5 in dict_set () from /usr/lib64/libglusterfs.so.0
> >>> #4  0x00303f82524c in dict_s

Re: [Gluster-users] Glusterd proccess hangs on reboot

2017-09-04 Thread Serkan Çoban

>1. On 80 nodes cluster, did you reboot only one node or multiple ones?
Tried both, result is same, but the logs/stacks are from stopping and
starting glusterd only on one server while others are running.

>2. Are you sure that pstack output was always constantly pointing on strcmp 
>being stuck?
It stays 70-80 minutes in %100 cpu consuming state, the stacks I send
is from first 5-10 minutes. I will capture stack traces with 10
minutes waits and send them to you tomorrow. Also with 40 servers It
stays that way for 5 minutes and then returns to normal.

>3. Are you absolutely sure even after few hours glusterd is stuck at the same 
>point?
It goes to normal state after 70-80 minutes and I can run cluster
commands after that. I will check this again to be sure..

On Mon, Sep 4, 2017 at 1:43 PM, Atin Mukherjee  wrote:
>
>
> On Fri, Sep 1, 2017 at 8:47 AM, Milind Changire  wrote:
>>
>> Serkan,
>> I have gone through other mails in the mail thread as well but responding
>> to this one specifically.
>>
>> Is this a source install or an RPM install ?
>> If this is an RPM install, could you please install the
>> glusterfs-debuginfo RPM and retry to capture the gdb backtrace.
>>
>> If this is a source install, then you'll need to configure the build with
>> --enable-debug and reinstall and retry capturing the gdb backtrace.
>>
>> Having the debuginfo package or a debug build helps to resolve the
>> function names and/or line numbers.
>> --
>> Milind
>>
>>
>>
>> On Thu, Aug 24, 2017 at 11:19 AM, Serkan Çoban 
>> wrote:
>>>
>>> Here you can find 10 stack trace samples from glusterd. I wait 10
>>> seconds between each trace.
>>> https://www.dropbox.com/s/9f36goq5xn3p1yt/glusterd_pstack.zip?dl=0
>>>
>>> Content of the first stack trace is here:
>>>
>>> Thread 8 (Thread 0x7f7a8cd4e700 (LWP 43069)):
>>> #0  0x003aa5c0f00d in nanosleep () from /lib64/libpthread.so.0
>>> #1  0x00303f837d57 in ?? () from /usr/lib64/libglusterfs.so.0
>>> #2  0x003aa5c07aa1 in start_thread () from /lib64/libpthread.so.0
>>> #3  0x003aa58e8bbd in clone () from /lib64/libc.so.6
>>> Thread 7 (Thread 0x7f7a8c34d700 (LWP 43070)):
>>> #0  0x003aa5c0f585 in sigwait () from /lib64/libpthread.so.0
>>> #1  0x0040643b in glusterfs_sigwaiter ()
>>> #2  0x003aa5c07aa1 in start_thread () from /lib64/libpthread.so.0
>>> #3  0x003aa58e8bbd in clone () from /lib64/libc.so.6
>>> Thread 6 (Thread 0x7f7a8b94c700 (LWP 43071)):
>>> #0  0x003aa58acc4d in nanosleep () from /lib64/libc.so.6
>>> #1  0x003aa58acac0 in sleep () from /lib64/libc.so.6
>>> #2  0x00303f8528fb in pool_sweeper () from
>>> /usr/lib64/libglusterfs.so.0
>>> #3  0x003aa5c07aa1 in start_thread () from /lib64/libpthread.so.0
>>> #4  0x003aa58e8bbd in clone () from /lib64/libc.so.6
>>> Thread 5 (Thread 0x7f7a8af4b700 (LWP 43072)):
>>> #0  0x003aa5c0ba5e in pthread_cond_timedwait@@GLIBC_2.3.2 () from
>>> /lib64/libpthread.so.0
>>> #1  0x00303f864afc in syncenv_task () from
>>> /usr/lib64/libglusterfs.so.0
>>> #2  0x00303f8729f0 in syncenv_processor () from
>>> /usr/lib64/libglusterfs.so.0
>>> #3  0x003aa5c07aa1 in start_thread () from /lib64/libpthread.so.0
>>> #4  0x003aa58e8bbd in clone () from /lib64/libc.so.6
>>> Thread 4 (Thread 0x7f7a8a54a700 (LWP 43073)):
>>> #0  0x003aa5c0ba5e in pthread_cond_timedwait@@GLIBC_2.3.2 () from
>>> /lib64/libpthread.so.0
>>> #1  0x00303f864afc in syncenv_task () from
>>> /usr/lib64/libglusterfs.so.0
>>> #2  0x00303f8729f0 in syncenv_processor () from
>>> /usr/lib64/libglusterfs.so.0
>>> #3  0x003aa5c07aa1 in start_thread () from /lib64/libpthread.so.0
>>> #4  0x003aa58e8bbd in clone () from /lib64/libc.so.6
>>> Thread 3 (Thread 0x7f7a886ac700 (LWP 43075)):
>>> #0  0x003aa5c0b68c in pthread_cond_wait@@GLIBC_2.3.2 () from
>>> /lib64/libpthread.so.0
>>> #1  0x7f7a898a099b in ?? () from
>>> /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
>>> #2  0x003aa5c07aa1 in start_thread () from /lib64/libpthread.so.0
>>> #3  0x003aa58e8bbd in clone () from /lib64/libc.so.6
>>> Thread 2 (Thread 0x7f7a87cab700 (LWP 43076)):
>>> #0  0x003aa5928692 in __strcmp_sse42 () from /lib64/libc.so.6
>>> #1  0x00303f82244a in ?? () from /usr/lib64/libglusterfs.so.0
>>> #2  0x00303f82433d in ?? () from /usr/lib64/libglusterfs.so.0
>>> #3  0x00303f8245f5 in dict_set () from /usr/lib64/libglusterfs.so.0
>>> #4  0x00303f82524c in dict_set_str () from
>>> /usr/lib64/libglusterfs.so.0
>>> #5  0x7f7a898da7fd in ?? () from
>>> /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
>>> #6  0x7f7a8981b0df in ?? () from
>>> /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
>>> #7  0x7f7a8981b47c in ?? () from
>>> /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
>>> #8  0x7f7a89831edf in ?? () from
>>> /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
>>> #9  0x7f7a897f28f7 in ?? () from
>>> /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
>>>

Re: [Gluster-users] Glusterd proccess hangs on reboot

2017-09-04 Thread Atin Mukherjee

On Fri, Sep 1, 2017 at 8:47 AM, Milind Changire  wrote:

> Serkan,
> I have gone through other mails in the mail thread as well but responding
> to this one specifically.
>
> Is this a source install or an RPM install ?
> If this is an RPM install, could you please install the
> glusterfs-debuginfo RPM and retry to capture the gdb backtrace.
>
> If this is a source install, then you'll need to configure the build with
> --enable-debug and reinstall and retry capturing the gdb backtrace.
>
> Having the debuginfo package or a debug build helps to resolve the
> function names and/or line numbers.
> --
> Milind
>
>
>
> On Thu, Aug 24, 2017 at 11:19 AM, Serkan Çoban 
> wrote:
>
>> Here you can find 10 stack trace samples from glusterd. I wait 10
>> seconds between each trace.
>> https://www.dropbox.com/s/9f36goq5xn3p1yt/glusterd_pstack.zip?dl=0
>>
>> Content of the first stack trace is here:
>>
>> Thread 8 (Thread 0x7f7a8cd4e700 (LWP 43069)):
>> #0  0x003aa5c0f00d in nanosleep () from /lib64/libpthread.so.0
>> #1  0x00303f837d57 in ?? () from /usr/lib64/libglusterfs.so.0
>> #2  0x003aa5c07aa1 in start_thread () from /lib64/libpthread.so.0
>> #3  0x003aa58e8bbd in clone () from /lib64/libc.so.6
>> Thread 7 (Thread 0x7f7a8c34d700 (LWP 43070)):
>> #0  0x003aa5c0f585 in sigwait () from /lib64/libpthread.so.0
>> #1  0x0040643b in glusterfs_sigwaiter ()
>> #2  0x003aa5c07aa1 in start_thread () from /lib64/libpthread.so.0
>> #3  0x003aa58e8bbd in clone () from /lib64/libc.so.6
>> Thread 6 (Thread 0x7f7a8b94c700 (LWP 43071)):
>> #0  0x003aa58acc4d in nanosleep () from /lib64/libc.so.6
>> #1  0x003aa58acac0 in sleep () from /lib64/libc.so.6
>> #2  0x00303f8528fb in pool_sweeper () from
>> /usr/lib64/libglusterfs.so.0
>> #3  0x003aa5c07aa1 in start_thread () from /lib64/libpthread.so.0
>> #4  0x003aa58e8bbd in clone () from /lib64/libc.so.6
>> Thread 5 (Thread 0x7f7a8af4b700 (LWP 43072)):
>> #0  0x003aa5c0ba5e in pthread_cond_timedwait@@GLIBC_2.3.2 () from
>> /lib64/libpthread.so.0
>> #1  0x00303f864afc in syncenv_task () from
>> /usr/lib64/libglusterfs.so.0
>> #2  0x00303f8729f0 in syncenv_processor () from
>> /usr/lib64/libglusterfs.so.0
>> #3  0x003aa5c07aa1 in start_thread () from /lib64/libpthread.so.0
>> #4  0x003aa58e8bbd in clone () from /lib64/libc.so.6
>> Thread 4 (Thread 0x7f7a8a54a700 (LWP 43073)):
>> #0  0x003aa5c0ba5e in pthread_cond_timedwait@@GLIBC_2.3.2 () from
>> /lib64/libpthread.so.0
>> #1  0x00303f864afc in syncenv_task () from
>> /usr/lib64/libglusterfs.so.0
>> #2  0x00303f8729f0 in syncenv_processor () from
>> /usr/lib64/libglusterfs.so.0
>> #3  0x003aa5c07aa1 in start_thread () from /lib64/libpthread.so.0
>> #4  0x003aa58e8bbd in clone () from /lib64/libc.so.6
>> Thread 3 (Thread 0x7f7a886ac700 (LWP 43075)):
>> #0  0x003aa5c0b68c in pthread_cond_wait@@GLIBC_2.3.2 () from
>> /lib64/libpthread.so.0
>> #1  0x7f7a898a099b in ?? () from
>> /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
>> #2  0x003aa5c07aa1 in start_thread () from /lib64/libpthread.so.0
>> #3  0x003aa58e8bbd in clone () from /lib64/libc.so.6
>> Thread 2 (Thread 0x7f7a87cab700 (LWP 43076)):
>> #0  0x003aa5928692 in __strcmp_sse42 () from /lib64/libc.so.6
>> #1  0x00303f82244a in ?? () from /usr/lib64/libglusterfs.so.0
>> #2  0x00303f82433d in ?? () from /usr/lib64/libglusterfs.so.0
>> #3  0x00303f8245f5 in dict_set () from /usr/lib64/libglusterfs.so.0
>> #4  0x00303f82524c in dict_set_str () from
>> /usr/lib64/libglusterfs.so.0
>> #5  0x7f7a898da7fd in ?? () from
>> /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
>> #6  0x7f7a8981b0df in ?? () from
>> /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
>> #7  0x7f7a8981b47c in ?? () from
>> /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
>> #8  0x7f7a89831edf in ?? () from
>> /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
>> #9  0x7f7a897f28f7 in ?? () from
>> /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
>> #10 0x7f7a897f0bb9 in ?? () from
>> /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
>> #11 0x7f7a8984c89a in ?? () from
>> /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
>> #12 0x7f7a898323ee in ?? () from
>> /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
>> #13 0x00303f40fad5 in rpc_clnt_handle_reply () from
>> /usr/lib64/libgfrpc.so.0
>> #14 0x00303f410c85 in rpc_clnt_notify () from /usr/lib64/libgfrpc.so.0
>> #15 0x00303f40bd68 in rpc_transport_notify () from
>> /usr/lib64/libgfrpc.so.0
>> #16 0x7f7a88a6fccd in ?? () from
>> /usr/lib64/glusterfs/3.10.5/rpc-transport/socket.so
>> #17 0x7f7a88a70ffe in ?? () from
>> /usr/lib64/glusterfs/3.10.5/rpc-transport/socket.so
>> #18 0x00303f887806 in ?? () from /usr/lib64/libglusterfs.so.0
>> #19 0x003aa5c07aa1 in start_thread () from /lib64/libpthread.so.0
>> #20 0x003aa58e8bbd in clone () from /lib64/libc.so.

Re: [Gluster-users] Glusterd proccess hangs on reboot

2017-09-03 Thread Ben Turner

- Original Message -
> From: "Serkan Çoban" 
> To: "Ben Turner" 
> Cc: "Gluster Users" 
> Sent: Sunday, September 3, 2017 2:55:06 PM
> Subject: Re: [Gluster-users] Glusterd proccess hangs on reboot
> 
> i usually change event threads to 4. But those logs are from a default
> installation.

Yepo me too, I did alot of the qualification for multi threaded epoll and that 
is what I found to best saturate my back end(12 disk RAID 6 spinners) without 
wasting threads.  Be careful tuning this up too high if you have alot of bricks 
per server, you could run into some contention with all of those threads 
fighting for CPU time.

On the hooks stuff on my system I have:

-rwxr-xr-x. 1 root root 1459 Jun  1 06:35 S29CTDB-teardown.sh
-rwxr-xr-x. 1 root root 1736 Jun  1 06:35 S30samba-stop.sh

Do you have SMB installed on these systems?  IIRC the scripts are only run if 
the service is chkconfigged on, if you don't have SMB installed and chkconfiged 
on I don't think these are the problem.

-b

> 
> On Sun, Sep 3, 2017 at 9:52 PM, Ben Turner  wrote:
> > - Original Message -
> >> From: "Ben Turner" 
> >> To: "Serkan Çoban" 
> >> Cc: "Gluster Users" 
> >> Sent: Sunday, September 3, 2017 2:30:31 PM
> >> Subject: Re: [Gluster-users] Glusterd proccess hangs on reboot
> >>
> >> - Original Message -
> >> > From: "Milind Changire" 
> >> > To: "Serkan Çoban" 
> >> > Cc: "Gluster Users" 
> >> > Sent: Saturday, September 2, 2017 11:44:40 PM
> >> > Subject: Re: [Gluster-users] Glusterd proccess hangs on reboot
> >> >
> >> > No worries Serkan,
> >> > You can continue to use your 40 node clusters.
> >> >
> >> > The backtrace has resolved the function names and it should be
> >> > sufficient
> >> > to
> >> > debug the issue.
> >> > Thanks for letting us know.
> >> >
> >> > We'll post on this thread again to notify you about the findings.
> >>
> >> One of the things I find interesting is seeing:
> >>
> >>  #1  0x7f928450099b in hooks_worker () from
> >>
> >> The "hooks" scripts are usually shell scripts that get run when volumes
> >> are
> >> started / stopped / etc.  It may be worth looking into what hooks scripts
> >> are getting run at shutdown and think about how one of them could hang up
> >> the system.  This may be a red herring but I don't see much else going on
> >> in
> >> the stack trace that I looked at.  The thread with the deepest stack is
> >> the
> >> hooks worker one, all of the other look to be in some sort of wait / sleep
> >> /
> >> listen state.
> >
> > Sorry the hooks call doesn't have the deepest stack, I didn't see the other
> > thread below it.
> >
> > In the logs I see:
> >
> > [2017-08-22 10:53:39.267860] I [socket.c:2426:socket_event_handler]
> > 0-transport: EPOLLERR - disconnecting now
> >
> > You mentioned changing event threads?  Even threads controls the number of
> > epoll listener threads, what did you change it to?  IIRC 2 is the default
> > value.  This may be some sort of race condition?  Just my $0.02.
> >
> > -b
> >
> >>
> >> -b
> >>
> >> >
> >> >
> >> >
> >> > On Sat, Sep 2, 2017 at 2:42 PM, Serkan Çoban < cobanser...@gmail.com >
> >> > wrote:
> >> >
> >> >
> >> > Hi Milind,
> >> >
> >> > Anything new about the issue? Can you able to find the problem,
> >> > anything else you need?
> >> > I will continue with two clusters each 40 servers, so I will not be
> >> > able to provide any further info for 80 servers.
> >> >
> >> > On Fri, Sep 1, 2017 at 10:30 AM, Serkan Çoban < cobanser...@gmail.com >
> >> > wrote:
> >> > > Hi,
> >> > > You can find pstack sampes here:
> >> > > https://www.dropbox.com/s/6gw8b6tng8puiox/pstack_with_debuginfo.zip?dl=0
> >> > >
> >> > > Here is the first one:
> >> > > Thread 8 (Thread 0x7f92879ae700 (LWP 78909)):
> >> > > #0 0x003d99c0f00d in nanosleep () from /lib64/libpthread.so.0
> >> > > #1 0x00310fe37d57 in gf_timer_proc () from
> >> > > /usr/lib64/libglusterfs.so.0
> >> > > #2 0x003d99c07aa1 in st

Re: [Gluster-users] Glusterd proccess hangs on reboot

2017-09-03 Thread Serkan Çoban

i usually change event threads to 4. But those logs are from a default
installation.

On Sun, Sep 3, 2017 at 9:52 PM, Ben Turner  wrote:
> - Original Message -
>> From: "Ben Turner" 
>> To: "Serkan Çoban" 
>> Cc: "Gluster Users" 
>> Sent: Sunday, September 3, 2017 2:30:31 PM
>> Subject: Re: [Gluster-users] Glusterd proccess hangs on reboot
>>
>> - Original Message -
>> > From: "Milind Changire" 
>> > To: "Serkan Çoban" 
>> > Cc: "Gluster Users" 
>> > Sent: Saturday, September 2, 2017 11:44:40 PM
>> > Subject: Re: [Gluster-users] Glusterd proccess hangs on reboot
>> >
>> > No worries Serkan,
>> > You can continue to use your 40 node clusters.
>> >
>> > The backtrace has resolved the function names and it should be sufficient
>> > to
>> > debug the issue.
>> > Thanks for letting us know.
>> >
>> > We'll post on this thread again to notify you about the findings.
>>
>> One of the things I find interesting is seeing:
>>
>>  #1  0x7f928450099b in hooks_worker () from
>>
>> The "hooks" scripts are usually shell scripts that get run when volumes are
>> started / stopped / etc.  It may be worth looking into what hooks scripts
>> are getting run at shutdown and think about how one of them could hang up
>> the system.  This may be a red herring but I don't see much else going on in
>> the stack trace that I looked at.  The thread with the deepest stack is the
>> hooks worker one, all of the other look to be in some sort of wait / sleep /
>> listen state.
>
> Sorry the hooks call doesn't have the deepest stack, I didn't see the other 
> thread below it.
>
> In the logs I see:
>
> [2017-08-22 10:53:39.267860] I [socket.c:2426:socket_event_handler] 
> 0-transport: EPOLLERR - disconnecting now
>
> You mentioned changing event threads?  Even threads controls the number of 
> epoll listener threads, what did you change it to?  IIRC 2 is the default 
> value.  This may be some sort of race condition?  Just my $0.02.
>
> -b
>
>>
>> -b
>>
>> >
>> >
>> >
>> > On Sat, Sep 2, 2017 at 2:42 PM, Serkan Çoban < cobanser...@gmail.com >
>> > wrote:
>> >
>> >
>> > Hi Milind,
>> >
>> > Anything new about the issue? Can you able to find the problem,
>> > anything else you need?
>> > I will continue with two clusters each 40 servers, so I will not be
>> > able to provide any further info for 80 servers.
>> >
>> > On Fri, Sep 1, 2017 at 10:30 AM, Serkan Çoban < cobanser...@gmail.com >
>> > wrote:
>> > > Hi,
>> > > You can find pstack sampes here:
>> > > https://www.dropbox.com/s/6gw8b6tng8puiox/pstack_with_debuginfo.zip?dl=0
>> > >
>> > > Here is the first one:
>> > > Thread 8 (Thread 0x7f92879ae700 (LWP 78909)):
>> > > #0 0x003d99c0f00d in nanosleep () from /lib64/libpthread.so.0
>> > > #1 0x00310fe37d57 in gf_timer_proc () from
>> > > /usr/lib64/libglusterfs.so.0
>> > > #2 0x003d99c07aa1 in start_thread () from /lib64/libpthread.so.0
>> > > #3 0x003d998e8bbd in clone () from /lib64/libc.so.6
>> > > Thread 7 (Thread 0x7f9286fad700 (LWP 78910)):
>> > > #0 0x003d99c0f585 in sigwait () from /lib64/libpthread.so.0
>> > > #1 0x0040643b in glusterfs_sigwaiter ()
>> > > #2 0x003d99c07aa1 in start_thread () from /lib64/libpthread.so.0
>> > > #3 0x003d998e8bbd in clone () from /lib64/libc.so.6
>> > > Thread 6 (Thread 0x7f92865ac700 (LWP 78911)):
>> > > #0 0x003d998acc4d in nanosleep () from /lib64/libc.so.6
>> > > #1 0x003d998acac0 in sleep () from /lib64/libc.so.6
>> > > #2 0x00310fe528fb in pool_sweeper () from
>> > > /usr/lib64/libglusterfs.so.0
>> > > #3 0x003d99c07aa1 in start_thread () from /lib64/libpthread.so.0
>> > > #4 0x003d998e8bbd in clone () from /lib64/libc.so.6
>> > > Thread 5 (Thread 0x7f9285bab700 (LWP 78912)):
>> > > #0 0x003d99c0ba5e in pthread_cond_timedwait@@GLIBC_2.3.2 () from
>> > > /lib64/libpthread.so.0
>> > > #1 0x00310fe64afc in syncenv_task () from
>> > > /usr/lib64/libglusterfs.so.0
>> > > #2 0x00310fe729f0 in syncenv_processor () from
>> > > /usr/lib64/libglusterfs.so.0
>> > > #3 0x003d99c07aa1 in start_

Re: [Gluster-users] Glusterd proccess hangs on reboot

2017-09-03 Thread Ben Turner

- Original Message -
> From: "Ben Turner" 
> To: "Serkan Çoban" 
> Cc: "Gluster Users" 
> Sent: Sunday, September 3, 2017 2:30:31 PM
> Subject: Re: [Gluster-users] Glusterd proccess hangs on reboot
> 
> - Original Message -
> > From: "Milind Changire" 
> > To: "Serkan Çoban" 
> > Cc: "Gluster Users" 
> > Sent: Saturday, September 2, 2017 11:44:40 PM
> > Subject: Re: [Gluster-users] Glusterd proccess hangs on reboot
> > 
> > No worries Serkan,
> > You can continue to use your 40 node clusters.
> > 
> > The backtrace has resolved the function names and it should be sufficient
> > to
> > debug the issue.
> > Thanks for letting us know.
> > 
> > We'll post on this thread again to notify you about the findings.
> 
> One of the things I find interesting is seeing:
> 
>  #1  0x7f928450099b in hooks_worker () from
> 
> The "hooks" scripts are usually shell scripts that get run when volumes are
> started / stopped / etc.  It may be worth looking into what hooks scripts
> are getting run at shutdown and think about how one of them could hang up
> the system.  This may be a red herring but I don't see much else going on in
> the stack trace that I looked at.  The thread with the deepest stack is the
> hooks worker one, all of the other look to be in some sort of wait / sleep /
> listen state.

Sorry the hooks call doesn't have the deepest stack, I didn't see the other 
thread below it.

In the logs I see:

[2017-08-22 10:53:39.267860] I [socket.c:2426:socket_event_handler] 
0-transport: EPOLLERR - disconnecting now

You mentioned changing event threads?  Even threads controls the number of 
epoll listener threads, what did you change it to?  IIRC 2 is the default 
value.  This may be some sort of race condition?  Just my $0.02.

-b

> 
> -b
> 
> > 
> > 
> > 
> > On Sat, Sep 2, 2017 at 2:42 PM, Serkan Çoban < cobanser...@gmail.com >
> > wrote:
> > 
> > 
> > Hi Milind,
> > 
> > Anything new about the issue? Can you able to find the problem,
> > anything else you need?
> > I will continue with two clusters each 40 servers, so I will not be
> > able to provide any further info for 80 servers.
> > 
> > On Fri, Sep 1, 2017 at 10:30 AM, Serkan Çoban < cobanser...@gmail.com >
> > wrote:
> > > Hi,
> > > You can find pstack sampes here:
> > > https://www.dropbox.com/s/6gw8b6tng8puiox/pstack_with_debuginfo.zip?dl=0
> > > 
> > > Here is the first one:
> > > Thread 8 (Thread 0x7f92879ae700 (LWP 78909)):
> > > #0 0x003d99c0f00d in nanosleep () from /lib64/libpthread.so.0
> > > #1 0x00310fe37d57 in gf_timer_proc () from
> > > /usr/lib64/libglusterfs.so.0
> > > #2 0x003d99c07aa1 in start_thread () from /lib64/libpthread.so.0
> > > #3 0x003d998e8bbd in clone () from /lib64/libc.so.6
> > > Thread 7 (Thread 0x7f9286fad700 (LWP 78910)):
> > > #0 0x003d99c0f585 in sigwait () from /lib64/libpthread.so.0
> > > #1 0x0040643b in glusterfs_sigwaiter ()
> > > #2 0x003d99c07aa1 in start_thread () from /lib64/libpthread.so.0
> > > #3 0x003d998e8bbd in clone () from /lib64/libc.so.6
> > > Thread 6 (Thread 0x7f92865ac700 (LWP 78911)):
> > > #0 0x003d998acc4d in nanosleep () from /lib64/libc.so.6
> > > #1 0x003d998acac0 in sleep () from /lib64/libc.so.6
> > > #2 0x00310fe528fb in pool_sweeper () from
> > > /usr/lib64/libglusterfs.so.0
> > > #3 0x003d99c07aa1 in start_thread () from /lib64/libpthread.so.0
> > > #4 0x003d998e8bbd in clone () from /lib64/libc.so.6
> > > Thread 5 (Thread 0x7f9285bab700 (LWP 78912)):
> > > #0 0x003d99c0ba5e in pthread_cond_timedwait@@GLIBC_2.3.2 () from
> > > /lib64/libpthread.so.0
> > > #1 0x00310fe64afc in syncenv_task () from
> > > /usr/lib64/libglusterfs.so.0
> > > #2 0x00310fe729f0 in syncenv_processor () from
> > > /usr/lib64/libglusterfs.so.0
> > > #3 0x003d99c07aa1 in start_thread () from /lib64/libpthread.so.0
> > > #4 0x003d998e8bbd in clone () from /lib64/libc.so.6
> > > Thread 4 (Thread 0x7f92851aa700 (LWP 78913)):
> > > #0 0x003d99c0ba5e in pthread_cond_timedwait@@GLIBC_2.3.2 () from
> > > /lib64/libpthread.so.0
> > > #1 0x00310fe64afc in syncenv_task () from
> > > /usr/lib64/libglusterfs.so.0
> > > #2 0x00310fe729f0 in syncenv_processor () from
> > > /usr/lib64/libglusterfs.so.0
> > > #3 0x003

Re: [Gluster-users] Glusterd proccess hangs on reboot

2017-09-03 Thread Ben Turner

- Original Message -
> From: "Milind Changire" 
> To: "Serkan Çoban" 
> Cc: "Gluster Users" 
> Sent: Saturday, September 2, 2017 11:44:40 PM
> Subject: Re: [Gluster-users] Glusterd proccess hangs on reboot
> 
> No worries Serkan,
> You can continue to use your 40 node clusters.
> 
> The backtrace has resolved the function names and it should be sufficient to
> debug the issue.
> Thanks for letting us know.
> 
> We'll post on this thread again to notify you about the findings.

One of the things I find interesting is seeing:

 #1  0x7f928450099b in hooks_worker () from

The "hooks" scripts are usually shell scripts that get run when volumes are 
started / stopped / etc.  It may be worth looking into what hooks scripts are 
getting run at shutdown and think about how one of them could hang up the 
system.  This may be a red herring but I don't see much else going on in the 
stack trace that I looked at.  The thread with the deepest stack is the hooks 
worker one, all of the other look to be in some sort of wait / sleep / listen 
state.

-b

> 
> 
> 
> On Sat, Sep 2, 2017 at 2:42 PM, Serkan Çoban < cobanser...@gmail.com > wrote:
> 
> 
> Hi Milind,
> 
> Anything new about the issue? Can you able to find the problem,
> anything else you need?
> I will continue with two clusters each 40 servers, so I will not be
> able to provide any further info for 80 servers.
> 
> On Fri, Sep 1, 2017 at 10:30 AM, Serkan Çoban < cobanser...@gmail.com >
> wrote:
> > Hi,
> > You can find pstack sampes here:
> > https://www.dropbox.com/s/6gw8b6tng8puiox/pstack_with_debuginfo.zip?dl=0
> > 
> > Here is the first one:
> > Thread 8 (Thread 0x7f92879ae700 (LWP 78909)):
> > #0 0x003d99c0f00d in nanosleep () from /lib64/libpthread.so.0
> > #1 0x00310fe37d57 in gf_timer_proc () from /usr/lib64/libglusterfs.so.0
> > #2 0x003d99c07aa1 in start_thread () from /lib64/libpthread.so.0
> > #3 0x003d998e8bbd in clone () from /lib64/libc.so.6
> > Thread 7 (Thread 0x7f9286fad700 (LWP 78910)):
> > #0 0x003d99c0f585 in sigwait () from /lib64/libpthread.so.0
> > #1 0x0040643b in glusterfs_sigwaiter ()
> > #2 0x003d99c07aa1 in start_thread () from /lib64/libpthread.so.0
> > #3 0x003d998e8bbd in clone () from /lib64/libc.so.6
> > Thread 6 (Thread 0x7f92865ac700 (LWP 78911)):
> > #0 0x003d998acc4d in nanosleep () from /lib64/libc.so.6
> > #1 0x003d998acac0 in sleep () from /lib64/libc.so.6
> > #2 0x00310fe528fb in pool_sweeper () from /usr/lib64/libglusterfs.so.0
> > #3 0x003d99c07aa1 in start_thread () from /lib64/libpthread.so.0
> > #4 0x003d998e8bbd in clone () from /lib64/libc.so.6
> > Thread 5 (Thread 0x7f9285bab700 (LWP 78912)):
> > #0 0x003d99c0ba5e in pthread_cond_timedwait@@GLIBC_2.3.2 () from
> > /lib64/libpthread.so.0
> > #1 0x00310fe64afc in syncenv_task () from /usr/lib64/libglusterfs.so.0
> > #2 0x00310fe729f0 in syncenv_processor () from
> > /usr/lib64/libglusterfs.so.0
> > #3 0x003d99c07aa1 in start_thread () from /lib64/libpthread.so.0
> > #4 0x003d998e8bbd in clone () from /lib64/libc.so.6
> > Thread 4 (Thread 0x7f92851aa700 (LWP 78913)):
> > #0 0x003d99c0ba5e in pthread_cond_timedwait@@GLIBC_2.3.2 () from
> > /lib64/libpthread.so.0
> > #1 0x00310fe64afc in syncenv_task () from /usr/lib64/libglusterfs.so.0
> > #2 0x00310fe729f0 in syncenv_processor () from
> > /usr/lib64/libglusterfs.so.0
> > #3 0x003d99c07aa1 in start_thread () from /lib64/libpthread.so.0
> > #4 0x003d998e8bbd in clone () from /lib64/libc.so.6
> > Thread 3 (Thread 0x7f9282ecc700 (LWP 78915)):
> > #0 0x003d99c0b68c in pthread_cond_wait@@GLIBC_2.3.2 () from
> > /lib64/libpthread.so.0
> > #1 0x7f928450099b in hooks_worker () from
> > /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
> > #2 0x003d99c07aa1 in start_thread () from /lib64/libpthread.so.0
> > #3 0x003d998e8bbd in clone () from /lib64/libc.so.6
> > Thread 2 (Thread 0x7f92824cb700 (LWP 78916)):
> > #0 0x003d9992867a in __strcmp_sse42 () from /lib64/libc.so.6
> > #1 0x00310fe2244a in dict_lookup_common () from
> > /usr/lib64/libglusterfs.so.0
> > #2 0x00310fe2433d in dict_set_lk () from /usr/lib64/libglusterfs.so.0
> > #3 0x00310fe245f5 in dict_set () from /usr/lib64/libglusterfs.so.0
> > #4 0x00310fe2524c in dict_set_str () from /usr/lib64/libglusterfs.so.0
> > #5 0x7f928453a8c4 in gd_add_brick_snap_details_to_dict () from
> > /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
> > #6 0x7f92

Re: [Gluster-users] Glusterd proccess hangs on reboot

2017-09-02 Thread Milind Changire

No worries Serkan,
You can continue to use your 40 node clusters.

The backtrace has resolved the function names and it *should* be sufficient
to debug the issue.
Thanks for letting us know.

We'll post on this thread again to notify you about the findings.



On Sat, Sep 2, 2017 at 2:42 PM, Serkan Çoban  wrote:

> Hi Milind,
>
> Anything new about the issue? Can you able to find the problem,
> anything else you need?
> I will continue with two clusters each 40 servers, so I will not be
> able to provide any further info for 80 servers.
>
> On Fri, Sep 1, 2017 at 10:30 AM, Serkan Çoban 
> wrote:
> > Hi,
> > You can find pstack sampes here:
> > https://www.dropbox.com/s/6gw8b6tng8puiox/pstack_with_debuginfo.zip?dl=0
> >
> > Here is the first one:
> > Thread 8 (Thread 0x7f92879ae700 (LWP 78909)):
> > #0  0x003d99c0f00d in nanosleep () from /lib64/libpthread.so.0
> > #1  0x00310fe37d57 in gf_timer_proc () from
> /usr/lib64/libglusterfs.so.0
> > #2  0x003d99c07aa1 in start_thread () from /lib64/libpthread.so.0
> > #3  0x003d998e8bbd in clone () from /lib64/libc.so.6
> > Thread 7 (Thread 0x7f9286fad700 (LWP 78910)):
> > #0  0x003d99c0f585 in sigwait () from /lib64/libpthread.so.0
> > #1  0x0040643b in glusterfs_sigwaiter ()
> > #2  0x003d99c07aa1 in start_thread () from /lib64/libpthread.so.0
> > #3  0x003d998e8bbd in clone () from /lib64/libc.so.6
> > Thread 6 (Thread 0x7f92865ac700 (LWP 78911)):
> > #0  0x003d998acc4d in nanosleep () from /lib64/libc.so.6
> > #1  0x003d998acac0 in sleep () from /lib64/libc.so.6
> > #2  0x00310fe528fb in pool_sweeper () from
> /usr/lib64/libglusterfs.so.0
> > #3  0x003d99c07aa1 in start_thread () from /lib64/libpthread.so.0
> > #4  0x003d998e8bbd in clone () from /lib64/libc.so.6
> > Thread 5 (Thread 0x7f9285bab700 (LWP 78912)):
> > #0  0x003d99c0ba5e in pthread_cond_timedwait@@GLIBC_2.3.2 () from
> > /lib64/libpthread.so.0
> > #1  0x00310fe64afc in syncenv_task () from
> /usr/lib64/libglusterfs.so.0
> > #2  0x00310fe729f0 in syncenv_processor () from
> /usr/lib64/libglusterfs.so.0
> > #3  0x003d99c07aa1 in start_thread () from /lib64/libpthread.so.0
> > #4  0x003d998e8bbd in clone () from /lib64/libc.so.6
> > Thread 4 (Thread 0x7f92851aa700 (LWP 78913)):
> > #0  0x003d99c0ba5e in pthread_cond_timedwait@@GLIBC_2.3.2 () from
> > /lib64/libpthread.so.0
> > #1  0x00310fe64afc in syncenv_task () from
> /usr/lib64/libglusterfs.so.0
> > #2  0x00310fe729f0 in syncenv_processor () from
> /usr/lib64/libglusterfs.so.0
> > #3  0x003d99c07aa1 in start_thread () from /lib64/libpthread.so.0
> > #4  0x003d998e8bbd in clone () from /lib64/libc.so.6
> > Thread 3 (Thread 0x7f9282ecc700 (LWP 78915)):
> > #0  0x003d99c0b68c in pthread_cond_wait@@GLIBC_2.3.2 () from
> > /lib64/libpthread.so.0
> > #1  0x7f928450099b in hooks_worker () from
> > /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
> > #2  0x003d99c07aa1 in start_thread () from /lib64/libpthread.so.0
> > #3  0x003d998e8bbd in clone () from /lib64/libc.so.6
> > Thread 2 (Thread 0x7f92824cb700 (LWP 78916)):
> > #0  0x003d9992867a in __strcmp_sse42 () from /lib64/libc.so.6
> > #1  0x00310fe2244a in dict_lookup_common () from
> > /usr/lib64/libglusterfs.so.0
> > #2  0x00310fe2433d in dict_set_lk () from
> /usr/lib64/libglusterfs.so.0
> > #3  0x00310fe245f5 in dict_set () from /usr/lib64/libglusterfs.so.0
> > #4  0x00310fe2524c in dict_set_str () from
> /usr/lib64/libglusterfs.so.0
> > #5  0x7f928453a8c4 in gd_add_brick_snap_details_to_dict () from
> > /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
> > #6  0x7f928447b0df in glusterd_add_volume_to_dict () from
> > /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
> > #7  0x7f928447b47c in glusterd_add_volumes_to_export_dict () from
> > /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
> > #8  0x7f9284491edf in glusterd_rpc_friend_add () from
> > /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
> > #9  0x7f92844528f7 in glusterd_ac_friend_add () from
> > /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
> > #10 0x7f9284450bb9 in glusterd_friend_sm () from
> > /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
> > #11 0x7f92844ac89a in __glusterd_mgmt_hndsk_version_ack_cbk ()
> > from /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
> > #12 0x7f92844923ee in glusterd_big_locked_cbk () from
> > /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
> > #13 0x00311020fad5 in rpc_clnt_handle_reply () from
> /usr/lib64/libgfrpc.so.0
> > #14 0x003110210c85 in rpc_clnt_notify () from
> /usr/lib64/libgfrpc.so.0
> > #15 0x00311020bd68 in rpc_transport_notify () from
> /usr/lib64/libgfrpc.so.0
> > #16 0x7f9283492ccd in socket_event_poll_in () from
> > /usr/lib64/glusterfs/3.10.5/rpc-transport/socket.so
> > #17 0x7f9283493ffe in socket_event_handler () from
> > /usr/lib64/glusterfs/3.10.5/rpc-transport/socke

Re: [Gluster-users] Glusterd proccess hangs on reboot

2017-09-02 Thread Serkan Çoban

Hi Milind,

Anything new about the issue? Can you able to find the problem,
anything else you need?
I will continue with two clusters each 40 servers, so I will not be
able to provide any further info for 80 servers.

On Fri, Sep 1, 2017 at 10:30 AM, Serkan Çoban  wrote:
> Hi,
> You can find pstack sampes here:
> https://www.dropbox.com/s/6gw8b6tng8puiox/pstack_with_debuginfo.zip?dl=0
>
> Here is the first one:
> Thread 8 (Thread 0x7f92879ae700 (LWP 78909)):
> #0  0x003d99c0f00d in nanosleep () from /lib64/libpthread.so.0
> #1  0x00310fe37d57 in gf_timer_proc () from /usr/lib64/libglusterfs.so.0
> #2  0x003d99c07aa1 in start_thread () from /lib64/libpthread.so.0
> #3  0x003d998e8bbd in clone () from /lib64/libc.so.6
> Thread 7 (Thread 0x7f9286fad700 (LWP 78910)):
> #0  0x003d99c0f585 in sigwait () from /lib64/libpthread.so.0
> #1  0x0040643b in glusterfs_sigwaiter ()
> #2  0x003d99c07aa1 in start_thread () from /lib64/libpthread.so.0
> #3  0x003d998e8bbd in clone () from /lib64/libc.so.6
> Thread 6 (Thread 0x7f92865ac700 (LWP 78911)):
> #0  0x003d998acc4d in nanosleep () from /lib64/libc.so.6
> #1  0x003d998acac0 in sleep () from /lib64/libc.so.6
> #2  0x00310fe528fb in pool_sweeper () from /usr/lib64/libglusterfs.so.0
> #3  0x003d99c07aa1 in start_thread () from /lib64/libpthread.so.0
> #4  0x003d998e8bbd in clone () from /lib64/libc.so.6
> Thread 5 (Thread 0x7f9285bab700 (LWP 78912)):
> #0  0x003d99c0ba5e in pthread_cond_timedwait@@GLIBC_2.3.2 () from
> /lib64/libpthread.so.0
> #1  0x00310fe64afc in syncenv_task () from /usr/lib64/libglusterfs.so.0
> #2  0x00310fe729f0 in syncenv_processor () from 
> /usr/lib64/libglusterfs.so.0
> #3  0x003d99c07aa1 in start_thread () from /lib64/libpthread.so.0
> #4  0x003d998e8bbd in clone () from /lib64/libc.so.6
> Thread 4 (Thread 0x7f92851aa700 (LWP 78913)):
> #0  0x003d99c0ba5e in pthread_cond_timedwait@@GLIBC_2.3.2 () from
> /lib64/libpthread.so.0
> #1  0x00310fe64afc in syncenv_task () from /usr/lib64/libglusterfs.so.0
> #2  0x00310fe729f0 in syncenv_processor () from 
> /usr/lib64/libglusterfs.so.0
> #3  0x003d99c07aa1 in start_thread () from /lib64/libpthread.so.0
> #4  0x003d998e8bbd in clone () from /lib64/libc.so.6
> Thread 3 (Thread 0x7f9282ecc700 (LWP 78915)):
> #0  0x003d99c0b68c in pthread_cond_wait@@GLIBC_2.3.2 () from
> /lib64/libpthread.so.0
> #1  0x7f928450099b in hooks_worker () from
> /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
> #2  0x003d99c07aa1 in start_thread () from /lib64/libpthread.so.0
> #3  0x003d998e8bbd in clone () from /lib64/libc.so.6
> Thread 2 (Thread 0x7f92824cb700 (LWP 78916)):
> #0  0x003d9992867a in __strcmp_sse42 () from /lib64/libc.so.6
> #1  0x00310fe2244a in dict_lookup_common () from
> /usr/lib64/libglusterfs.so.0
> #2  0x00310fe2433d in dict_set_lk () from /usr/lib64/libglusterfs.so.0
> #3  0x00310fe245f5 in dict_set () from /usr/lib64/libglusterfs.so.0
> #4  0x00310fe2524c in dict_set_str () from /usr/lib64/libglusterfs.so.0
> #5  0x7f928453a8c4 in gd_add_brick_snap_details_to_dict () from
> /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
> #6  0x7f928447b0df in glusterd_add_volume_to_dict () from
> /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
> #7  0x7f928447b47c in glusterd_add_volumes_to_export_dict () from
> /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
> #8  0x7f9284491edf in glusterd_rpc_friend_add () from
> /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
> #9  0x7f92844528f7 in glusterd_ac_friend_add () from
> /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
> #10 0x7f9284450bb9 in glusterd_friend_sm () from
> /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
> #11 0x7f92844ac89a in __glusterd_mgmt_hndsk_version_ack_cbk ()
> from /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
> #12 0x7f92844923ee in glusterd_big_locked_cbk () from
> /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
> #13 0x00311020fad5 in rpc_clnt_handle_reply () from 
> /usr/lib64/libgfrpc.so.0
> #14 0x003110210c85 in rpc_clnt_notify () from /usr/lib64/libgfrpc.so.0
> #15 0x00311020bd68 in rpc_transport_notify () from 
> /usr/lib64/libgfrpc.so.0
> #16 0x7f9283492ccd in socket_event_poll_in () from
> /usr/lib64/glusterfs/3.10.5/rpc-transport/socket.so
> #17 0x7f9283493ffe in socket_event_handler () from
> /usr/lib64/glusterfs/3.10.5/rpc-transport/socket.so
> #18 0x00310fe87806 in event_dispatch_epoll_worker () from
> /usr/lib64/libglusterfs.so.0
> #19 0x003d99c07aa1 in start_thread () from /lib64/libpthread.so.0
> #20 0x003d998e8bbd in clone () from /lib64/libc.so.6
> Thread 1 (Thread 0x7f928e4a4740 (LWP 78908)):
> #0  0x003d99c082fd in pthread_join () from /lib64/libpthread.so.0
> #1  0x00310fe872d5 in event_dispatch_epoll () from
> /usr/lib64/libglusterfs.so.0
> #2  0x00409020 in main ()
>
> On Fri

Re: [Gluster-users] Glusterd proccess hangs on reboot

2017-09-01 Thread Serkan Çoban

Hi,
You can find pstack sampes here:
https://www.dropbox.com/s/6gw8b6tng8puiox/pstack_with_debuginfo.zip?dl=0

Here is the first one:
Thread 8 (Thread 0x7f92879ae700 (LWP 78909)):
#0  0x003d99c0f00d in nanosleep () from /lib64/libpthread.so.0
#1  0x00310fe37d57 in gf_timer_proc () from /usr/lib64/libglusterfs.so.0
#2  0x003d99c07aa1 in start_thread () from /lib64/libpthread.so.0
#3  0x003d998e8bbd in clone () from /lib64/libc.so.6
Thread 7 (Thread 0x7f9286fad700 (LWP 78910)):
#0  0x003d99c0f585 in sigwait () from /lib64/libpthread.so.0
#1  0x0040643b in glusterfs_sigwaiter ()
#2  0x003d99c07aa1 in start_thread () from /lib64/libpthread.so.0
#3  0x003d998e8bbd in clone () from /lib64/libc.so.6
Thread 6 (Thread 0x7f92865ac700 (LWP 78911)):
#0  0x003d998acc4d in nanosleep () from /lib64/libc.so.6
#1  0x003d998acac0 in sleep () from /lib64/libc.so.6
#2  0x00310fe528fb in pool_sweeper () from /usr/lib64/libglusterfs.so.0
#3  0x003d99c07aa1 in start_thread () from /lib64/libpthread.so.0
#4  0x003d998e8bbd in clone () from /lib64/libc.so.6
Thread 5 (Thread 0x7f9285bab700 (LWP 78912)):
#0  0x003d99c0ba5e in pthread_cond_timedwait@@GLIBC_2.3.2 () from
/lib64/libpthread.so.0
#1  0x00310fe64afc in syncenv_task () from /usr/lib64/libglusterfs.so.0
#2  0x00310fe729f0 in syncenv_processor () from /usr/lib64/libglusterfs.so.0
#3  0x003d99c07aa1 in start_thread () from /lib64/libpthread.so.0
#4  0x003d998e8bbd in clone () from /lib64/libc.so.6
Thread 4 (Thread 0x7f92851aa700 (LWP 78913)):
#0  0x003d99c0ba5e in pthread_cond_timedwait@@GLIBC_2.3.2 () from
/lib64/libpthread.so.0
#1  0x00310fe64afc in syncenv_task () from /usr/lib64/libglusterfs.so.0
#2  0x00310fe729f0 in syncenv_processor () from /usr/lib64/libglusterfs.so.0
#3  0x003d99c07aa1 in start_thread () from /lib64/libpthread.so.0
#4  0x003d998e8bbd in clone () from /lib64/libc.so.6
Thread 3 (Thread 0x7f9282ecc700 (LWP 78915)):
#0  0x003d99c0b68c in pthread_cond_wait@@GLIBC_2.3.2 () from
/lib64/libpthread.so.0
#1  0x7f928450099b in hooks_worker () from
/usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
#2  0x003d99c07aa1 in start_thread () from /lib64/libpthread.so.0
#3  0x003d998e8bbd in clone () from /lib64/libc.so.6
Thread 2 (Thread 0x7f92824cb700 (LWP 78916)):
#0  0x003d9992867a in __strcmp_sse42 () from /lib64/libc.so.6
#1  0x00310fe2244a in dict_lookup_common () from
/usr/lib64/libglusterfs.so.0
#2  0x00310fe2433d in dict_set_lk () from /usr/lib64/libglusterfs.so.0
#3  0x00310fe245f5 in dict_set () from /usr/lib64/libglusterfs.so.0
#4  0x00310fe2524c in dict_set_str () from /usr/lib64/libglusterfs.so.0
#5  0x7f928453a8c4 in gd_add_brick_snap_details_to_dict () from
/usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
#6  0x7f928447b0df in glusterd_add_volume_to_dict () from
/usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
#7  0x7f928447b47c in glusterd_add_volumes_to_export_dict () from
/usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
#8  0x7f9284491edf in glusterd_rpc_friend_add () from
/usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
#9  0x7f92844528f7 in glusterd_ac_friend_add () from
/usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
#10 0x7f9284450bb9 in glusterd_friend_sm () from
/usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
#11 0x7f92844ac89a in __glusterd_mgmt_hndsk_version_ack_cbk ()
from /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
#12 0x7f92844923ee in glusterd_big_locked_cbk () from
/usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
#13 0x00311020fad5 in rpc_clnt_handle_reply () from /usr/lib64/libgfrpc.so.0
#14 0x003110210c85 in rpc_clnt_notify () from /usr/lib64/libgfrpc.so.0
#15 0x00311020bd68 in rpc_transport_notify () from /usr/lib64/libgfrpc.so.0
#16 0x7f9283492ccd in socket_event_poll_in () from
/usr/lib64/glusterfs/3.10.5/rpc-transport/socket.so
#17 0x7f9283493ffe in socket_event_handler () from
/usr/lib64/glusterfs/3.10.5/rpc-transport/socket.so
#18 0x00310fe87806 in event_dispatch_epoll_worker () from
/usr/lib64/libglusterfs.so.0
#19 0x003d99c07aa1 in start_thread () from /lib64/libpthread.so.0
#20 0x003d998e8bbd in clone () from /lib64/libc.so.6
Thread 1 (Thread 0x7f928e4a4740 (LWP 78908)):
#0  0x003d99c082fd in pthread_join () from /lib64/libpthread.so.0
#1  0x00310fe872d5 in event_dispatch_epoll () from
/usr/lib64/libglusterfs.so.0
#2  0x00409020 in main ()

On Fri, Sep 1, 2017 at 6:17 AM, Milind Changire  wrote:
> Serkan,
> I have gone through other mails in the mail thread as well but responding to
> this one specifically.
>
> Is this a source install or an RPM install ?
> If this is an RPM install, could you please install the glusterfs-debuginfo
> RPM and retry to capture the gdb backtrace.
>
> If this is a source install, then you'll need to configure the build with
> --enable-debug and reinstall an

Re: [Gluster-users] Glusterd proccess hangs on reboot

2017-08-31 Thread Milind Changire

Serkan,
I have gone through other mails in the mail thread as well but responding
to this one specifically.

Is this a source install or an RPM install ?
If this is an RPM install, could you please install the glusterfs-debuginfo
RPM and retry to capture the gdb backtrace.

If this is a source install, then you'll need to configure the build with
--enable-debug and reinstall and retry capturing the gdb backtrace.

Having the debuginfo package or a debug build helps to resolve the function
names and/or line numbers.
--
Milind



On Thu, Aug 24, 2017 at 11:19 AM, Serkan Çoban 
wrote:

> Here you can find 10 stack trace samples from glusterd. I wait 10
> seconds between each trace.
> https://www.dropbox.com/s/9f36goq5xn3p1yt/glusterd_pstack.zip?dl=0
>
> Content of the first stack trace is here:
>
> Thread 8 (Thread 0x7f7a8cd4e700 (LWP 43069)):
> #0  0x003aa5c0f00d in nanosleep () from /lib64/libpthread.so.0
> #1  0x00303f837d57 in ?? () from /usr/lib64/libglusterfs.so.0
> #2  0x003aa5c07aa1 in start_thread () from /lib64/libpthread.so.0
> #3  0x003aa58e8bbd in clone () from /lib64/libc.so.6
> Thread 7 (Thread 0x7f7a8c34d700 (LWP 43070)):
> #0  0x003aa5c0f585 in sigwait () from /lib64/libpthread.so.0
> #1  0x0040643b in glusterfs_sigwaiter ()
> #2  0x003aa5c07aa1 in start_thread () from /lib64/libpthread.so.0
> #3  0x003aa58e8bbd in clone () from /lib64/libc.so.6
> Thread 6 (Thread 0x7f7a8b94c700 (LWP 43071)):
> #0  0x003aa58acc4d in nanosleep () from /lib64/libc.so.6
> #1  0x003aa58acac0 in sleep () from /lib64/libc.so.6
> #2  0x00303f8528fb in pool_sweeper () from /usr/lib64/libglusterfs.so.0
> #3  0x003aa5c07aa1 in start_thread () from /lib64/libpthread.so.0
> #4  0x003aa58e8bbd in clone () from /lib64/libc.so.6
> Thread 5 (Thread 0x7f7a8af4b700 (LWP 43072)):
> #0  0x003aa5c0ba5e in pthread_cond_timedwait@@GLIBC_2.3.2 () from
> /lib64/libpthread.so.0
> #1  0x00303f864afc in syncenv_task () from /usr/lib64/libglusterfs.so.0
> #2  0x00303f8729f0 in syncenv_processor () from
> /usr/lib64/libglusterfs.so.0
> #3  0x003aa5c07aa1 in start_thread () from /lib64/libpthread.so.0
> #4  0x003aa58e8bbd in clone () from /lib64/libc.so.6
> Thread 4 (Thread 0x7f7a8a54a700 (LWP 43073)):
> #0  0x003aa5c0ba5e in pthread_cond_timedwait@@GLIBC_2.3.2 () from
> /lib64/libpthread.so.0
> #1  0x00303f864afc in syncenv_task () from /usr/lib64/libglusterfs.so.0
> #2  0x00303f8729f0 in syncenv_processor () from
> /usr/lib64/libglusterfs.so.0
> #3  0x003aa5c07aa1 in start_thread () from /lib64/libpthread.so.0
> #4  0x003aa58e8bbd in clone () from /lib64/libc.so.6
> Thread 3 (Thread 0x7f7a886ac700 (LWP 43075)):
> #0  0x003aa5c0b68c in pthread_cond_wait@@GLIBC_2.3.2 () from
> /lib64/libpthread.so.0
> #1  0x7f7a898a099b in ?? () from
> /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
> #2  0x003aa5c07aa1 in start_thread () from /lib64/libpthread.so.0
> #3  0x003aa58e8bbd in clone () from /lib64/libc.so.6
> Thread 2 (Thread 0x7f7a87cab700 (LWP 43076)):
> #0  0x003aa5928692 in __strcmp_sse42 () from /lib64/libc.so.6
> #1  0x00303f82244a in ?? () from /usr/lib64/libglusterfs.so.0
> #2  0x00303f82433d in ?? () from /usr/lib64/libglusterfs.so.0
> #3  0x00303f8245f5 in dict_set () from /usr/lib64/libglusterfs.so.0
> #4  0x00303f82524c in dict_set_str () from /usr/lib64/libglusterfs.so.0
> #5  0x7f7a898da7fd in ?? () from
> /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
> #6  0x7f7a8981b0df in ?? () from
> /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
> #7  0x7f7a8981b47c in ?? () from
> /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
> #8  0x7f7a89831edf in ?? () from
> /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
> #9  0x7f7a897f28f7 in ?? () from
> /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
> #10 0x7f7a897f0bb9 in ?? () from
> /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
> #11 0x7f7a8984c89a in ?? () from
> /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
> #12 0x7f7a898323ee in ?? () from
> /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
> #13 0x00303f40fad5 in rpc_clnt_handle_reply () from
> /usr/lib64/libgfrpc.so.0
> #14 0x00303f410c85 in rpc_clnt_notify () from /usr/lib64/libgfrpc.so.0
> #15 0x00303f40bd68 in rpc_transport_notify () from
> /usr/lib64/libgfrpc.so.0
> #16 0x7f7a88a6fccd in ?? () from
> /usr/lib64/glusterfs/3.10.5/rpc-transport/socket.so
> #17 0x7f7a88a70ffe in ?? () from
> /usr/lib64/glusterfs/3.10.5/rpc-transport/socket.so
> #18 0x00303f887806 in ?? () from /usr/lib64/libglusterfs.so.0
> #19 0x003aa5c07aa1 in start_thread () from /lib64/libpthread.so.0
> #20 0x003aa58e8bbd in clone () from /lib64/libc.so.6
> Thread 1 (Thread 0x7f7a93844740 (LWP 43068)):
> #0  0x003aa5c082fd in pthread_join () from /lib64/libpthread.so.0
> #1  0x00303f8872d5 in ?? () from /usr/lib64/libglusterf

Re: [Gluster-users] Glusterd proccess hangs on reboot

2017-08-31 Thread Atin Mukherjee

On Thu, Aug 24, 2017 at 11:19 AM, Serkan Çoban 
wrote:

> Here you can find 10 stack trace samples from glusterd. I wait 10
> seconds between each trace.
> https://www.dropbox.com/s/9f36goq5xn3p1yt/glusterd_pstack.zip?dl=0
>
> Content of the first stack trace is here:
>
> Thread 8 (Thread 0x7f7a8cd4e700 (LWP 43069)):
> #0  0x003aa5c0f00d in nanosleep () from /lib64/libpthread.so.0
> #1  0x00303f837d57 in ?? () from /usr/lib64/libglusterfs.so.0
> #2  0x003aa5c07aa1 in start_thread () from /lib64/libpthread.so.0
> #3  0x003aa58e8bbd in clone () from /lib64/libc.so.6
> Thread 7 (Thread 0x7f7a8c34d700 (LWP 43070)):
> #0  0x003aa5c0f585 in sigwait () from /lib64/libpthread.so.0
> #1  0x0040643b in glusterfs_sigwaiter ()
> #2  0x003aa5c07aa1 in start_thread () from /lib64/libpthread.so.0
> #3  0x003aa58e8bbd in clone () from /lib64/libc.so.6
> Thread 6 (Thread 0x7f7a8b94c700 (LWP 43071)):
> #0  0x003aa58acc4d in nanosleep () from /lib64/libc.so.6
> #1  0x003aa58acac0 in sleep () from /lib64/libc.so.6
> #2  0x00303f8528fb in pool_sweeper () from /usr/lib64/libglusterfs.so.0
> #3  0x003aa5c07aa1 in start_thread () from /lib64/libpthread.so.0
> #4  0x003aa58e8bbd in clone () from /lib64/libc.so.6
> Thread 5 (Thread 0x7f7a8af4b700 (LWP 43072)):
> #0  0x003aa5c0ba5e in pthread_cond_timedwait@@GLIBC_2.3.2 () from
> /lib64/libpthread.so.0
> #1  0x00303f864afc in syncenv_task () from /usr/lib64/libglusterfs.so.0
> #2  0x00303f8729f0 in syncenv_processor () from
> /usr/lib64/libglusterfs.so.0
> #3  0x003aa5c07aa1 in start_thread () from /lib64/libpthread.so.0
> #4  0x003aa58e8bbd in clone () from /lib64/libc.so.6
> Thread 4 (Thread 0x7f7a8a54a700 (LWP 43073)):
> #0  0x003aa5c0ba5e in pthread_cond_timedwait@@GLIBC_2.3.2 () from
> /lib64/libpthread.so.0
> #1  0x00303f864afc in syncenv_task () from /usr/lib64/libglusterfs.so.0
> #2  0x00303f8729f0 in syncenv_processor () from
> /usr/lib64/libglusterfs.so.0
> #3  0x003aa5c07aa1 in start_thread () from /lib64/libpthread.so.0
> #4  0x003aa58e8bbd in clone () from /lib64/libc.so.6
> Thread 3 (Thread 0x7f7a886ac700 (LWP 43075)):
> #0  0x003aa5c0b68c in pthread_cond_wait@@GLIBC_2.3.2 () from
> /lib64/libpthread.so.0
> #1  0x7f7a898a099b in ?? () from
> /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
> #2  0x003aa5c07aa1 in start_thread () from /lib64/libpthread.so.0
> #3  0x003aa58e8bbd in clone () from /lib64/libc.so.6
> Thread 2 (Thread 0x7f7a87cab700 (LWP 43076)):
> #0  0x003aa5928692 in __strcmp_sse42 () from /lib64/libc.so.6
> #1  0x00303f82244a in ?? () from /usr/lib64/libglusterfs.so.0
> #2  0x00303f82433d in ?? () from /usr/lib64/libglusterfs.so.0
> #3  0x00303f8245f5 in dict_set () from /usr/lib64/libglusterfs.so.0
> #4  0x00303f82524c in dict_set_str () from /usr/lib64/libglusterfs.so.0
> #5  0x7f7a898da7fd in ?? () from
> /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
> #6  0x7f7a8981b0df in ?? () from
> /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
> #7  0x7f7a8981b47c in ?? () from
> /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
> #8  0x7f7a89831edf in ?? () from
> /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
> #9  0x7f7a897f28f7 in ?? () from
> /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
> #10 0x7f7a897f0bb9 in ?? () from
> /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
> #11 0x7f7a8984c89a in ?? () from
> /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
> #12 0x7f7a898323ee in ?? () from
> /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
> #13 0x00303f40fad5 in rpc_clnt_handle_reply () from
> /usr/lib64/libgfrpc.so.0
> #14 0x00303f410c85 in rpc_clnt_notify () from /usr/lib64/libgfrpc.so.0
> #15 0x00303f40bd68 in rpc_transport_notify () from
> /usr/lib64/libgfrpc.so.0
> #16 0x7f7a88a6fccd in ?? () from
> /usr/lib64/glusterfs/3.10.5/rpc-transport/socket.so
> #17 0x7f7a88a70ffe in ?? () from
> /usr/lib64/glusterfs/3.10.5/rpc-transport/socket.so
> #18 0x00303f887806 in ?? () from /usr/lib64/libglusterfs.so.0
> #19 0x003aa5c07aa1 in start_thread () from /lib64/libpthread.so.0
> #20 0x003aa58e8bbd in clone () from /lib64/libc.so.6
> Thread 1 (Thread 0x7f7a93844740 (LWP 43068)):
> #0  0x003aa5c082fd in pthread_join () from /lib64/libpthread.so.0
> #1  0x00303f8872d5 in ?? () from /usr/lib64/libglusterfs.so.0
> #2  0x00409020 in main ()
>

FWIW, we need to figure out the respective function handlers from the
addresses dumped in thread 2 which would help us to figure out where the
glusterd process is stuck. I remember Milind has been working on a script
to have these addresses converted to the function names

@Milind - can you please help here in getting the function names dumped
from these addresses? Probably sharing the script with Serkan and letting
it run on the setup would be ideal?


> On Wed, Aug 23, 2017 at 8:46

Re: [Gluster-users] Glusterd proccess hangs on reboot

2017-08-31 Thread Serkan Çoban

Hi Gaurav,

Any improvement about the issue?

On Tue, Aug 29, 2017 at 1:57 PM, Serkan Çoban  wrote:
> glusterd returned to normal, here is the logs:
> https://www.dropbox.com/s/41jx2zn3uizvr53/80servers_glusterd_normal_status.zip?dl=0
>
>
> On Tue, Aug 29, 2017 at 1:47 PM, Serkan Çoban  wrote:
>> Here is the logs after stopping all three volumes and restarting
>> glusterd in all nodes. I waited 70 minutes after glusterd restart but
>> it is still consuming %100 CPU.
>> https://www.dropbox.com/s/pzl0f198v03twx3/80servers_after_glusterd_restart.zip?dl=0
>>
>>
>> On Tue, Aug 29, 2017 at 12:37 PM, Gaurav Yadav  wrote:
>>>
>>> I believe logs you have shared logs which consist of create volume followed
>>> by starting the volume.
>>> However, you have mentioned that when a node from 80 server cluster gets
>>> rebooted, glusterd process hangs.
>>>
>>> Could you please provide the logs which led glusterd to hang for all the
>>> cases along with gusterd process utilization.
>>>
>>>
>>> Thanks
>>> Gaurav
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Tue, Aug 29, 2017 at 2:44 PM, Serkan Çoban  wrote:

 Here is the requested logs:

 https://www.dropbox.com/s/vt187h0gtu5doip/gluster_logs_20_40_80_servers.zip?dl=0


 On Tue, Aug 29, 2017 at 7:48 AM, Gaurav Yadav  wrote:
 > Till now I haven't found anything significant.
 >
 > Can you send me gluster logs along with command-history-logs for these
 > scenarios:
 >  Scenario1 : 20 servers
 >  Scenario2 : 40 servers
 >  Scenario3:  80 Servers
 >
 >
 > Thanks
 > Gaurav
 >
 >
 >
 > On Mon, Aug 28, 2017 at 11:22 AM, Serkan Çoban 
 > wrote:
 >>
 >> Hi Gaurav,
 >> Any progress about the problem?
 >>
 >> On Thursday, August 24, 2017, Serkan Çoban 
 >> wrote:
 >>>
 >>> Thank you Gaurav,
 >>> Here is more findings:
 >>> Problem does not happen using only 20 servers each has 68 bricks.
 >>> (peer probe only 20 servers)
 >>> If we use 40 servers with single volume, glusterd cpu %100 state
 >>> continues for 5 minutes and it goes to normal state.
 >>> with 80 servers we have no working state yet...
 >>>
 >>> On Thu, Aug 24, 2017 at 1:33 PM, Gaurav Yadav 
 >>> wrote:
 >>> >
 >>> > I am working on it and will share my findings as soon as possible.
 >>> >
 >>> >
 >>> > Thanks
 >>> > Gaurav
 >>> >
 >>> > On Thu, Aug 24, 2017 at 3:58 PM, Serkan Çoban
 >>> > 
 >>> > wrote:
 >>> >>
 >>> >> Restarting glusterd causes the same thing. I tried with 3.12.rc0,
 >>> >> 3.10.5. 3.8.15, 3.7.20 all same behavior.
 >>> >> My OS is centos 6.9, I tried with centos 6.8 problem remains...
 >>> >> Only way to a healthy state is destroy gluster config/rpms,
 >>> >> reinstall
 >>> >> and recreate volumes.
 >>> >>
 >>> >> On Thu, Aug 24, 2017 at 8:49 AM, Serkan Çoban
 >>> >> 
 >>> >> wrote:
 >>> >> > Here you can find 10 stack trace samples from glusterd. I wait 10
 >>> >> > seconds between each trace.
 >>> >> >
 >>> >> > https://www.dropbox.com/s/9f36goq5xn3p1yt/glusterd_pstack.zip?dl=0
 >>> >> >
 >>> >> > Content of the first stack trace is here:
 >>> >> >
 >>> >> > Thread 8 (Thread 0x7f7a8cd4e700 (LWP 43069)):
 >>> >> > #0  0x003aa5c0f00d in nanosleep () from
 >>> >> > /lib64/libpthread.so.0
 >>> >> > #1  0x00303f837d57 in ?? () from /usr/lib64/libglusterfs.so.0
 >>> >> > #2  0x003aa5c07aa1 in start_thread () from
 >>> >> > /lib64/libpthread.so.0
 >>> >> > #3  0x003aa58e8bbd in clone () from /lib64/libc.so.6
 >>> >> > Thread 7 (Thread 0x7f7a8c34d700 (LWP 43070)):
 >>> >> > #0  0x003aa5c0f585 in sigwait () from /lib64/libpthread.so.0
 >>> >> > #1  0x0040643b in glusterfs_sigwaiter ()
 >>> >> > #2  0x003aa5c07aa1 in start_thread () from
 >>> >> > /lib64/libpthread.so.0
 >>> >> > #3  0x003aa58e8bbd in clone () from /lib64/libc.so.6
 >>> >> > Thread 6 (Thread 0x7f7a8b94c700 (LWP 43071)):
 >>> >> > #0  0x003aa58acc4d in nanosleep () from /lib64/libc.so.6
 >>> >> > #1  0x003aa58acac0 in sleep () from /lib64/libc.so.6
 >>> >> > #2  0x00303f8528fb in pool_sweeper () from
 >>> >> > /usr/lib64/libglusterfs.so.0
 >>> >> > #3  0x003aa5c07aa1 in start_thread () from
 >>> >> > /lib64/libpthread.so.0
 >>> >> > #4  0x003aa58e8bbd in clone () from /lib64/libc.so.6
 >>> >> > Thread 5 (Thread 0x7f7a8af4b700 (LWP 43072)):
 >>> >> > #0  0x003aa5c0ba5e in pthread_cond_timedwait@@GLIBC_2.3.2 ()
 >>> >> > from
 >>> >> > /lib64/libpthread.so.0
 >>> >> > #1  0x00303f864afc in syncenv_task () from
 >>> >> > /usr/lib64/libglusterfs.so.0
 >>> >> > #2  0x00303f8729f0 in syncenv_processor () from
 >>> >> > /usr/lib64/libglusterfs.so.0
 >>> >> > #3  0x003aa5c07aa1 in start_thread () from
 >>> >>

Re: [Gluster-users] Glusterd proccess hangs on reboot

2017-08-29 Thread Serkan Çoban

glusterd returned to normal, here is the logs:
https://www.dropbox.com/s/41jx2zn3uizvr53/80servers_glusterd_normal_status.zip?dl=0


On Tue, Aug 29, 2017 at 1:47 PM, Serkan Çoban  wrote:
> Here is the logs after stopping all three volumes and restarting
> glusterd in all nodes. I waited 70 minutes after glusterd restart but
> it is still consuming %100 CPU.
> https://www.dropbox.com/s/pzl0f198v03twx3/80servers_after_glusterd_restart.zip?dl=0
>
>
> On Tue, Aug 29, 2017 at 12:37 PM, Gaurav Yadav  wrote:
>>
>> I believe logs you have shared logs which consist of create volume followed
>> by starting the volume.
>> However, you have mentioned that when a node from 80 server cluster gets
>> rebooted, glusterd process hangs.
>>
>> Could you please provide the logs which led glusterd to hang for all the
>> cases along with gusterd process utilization.
>>
>>
>> Thanks
>> Gaurav
>>
>>
>>
>>
>>
>>
>> On Tue, Aug 29, 2017 at 2:44 PM, Serkan Çoban  wrote:
>>>
>>> Here is the requested logs:
>>>
>>> https://www.dropbox.com/s/vt187h0gtu5doip/gluster_logs_20_40_80_servers.zip?dl=0
>>>
>>>
>>> On Tue, Aug 29, 2017 at 7:48 AM, Gaurav Yadav  wrote:
>>> > Till now I haven't found anything significant.
>>> >
>>> > Can you send me gluster logs along with command-history-logs for these
>>> > scenarios:
>>> >  Scenario1 : 20 servers
>>> >  Scenario2 : 40 servers
>>> >  Scenario3:  80 Servers
>>> >
>>> >
>>> > Thanks
>>> > Gaurav
>>> >
>>> >
>>> >
>>> > On Mon, Aug 28, 2017 at 11:22 AM, Serkan Çoban 
>>> > wrote:
>>> >>
>>> >> Hi Gaurav,
>>> >> Any progress about the problem?
>>> >>
>>> >> On Thursday, August 24, 2017, Serkan Çoban 
>>> >> wrote:
>>> >>>
>>> >>> Thank you Gaurav,
>>> >>> Here is more findings:
>>> >>> Problem does not happen using only 20 servers each has 68 bricks.
>>> >>> (peer probe only 20 servers)
>>> >>> If we use 40 servers with single volume, glusterd cpu %100 state
>>> >>> continues for 5 minutes and it goes to normal state.
>>> >>> with 80 servers we have no working state yet...
>>> >>>
>>> >>> On Thu, Aug 24, 2017 at 1:33 PM, Gaurav Yadav 
>>> >>> wrote:
>>> >>> >
>>> >>> > I am working on it and will share my findings as soon as possible.
>>> >>> >
>>> >>> >
>>> >>> > Thanks
>>> >>> > Gaurav
>>> >>> >
>>> >>> > On Thu, Aug 24, 2017 at 3:58 PM, Serkan Çoban
>>> >>> > 
>>> >>> > wrote:
>>> >>> >>
>>> >>> >> Restarting glusterd causes the same thing. I tried with 3.12.rc0,
>>> >>> >> 3.10.5. 3.8.15, 3.7.20 all same behavior.
>>> >>> >> My OS is centos 6.9, I tried with centos 6.8 problem remains...
>>> >>> >> Only way to a healthy state is destroy gluster config/rpms,
>>> >>> >> reinstall
>>> >>> >> and recreate volumes.
>>> >>> >>
>>> >>> >> On Thu, Aug 24, 2017 at 8:49 AM, Serkan Çoban
>>> >>> >> 
>>> >>> >> wrote:
>>> >>> >> > Here you can find 10 stack trace samples from glusterd. I wait 10
>>> >>> >> > seconds between each trace.
>>> >>> >> >
>>> >>> >> > https://www.dropbox.com/s/9f36goq5xn3p1yt/glusterd_pstack.zip?dl=0
>>> >>> >> >
>>> >>> >> > Content of the first stack trace is here:
>>> >>> >> >
>>> >>> >> > Thread 8 (Thread 0x7f7a8cd4e700 (LWP 43069)):
>>> >>> >> > #0  0x003aa5c0f00d in nanosleep () from
>>> >>> >> > /lib64/libpthread.so.0
>>> >>> >> > #1  0x00303f837d57 in ?? () from /usr/lib64/libglusterfs.so.0
>>> >>> >> > #2  0x003aa5c07aa1 in start_thread () from
>>> >>> >> > /lib64/libpthread.so.0
>>> >>> >> > #3  0x003aa58e8bbd in clone () from /lib64/libc.so.6
>>> >>> >> > Thread 7 (Thread 0x7f7a8c34d700 (LWP 43070)):
>>> >>> >> > #0  0x003aa5c0f585 in sigwait () from /lib64/libpthread.so.0
>>> >>> >> > #1  0x0040643b in glusterfs_sigwaiter ()
>>> >>> >> > #2  0x003aa5c07aa1 in start_thread () from
>>> >>> >> > /lib64/libpthread.so.0
>>> >>> >> > #3  0x003aa58e8bbd in clone () from /lib64/libc.so.6
>>> >>> >> > Thread 6 (Thread 0x7f7a8b94c700 (LWP 43071)):
>>> >>> >> > #0  0x003aa58acc4d in nanosleep () from /lib64/libc.so.6
>>> >>> >> > #1  0x003aa58acac0 in sleep () from /lib64/libc.so.6
>>> >>> >> > #2  0x00303f8528fb in pool_sweeper () from
>>> >>> >> > /usr/lib64/libglusterfs.so.0
>>> >>> >> > #3  0x003aa5c07aa1 in start_thread () from
>>> >>> >> > /lib64/libpthread.so.0
>>> >>> >> > #4  0x003aa58e8bbd in clone () from /lib64/libc.so.6
>>> >>> >> > Thread 5 (Thread 0x7f7a8af4b700 (LWP 43072)):
>>> >>> >> > #0  0x003aa5c0ba5e in pthread_cond_timedwait@@GLIBC_2.3.2 ()
>>> >>> >> > from
>>> >>> >> > /lib64/libpthread.so.0
>>> >>> >> > #1  0x00303f864afc in syncenv_task () from
>>> >>> >> > /usr/lib64/libglusterfs.so.0
>>> >>> >> > #2  0x00303f8729f0 in syncenv_processor () from
>>> >>> >> > /usr/lib64/libglusterfs.so.0
>>> >>> >> > #3  0x003aa5c07aa1 in start_thread () from
>>> >>> >> > /lib64/libpthread.so.0
>>> >>> >> > #4  0x003aa58e8bbd in clone () from /lib64/libc.so.6
>>> >>> >> > Thread 4 (Thread 0x7f7a8a54a700 (LWP 43073)):
>>> >>> >> > #0  0x003aa5c0ba5e in pthread_cond_timedwait@@GLIBC_2.3.2 ()

Re: [Gluster-users] Glusterd proccess hangs on reboot

2017-08-29 Thread Serkan Çoban

Here is the logs after stopping all three volumes and restarting
glusterd in all nodes. I waited 70 minutes after glusterd restart but
it is still consuming %100 CPU.
https://www.dropbox.com/s/pzl0f198v03twx3/80servers_after_glusterd_restart.zip?dl=0


On Tue, Aug 29, 2017 at 12:37 PM, Gaurav Yadav  wrote:
>
> I believe logs you have shared logs which consist of create volume followed
> by starting the volume.
> However, you have mentioned that when a node from 80 server cluster gets
> rebooted, glusterd process hangs.
>
> Could you please provide the logs which led glusterd to hang for all the
> cases along with gusterd process utilization.
>
>
> Thanks
> Gaurav
>
>
>
>
>
>
> On Tue, Aug 29, 2017 at 2:44 PM, Serkan Çoban  wrote:
>>
>> Here is the requested logs:
>>
>> https://www.dropbox.com/s/vt187h0gtu5doip/gluster_logs_20_40_80_servers.zip?dl=0
>>
>>
>> On Tue, Aug 29, 2017 at 7:48 AM, Gaurav Yadav  wrote:
>> > Till now I haven't found anything significant.
>> >
>> > Can you send me gluster logs along with command-history-logs for these
>> > scenarios:
>> >  Scenario1 : 20 servers
>> >  Scenario2 : 40 servers
>> >  Scenario3:  80 Servers
>> >
>> >
>> > Thanks
>> > Gaurav
>> >
>> >
>> >
>> > On Mon, Aug 28, 2017 at 11:22 AM, Serkan Çoban 
>> > wrote:
>> >>
>> >> Hi Gaurav,
>> >> Any progress about the problem?
>> >>
>> >> On Thursday, August 24, 2017, Serkan Çoban 
>> >> wrote:
>> >>>
>> >>> Thank you Gaurav,
>> >>> Here is more findings:
>> >>> Problem does not happen using only 20 servers each has 68 bricks.
>> >>> (peer probe only 20 servers)
>> >>> If we use 40 servers with single volume, glusterd cpu %100 state
>> >>> continues for 5 minutes and it goes to normal state.
>> >>> with 80 servers we have no working state yet...
>> >>>
>> >>> On Thu, Aug 24, 2017 at 1:33 PM, Gaurav Yadav 
>> >>> wrote:
>> >>> >
>> >>> > I am working on it and will share my findings as soon as possible.
>> >>> >
>> >>> >
>> >>> > Thanks
>> >>> > Gaurav
>> >>> >
>> >>> > On Thu, Aug 24, 2017 at 3:58 PM, Serkan Çoban
>> >>> > 
>> >>> > wrote:
>> >>> >>
>> >>> >> Restarting glusterd causes the same thing. I tried with 3.12.rc0,
>> >>> >> 3.10.5. 3.8.15, 3.7.20 all same behavior.
>> >>> >> My OS is centos 6.9, I tried with centos 6.8 problem remains...
>> >>> >> Only way to a healthy state is destroy gluster config/rpms,
>> >>> >> reinstall
>> >>> >> and recreate volumes.
>> >>> >>
>> >>> >> On Thu, Aug 24, 2017 at 8:49 AM, Serkan Çoban
>> >>> >> 
>> >>> >> wrote:
>> >>> >> > Here you can find 10 stack trace samples from glusterd. I wait 10
>> >>> >> > seconds between each trace.
>> >>> >> >
>> >>> >> > https://www.dropbox.com/s/9f36goq5xn3p1yt/glusterd_pstack.zip?dl=0
>> >>> >> >
>> >>> >> > Content of the first stack trace is here:
>> >>> >> >
>> >>> >> > Thread 8 (Thread 0x7f7a8cd4e700 (LWP 43069)):
>> >>> >> > #0  0x003aa5c0f00d in nanosleep () from
>> >>> >> > /lib64/libpthread.so.0
>> >>> >> > #1  0x00303f837d57 in ?? () from /usr/lib64/libglusterfs.so.0
>> >>> >> > #2  0x003aa5c07aa1 in start_thread () from
>> >>> >> > /lib64/libpthread.so.0
>> >>> >> > #3  0x003aa58e8bbd in clone () from /lib64/libc.so.6
>> >>> >> > Thread 7 (Thread 0x7f7a8c34d700 (LWP 43070)):
>> >>> >> > #0  0x003aa5c0f585 in sigwait () from /lib64/libpthread.so.0
>> >>> >> > #1  0x0040643b in glusterfs_sigwaiter ()
>> >>> >> > #2  0x003aa5c07aa1 in start_thread () from
>> >>> >> > /lib64/libpthread.so.0
>> >>> >> > #3  0x003aa58e8bbd in clone () from /lib64/libc.so.6
>> >>> >> > Thread 6 (Thread 0x7f7a8b94c700 (LWP 43071)):
>> >>> >> > #0  0x003aa58acc4d in nanosleep () from /lib64/libc.so.6
>> >>> >> > #1  0x003aa58acac0 in sleep () from /lib64/libc.so.6
>> >>> >> > #2  0x00303f8528fb in pool_sweeper () from
>> >>> >> > /usr/lib64/libglusterfs.so.0
>> >>> >> > #3  0x003aa5c07aa1 in start_thread () from
>> >>> >> > /lib64/libpthread.so.0
>> >>> >> > #4  0x003aa58e8bbd in clone () from /lib64/libc.so.6
>> >>> >> > Thread 5 (Thread 0x7f7a8af4b700 (LWP 43072)):
>> >>> >> > #0  0x003aa5c0ba5e in pthread_cond_timedwait@@GLIBC_2.3.2 ()
>> >>> >> > from
>> >>> >> > /lib64/libpthread.so.0
>> >>> >> > #1  0x00303f864afc in syncenv_task () from
>> >>> >> > /usr/lib64/libglusterfs.so.0
>> >>> >> > #2  0x00303f8729f0 in syncenv_processor () from
>> >>> >> > /usr/lib64/libglusterfs.so.0
>> >>> >> > #3  0x003aa5c07aa1 in start_thread () from
>> >>> >> > /lib64/libpthread.so.0
>> >>> >> > #4  0x003aa58e8bbd in clone () from /lib64/libc.so.6
>> >>> >> > Thread 4 (Thread 0x7f7a8a54a700 (LWP 43073)):
>> >>> >> > #0  0x003aa5c0ba5e in pthread_cond_timedwait@@GLIBC_2.3.2 ()
>> >>> >> > from
>> >>> >> > /lib64/libpthread.so.0
>> >>> >> > #1  0x00303f864afc in syncenv_task () from
>> >>> >> > /usr/lib64/libglusterfs.so.0
>> >>> >> > #2  0x00303f8729f0 in syncenv_processor () from
>> >>> >> > /usr/lib64/libglusterfs.so.0
>> >>> >> > #3  0x003aa5c07aa1 in start_thread () from
>> >

Re: [Gluster-users] Glusterd proccess hangs on reboot

2017-08-29 Thread Gaurav Yadav

I believe logs you have shared logs which consist of create volume followed
by starting the volume.
However, you have mentioned that when a node from 80 server cluster gets
rebooted, glusterd process hangs.

Could you please provide the logs which led glusterd to hang for all the
cases along with gusterd process utilization.


Thanks
Gaurav






On Tue, Aug 29, 2017 at 2:44 PM, Serkan Çoban  wrote:

> Here is the requested logs:
> https://www.dropbox.com/s/vt187h0gtu5doip/gluster_logs_
> 20_40_80_servers.zip?dl=0
>
>
> On Tue, Aug 29, 2017 at 7:48 AM, Gaurav Yadav  wrote:
> > Till now I haven't found anything significant.
> >
> > Can you send me gluster logs along with command-history-logs for these
> > scenarios:
> >  Scenario1 : 20 servers
> >  Scenario2 : 40 servers
> >  Scenario3:  80 Servers
> >
> >
> > Thanks
> > Gaurav
> >
> >
> >
> > On Mon, Aug 28, 2017 at 11:22 AM, Serkan Çoban 
> > wrote:
> >>
> >> Hi Gaurav,
> >> Any progress about the problem?
> >>
> >> On Thursday, August 24, 2017, Serkan Çoban 
> wrote:
> >>>
> >>> Thank you Gaurav,
> >>> Here is more findings:
> >>> Problem does not happen using only 20 servers each has 68 bricks.
> >>> (peer probe only 20 servers)
> >>> If we use 40 servers with single volume, glusterd cpu %100 state
> >>> continues for 5 minutes and it goes to normal state.
> >>> with 80 servers we have no working state yet...
> >>>
> >>> On Thu, Aug 24, 2017 at 1:33 PM, Gaurav Yadav 
> wrote:
> >>> >
> >>> > I am working on it and will share my findings as soon as possible.
> >>> >
> >>> >
> >>> > Thanks
> >>> > Gaurav
> >>> >
> >>> > On Thu, Aug 24, 2017 at 3:58 PM, Serkan Çoban  >
> >>> > wrote:
> >>> >>
> >>> >> Restarting glusterd causes the same thing. I tried with 3.12.rc0,
> >>> >> 3.10.5. 3.8.15, 3.7.20 all same behavior.
> >>> >> My OS is centos 6.9, I tried with centos 6.8 problem remains...
> >>> >> Only way to a healthy state is destroy gluster config/rpms,
> reinstall
> >>> >> and recreate volumes.
> >>> >>
> >>> >> On Thu, Aug 24, 2017 at 8:49 AM, Serkan Çoban <
> cobanser...@gmail.com>
> >>> >> wrote:
> >>> >> > Here you can find 10 stack trace samples from glusterd. I wait 10
> >>> >> > seconds between each trace.
> >>> >> > https://www.dropbox.com/s/9f36goq5xn3p1yt/glusterd_
> pstack.zip?dl=0
> >>> >> >
> >>> >> > Content of the first stack trace is here:
> >>> >> >
> >>> >> > Thread 8 (Thread 0x7f7a8cd4e700 (LWP 43069)):
> >>> >> > #0  0x003aa5c0f00d in nanosleep () from /lib64/libpthread.so.0
> >>> >> > #1  0x00303f837d57 in ?? () from /usr/lib64/libglusterfs.so.0
> >>> >> > #2  0x003aa5c07aa1 in start_thread () from
> >>> >> > /lib64/libpthread.so.0
> >>> >> > #3  0x003aa58e8bbd in clone () from /lib64/libc.so.6
> >>> >> > Thread 7 (Thread 0x7f7a8c34d700 (LWP 43070)):
> >>> >> > #0  0x003aa5c0f585 in sigwait () from /lib64/libpthread.so.0
> >>> >> > #1  0x0040643b in glusterfs_sigwaiter ()
> >>> >> > #2  0x003aa5c07aa1 in start_thread () from
> >>> >> > /lib64/libpthread.so.0
> >>> >> > #3  0x003aa58e8bbd in clone () from /lib64/libc.so.6
> >>> >> > Thread 6 (Thread 0x7f7a8b94c700 (LWP 43071)):
> >>> >> > #0  0x003aa58acc4d in nanosleep () from /lib64/libc.so.6
> >>> >> > #1  0x003aa58acac0 in sleep () from /lib64/libc.so.6
> >>> >> > #2  0x00303f8528fb in pool_sweeper () from
> >>> >> > /usr/lib64/libglusterfs.so.0
> >>> >> > #3  0x003aa5c07aa1 in start_thread () from
> >>> >> > /lib64/libpthread.so.0
> >>> >> > #4  0x003aa58e8bbd in clone () from /lib64/libc.so.6
> >>> >> > Thread 5 (Thread 0x7f7a8af4b700 (LWP 43072)):
> >>> >> > #0  0x003aa5c0ba5e in pthread_cond_timedwait@@GLIBC_2.3.2 ()
> >>> >> > from
> >>> >> > /lib64/libpthread.so.0
> >>> >> > #1  0x00303f864afc in syncenv_task () from
> >>> >> > /usr/lib64/libglusterfs.so.0
> >>> >> > #2  0x00303f8729f0 in syncenv_processor () from
> >>> >> > /usr/lib64/libglusterfs.so.0
> >>> >> > #3  0x003aa5c07aa1 in start_thread () from
> >>> >> > /lib64/libpthread.so.0
> >>> >> > #4  0x003aa58e8bbd in clone () from /lib64/libc.so.6
> >>> >> > Thread 4 (Thread 0x7f7a8a54a700 (LWP 43073)):
> >>> >> > #0  0x003aa5c0ba5e in pthread_cond_timedwait@@GLIBC_2.3.2 ()
> >>> >> > from
> >>> >> > /lib64/libpthread.so.0
> >>> >> > #1  0x00303f864afc in syncenv_task () from
> >>> >> > /usr/lib64/libglusterfs.so.0
> >>> >> > #2  0x00303f8729f0 in syncenv_processor () from
> >>> >> > /usr/lib64/libglusterfs.so.0
> >>> >> > #3  0x003aa5c07aa1 in start_thread () from
> >>> >> > /lib64/libpthread.so.0
> >>> >> > #4  0x003aa58e8bbd in clone () from /lib64/libc.so.6
> >>> >> > Thread 3 (Thread 0x7f7a886ac700 (LWP 43075)):
> >>> >> > #0  0x003aa5c0b68c in pthread_cond_wait@@GLIBC_2.3.2 () from
> >>> >> > /lib64/libpthread.so.0
> >>> >> > #1  0x7f7a898a099b in ?? () from
> >>> >> > /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
> >>> >> > #2  0x003aa5c07aa1 in start_thread () from
> >>> >> > /lib64/libpthread.so.0

Re: [Gluster-users] Glusterd proccess hangs on reboot

2017-08-29 Thread Serkan Çoban

Here is the requested logs:
https://www.dropbox.com/s/vt187h0gtu5doip/gluster_logs_20_40_80_servers.zip?dl=0


On Tue, Aug 29, 2017 at 7:48 AM, Gaurav Yadav  wrote:
> Till now I haven't found anything significant.
>
> Can you send me gluster logs along with command-history-logs for these
> scenarios:
>  Scenario1 : 20 servers
>  Scenario2 : 40 servers
>  Scenario3:  80 Servers
>
>
> Thanks
> Gaurav
>
>
>
> On Mon, Aug 28, 2017 at 11:22 AM, Serkan Çoban 
> wrote:
>>
>> Hi Gaurav,
>> Any progress about the problem?
>>
>> On Thursday, August 24, 2017, Serkan Çoban  wrote:
>>>
>>> Thank you Gaurav,
>>> Here is more findings:
>>> Problem does not happen using only 20 servers each has 68 bricks.
>>> (peer probe only 20 servers)
>>> If we use 40 servers with single volume, glusterd cpu %100 state
>>> continues for 5 minutes and it goes to normal state.
>>> with 80 servers we have no working state yet...
>>>
>>> On Thu, Aug 24, 2017 at 1:33 PM, Gaurav Yadav  wrote:
>>> >
>>> > I am working on it and will share my findings as soon as possible.
>>> >
>>> >
>>> > Thanks
>>> > Gaurav
>>> >
>>> > On Thu, Aug 24, 2017 at 3:58 PM, Serkan Çoban 
>>> > wrote:
>>> >>
>>> >> Restarting glusterd causes the same thing. I tried with 3.12.rc0,
>>> >> 3.10.5. 3.8.15, 3.7.20 all same behavior.
>>> >> My OS is centos 6.9, I tried with centos 6.8 problem remains...
>>> >> Only way to a healthy state is destroy gluster config/rpms, reinstall
>>> >> and recreate volumes.
>>> >>
>>> >> On Thu, Aug 24, 2017 at 8:49 AM, Serkan Çoban 
>>> >> wrote:
>>> >> > Here you can find 10 stack trace samples from glusterd. I wait 10
>>> >> > seconds between each trace.
>>> >> > https://www.dropbox.com/s/9f36goq5xn3p1yt/glusterd_pstack.zip?dl=0
>>> >> >
>>> >> > Content of the first stack trace is here:
>>> >> >
>>> >> > Thread 8 (Thread 0x7f7a8cd4e700 (LWP 43069)):
>>> >> > #0  0x003aa5c0f00d in nanosleep () from /lib64/libpthread.so.0
>>> >> > #1  0x00303f837d57 in ?? () from /usr/lib64/libglusterfs.so.0
>>> >> > #2  0x003aa5c07aa1 in start_thread () from
>>> >> > /lib64/libpthread.so.0
>>> >> > #3  0x003aa58e8bbd in clone () from /lib64/libc.so.6
>>> >> > Thread 7 (Thread 0x7f7a8c34d700 (LWP 43070)):
>>> >> > #0  0x003aa5c0f585 in sigwait () from /lib64/libpthread.so.0
>>> >> > #1  0x0040643b in glusterfs_sigwaiter ()
>>> >> > #2  0x003aa5c07aa1 in start_thread () from
>>> >> > /lib64/libpthread.so.0
>>> >> > #3  0x003aa58e8bbd in clone () from /lib64/libc.so.6
>>> >> > Thread 6 (Thread 0x7f7a8b94c700 (LWP 43071)):
>>> >> > #0  0x003aa58acc4d in nanosleep () from /lib64/libc.so.6
>>> >> > #1  0x003aa58acac0 in sleep () from /lib64/libc.so.6
>>> >> > #2  0x00303f8528fb in pool_sweeper () from
>>> >> > /usr/lib64/libglusterfs.so.0
>>> >> > #3  0x003aa5c07aa1 in start_thread () from
>>> >> > /lib64/libpthread.so.0
>>> >> > #4  0x003aa58e8bbd in clone () from /lib64/libc.so.6
>>> >> > Thread 5 (Thread 0x7f7a8af4b700 (LWP 43072)):
>>> >> > #0  0x003aa5c0ba5e in pthread_cond_timedwait@@GLIBC_2.3.2 ()
>>> >> > from
>>> >> > /lib64/libpthread.so.0
>>> >> > #1  0x00303f864afc in syncenv_task () from
>>> >> > /usr/lib64/libglusterfs.so.0
>>> >> > #2  0x00303f8729f0 in syncenv_processor () from
>>> >> > /usr/lib64/libglusterfs.so.0
>>> >> > #3  0x003aa5c07aa1 in start_thread () from
>>> >> > /lib64/libpthread.so.0
>>> >> > #4  0x003aa58e8bbd in clone () from /lib64/libc.so.6
>>> >> > Thread 4 (Thread 0x7f7a8a54a700 (LWP 43073)):
>>> >> > #0  0x003aa5c0ba5e in pthread_cond_timedwait@@GLIBC_2.3.2 ()
>>> >> > from
>>> >> > /lib64/libpthread.so.0
>>> >> > #1  0x00303f864afc in syncenv_task () from
>>> >> > /usr/lib64/libglusterfs.so.0
>>> >> > #2  0x00303f8729f0 in syncenv_processor () from
>>> >> > /usr/lib64/libglusterfs.so.0
>>> >> > #3  0x003aa5c07aa1 in start_thread () from
>>> >> > /lib64/libpthread.so.0
>>> >> > #4  0x003aa58e8bbd in clone () from /lib64/libc.so.6
>>> >> > Thread 3 (Thread 0x7f7a886ac700 (LWP 43075)):
>>> >> > #0  0x003aa5c0b68c in pthread_cond_wait@@GLIBC_2.3.2 () from
>>> >> > /lib64/libpthread.so.0
>>> >> > #1  0x7f7a898a099b in ?? () from
>>> >> > /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
>>> >> > #2  0x003aa5c07aa1 in start_thread () from
>>> >> > /lib64/libpthread.so.0
>>> >> > #3  0x003aa58e8bbd in clone () from /lib64/libc.so.6
>>> >> > Thread 2 (Thread 0x7f7a87cab700 (LWP 43076)):
>>> >> > #0  0x003aa5928692 in __strcmp_sse42 () from /lib64/libc.so.6
>>> >> > #1  0x00303f82244a in ?? () from /usr/lib64/libglusterfs.so.0
>>> >> > #2  0x00303f82433d in ?? () from /usr/lib64/libglusterfs.so.0
>>> >> > #3  0x00303f8245f5 in dict_set () from
>>> >> > /usr/lib64/libglusterfs.so.0
>>> >> > #4  0x00303f82524c in dict_set_str () from
>>> >> > /usr/lib64/libglusterfs.so.0
>>> >> > #5  0x7f7a898da7fd in ?? () from
>>> >> > /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
>>> >> > #6  0x7f7a

Re: [Gluster-users] Glusterd proccess hangs on reboot

2017-08-28 Thread Gaurav Yadav

Till now I haven't found anything significant.

Can you send me gluster logs along with command-history-logs for these
scenarios:
 Scenario1 : 20 servers
 Scenario2 : 40 servers
 Scenario3:  80 Servers


Thanks
Gaurav



On Mon, Aug 28, 2017 at 11:22 AM, Serkan Çoban 
wrote:

> Hi Gaurav,
> Any progress about the problem?
>
> On Thursday, August 24, 2017, Serkan Çoban  wrote:
>
>> Thank you Gaurav,
>> Here is more findings:
>> Problem does not happen using only 20 servers each has 68 bricks.
>> (peer probe only 20 servers)
>> If we use 40 servers with single volume, glusterd cpu %100 state
>> continues for 5 minutes and it goes to normal state.
>> with 80 servers we have no working state yet...
>>
>> On Thu, Aug 24, 2017 at 1:33 PM, Gaurav Yadav  wrote:
>> >
>> > I am working on it and will share my findings as soon as possible.
>> >
>> >
>> > Thanks
>> > Gaurav
>> >
>> > On Thu, Aug 24, 2017 at 3:58 PM, Serkan Çoban 
>> wrote:
>> >>
>> >> Restarting glusterd causes the same thing. I tried with 3.12.rc0,
>> >> 3.10.5. 3.8.15, 3.7.20 all same behavior.
>> >> My OS is centos 6.9, I tried with centos 6.8 problem remains...
>> >> Only way to a healthy state is destroy gluster config/rpms, reinstall
>> >> and recreate volumes.
>> >>
>> >> On Thu, Aug 24, 2017 at 8:49 AM, Serkan Çoban 
>> >> wrote:
>> >> > Here you can find 10 stack trace samples from glusterd. I wait 10
>> >> > seconds between each trace.
>> >> > https://www.dropbox.com/s/9f36goq5xn3p1yt/glusterd_pstack.zip?dl=0
>> >> >
>> >> > Content of the first stack trace is here:
>> >> >
>> >> > Thread 8 (Thread 0x7f7a8cd4e700 (LWP 43069)):
>> >> > #0  0x003aa5c0f00d in nanosleep () from /lib64/libpthread.so.0
>> >> > #1  0x00303f837d57 in ?? () from /usr/lib64/libglusterfs.so.0
>> >> > #2  0x003aa5c07aa1 in start_thread () from /lib64/libpthread.so.0
>> >> > #3  0x003aa58e8bbd in clone () from /lib64/libc.so.6
>> >> > Thread 7 (Thread 0x7f7a8c34d700 (LWP 43070)):
>> >> > #0  0x003aa5c0f585 in sigwait () from /lib64/libpthread.so.0
>> >> > #1  0x0040643b in glusterfs_sigwaiter ()
>> >> > #2  0x003aa5c07aa1 in start_thread () from /lib64/libpthread.so.0
>> >> > #3  0x003aa58e8bbd in clone () from /lib64/libc.so.6
>> >> > Thread 6 (Thread 0x7f7a8b94c700 (LWP 43071)):
>> >> > #0  0x003aa58acc4d in nanosleep () from /lib64/libc.so.6
>> >> > #1  0x003aa58acac0 in sleep () from /lib64/libc.so.6
>> >> > #2  0x00303f8528fb in pool_sweeper () from
>> >> > /usr/lib64/libglusterfs.so.0
>> >> > #3  0x003aa5c07aa1 in start_thread () from /lib64/libpthread.so.0
>> >> > #4  0x003aa58e8bbd in clone () from /lib64/libc.so.6
>> >> > Thread 5 (Thread 0x7f7a8af4b700 (LWP 43072)):
>> >> > #0  0x003aa5c0ba5e in pthread_cond_timedwait@@GLIBC_2.3.2 ()
>> from
>> >> > /lib64/libpthread.so.0
>> >> > #1  0x00303f864afc in syncenv_task () from
>> >> > /usr/lib64/libglusterfs.so.0
>> >> > #2  0x00303f8729f0 in syncenv_processor () from
>> >> > /usr/lib64/libglusterfs.so.0
>> >> > #3  0x003aa5c07aa1 in start_thread () from /lib64/libpthread.so.0
>> >> > #4  0x003aa58e8bbd in clone () from /lib64/libc.so.6
>> >> > Thread 4 (Thread 0x7f7a8a54a700 (LWP 43073)):
>> >> > #0  0x003aa5c0ba5e in pthread_cond_timedwait@@GLIBC_2.3.2 ()
>> from
>> >> > /lib64/libpthread.so.0
>> >> > #1  0x00303f864afc in syncenv_task () from
>> >> > /usr/lib64/libglusterfs.so.0
>> >> > #2  0x00303f8729f0 in syncenv_processor () from
>> >> > /usr/lib64/libglusterfs.so.0
>> >> > #3  0x003aa5c07aa1 in start_thread () from /lib64/libpthread.so.0
>> >> > #4  0x003aa58e8bbd in clone () from /lib64/libc.so.6
>> >> > Thread 3 (Thread 0x7f7a886ac700 (LWP 43075)):
>> >> > #0  0x003aa5c0b68c in pthread_cond_wait@@GLIBC_2.3.2 () from
>> >> > /lib64/libpthread.so.0
>> >> > #1  0x7f7a898a099b in ?? () from
>> >> > /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
>> >> > #2  0x003aa5c07aa1 in start_thread () from /lib64/libpthread.so.0
>> >> > #3  0x003aa58e8bbd in clone () from /lib64/libc.so.6
>> >> > Thread 2 (Thread 0x7f7a87cab700 (LWP 43076)):
>> >> > #0  0x003aa5928692 in __strcmp_sse42 () from /lib64/libc.so.6
>> >> > #1  0x00303f82244a in ?? () from /usr/lib64/libglusterfs.so.0
>> >> > #2  0x00303f82433d in ?? () from /usr/lib64/libglusterfs.so.0
>> >> > #3  0x00303f8245f5 in dict_set () from
>> /usr/lib64/libglusterfs.so.0
>> >> > #4  0x00303f82524c in dict_set_str () from
>> >> > /usr/lib64/libglusterfs.so.0
>> >> > #5  0x7f7a898da7fd in ?? () from
>> >> > /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
>> >> > #6  0x7f7a8981b0df in ?? () from
>> >> > /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
>> >> > #7  0x7f7a8981b47c in ?? () from
>> >> > /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
>> >> > #8  0x7f7a89831edf in ?? () from
>> >> > /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
>> >> > #9  0x7f7a897f28f7 in ?? () from
>> >> > /usr/lib64/g

Re: [Gluster-users] Glusterd proccess hangs on reboot

2017-08-27 Thread Serkan Çoban

Hi Gaurav,
Any progress about the problem?

On Thursday, August 24, 2017, Serkan Çoban  wrote:

> Thank you Gaurav,
> Here is more findings:
> Problem does not happen using only 20 servers each has 68 bricks.
> (peer probe only 20 servers)
> If we use 40 servers with single volume, glusterd cpu %100 state
> continues for 5 minutes and it goes to normal state.
> with 80 servers we have no working state yet...
>
> On Thu, Aug 24, 2017 at 1:33 PM, Gaurav Yadav  > wrote:
> >
> > I am working on it and will share my findings as soon as possible.
> >
> >
> > Thanks
> > Gaurav
> >
> > On Thu, Aug 24, 2017 at 3:58 PM, Serkan Çoban  > wrote:
> >>
> >> Restarting glusterd causes the same thing. I tried with 3.12.rc0,
> >> 3.10.5. 3.8.15, 3.7.20 all same behavior.
> >> My OS is centos 6.9, I tried with centos 6.8 problem remains...
> >> Only way to a healthy state is destroy gluster config/rpms, reinstall
> >> and recreate volumes.
> >>
> >> On Thu, Aug 24, 2017 at 8:49 AM, Serkan Çoban  >
> >> wrote:
> >> > Here you can find 10 stack trace samples from glusterd. I wait 10
> >> > seconds between each trace.
> >> > https://www.dropbox.com/s/9f36goq5xn3p1yt/glusterd_pstack.zip?dl=0
> >> >
> >> > Content of the first stack trace is here:
> >> >
> >> > Thread 8 (Thread 0x7f7a8cd4e700 (LWP 43069)):
> >> > #0  0x003aa5c0f00d in nanosleep () from /lib64/libpthread.so.0
> >> > #1  0x00303f837d57 in ?? () from /usr/lib64/libglusterfs.so.0
> >> > #2  0x003aa5c07aa1 in start_thread () from /lib64/libpthread.so.0
> >> > #3  0x003aa58e8bbd in clone () from /lib64/libc.so.6
> >> > Thread 7 (Thread 0x7f7a8c34d700 (LWP 43070)):
> >> > #0  0x003aa5c0f585 in sigwait () from /lib64/libpthread.so.0
> >> > #1  0x0040643b in glusterfs_sigwaiter ()
> >> > #2  0x003aa5c07aa1 in start_thread () from /lib64/libpthread.so.0
> >> > #3  0x003aa58e8bbd in clone () from /lib64/libc.so.6
> >> > Thread 6 (Thread 0x7f7a8b94c700 (LWP 43071)):
> >> > #0  0x003aa58acc4d in nanosleep () from /lib64/libc.so.6
> >> > #1  0x003aa58acac0 in sleep () from /lib64/libc.so.6
> >> > #2  0x00303f8528fb in pool_sweeper () from
> >> > /usr/lib64/libglusterfs.so.0
> >> > #3  0x003aa5c07aa1 in start_thread () from /lib64/libpthread.so.0
> >> > #4  0x003aa58e8bbd in clone () from /lib64/libc.so.6
> >> > Thread 5 (Thread 0x7f7a8af4b700 (LWP 43072)):
> >> > #0  0x003aa5c0ba5e in pthread_cond_timedwait@@GLIBC_2.3.2 () from
> >> > /lib64/libpthread.so.0
> >> > #1  0x00303f864afc in syncenv_task () from
> >> > /usr/lib64/libglusterfs.so.0
> >> > #2  0x00303f8729f0 in syncenv_processor () from
> >> > /usr/lib64/libglusterfs.so.0
> >> > #3  0x003aa5c07aa1 in start_thread () from /lib64/libpthread.so.0
> >> > #4  0x003aa58e8bbd in clone () from /lib64/libc.so.6
> >> > Thread 4 (Thread 0x7f7a8a54a700 (LWP 43073)):
> >> > #0  0x003aa5c0ba5e in pthread_cond_timedwait@@GLIBC_2.3.2 () from
> >> > /lib64/libpthread.so.0
> >> > #1  0x00303f864afc in syncenv_task () from
> >> > /usr/lib64/libglusterfs.so.0
> >> > #2  0x00303f8729f0 in syncenv_processor () from
> >> > /usr/lib64/libglusterfs.so.0
> >> > #3  0x003aa5c07aa1 in start_thread () from /lib64/libpthread.so.0
> >> > #4  0x003aa58e8bbd in clone () from /lib64/libc.so.6
> >> > Thread 3 (Thread 0x7f7a886ac700 (LWP 43075)):
> >> > #0  0x003aa5c0b68c in pthread_cond_wait@@GLIBC_2.3.2 () from
> >> > /lib64/libpthread.so.0
> >> > #1  0x7f7a898a099b in ?? () from
> >> > /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
> >> > #2  0x003aa5c07aa1 in start_thread () from /lib64/libpthread.so.0
> >> > #3  0x003aa58e8bbd in clone () from /lib64/libc.so.6
> >> > Thread 2 (Thread 0x7f7a87cab700 (LWP 43076)):
> >> > #0  0x003aa5928692 in __strcmp_sse42 () from /lib64/libc.so.6
> >> > #1  0x00303f82244a in ?? () from /usr/lib64/libglusterfs.so.0
> >> > #2  0x00303f82433d in ?? () from /usr/lib64/libglusterfs.so.0
> >> > #3  0x00303f8245f5 in dict_set () from
> /usr/lib64/libglusterfs.so.0
> >> > #4  0x00303f82524c in dict_set_str () from
> >> > /usr/lib64/libglusterfs.so.0
> >> > #5  0x7f7a898da7fd in ?? () from
> >> > /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
> >> > #6  0x7f7a8981b0df in ?? () from
> >> > /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
> >> > #7  0x7f7a8981b47c in ?? () from
> >> > /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
> >> > #8  0x7f7a89831edf in ?? () from
> >> > /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
> >> > #9  0x7f7a897f28f7 in ?? () from
> >> > /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
> >> > #10 0x7f7a897f0bb9 in ?? () from
> >> > /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
> >> > #11 0x7f7a8984c89a in ?? () from
> >> > /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
> >> > #12 0x7f7a898323ee in ?? () from
> >> > /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
> >> > #13 0x00303f40fad5 i

Re: [Gluster-users] Glusterd proccess hangs on reboot

2017-08-24 Thread Serkan Çoban

Thank you Gaurav,
Here is more findings:
Problem does not happen using only 20 servers each has 68 bricks.
(peer probe only 20 servers)
If we use 40 servers with single volume, glusterd cpu %100 state
continues for 5 minutes and it goes to normal state.
with 80 servers we have no working state yet...

On Thu, Aug 24, 2017 at 1:33 PM, Gaurav Yadav  wrote:
>
> I am working on it and will share my findings as soon as possible.
>
>
> Thanks
> Gaurav
>
> On Thu, Aug 24, 2017 at 3:58 PM, Serkan Çoban  wrote:
>>
>> Restarting glusterd causes the same thing. I tried with 3.12.rc0,
>> 3.10.5. 3.8.15, 3.7.20 all same behavior.
>> My OS is centos 6.9, I tried with centos 6.8 problem remains...
>> Only way to a healthy state is destroy gluster config/rpms, reinstall
>> and recreate volumes.
>>
>> On Thu, Aug 24, 2017 at 8:49 AM, Serkan Çoban 
>> wrote:
>> > Here you can find 10 stack trace samples from glusterd. I wait 10
>> > seconds between each trace.
>> > https://www.dropbox.com/s/9f36goq5xn3p1yt/glusterd_pstack.zip?dl=0
>> >
>> > Content of the first stack trace is here:
>> >
>> > Thread 8 (Thread 0x7f7a8cd4e700 (LWP 43069)):
>> > #0  0x003aa5c0f00d in nanosleep () from /lib64/libpthread.so.0
>> > #1  0x00303f837d57 in ?? () from /usr/lib64/libglusterfs.so.0
>> > #2  0x003aa5c07aa1 in start_thread () from /lib64/libpthread.so.0
>> > #3  0x003aa58e8bbd in clone () from /lib64/libc.so.6
>> > Thread 7 (Thread 0x7f7a8c34d700 (LWP 43070)):
>> > #0  0x003aa5c0f585 in sigwait () from /lib64/libpthread.so.0
>> > #1  0x0040643b in glusterfs_sigwaiter ()
>> > #2  0x003aa5c07aa1 in start_thread () from /lib64/libpthread.so.0
>> > #3  0x003aa58e8bbd in clone () from /lib64/libc.so.6
>> > Thread 6 (Thread 0x7f7a8b94c700 (LWP 43071)):
>> > #0  0x003aa58acc4d in nanosleep () from /lib64/libc.so.6
>> > #1  0x003aa58acac0 in sleep () from /lib64/libc.so.6
>> > #2  0x00303f8528fb in pool_sweeper () from
>> > /usr/lib64/libglusterfs.so.0
>> > #3  0x003aa5c07aa1 in start_thread () from /lib64/libpthread.so.0
>> > #4  0x003aa58e8bbd in clone () from /lib64/libc.so.6
>> > Thread 5 (Thread 0x7f7a8af4b700 (LWP 43072)):
>> > #0  0x003aa5c0ba5e in pthread_cond_timedwait@@GLIBC_2.3.2 () from
>> > /lib64/libpthread.so.0
>> > #1  0x00303f864afc in syncenv_task () from
>> > /usr/lib64/libglusterfs.so.0
>> > #2  0x00303f8729f0 in syncenv_processor () from
>> > /usr/lib64/libglusterfs.so.0
>> > #3  0x003aa5c07aa1 in start_thread () from /lib64/libpthread.so.0
>> > #4  0x003aa58e8bbd in clone () from /lib64/libc.so.6
>> > Thread 4 (Thread 0x7f7a8a54a700 (LWP 43073)):
>> > #0  0x003aa5c0ba5e in pthread_cond_timedwait@@GLIBC_2.3.2 () from
>> > /lib64/libpthread.so.0
>> > #1  0x00303f864afc in syncenv_task () from
>> > /usr/lib64/libglusterfs.so.0
>> > #2  0x00303f8729f0 in syncenv_processor () from
>> > /usr/lib64/libglusterfs.so.0
>> > #3  0x003aa5c07aa1 in start_thread () from /lib64/libpthread.so.0
>> > #4  0x003aa58e8bbd in clone () from /lib64/libc.so.6
>> > Thread 3 (Thread 0x7f7a886ac700 (LWP 43075)):
>> > #0  0x003aa5c0b68c in pthread_cond_wait@@GLIBC_2.3.2 () from
>> > /lib64/libpthread.so.0
>> > #1  0x7f7a898a099b in ?? () from
>> > /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
>> > #2  0x003aa5c07aa1 in start_thread () from /lib64/libpthread.so.0
>> > #3  0x003aa58e8bbd in clone () from /lib64/libc.so.6
>> > Thread 2 (Thread 0x7f7a87cab700 (LWP 43076)):
>> > #0  0x003aa5928692 in __strcmp_sse42 () from /lib64/libc.so.6
>> > #1  0x00303f82244a in ?? () from /usr/lib64/libglusterfs.so.0
>> > #2  0x00303f82433d in ?? () from /usr/lib64/libglusterfs.so.0
>> > #3  0x00303f8245f5 in dict_set () from /usr/lib64/libglusterfs.so.0
>> > #4  0x00303f82524c in dict_set_str () from
>> > /usr/lib64/libglusterfs.so.0
>> > #5  0x7f7a898da7fd in ?? () from
>> > /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
>> > #6  0x7f7a8981b0df in ?? () from
>> > /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
>> > #7  0x7f7a8981b47c in ?? () from
>> > /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
>> > #8  0x7f7a89831edf in ?? () from
>> > /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
>> > #9  0x7f7a897f28f7 in ?? () from
>> > /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
>> > #10 0x7f7a897f0bb9 in ?? () from
>> > /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
>> > #11 0x7f7a8984c89a in ?? () from
>> > /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
>> > #12 0x7f7a898323ee in ?? () from
>> > /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
>> > #13 0x00303f40fad5 in rpc_clnt_handle_reply () from
>> > /usr/lib64/libgfrpc.so.0
>> > #14 0x00303f410c85 in rpc_clnt_notify () from
>> > /usr/lib64/libgfrpc.so.0
>> > #15 0x00303f40bd68 in rpc_transport_notify () from
>> > /usr/lib64/libgfrpc.so.0
>> > #16 0x7f7a88a6fccd in ?? () from
>> > /usr/lib64/glus

Re: [Gluster-users] Glusterd proccess hangs on reboot

2017-08-24 Thread Gaurav Yadav

I am working on it and will share my findings as soon as possible.


Thanks
Gaurav

On Thu, Aug 24, 2017 at 3:58 PM, Serkan Çoban  wrote:

> Restarting glusterd causes the same thing. I tried with 3.12.rc0,
> 3.10.5. 3.8.15, 3.7.20 all same behavior.
> My OS is centos 6.9, I tried with centos 6.8 problem remains...
> Only way to a healthy state is destroy gluster config/rpms, reinstall
> and recreate volumes.
>
> On Thu, Aug 24, 2017 at 8:49 AM, Serkan Çoban 
> wrote:
> > Here you can find 10 stack trace samples from glusterd. I wait 10
> > seconds between each trace.
> > https://www.dropbox.com/s/9f36goq5xn3p1yt/glusterd_pstack.zip?dl=0
> >
> > Content of the first stack trace is here:
> >
> > Thread 8 (Thread 0x7f7a8cd4e700 (LWP 43069)):
> > #0  0x003aa5c0f00d in nanosleep () from /lib64/libpthread.so.0
> > #1  0x00303f837d57 in ?? () from /usr/lib64/libglusterfs.so.0
> > #2  0x003aa5c07aa1 in start_thread () from /lib64/libpthread.so.0
> > #3  0x003aa58e8bbd in clone () from /lib64/libc.so.6
> > Thread 7 (Thread 0x7f7a8c34d700 (LWP 43070)):
> > #0  0x003aa5c0f585 in sigwait () from /lib64/libpthread.so.0
> > #1  0x0040643b in glusterfs_sigwaiter ()
> > #2  0x003aa5c07aa1 in start_thread () from /lib64/libpthread.so.0
> > #3  0x003aa58e8bbd in clone () from /lib64/libc.so.6
> > Thread 6 (Thread 0x7f7a8b94c700 (LWP 43071)):
> > #0  0x003aa58acc4d in nanosleep () from /lib64/libc.so.6
> > #1  0x003aa58acac0 in sleep () from /lib64/libc.so.6
> > #2  0x00303f8528fb in pool_sweeper () from
> /usr/lib64/libglusterfs.so.0
> > #3  0x003aa5c07aa1 in start_thread () from /lib64/libpthread.so.0
> > #4  0x003aa58e8bbd in clone () from /lib64/libc.so.6
> > Thread 5 (Thread 0x7f7a8af4b700 (LWP 43072)):
> > #0  0x003aa5c0ba5e in pthread_cond_timedwait@@GLIBC_2.3.2 () from
> > /lib64/libpthread.so.0
> > #1  0x00303f864afc in syncenv_task () from
> /usr/lib64/libglusterfs.so.0
> > #2  0x00303f8729f0 in syncenv_processor () from
> /usr/lib64/libglusterfs.so.0
> > #3  0x003aa5c07aa1 in start_thread () from /lib64/libpthread.so.0
> > #4  0x003aa58e8bbd in clone () from /lib64/libc.so.6
> > Thread 4 (Thread 0x7f7a8a54a700 (LWP 43073)):
> > #0  0x003aa5c0ba5e in pthread_cond_timedwait@@GLIBC_2.3.2 () from
> > /lib64/libpthread.so.0
> > #1  0x00303f864afc in syncenv_task () from
> /usr/lib64/libglusterfs.so.0
> > #2  0x00303f8729f0 in syncenv_processor () from
> /usr/lib64/libglusterfs.so.0
> > #3  0x003aa5c07aa1 in start_thread () from /lib64/libpthread.so.0
> > #4  0x003aa58e8bbd in clone () from /lib64/libc.so.6
> > Thread 3 (Thread 0x7f7a886ac700 (LWP 43075)):
> > #0  0x003aa5c0b68c in pthread_cond_wait@@GLIBC_2.3.2 () from
> > /lib64/libpthread.so.0
> > #1  0x7f7a898a099b in ?? () from
> > /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
> > #2  0x003aa5c07aa1 in start_thread () from /lib64/libpthread.so.0
> > #3  0x003aa58e8bbd in clone () from /lib64/libc.so.6
> > Thread 2 (Thread 0x7f7a87cab700 (LWP 43076)):
> > #0  0x003aa5928692 in __strcmp_sse42 () from /lib64/libc.so.6
> > #1  0x00303f82244a in ?? () from /usr/lib64/libglusterfs.so.0
> > #2  0x00303f82433d in ?? () from /usr/lib64/libglusterfs.so.0
> > #3  0x00303f8245f5 in dict_set () from /usr/lib64/libglusterfs.so.0
> > #4  0x00303f82524c in dict_set_str () from
> /usr/lib64/libglusterfs.so.0
> > #5  0x7f7a898da7fd in ?? () from
> > /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
> > #6  0x7f7a8981b0df in ?? () from
> > /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
> > #7  0x7f7a8981b47c in ?? () from
> > /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
> > #8  0x7f7a89831edf in ?? () from
> > /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
> > #9  0x7f7a897f28f7 in ?? () from
> > /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
> > #10 0x7f7a897f0bb9 in ?? () from
> > /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
> > #11 0x7f7a8984c89a in ?? () from
> > /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
> > #12 0x7f7a898323ee in ?? () from
> > /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
> > #13 0x00303f40fad5 in rpc_clnt_handle_reply () from
> /usr/lib64/libgfrpc.so.0
> > #14 0x00303f410c85 in rpc_clnt_notify () from
> /usr/lib64/libgfrpc.so.0
> > #15 0x00303f40bd68 in rpc_transport_notify () from
> /usr/lib64/libgfrpc.so.0
> > #16 0x7f7a88a6fccd in ?? () from
> > /usr/lib64/glusterfs/3.10.5/rpc-transport/socket.so
> > #17 0x7f7a88a70ffe in ?? () from
> > /usr/lib64/glusterfs/3.10.5/rpc-transport/socket.so
> > #18 0x00303f887806 in ?? () from /usr/lib64/libglusterfs.so.0
> > #19 0x003aa5c07aa1 in start_thread () from /lib64/libpthread.so.0
> > #20 0x003aa58e8bbd in clone () from /lib64/libc.so.6
> > Thread 1 (Thread 0x7f7a93844740 (LWP 43068)):
> > #0  0x003aa5c082fd in pthread_join () from /lib64/libpthread.so.0
> > #1  0x

Re: [Gluster-users] Glusterd proccess hangs on reboot

2017-08-24 Thread Serkan Çoban

Restarting glusterd causes the same thing. I tried with 3.12.rc0,
3.10.5. 3.8.15, 3.7.20 all same behavior.
My OS is centos 6.9, I tried with centos 6.8 problem remains...
Only way to a healthy state is destroy gluster config/rpms, reinstall
and recreate volumes.

On Thu, Aug 24, 2017 at 8:49 AM, Serkan Çoban  wrote:
> Here you can find 10 stack trace samples from glusterd. I wait 10
> seconds between each trace.
> https://www.dropbox.com/s/9f36goq5xn3p1yt/glusterd_pstack.zip?dl=0
>
> Content of the first stack trace is here:
>
> Thread 8 (Thread 0x7f7a8cd4e700 (LWP 43069)):
> #0  0x003aa5c0f00d in nanosleep () from /lib64/libpthread.so.0
> #1  0x00303f837d57 in ?? () from /usr/lib64/libglusterfs.so.0
> #2  0x003aa5c07aa1 in start_thread () from /lib64/libpthread.so.0
> #3  0x003aa58e8bbd in clone () from /lib64/libc.so.6
> Thread 7 (Thread 0x7f7a8c34d700 (LWP 43070)):
> #0  0x003aa5c0f585 in sigwait () from /lib64/libpthread.so.0
> #1  0x0040643b in glusterfs_sigwaiter ()
> #2  0x003aa5c07aa1 in start_thread () from /lib64/libpthread.so.0
> #3  0x003aa58e8bbd in clone () from /lib64/libc.so.6
> Thread 6 (Thread 0x7f7a8b94c700 (LWP 43071)):
> #0  0x003aa58acc4d in nanosleep () from /lib64/libc.so.6
> #1  0x003aa58acac0 in sleep () from /lib64/libc.so.6
> #2  0x00303f8528fb in pool_sweeper () from /usr/lib64/libglusterfs.so.0
> #3  0x003aa5c07aa1 in start_thread () from /lib64/libpthread.so.0
> #4  0x003aa58e8bbd in clone () from /lib64/libc.so.6
> Thread 5 (Thread 0x7f7a8af4b700 (LWP 43072)):
> #0  0x003aa5c0ba5e in pthread_cond_timedwait@@GLIBC_2.3.2 () from
> /lib64/libpthread.so.0
> #1  0x00303f864afc in syncenv_task () from /usr/lib64/libglusterfs.so.0
> #2  0x00303f8729f0 in syncenv_processor () from 
> /usr/lib64/libglusterfs.so.0
> #3  0x003aa5c07aa1 in start_thread () from /lib64/libpthread.so.0
> #4  0x003aa58e8bbd in clone () from /lib64/libc.so.6
> Thread 4 (Thread 0x7f7a8a54a700 (LWP 43073)):
> #0  0x003aa5c0ba5e in pthread_cond_timedwait@@GLIBC_2.3.2 () from
> /lib64/libpthread.so.0
> #1  0x00303f864afc in syncenv_task () from /usr/lib64/libglusterfs.so.0
> #2  0x00303f8729f0 in syncenv_processor () from 
> /usr/lib64/libglusterfs.so.0
> #3  0x003aa5c07aa1 in start_thread () from /lib64/libpthread.so.0
> #4  0x003aa58e8bbd in clone () from /lib64/libc.so.6
> Thread 3 (Thread 0x7f7a886ac700 (LWP 43075)):
> #0  0x003aa5c0b68c in pthread_cond_wait@@GLIBC_2.3.2 () from
> /lib64/libpthread.so.0
> #1  0x7f7a898a099b in ?? () from
> /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
> #2  0x003aa5c07aa1 in start_thread () from /lib64/libpthread.so.0
> #3  0x003aa58e8bbd in clone () from /lib64/libc.so.6
> Thread 2 (Thread 0x7f7a87cab700 (LWP 43076)):
> #0  0x003aa5928692 in __strcmp_sse42 () from /lib64/libc.so.6
> #1  0x00303f82244a in ?? () from /usr/lib64/libglusterfs.so.0
> #2  0x00303f82433d in ?? () from /usr/lib64/libglusterfs.so.0
> #3  0x00303f8245f5 in dict_set () from /usr/lib64/libglusterfs.so.0
> #4  0x00303f82524c in dict_set_str () from /usr/lib64/libglusterfs.so.0
> #5  0x7f7a898da7fd in ?? () from
> /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
> #6  0x7f7a8981b0df in ?? () from
> /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
> #7  0x7f7a8981b47c in ?? () from
> /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
> #8  0x7f7a89831edf in ?? () from
> /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
> #9  0x7f7a897f28f7 in ?? () from
> /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
> #10 0x7f7a897f0bb9 in ?? () from
> /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
> #11 0x7f7a8984c89a in ?? () from
> /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
> #12 0x7f7a898323ee in ?? () from
> /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
> #13 0x00303f40fad5 in rpc_clnt_handle_reply () from 
> /usr/lib64/libgfrpc.so.0
> #14 0x00303f410c85 in rpc_clnt_notify () from /usr/lib64/libgfrpc.so.0
> #15 0x00303f40bd68 in rpc_transport_notify () from 
> /usr/lib64/libgfrpc.so.0
> #16 0x7f7a88a6fccd in ?? () from
> /usr/lib64/glusterfs/3.10.5/rpc-transport/socket.so
> #17 0x7f7a88a70ffe in ?? () from
> /usr/lib64/glusterfs/3.10.5/rpc-transport/socket.so
> #18 0x00303f887806 in ?? () from /usr/lib64/libglusterfs.so.0
> #19 0x003aa5c07aa1 in start_thread () from /lib64/libpthread.so.0
> #20 0x003aa58e8bbd in clone () from /lib64/libc.so.6
> Thread 1 (Thread 0x7f7a93844740 (LWP 43068)):
> #0  0x003aa5c082fd in pthread_join () from /lib64/libpthread.so.0
> #1  0x00303f8872d5 in ?? () from /usr/lib64/libglusterfs.so.0
> #2  0x00409020 in main ()
>
> On Wed, Aug 23, 2017 at 8:46 PM, Atin Mukherjee  wrote:
>> Could you be able to provide the pstack dump of the glusterd process?
>>
>> On Wed, 23 Aug 2017 at 20:22, Atin Mukherjee  wrote:
>>>
>>> Not yet. Gaurav will be

Re: [Gluster-users] Glusterd proccess hangs on reboot

2017-08-23 Thread Serkan Çoban

Here you can find 10 stack trace samples from glusterd. I wait 10
seconds between each trace.
https://www.dropbox.com/s/9f36goq5xn3p1yt/glusterd_pstack.zip?dl=0

Content of the first stack trace is here:

Thread 8 (Thread 0x7f7a8cd4e700 (LWP 43069)):
#0  0x003aa5c0f00d in nanosleep () from /lib64/libpthread.so.0
#1  0x00303f837d57 in ?? () from /usr/lib64/libglusterfs.so.0
#2  0x003aa5c07aa1 in start_thread () from /lib64/libpthread.so.0
#3  0x003aa58e8bbd in clone () from /lib64/libc.so.6
Thread 7 (Thread 0x7f7a8c34d700 (LWP 43070)):
#0  0x003aa5c0f585 in sigwait () from /lib64/libpthread.so.0
#1  0x0040643b in glusterfs_sigwaiter ()
#2  0x003aa5c07aa1 in start_thread () from /lib64/libpthread.so.0
#3  0x003aa58e8bbd in clone () from /lib64/libc.so.6
Thread 6 (Thread 0x7f7a8b94c700 (LWP 43071)):
#0  0x003aa58acc4d in nanosleep () from /lib64/libc.so.6
#1  0x003aa58acac0 in sleep () from /lib64/libc.so.6
#2  0x00303f8528fb in pool_sweeper () from /usr/lib64/libglusterfs.so.0
#3  0x003aa5c07aa1 in start_thread () from /lib64/libpthread.so.0
#4  0x003aa58e8bbd in clone () from /lib64/libc.so.6
Thread 5 (Thread 0x7f7a8af4b700 (LWP 43072)):
#0  0x003aa5c0ba5e in pthread_cond_timedwait@@GLIBC_2.3.2 () from
/lib64/libpthread.so.0
#1  0x00303f864afc in syncenv_task () from /usr/lib64/libglusterfs.so.0
#2  0x00303f8729f0 in syncenv_processor () from /usr/lib64/libglusterfs.so.0
#3  0x003aa5c07aa1 in start_thread () from /lib64/libpthread.so.0
#4  0x003aa58e8bbd in clone () from /lib64/libc.so.6
Thread 4 (Thread 0x7f7a8a54a700 (LWP 43073)):
#0  0x003aa5c0ba5e in pthread_cond_timedwait@@GLIBC_2.3.2 () from
/lib64/libpthread.so.0
#1  0x00303f864afc in syncenv_task () from /usr/lib64/libglusterfs.so.0
#2  0x00303f8729f0 in syncenv_processor () from /usr/lib64/libglusterfs.so.0
#3  0x003aa5c07aa1 in start_thread () from /lib64/libpthread.so.0
#4  0x003aa58e8bbd in clone () from /lib64/libc.so.6
Thread 3 (Thread 0x7f7a886ac700 (LWP 43075)):
#0  0x003aa5c0b68c in pthread_cond_wait@@GLIBC_2.3.2 () from
/lib64/libpthread.so.0
#1  0x7f7a898a099b in ?? () from
/usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
#2  0x003aa5c07aa1 in start_thread () from /lib64/libpthread.so.0
#3  0x003aa58e8bbd in clone () from /lib64/libc.so.6
Thread 2 (Thread 0x7f7a87cab700 (LWP 43076)):
#0  0x003aa5928692 in __strcmp_sse42 () from /lib64/libc.so.6
#1  0x00303f82244a in ?? () from /usr/lib64/libglusterfs.so.0
#2  0x00303f82433d in ?? () from /usr/lib64/libglusterfs.so.0
#3  0x00303f8245f5 in dict_set () from /usr/lib64/libglusterfs.so.0
#4  0x00303f82524c in dict_set_str () from /usr/lib64/libglusterfs.so.0
#5  0x7f7a898da7fd in ?? () from
/usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
#6  0x7f7a8981b0df in ?? () from
/usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
#7  0x7f7a8981b47c in ?? () from
/usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
#8  0x7f7a89831edf in ?? () from
/usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
#9  0x7f7a897f28f7 in ?? () from
/usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
#10 0x7f7a897f0bb9 in ?? () from
/usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
#11 0x7f7a8984c89a in ?? () from
/usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
#12 0x7f7a898323ee in ?? () from
/usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
#13 0x00303f40fad5 in rpc_clnt_handle_reply () from /usr/lib64/libgfrpc.so.0
#14 0x00303f410c85 in rpc_clnt_notify () from /usr/lib64/libgfrpc.so.0
#15 0x00303f40bd68 in rpc_transport_notify () from /usr/lib64/libgfrpc.so.0
#16 0x7f7a88a6fccd in ?? () from
/usr/lib64/glusterfs/3.10.5/rpc-transport/socket.so
#17 0x7f7a88a70ffe in ?? () from
/usr/lib64/glusterfs/3.10.5/rpc-transport/socket.so
#18 0x00303f887806 in ?? () from /usr/lib64/libglusterfs.so.0
#19 0x003aa5c07aa1 in start_thread () from /lib64/libpthread.so.0
#20 0x003aa58e8bbd in clone () from /lib64/libc.so.6
Thread 1 (Thread 0x7f7a93844740 (LWP 43068)):
#0  0x003aa5c082fd in pthread_join () from /lib64/libpthread.so.0
#1  0x00303f8872d5 in ?? () from /usr/lib64/libglusterfs.so.0
#2  0x00409020 in main ()

On Wed, Aug 23, 2017 at 8:46 PM, Atin Mukherjee  wrote:
> Could you be able to provide the pstack dump of the glusterd process?
>
> On Wed, 23 Aug 2017 at 20:22, Atin Mukherjee  wrote:
>>
>> Not yet. Gaurav will be taking a look at it tomorrow.
>>
>> On Wed, 23 Aug 2017 at 20:14, Serkan Çoban  wrote:
>>>
>>> Hi Atin,
>>>
>>> Do you have time to check the logs?
>>>
>>> On Wed, Aug 23, 2017 at 10:02 AM, Serkan Çoban 
>>> wrote:
>>> > Same thing happens with 3.12.rc0. This time perf top shows hanging in
>>> > libglusterfs.so and below is the glusterd logs, which are different
>>> > from 3.10.
>>> > With 3.10.5, after 60-70 minutes CPU usage becomes normal and we see
>>> > brick processes come online and

Re: [Gluster-users] Glusterd proccess hangs on reboot

2017-08-23 Thread Atin Mukherjee

Could you be able to provide the pstack dump of the glusterd process?

On Wed, 23 Aug 2017 at 20:22, Atin Mukherjee  wrote:

> Not yet. Gaurav will be taking a look at it tomorrow.
>
> On Wed, 23 Aug 2017 at 20:14, Serkan Çoban  wrote:
>
>> Hi Atin,
>>
>> Do you have time to check the logs?
>>
>> On Wed, Aug 23, 2017 at 10:02 AM, Serkan Çoban 
>> wrote:
>> > Same thing happens with 3.12.rc0. This time perf top shows hanging in
>> > libglusterfs.so and below is the glusterd logs, which are different
>> > from 3.10.
>> > With 3.10.5, after 60-70 minutes CPU usage becomes normal and we see
>> > brick processes come online and system starts to answer commands like
>> > "gluster peer status"..
>> >
>> > [2017-08-23 06:46:02.150472] E [client_t.c:324:gf_client_ref]
>> > (-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1)
>> > [0x7f5ae2c091b1] -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c)
>> > [0x7f5ae2c0851c] -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9)
>> > [0x7f5ae2ea3949] ) 0-client_t: null client [Invalid argument]
>> > [2017-08-23 06:46:02.152181] E [client_t.c:324:gf_client_ref]
>> > (-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1)
>> > [0x7f5ae2c091b1] -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c)
>> > [0x7f5ae2c0851c] -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9)
>> > [0x7f5ae2ea3949] ) 0-client_t: null client [Invalid argument]
>> > [2017-08-23 06:46:02.152287] E [client_t.c:324:gf_client_ref]
>> > (-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1)
>> > [0x7f5ae2c091b1] -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c)
>> > [0x7f5ae2c0851c] -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9)
>> > [0x7f5ae2ea3949] ) 0-client_t: null client [Invalid argument]
>> > [2017-08-23 06:46:02.153503] E [client_t.c:324:gf_client_ref]
>> > (-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1)
>> > [0x7f5ae2c091b1] -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c)
>> > [0x7f5ae2c0851c] -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9)
>> > [0x7f5ae2ea3949] ) 0-client_t: null client [Invalid argument]
>> > [2017-08-23 06:46:02.153647] E [client_t.c:324:gf_client_ref]
>> > (-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1)
>> > [0x7f5ae2c091b1] -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c)
>> > [0x7f5ae2c0851c] -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9)
>> > [0x7f5ae2ea3949] ) 0-client_t: null client [Invalid argument]
>> > [2017-08-23 06:46:02.153866] E [client_t.c:324:gf_client_ref]
>> > (-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1)
>> > [0x7f5ae2c091b1] -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c)
>> > [0x7f5ae2c0851c] -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9)
>> > [0x7f5ae2ea3949] ) 0-client_t: null client [Invalid argument]
>> > [2017-08-23 06:46:02.153948] E [client_t.c:324:gf_client_ref]
>> > (-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1)
>> > [0x7f5ae2c091b1] -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c)
>> > [0x7f5ae2c0851c] -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9)
>> > [0x7f5ae2ea3949] ) 0-client_t: null client [Invalid argument]
>> > [2017-08-23 06:46:02.154018] E [client_t.c:324:gf_client_ref]
>> > (-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1)
>> > [0x7f5ae2c091b1] -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c)
>> > [0x7f5ae2c0851c] -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9)
>> > [0x7f5ae2ea3949] ) 0-client_t: null client [Invalid argument]
>> > [2017-08-23 06:46:02.154108] E [client_t.c:324:gf_client_ref]
>> > (-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1)
>> > [0x7f5ae2c091b1] -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c)
>> > [0x7f5ae2c0851c] -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9)
>> > [0x7f5ae2ea3949] ) 0-client_t: null client [Invalid argument]
>> > [2017-08-23 06:46:02.154162] E [client_t.c:324:gf_client_ref]
>> > (-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1)
>> > [0x7f5ae2c091b1] -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c)
>> > [0x7f5ae2c0851c] -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9)
>> > [0x7f5ae2ea3949] ) 0-client_t: null client [Invalid argument]
>> > [2017-08-23 06:46:02.154250] E [client_t.c:324:gf_client_ref]
>> > (-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1)
>> > [0x7f5ae2c091b1] -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c)
>> > [0x7f5ae2c0851c] -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9)
>> > [0x7f5ae2ea3949] ) 0-client_t: null client [Invalid argument]
>> > [2017-08-23 06:46:02.154322] E [client_t.c:324:gf_client_ref]
>> > (-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1)
>> > [0x7f5ae2c091b1] -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c)
>> > [0x7f5ae2c0851c] -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9)
>> > [0x7f5ae2ea3949] ) 0-client_t: null client [Invalid argument]
>> > [2017-08-23 06:46:02.154425] E [client_t.c:324:gf_client_ref]
>> > (-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1)
>>

Re: [Gluster-users] Glusterd proccess hangs on reboot

2017-08-23 Thread Atin Mukherjee

Not yet. Gaurav will be taking a look at it tomorrow.

On Wed, 23 Aug 2017 at 20:14, Serkan Çoban  wrote:

> Hi Atin,
>
> Do you have time to check the logs?
>
> On Wed, Aug 23, 2017 at 10:02 AM, Serkan Çoban 
> wrote:
> > Same thing happens with 3.12.rc0. This time perf top shows hanging in
> > libglusterfs.so and below is the glusterd logs, which are different
> > from 3.10.
> > With 3.10.5, after 60-70 minutes CPU usage becomes normal and we see
> > brick processes come online and system starts to answer commands like
> > "gluster peer status"..
> >
> > [2017-08-23 06:46:02.150472] E [client_t.c:324:gf_client_ref]
> > (-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1)
> > [0x7f5ae2c091b1] -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c)
> > [0x7f5ae2c0851c] -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9)
> > [0x7f5ae2ea3949] ) 0-client_t: null client [Invalid argument]
> > [2017-08-23 06:46:02.152181] E [client_t.c:324:gf_client_ref]
> > (-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1)
> > [0x7f5ae2c091b1] -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c)
> > [0x7f5ae2c0851c] -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9)
> > [0x7f5ae2ea3949] ) 0-client_t: null client [Invalid argument]
> > [2017-08-23 06:46:02.152287] E [client_t.c:324:gf_client_ref]
> > (-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1)
> > [0x7f5ae2c091b1] -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c)
> > [0x7f5ae2c0851c] -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9)
> > [0x7f5ae2ea3949] ) 0-client_t: null client [Invalid argument]
> > [2017-08-23 06:46:02.153503] E [client_t.c:324:gf_client_ref]
> > (-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1)
> > [0x7f5ae2c091b1] -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c)
> > [0x7f5ae2c0851c] -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9)
> > [0x7f5ae2ea3949] ) 0-client_t: null client [Invalid argument]
> > [2017-08-23 06:46:02.153647] E [client_t.c:324:gf_client_ref]
> > (-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1)
> > [0x7f5ae2c091b1] -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c)
> > [0x7f5ae2c0851c] -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9)
> > [0x7f5ae2ea3949] ) 0-client_t: null client [Invalid argument]
> > [2017-08-23 06:46:02.153866] E [client_t.c:324:gf_client_ref]
> > (-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1)
> > [0x7f5ae2c091b1] -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c)
> > [0x7f5ae2c0851c] -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9)
> > [0x7f5ae2ea3949] ) 0-client_t: null client [Invalid argument]
> > [2017-08-23 06:46:02.153948] E [client_t.c:324:gf_client_ref]
> > (-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1)
> > [0x7f5ae2c091b1] -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c)
> > [0x7f5ae2c0851c] -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9)
> > [0x7f5ae2ea3949] ) 0-client_t: null client [Invalid argument]
> > [2017-08-23 06:46:02.154018] E [client_t.c:324:gf_client_ref]
> > (-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1)
> > [0x7f5ae2c091b1] -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c)
> > [0x7f5ae2c0851c] -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9)
> > [0x7f5ae2ea3949] ) 0-client_t: null client [Invalid argument]
> > [2017-08-23 06:46:02.154108] E [client_t.c:324:gf_client_ref]
> > (-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1)
> > [0x7f5ae2c091b1] -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c)
> > [0x7f5ae2c0851c] -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9)
> > [0x7f5ae2ea3949] ) 0-client_t: null client [Invalid argument]
> > [2017-08-23 06:46:02.154162] E [client_t.c:324:gf_client_ref]
> > (-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1)
> > [0x7f5ae2c091b1] -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c)
> > [0x7f5ae2c0851c] -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9)
> > [0x7f5ae2ea3949] ) 0-client_t: null client [Invalid argument]
> > [2017-08-23 06:46:02.154250] E [client_t.c:324:gf_client_ref]
> > (-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1)
> > [0x7f5ae2c091b1] -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c)
> > [0x7f5ae2c0851c] -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9)
> > [0x7f5ae2ea3949] ) 0-client_t: null client [Invalid argument]
> > [2017-08-23 06:46:02.154322] E [client_t.c:324:gf_client_ref]
> > (-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1)
> > [0x7f5ae2c091b1] -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c)
> > [0x7f5ae2c0851c] -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9)
> > [0x7f5ae2ea3949] ) 0-client_t: null client [Invalid argument]
> > [2017-08-23 06:46:02.154425] E [client_t.c:324:gf_client_ref]
> > (-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1)
> > [0x7f5ae2c091b1] -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c)
> > [0x7f5ae2c0851c] -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9)
> > [0x7f5ae2ea3949] ) 0-client_t: null client [Invalid argu

Re: [Gluster-users] Glusterd proccess hangs on reboot

2017-08-23 Thread Serkan Çoban

Hi Atin,

Do you have time to check the logs?

On Wed, Aug 23, 2017 at 10:02 AM, Serkan Çoban  wrote:
> Same thing happens with 3.12.rc0. This time perf top shows hanging in
> libglusterfs.so and below is the glusterd logs, which are different
> from 3.10.
> With 3.10.5, after 60-70 minutes CPU usage becomes normal and we see
> brick processes come online and system starts to answer commands like
> "gluster peer status"..
>
> [2017-08-23 06:46:02.150472] E [client_t.c:324:gf_client_ref]
> (-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1)
> [0x7f5ae2c091b1] -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c)
> [0x7f5ae2c0851c] -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9)
> [0x7f5ae2ea3949] ) 0-client_t: null client [Invalid argument]
> [2017-08-23 06:46:02.152181] E [client_t.c:324:gf_client_ref]
> (-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1)
> [0x7f5ae2c091b1] -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c)
> [0x7f5ae2c0851c] -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9)
> [0x7f5ae2ea3949] ) 0-client_t: null client [Invalid argument]
> [2017-08-23 06:46:02.152287] E [client_t.c:324:gf_client_ref]
> (-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1)
> [0x7f5ae2c091b1] -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c)
> [0x7f5ae2c0851c] -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9)
> [0x7f5ae2ea3949] ) 0-client_t: null client [Invalid argument]
> [2017-08-23 06:46:02.153503] E [client_t.c:324:gf_client_ref]
> (-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1)
> [0x7f5ae2c091b1] -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c)
> [0x7f5ae2c0851c] -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9)
> [0x7f5ae2ea3949] ) 0-client_t: null client [Invalid argument]
> [2017-08-23 06:46:02.153647] E [client_t.c:324:gf_client_ref]
> (-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1)
> [0x7f5ae2c091b1] -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c)
> [0x7f5ae2c0851c] -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9)
> [0x7f5ae2ea3949] ) 0-client_t: null client [Invalid argument]
> [2017-08-23 06:46:02.153866] E [client_t.c:324:gf_client_ref]
> (-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1)
> [0x7f5ae2c091b1] -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c)
> [0x7f5ae2c0851c] -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9)
> [0x7f5ae2ea3949] ) 0-client_t: null client [Invalid argument]
> [2017-08-23 06:46:02.153948] E [client_t.c:324:gf_client_ref]
> (-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1)
> [0x7f5ae2c091b1] -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c)
> [0x7f5ae2c0851c] -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9)
> [0x7f5ae2ea3949] ) 0-client_t: null client [Invalid argument]
> [2017-08-23 06:46:02.154018] E [client_t.c:324:gf_client_ref]
> (-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1)
> [0x7f5ae2c091b1] -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c)
> [0x7f5ae2c0851c] -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9)
> [0x7f5ae2ea3949] ) 0-client_t: null client [Invalid argument]
> [2017-08-23 06:46:02.154108] E [client_t.c:324:gf_client_ref]
> (-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1)
> [0x7f5ae2c091b1] -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c)
> [0x7f5ae2c0851c] -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9)
> [0x7f5ae2ea3949] ) 0-client_t: null client [Invalid argument]
> [2017-08-23 06:46:02.154162] E [client_t.c:324:gf_client_ref]
> (-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1)
> [0x7f5ae2c091b1] -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c)
> [0x7f5ae2c0851c] -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9)
> [0x7f5ae2ea3949] ) 0-client_t: null client [Invalid argument]
> [2017-08-23 06:46:02.154250] E [client_t.c:324:gf_client_ref]
> (-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1)
> [0x7f5ae2c091b1] -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c)
> [0x7f5ae2c0851c] -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9)
> [0x7f5ae2ea3949] ) 0-client_t: null client [Invalid argument]
> [2017-08-23 06:46:02.154322] E [client_t.c:324:gf_client_ref]
> (-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1)
> [0x7f5ae2c091b1] -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c)
> [0x7f5ae2c0851c] -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9)
> [0x7f5ae2ea3949] ) 0-client_t: null client [Invalid argument]
> [2017-08-23 06:46:02.154425] E [client_t.c:324:gf_client_ref]
> (-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1)
> [0x7f5ae2c091b1] -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c)
> [0x7f5ae2c0851c] -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9)
> [0x7f5ae2ea3949] ) 0-client_t: null client [Invalid argument]
> [2017-08-23 06:46:02.154494] E [client_t.c:324:gf_client_ref]
> (-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1)
> [0x7f5ae2c091b1] -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c)
> [0x7f5ae2c0851c] -->/usr/lib64/libglusterfs.so.0(gf_clien

Re: [Gluster-users] Glusterd proccess hangs on reboot

2017-08-23 Thread Serkan Çoban

Same thing happens with 3.12.rc0. This time perf top shows hanging in
libglusterfs.so and below is the glusterd logs, which are different
from 3.10.
With 3.10.5, after 60-70 minutes CPU usage becomes normal and we see
brick processes come online and system starts to answer commands like
"gluster peer status"..

[2017-08-23 06:46:02.150472] E [client_t.c:324:gf_client_ref]
(-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1)
[0x7f5ae2c091b1] -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c)
[0x7f5ae2c0851c] -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9)
[0x7f5ae2ea3949] ) 0-client_t: null client [Invalid argument]
[2017-08-23 06:46:02.152181] E [client_t.c:324:gf_client_ref]
(-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1)
[0x7f5ae2c091b1] -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c)
[0x7f5ae2c0851c] -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9)
[0x7f5ae2ea3949] ) 0-client_t: null client [Invalid argument]
[2017-08-23 06:46:02.152287] E [client_t.c:324:gf_client_ref]
(-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1)
[0x7f5ae2c091b1] -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c)
[0x7f5ae2c0851c] -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9)
[0x7f5ae2ea3949] ) 0-client_t: null client [Invalid argument]
[2017-08-23 06:46:02.153503] E [client_t.c:324:gf_client_ref]
(-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1)
[0x7f5ae2c091b1] -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c)
[0x7f5ae2c0851c] -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9)
[0x7f5ae2ea3949] ) 0-client_t: null client [Invalid argument]
[2017-08-23 06:46:02.153647] E [client_t.c:324:gf_client_ref]
(-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1)
[0x7f5ae2c091b1] -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c)
[0x7f5ae2c0851c] -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9)
[0x7f5ae2ea3949] ) 0-client_t: null client [Invalid argument]
[2017-08-23 06:46:02.153866] E [client_t.c:324:gf_client_ref]
(-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1)
[0x7f5ae2c091b1] -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c)
[0x7f5ae2c0851c] -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9)
[0x7f5ae2ea3949] ) 0-client_t: null client [Invalid argument]
[2017-08-23 06:46:02.153948] E [client_t.c:324:gf_client_ref]
(-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1)
[0x7f5ae2c091b1] -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c)
[0x7f5ae2c0851c] -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9)
[0x7f5ae2ea3949] ) 0-client_t: null client [Invalid argument]
[2017-08-23 06:46:02.154018] E [client_t.c:324:gf_client_ref]
(-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1)
[0x7f5ae2c091b1] -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c)
[0x7f5ae2c0851c] -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9)
[0x7f5ae2ea3949] ) 0-client_t: null client [Invalid argument]
[2017-08-23 06:46:02.154108] E [client_t.c:324:gf_client_ref]
(-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1)
[0x7f5ae2c091b1] -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c)
[0x7f5ae2c0851c] -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9)
[0x7f5ae2ea3949] ) 0-client_t: null client [Invalid argument]
[2017-08-23 06:46:02.154162] E [client_t.c:324:gf_client_ref]
(-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1)
[0x7f5ae2c091b1] -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c)
[0x7f5ae2c0851c] -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9)
[0x7f5ae2ea3949] ) 0-client_t: null client [Invalid argument]
[2017-08-23 06:46:02.154250] E [client_t.c:324:gf_client_ref]
(-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1)
[0x7f5ae2c091b1] -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c)
[0x7f5ae2c0851c] -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9)
[0x7f5ae2ea3949] ) 0-client_t: null client [Invalid argument]
[2017-08-23 06:46:02.154322] E [client_t.c:324:gf_client_ref]
(-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1)
[0x7f5ae2c091b1] -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c)
[0x7f5ae2c0851c] -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9)
[0x7f5ae2ea3949] ) 0-client_t: null client [Invalid argument]
[2017-08-23 06:46:02.154425] E [client_t.c:324:gf_client_ref]
(-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1)
[0x7f5ae2c091b1] -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c)
[0x7f5ae2c0851c] -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9)
[0x7f5ae2ea3949] ) 0-client_t: null client [Invalid argument]
[2017-08-23 06:46:02.154494] E [client_t.c:324:gf_client_ref]
(-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1)
[0x7f5ae2c091b1] -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c)
[0x7f5ae2c0851c] -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9)
[0x7f5ae2ea3949] ) 0-client_t: null client [Invalid argument]
[2017-08-23 06:46:02.154575] E [client_t.c:324:gf_client_ref]
(-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1)
[0x7f5ae2c091b1] -->/usr/lib64/libgfrpc.so.0(rpcsvc_request

Re: [Gluster-users] Glusterd proccess hangs on reboot

2017-08-22 Thread Serkan Çoban

I reboot multiple times, also I destroyed the gluster configuration
and recreate multiple times. The behavior is same.

On Tue, Aug 22, 2017 at 6:47 PM, Atin Mukherjee  wrote:
> My guess is there is a corruption in vol list or peer list which has lead
> glusterd to get into a infinite loop of traversing a peer/volume list and
> CPU to hog up. Again this is a guess and I've not got a chance to take a
> detail look at the logs and the strace output.
>
> I believe if you get to reboot the node again the problem will disappear.
>
> On Tue, 22 Aug 2017 at 20:07, Serkan Çoban  wrote:
>>
>> As an addition perf top shows %80 libc-2.12.so __strcmp_sse42 during
>> glusterd %100 cpu usage
>> Hope this helps...
>>
>> On Tue, Aug 22, 2017 at 2:41 PM, Serkan Çoban 
>> wrote:
>> > Hi there,
>> >
>> > I have a strange problem.
>> > Gluster version in 3.10.5, I am testing new servers. Gluster
>> > configuration is 16+4 EC, I have three volumes, each have 1600 bricks.
>> > I can successfully create the cluster and volumes without any
>> > problems. I write data to cluster from 100 clients for 12 hours again
>> > no problem. But when I try to reboot a node, glusterd process hangs on
>> > %100 CPU usage and seems to do nothing, no brick processes come
>> > online. You can find strace of glusterd process for 1 minutes here:
>> >
>> > https://www.dropbox.com/s/c7bxfnbqxze1yus/gluster_strace.out?dl=0
>> >
>> > Here is the glusterd logs:
>> > https://www.dropbox.com/s/hkstb3mdeil9a5u/glusterd.log?dl=0
>> >
>> >
>> > By the way, reboot of one server completes without problem if I reboot
>> > the servers before creating any volumes.
>> ___
>> Gluster-users mailing list
>> Gluster-users@gluster.org
>> http://lists.gluster.org/mailman/listinfo/gluster-users
>
> --
> - Atin (atinm)
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Glusterd proccess hangs on reboot

2017-08-22 Thread Atin Mukherjee

My guess is there is a corruption in vol list or peer list which has lead
glusterd to get into a infinite loop of traversing a peer/volume list and
CPU to hog up. Again this is a guess and I've not got a chance to take a
detail look at the logs and the strace output.

I believe if you get to reboot the node again the problem will disappear.

On Tue, 22 Aug 2017 at 20:07, Serkan Çoban  wrote:

> As an addition perf top shows %80 libc-2.12.so __strcmp_sse42 during
> glusterd %100 cpu usage
> Hope this helps...
>
> On Tue, Aug 22, 2017 at 2:41 PM, Serkan Çoban 
> wrote:
> > Hi there,
> >
> > I have a strange problem.
> > Gluster version in 3.10.5, I am testing new servers. Gluster
> > configuration is 16+4 EC, I have three volumes, each have 1600 bricks.
> > I can successfully create the cluster and volumes without any
> > problems. I write data to cluster from 100 clients for 12 hours again
> > no problem. But when I try to reboot a node, glusterd process hangs on
> > %100 CPU usage and seems to do nothing, no brick processes come
> > online. You can find strace of glusterd process for 1 minutes here:
> >
> > https://www.dropbox.com/s/c7bxfnbqxze1yus/gluster_strace.out?dl=0
> >
> > Here is the glusterd logs:
> > https://www.dropbox.com/s/hkstb3mdeil9a5u/glusterd.log?dl=0
> >
> >
> > By the way, reboot of one server completes without problem if I reboot
> > the servers before creating any volumes.
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users

-- 
- Atin (atinm)
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Glusterd proccess hangs on reboot

2017-08-22 Thread Serkan Çoban

As an addition perf top shows %80 libc-2.12.so __strcmp_sse42 during
glusterd %100 cpu usage
Hope this helps...

On Tue, Aug 22, 2017 at 2:41 PM, Serkan Çoban  wrote:
> Hi there,
>
> I have a strange problem.
> Gluster version in 3.10.5, I am testing new servers. Gluster
> configuration is 16+4 EC, I have three volumes, each have 1600 bricks.
> I can successfully create the cluster and volumes without any
> problems. I write data to cluster from 100 clients for 12 hours again
> no problem. But when I try to reboot a node, glusterd process hangs on
> %100 CPU usage and seems to do nothing, no brick processes come
> online. You can find strace of glusterd process for 1 minutes here:
>
> https://www.dropbox.com/s/c7bxfnbqxze1yus/gluster_strace.out?dl=0
>
> Here is the glusterd logs:
> https://www.dropbox.com/s/hkstb3mdeil9a5u/glusterd.log?dl=0
>
>
> By the way, reboot of one server completes without problem if I reboot
> the servers before creating any volumes.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] Glusterd proccess hangs on reboot

2017-08-22 Thread Serkan Çoban

Hi there,

I have a strange problem.
Gluster version in 3.10.5, I am testing new servers. Gluster
configuration is 16+4 EC, I have three volumes, each have 1600 bricks.
I can successfully create the cluster and volumes without any
problems. I write data to cluster from 100 clients for 12 hours again
no problem. But when I try to reboot a node, glusterd process hangs on
%100 CPU usage and seems to do nothing, no brick processes come
online. You can find strace of glusterd process for 1 minutes here:

https://www.dropbox.com/s/c7bxfnbqxze1yus/gluster_strace.out?dl=0

Here is the glusterd logs:
https://www.dropbox.com/s/hkstb3mdeil9a5u/glusterd.log?dl=0


By the way, reboot of one server completes without problem if I reboot
the servers before creating any volumes.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

38 matches

Mail list logo