Hi,

I encountered a wiered problem recently.

*Environment*

Arch: x86_64
CPU Vendor: hygon (AMD Zen1 OEM) (www.hygon.cn) (Hygon C86 7285)
OS: CentOS 7.6
Kernel: 3.10.0-957
Go version: 1.15.6

*What happened*

Our pure go program, named controllerd, stopped working except looping on 
one cpu core (from top command, it consumed one cpu core). Before it 
stopped working, it runs from mid April to Jun 12. 

After profiling the process with perf command, the perf data shows the 
program looped to call __vdso_clock_gettime .

64.20% controllerd [vdso] [.] __vdso_clock_gettime 
8.82% controllerd controllerd [.] runtime.procyield
8.82% controllerd controllerd [.] runtime.suspendG 
8.01% controllerd controllerd [.] runtime.nanotime1
1.47% controllerd [kernel.kallsyms] [k] __enqueue_entity 
0.79% controllerd [kernel.kallsyms] [k] system_call_after_swapgs 
0.79% controllerd [kernel.kallsyms] [k] set_next_entity 
0.68% controllerd [kernel.kallsyms] [k] _raw_gspin_lock
0.68% controllerd [kernel.kallsyms] [k] change_pte_range
0.65% controllerd [kernel.kallsyms] [k] auditsys
0.57% controllerd [kernel.kallsyms] [k] update_curr 
0.45% controllerd [kernel.kallsyms] [k] native_sched_clock 
0.45% controllerd [kernel.kallsyms] [k] cpuacct_charge 
0.45% controllerd [kernel.kallsyms] [k] __x86_indirect_thunk_+rax 
0.45% controllerd [kernel.kallsyms] [k] __audit_syscall_exit
0.34% controllerd [kernel.kallsyms] [k] pick_next_task_fail
0.23% controllerd controllerd [.] runtime.osyield 
0.23% controllerd [kernel.kallsyms] [k] native_queuedc_spin_lock_slowpath
0.23% controllerd [kernel.kallsyms] [k] dput 
0.23% controllerd [kernel.kallsyms] [k] __schedule 
0.23% controllerd [kernel.kallsyms] [k] put_prev_task_fair 
0.23% controllerd [kernel.kallsyms] [k] yield_task_fair
0.22% controllerd [kernel.kallsyms] [k] sys_sched_yield 
0.11% controllerd [kernel.kallsyms] [k] clear_buddies 
0.11% controllerd [kernel.kallsyms] [k] update_rq_clock.part.78 
0.11% controllerd [kernel.kallsyms] [k] update_min_vruntime 
0.11% controllerd [kernel.kallsyms] [k] tick_do_update_jiffies64 
0.11% controllerd [kernel.kallsyms] [k] system_call 
0.11% controllerd [kernel.kallsyms] [k] rb_next 
0.11% controllerd [kernel.kallsyms] [k] rb_insert_color 
0.00% controllerd controllerd [.] runtime.notesleep 
0.00% controllerd [kernel.kallsyms] [k] load_balance 
0.00% controllerd controllerd [.] runtime.runggrab 
0.00% controllerd controllerd [.] runtime.findrunnable 
0.00% controllerd [kernel.kallsyms] [k] update_numa_stats 
0.00% controllerd [kernel.kallsyms] [k] __lock_task_sighand
0.00% controllerd [kernel.kallsyms] [k] idle_cpu
0.00% controllerd [kernel.kallsyms] [k] kmem_cache_free_bulk 
0.00% controllerd [kernel.kallsyms] [k] task_numa_find_cpu
0.00% controllerd [kernel.kallsyms] [k] __queue_work
0.00% controllerd [kernel.kallsyms] [k] __perf_event_task_sched_in
0.00% controllerd [kernel.kallsyms] [k] finish_task_switch
0.00% controllerd [kernel.kallsyms] [k] perf_ctx_unlock
0.00% controllerd [kernel.kallsyms] [k] native_write_msr_safe
0.00% controllerd [kernel.kallsyms] [k] __perf_event_enable

B.T.W, perf command is:

perf record -o controllerd_perf.data -F 99 -p PID sleep 10

I'm not familiar with Go's runtime detail. But after getting the perf data 
and reading code of function suspendG of Go's runtime. I guessed: The 
program looping in suspendG and failed to find any other goroutine to 
execute.

*What was did*


   - Firstly, I suspected the clock source was not set correctly on the 
   system. But it's set to tsc, and it's right.
   - Then, I tried to send SIGUSR1 to the program to see if it could be 
   revived (the controllerd program can be triggered with SIGUSR1 to flush 
   logs), but nothing changed.
   - Finally, we have to restart the program to recover.
   

*Ask for help*


   1. Any body know the reason of the problem?
   2. What can I do to get more information if this problem happened again?
   
Thanks.

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/1552be0a-3c36-4561-8fd0-2e649195a269n%40googlegroups.com.

Reply via email to