Hi!

We have a go program (an api server) on a virtual machine(with 8 cores) 
with a long time stable running.
However, the program recently suffered a weird problem that only a single 
CPU reached 100%
usage while others were very low, in the meanwhile, the network bandwidth 
was totally zero,
also, there were a bunch of tcp connections with CLOSE_WAIT state on the 
server side.
So it seems to me that the program was busily spinning on some events and 
cannot execute our codes.

We sent a QUIT signal to it and got its goroutine stacks, there were 3000+ 
goroutines on there, only two goroutines
were running but 370 goroutines were runnable, others were blocked on the 
channel events. Unfortunately, these two gouroutine stacks
were not available since the "goroutine running on other thread".

We didn't adjust runtime.GOMAXPROCS so the default Ps in Go should be the 
number of processors, i.e. 8. In my
view, the number of running goroutines should be larger, and it seems the 
runq size was somewhat large (even we have
8 Ms which are running user goroutines, the average runq size is 46, if we 
only the global runq).

I don't know what did other Ms do at that time, I know there is a mark 
assistant mechanism in the garbage collector implementation.
But will it use a log of Ms and make the scheduler in trouble?

Go version we use: go/1.12.13.
Os we use: CentOS/3.10.0.

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/87d85095-a8f1-49e9-b079-1e9fe2089a31%40googlegroups.com.

Reply via email to