Could you clarify something? You say: " We have about half a million agents running on each of our machines" in your initial message. I thought maybe it was a language thing, and you meant 500,000 goroutines. But then you said: "There are 7000 goroutines total"
So, you have about 500,000 *processes *running this agent on each machine, and each process has around 7,000 gorouines? Is that correct? On Sunday, June 20, 2021 at 10:48:20 PM UTC-4 zjy19...@gmail.com wrote: > On Thu, Jun 17, 2021 at 9:19 AM Peter Z wrote: >> > >> > The original post is on stackoverflow >> https://stackoverflow.com/questions/67999117/unexpected-stuck-in-sync-pool-get >> >> > >> > Golang ENV: >> > go1.14.3 linux/amd64 >> > >> > Description: >> > We have about half a million agents running on each of our machines.The >> agent is written in Go. Recently we found that the agent may get stuck, no >> response for the sent requests. The metrics exported from the agent show >> that a channel in the agent(caching the request) is full. Deep into the >> goroutine stacks, we found that the goroutines consuming messages from the >> channel are all waiting for a lock.The goroutines Stack details are shown >> below. >> >> That is peculiar. What is happening under the lock is that the pool >> is allocating a slice that is GOMAXPROCS in length. This shouldn't >> take long, obviously. And it only needs to happen when the pool is >> first created, or when GOMAXPROCS changes. So: how often do you >> create this pool? Is it the case that you create the pool and then >> have a large number of goroutines try to Get a value simultaneously? >> Or, how often do you change GOMAXPROCS? (And, if you do change >> GOMAXPROCS, why?) >> > 1) The GOMAXPROCS is not manually changed, it's initialized as default. > 2) sync.Pool is not explicitly used here, we use fmt package and log > package > (zap from go.uber.org/zap), which will use sync.Pool internally. > 3) There are 7000 goroutines total, and about 500 goroutines waiting for > the > lock, it's not much but 'create a pool and a number of goroutines > try go Get a value simultaneously' may really happen on program start up. > 4) Does the 'taskset' command have any effect ? The program is running > with '*taskset* -c $last_2nd_core,$last_3rd_core,$last_4th_core'. > >> >> > The stack shows that all of the goroutines are waiting for the global >> lock in sync.Pool. But I can't figure out which gouroutine is holding the >> lock. There should be a gouroutine which has `sync.runtime_SemacquireMutex` >> in it's stack not at the top, but there isn't. >> >> I don't think that is what you would see. I think you would see a >> goroutine with pinSlow in the stack but with SemaquireMutex not in the >> stack. > > > Shown as the grep result, all of the goroutines with pinSlow have a > SemaquireMutex > > [******@****** ~]$ curl ******795/debug/pprof/goroutine?debug=1 > 2>/dev/null | grep pinSlow -B4 > > 166 @ 0x438cd0 0x4497e0 0x4497cb 0x449547 0x481c1c 0x482792 0x482793 > 0x4824ee 0x4821af 0x520ebd 0x51fdff 0x51fcd0 0x737fb4 0x73a836 0x73a813 > 0x97d660 0x97d60a 0x97d5e9 0x4689e1 > > # 0x449546 sync.runtime_SemacquireMutex+0x46 > /home/ferry/ONLINE_SERVICE/other/ferry/task_workspace/gopath/src/******/go-env/go1-14-linux-amd64/src/runtime/sema.go:71 > > # 0x481c1b sync.(*Mutex).lockSlow+0xfb > /home/ferry/ONLINE_SERVICE/other/ferry/task_workspace/gopath/src/******/go-env/go1-14-linux-amd64/src/sync/mutex.go:138 > > # 0x482791 sync.(*Mutex).Lock+0x271 > /home/ferry/ONLINE_SERVICE/other/ferry/task_workspace/gopath/src/******/go-env/go1-14-linux-amd64/src/sync/mutex.go:81 > > # 0x482792 sync.(*Pool).pinSlow+0x272 > /home/ferry/ONLINE_SERVICE/other/ferry/task_workspace/gopath/src/******/go-env/go1-14-linux-amd64/src/sync/pool.go:213 > > -- > > 120 @ 0x438cd0 0x4497e0 0x4497cb 0x449547 0x481c1c 0x482792 0x482793 > 0x4824ee 0x4821af 0x4f646f 0x51ed7b 0x51ff39 0x5218e7 0x73a8b0 0x97d660 > 0x97d60a 0x97d5e9 0x4689e1 > > # 0x449546 sync.runtime_SemacquireMutex+0x46 > /home/ferry/ONLINE_SERVICE/other/ferry/task_workspace/gopath/src/******/go-env/go1-14-linux-amd64/src/runtime/sema.go:71 > > # 0x481c1b sync.(*Mutex).lockSlow+0xfb > /home/ferry/ONLINE_SERVICE/other/ferry/task_workspace/gopath/src/******/go-env/go1-14-linux-amd64/src/sync/mutex.go:138 > > # 0x482791 sync.(*Mutex).Lock+0x271 > /home/ferry/ONLINE_SERVICE/other/ferry/task_workspace/gopath/src/******/go-env/go1-14-linux-amd64/src/sync/mutex.go:81 > > # 0x482792 sync.(*Pool).pinSlow+0x272 > /home/ferry/ONLINE_SERVICE/other/ferry/task_workspace/gopath/src/******/go-env/go1-14-linux-amd64/src/sync/pool.go:213 > > -- > > 119 @ 0x438cd0 0x4497e0 0x4497cb 0x449547 0x481c1c 0x482792 0x482793 > 0x4824ee 0x4821af 0x5269e1 0x5269d2 0x526892 0x51f6cd 0x51f116 0x51ff39 > 0x5218e7 0x73a8b0 0x97d660 0x97d60a 0x97d5e9 0x4689e1 > > # 0x449546 sync.runtime_SemacquireMutex+0x46 > /home/ferry/ONLINE_SERVICE/other/ferry/task_workspace/gopath/src/******/go-env/go1-14-linux-amd64/src/runtime/sema.go:71 > > # 0x481c1b sync.(*Mutex).lockSlow+0xfb > /home/ferry/ONLINE_SERVICE/other/ferry/task_workspace/gopath/src/******/go-env/go1-14-linux-amd64/src/sync/mutex.go:138 > > # 0x482791 sync.(*Mutex).Lock+0x271 > /home/ferry/ONLINE_SERVICE/other/ferry/task_workspace/gopath/src/******/go-env/go1-14-linux-amd64/src/sync/mutex.go:81 > > # 0x482792 sync.(*Pool).pinSlow+0x272 > /home/ferry/ONLINE_SERVICE/other/ferry/task_workspace/gopath/src/******/go-env/go1-14-linux-amd64/src/sync/pool.go:213 > > -- > > 59 @ 0x438cd0 0x4497e0 0x4497cb 0x449547 0x481c1c 0x482792 0x482793 > 0x4824ee 0x4821af 0x4d8291 0x4d5726 0x9857f7 0x9804ca 0x97d5d5 0x4689e1 > > # 0x449546 sync.runtime_SemacquireMutex+0x46 > /home/ferry/ONLINE_SERVICE/other/ferry/task_workspace/gopath/src/******/go-env/go1-14-linux-amd64/src/runtime/sema.go:71 > > # 0x481c1b sync.(*Mutex).lockSlow+0xfb > /home/ferry/ONLINE_SERVICE/other/ferry/task_workspace/gopath/src/******/go-env/go1-14-linux-amd64/src/sync/mutex.go:138 > > # 0x482791 sync.(*Mutex).Lock+0x271 > /home/ferry/ONLINE_SERVICE/other/ferry/task_workspace/gopath/src/******/go-env/go1-14-linux-amd64/src/sync/mutex.go:81 > > # 0x482792 sync.(*Pool).pinSlow+0x272 > /home/ferry/ONLINE_SERVICE/other/ferry/task_workspace/gopath/src/******/go-env/go1-14-linux-amd64/src/sync/pool.go:213 > > -- > > 36 @ 0x438cd0 0x4497e0 0x4497cb 0x449547 0x481c1c 0x482792 0x482793 > 0x4824ee 0x4821af 0x51ed98 0x51ed88 0x51ff39 0x5218e7 0x73a8b0 0x97d660 > 0x97d60a 0x97d5e9 0x4689e1 > > # 0x449546 sync.runtime_SemacquireMutex+0x46 > /home/ferry/ONLINE_SERVICE/other/ferry/task_workspace/gopath/src/******/go-env/go1-14-linux-amd64/src/runtime/sema.go:71 > > # 0x481c1b sync.(*Mutex).lockSlow+0xfb > /home/ferry/ONLINE_SERVICE/other/ferry/task_workspace/gopath/src/******/go-env/go1-14-linux-amd64/src/sync/mutex.go:138 > > # 0x482791 sync.(*Mutex).Lock+0x271 > /home/ferry/ONLINE_SERVICE/other/ferry/task_workspace/gopath/src/******/go-env/go1-14-linux-amd64/src/sync/mutex.go:81 > > # 0x482792 sync.(*Pool).pinSlow+0x272 > /home/ferry/ONLINE_SERVICE/other/ferry/task_workspace/gopath/src/******/go-env/go1-14-linux-amd64/src/sync/pool.go:213 > > -- > > 10 @ 0x438cd0 0x4497e0 0x4497cb 0x449547 0x481c1c 0x482792 0x482793 > 0x4824ee 0x4821af 0x4d8291 0x4d8856 0x9761b6 0x4689e1 > > # 0x449546 sync.runtime_SemacquireMutex+0x46 > /home/ferry/ONLINE_SERVICE/other/ferry/task_workspace/gopath/src/******/go-env/go1-14-linux-amd64/src/runtime/sema.go:71 > > # 0x481c1b sync.(*Mutex).lockSlow+0xfb > /home/ferry/ONLINE_SERVICE/other/ferry/task_workspace/gopath/src/******/go-env/go1-14-linux-amd64/src/sync/mutex.go:138 > > # 0x482791 sync.(*Mutex).Lock+0x271 > /home/ferry/ONLINE_SERVICE/other/ferry/task_workspace/gopath/src/******/go-env/go1-14-linux-amd64/src/sync/mutex.go:81 > > # 0x482792 sync.(*Pool).pinSlow+0x272 > /home/ferry/ONLINE_SERVICE/other/ferry/task_workspace/gopath/src/******/go-env/go1-14-linux-amd64/src/sync/pool.go:213 > > -- > > 2 @ 0x438cd0 0x4497e0 0x4497cb 0x449547 0x481c1c 0x482792 0x482793 > 0x4824ee 0x4821af 0x7c6ddc 0x7c6dc3 0x7c8cdf 0x4689e1 > > # 0x449546 sync.runtime_SemacquireMutex+0x46 > /home/ferry/ONLINE_SERVICE/other/ferry/task_workspace/gopath/src/******/go-env/go1-14-linux-amd64/src/runtime/sema.go:71 > > # 0x481c1b sync.(*Mutex).lockSlow+0xfb > /home/ferry/ONLINE_SERVICE/other/ferry/task_workspace/gopath/src/******/go-env/go1-14-linux-amd64/src/sync/mutex.go:138 > > # 0x482791 sync.(*Mutex).Lock+0x271 > /home/ferry/ONLINE_SERVICE/other/ferry/task_workspace/gopath/src/******/go-env/go1-14-linux-amd64/src/sync/mutex.go:81 > > # 0x482792 sync.(*Pool).pinSlow+0x272 > /home/ferry/ONLINE_SERVICE/other/ferry/task_workspace/gopath/src/******/go-env/go1-14-linux-amd64/src/sync/pool.go:213 > >> > Reproduce: >> > Can't find a way to reproduce this problem for now. >> >> It's going to be pretty hard for us to solve the problem without a >> reproducer. > > > We are now trying to reproduce this problem, but haven't catch the bug. > > Thanks for the comment. > -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/cf7b3384-8f19-4a32-8306-44ecff5a6299n%40googlegroups.com.