>
> On Thu, Jun 17, 2021 at 9:19 AM Peter Z  wrote: 
> > 
> > The original post is on stackoverflow 
> https://stackoverflow.com/questions/67999117/unexpected-stuck-in-sync-pool-get
>  
> > 
> > Golang ENV: 
> > go1.14.3 linux/amd64 
> > 
> > Description: 
> > We have about half a million agents running on each of our machines.The 
> agent is written in Go. Recently we found that the agent may get stuck, no 
> response for the sent requests. The metrics exported from the agent show 
> that a channel in the agent(caching the request) is full. Deep into the 
> goroutine stacks, we found that the goroutines consuming messages from the 
> channel are all waiting for a lock.The goroutines Stack details are shown 
> below. 
>
> That is peculiar. What is happening under the lock is that the pool 
> is allocating a slice that is GOMAXPROCS in length. This shouldn't 
> take long, obviously. And it only needs to happen when the pool is 
> first created, or when GOMAXPROCS changes. So: how often do you 
> create this pool? Is it the case that you create the pool and then 
> have a large number of goroutines try to Get a value simultaneously? 
> Or, how often do you change GOMAXPROCS? (And, if you do change 
> GOMAXPROCS, why?)
>
1) The GOMAXPROCS is not manually changed, it's initialized as default.
2) sync.Pool is not explicitly used here, we use fmt package and log package
(zap from go.uber.org/zap), which will use sync.Pool internally. 
3) There are 7000 goroutines total, and about 500 goroutines waiting for 
the 
lock, it's not much but 'create a pool and a number of goroutines 
try go Get a value simultaneously'  may really happen on program start up. 
4) Does the 'taskset' command have any effect ? The program is running 
with '*taskset* -c $last_2nd_core,$last_3rd_core,$last_4th_core'.

>
> > The stack shows that all of the goroutines are waiting for the global 
> lock in sync.Pool. But I can't figure out which gouroutine is holding the 
> lock. There should be a gouroutine which has `sync.runtime_SemacquireMutex` 
> in it's stack not at the top, but there isn't. 
>
> I don't think that is what you would see. I think you would see a 
> goroutine with pinSlow in the stack but with SemaquireMutex not in the 
> stack.

 
Shown as the grep result, all of the goroutines with pinSlow have a 
SemaquireMutex

[******@****** ~]$ curl ******795/debug/pprof/goroutine?debug=1 2>/dev/null 
| grep pinSlow -B4

166 @ 0x438cd0 0x4497e0 0x4497cb 0x449547 0x481c1c 0x482792 0x482793 
0x4824ee 0x4821af 0x520ebd 0x51fdff 0x51fcd0 0x737fb4 0x73a836 0x73a813 
0x97d660 0x97d60a 0x97d5e9 0x4689e1

# 0x449546 sync.runtime_SemacquireMutex+0x46 
/home/ferry/ONLINE_SERVICE/other/ferry/task_workspace/gopath/src/******/go-env/go1-14-linux-amd64/src/runtime/sema.go:71

# 0x481c1b sync.(*Mutex).lockSlow+0xfb 
/home/ferry/ONLINE_SERVICE/other/ferry/task_workspace/gopath/src/******/go-env/go1-14-linux-amd64/src/sync/mutex.go:138

# 0x482791 sync.(*Mutex).Lock+0x271 
/home/ferry/ONLINE_SERVICE/other/ferry/task_workspace/gopath/src/******/go-env/go1-14-linux-amd64/src/sync/mutex.go:81

# 0x482792 sync.(*Pool).pinSlow+0x272 
/home/ferry/ONLINE_SERVICE/other/ferry/task_workspace/gopath/src/******/go-env/go1-14-linux-amd64/src/sync/pool.go:213

--

120 @ 0x438cd0 0x4497e0 0x4497cb 0x449547 0x481c1c 0x482792 0x482793 
0x4824ee 0x4821af 0x4f646f 0x51ed7b 0x51ff39 0x5218e7 0x73a8b0 0x97d660 
0x97d60a 0x97d5e9 0x4689e1

# 0x449546 sync.runtime_SemacquireMutex+0x46 
/home/ferry/ONLINE_SERVICE/other/ferry/task_workspace/gopath/src/******/go-env/go1-14-linux-amd64/src/runtime/sema.go:71

# 0x481c1b sync.(*Mutex).lockSlow+0xfb 
/home/ferry/ONLINE_SERVICE/other/ferry/task_workspace/gopath/src/******/go-env/go1-14-linux-amd64/src/sync/mutex.go:138

# 0x482791 sync.(*Mutex).Lock+0x271 
/home/ferry/ONLINE_SERVICE/other/ferry/task_workspace/gopath/src/******/go-env/go1-14-linux-amd64/src/sync/mutex.go:81

# 0x482792 sync.(*Pool).pinSlow+0x272 
/home/ferry/ONLINE_SERVICE/other/ferry/task_workspace/gopath/src/******/go-env/go1-14-linux-amd64/src/sync/pool.go:213

--

119 @ 0x438cd0 0x4497e0 0x4497cb 0x449547 0x481c1c 0x482792 0x482793 
0x4824ee 0x4821af 0x5269e1 0x5269d2 0x526892 0x51f6cd 0x51f116 0x51ff39 
0x5218e7 0x73a8b0 0x97d660 0x97d60a 0x97d5e9 0x4689e1

# 0x449546 sync.runtime_SemacquireMutex+0x46 
/home/ferry/ONLINE_SERVICE/other/ferry/task_workspace/gopath/src/******/go-env/go1-14-linux-amd64/src/runtime/sema.go:71

# 0x481c1b sync.(*Mutex).lockSlow+0xfb 
/home/ferry/ONLINE_SERVICE/other/ferry/task_workspace/gopath/src/******/go-env/go1-14-linux-amd64/src/sync/mutex.go:138

# 0x482791 sync.(*Mutex).Lock+0x271 
/home/ferry/ONLINE_SERVICE/other/ferry/task_workspace/gopath/src/******/go-env/go1-14-linux-amd64/src/sync/mutex.go:81

# 0x482792 sync.(*Pool).pinSlow+0x272 
/home/ferry/ONLINE_SERVICE/other/ferry/task_workspace/gopath/src/******/go-env/go1-14-linux-amd64/src/sync/pool.go:213

--

59 @ 0x438cd0 0x4497e0 0x4497cb 0x449547 0x481c1c 0x482792 0x482793 
0x4824ee 0x4821af 0x4d8291 0x4d5726 0x9857f7 0x9804ca 0x97d5d5 0x4689e1

# 0x449546 sync.runtime_SemacquireMutex+0x46 
/home/ferry/ONLINE_SERVICE/other/ferry/task_workspace/gopath/src/******/go-env/go1-14-linux-amd64/src/runtime/sema.go:71

# 0x481c1b sync.(*Mutex).lockSlow+0xfb 
/home/ferry/ONLINE_SERVICE/other/ferry/task_workspace/gopath/src/******/go-env/go1-14-linux-amd64/src/sync/mutex.go:138

# 0x482791 sync.(*Mutex).Lock+0x271 
/home/ferry/ONLINE_SERVICE/other/ferry/task_workspace/gopath/src/******/go-env/go1-14-linux-amd64/src/sync/mutex.go:81

# 0x482792 sync.(*Pool).pinSlow+0x272 
/home/ferry/ONLINE_SERVICE/other/ferry/task_workspace/gopath/src/******/go-env/go1-14-linux-amd64/src/sync/pool.go:213

--

36 @ 0x438cd0 0x4497e0 0x4497cb 0x449547 0x481c1c 0x482792 0x482793 
0x4824ee 0x4821af 0x51ed98 0x51ed88 0x51ff39 0x5218e7 0x73a8b0 0x97d660 
0x97d60a 0x97d5e9 0x4689e1

# 0x449546 sync.runtime_SemacquireMutex+0x46 
/home/ferry/ONLINE_SERVICE/other/ferry/task_workspace/gopath/src/******/go-env/go1-14-linux-amd64/src/runtime/sema.go:71

# 0x481c1b sync.(*Mutex).lockSlow+0xfb 
/home/ferry/ONLINE_SERVICE/other/ferry/task_workspace/gopath/src/******/go-env/go1-14-linux-amd64/src/sync/mutex.go:138

# 0x482791 sync.(*Mutex).Lock+0x271 
/home/ferry/ONLINE_SERVICE/other/ferry/task_workspace/gopath/src/******/go-env/go1-14-linux-amd64/src/sync/mutex.go:81

# 0x482792 sync.(*Pool).pinSlow+0x272 
/home/ferry/ONLINE_SERVICE/other/ferry/task_workspace/gopath/src/******/go-env/go1-14-linux-amd64/src/sync/pool.go:213

--

10 @ 0x438cd0 0x4497e0 0x4497cb 0x449547 0x481c1c 0x482792 0x482793 
0x4824ee 0x4821af 0x4d8291 0x4d8856 0x9761b6 0x4689e1

# 0x449546 sync.runtime_SemacquireMutex+0x46 
/home/ferry/ONLINE_SERVICE/other/ferry/task_workspace/gopath/src/******/go-env/go1-14-linux-amd64/src/runtime/sema.go:71

# 0x481c1b sync.(*Mutex).lockSlow+0xfb 
/home/ferry/ONLINE_SERVICE/other/ferry/task_workspace/gopath/src/******/go-env/go1-14-linux-amd64/src/sync/mutex.go:138

# 0x482791 sync.(*Mutex).Lock+0x271 
/home/ferry/ONLINE_SERVICE/other/ferry/task_workspace/gopath/src/******/go-env/go1-14-linux-amd64/src/sync/mutex.go:81

# 0x482792 sync.(*Pool).pinSlow+0x272 
/home/ferry/ONLINE_SERVICE/other/ferry/task_workspace/gopath/src/******/go-env/go1-14-linux-amd64/src/sync/pool.go:213

--

2 @ 0x438cd0 0x4497e0 0x4497cb 0x449547 0x481c1c 0x482792 0x482793 0x4824ee 
0x4821af 0x7c6ddc 0x7c6dc3 0x7c8cdf 0x4689e1

# 0x449546 sync.runtime_SemacquireMutex+0x46 
/home/ferry/ONLINE_SERVICE/other/ferry/task_workspace/gopath/src/******/go-env/go1-14-linux-amd64/src/runtime/sema.go:71

# 0x481c1b sync.(*Mutex).lockSlow+0xfb 
/home/ferry/ONLINE_SERVICE/other/ferry/task_workspace/gopath/src/******/go-env/go1-14-linux-amd64/src/sync/mutex.go:138

# 0x482791 sync.(*Mutex).Lock+0x271 
/home/ferry/ONLINE_SERVICE/other/ferry/task_workspace/gopath/src/******/go-env/go1-14-linux-amd64/src/sync/mutex.go:81

# 0x482792 sync.(*Pool).pinSlow+0x272 
/home/ferry/ONLINE_SERVICE/other/ferry/task_workspace/gopath/src/******/go-env/go1-14-linux-amd64/src/sync/pool.go:213

>
>
>
> > Reproduce: 
> > Can't find a way to reproduce this problem for now. 
>
> It's going to be pretty hard for us to solve the problem without a 
> reproducer.

 
 We are now trying to reproduce this problem, but  haven't catch the bug.

Thanks for the comment.

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/abcee1fe-0d0a-4780-9f05-8f9fce1107a7n%40googlegroups.com.

Reply via email to