Could you clarify something? You say:
" We have about half a million agents running on each of our machines"
in your initial message. I thought maybe it was a language thing, and you 
meant 500,000 goroutines. But then you said:
"There are 7000 goroutines total"

So, you have about 500,000 *processes *running this agent on each machine, 
and each process has around 7,000 gorouines? Is that correct?

On Sunday, June 20, 2021 at 10:48:20 PM UTC-4 zjy19...@gmail.com wrote:

> On Thu, Jun 17, 2021 at 9:19 AM Peter Z  wrote: 
>> > 
>> > The original post is on stackoverflow 
>> https://stackoverflow.com/questions/67999117/unexpected-stuck-in-sync-pool-get
>>  
>> > 
>> > Golang ENV: 
>> > go1.14.3 linux/amd64 
>> > 
>> > Description: 
>> > We have about half a million agents running on each of our machines.The 
>> agent is written in Go. Recently we found that the agent may get stuck, no 
>> response for the sent requests. The metrics exported from the agent show 
>> that a channel in the agent(caching the request) is full. Deep into the 
>> goroutine stacks, we found that the goroutines consuming messages from the 
>> channel are all waiting for a lock.The goroutines Stack details are shown 
>> below. 
>>
>> That is peculiar. What is happening under the lock is that the pool 
>> is allocating a slice that is GOMAXPROCS in length. This shouldn't 
>> take long, obviously. And it only needs to happen when the pool is 
>> first created, or when GOMAXPROCS changes. So: how often do you 
>> create this pool? Is it the case that you create the pool and then 
>> have a large number of goroutines try to Get a value simultaneously? 
>> Or, how often do you change GOMAXPROCS? (And, if you do change 
>> GOMAXPROCS, why?)
>>
> 1) The GOMAXPROCS is not manually changed, it's initialized as default.
> 2) sync.Pool is not explicitly used here, we use fmt package and log 
> package
> (zap from go.uber.org/zap), which will use sync.Pool internally. 
> 3) There are 7000 goroutines total, and about 500 goroutines waiting for 
> the 
> lock, it's not much but 'create a pool and a number of goroutines 
> try go Get a value simultaneously'  may really happen on program start up. 
> 4) Does the 'taskset' command have any effect ? The program is running 
> with '*taskset* -c $last_2nd_core,$last_3rd_core,$last_4th_core'.
>
>>
>> > The stack shows that all of the goroutines are waiting for the global 
>> lock in sync.Pool. But I can't figure out which gouroutine is holding the 
>> lock. There should be a gouroutine which has `sync.runtime_SemacquireMutex` 
>> in it's stack not at the top, but there isn't. 
>>
>> I don't think that is what you would see. I think you would see a 
>> goroutine with pinSlow in the stack but with SemaquireMutex not in the 
>> stack.
>
>  
> Shown as the grep result, all of the goroutines with pinSlow have a 
> SemaquireMutex
>
> [******@****** ~]$ curl ******795/debug/pprof/goroutine?debug=1 
> 2>/dev/null | grep pinSlow -B4
>
> 166 @ 0x438cd0 0x4497e0 0x4497cb 0x449547 0x481c1c 0x482792 0x482793 
> 0x4824ee 0x4821af 0x520ebd 0x51fdff 0x51fcd0 0x737fb4 0x73a836 0x73a813 
> 0x97d660 0x97d60a 0x97d5e9 0x4689e1
>
> # 0x449546 sync.runtime_SemacquireMutex+0x46 
> /home/ferry/ONLINE_SERVICE/other/ferry/task_workspace/gopath/src/******/go-env/go1-14-linux-amd64/src/runtime/sema.go:71
>
> # 0x481c1b sync.(*Mutex).lockSlow+0xfb 
> /home/ferry/ONLINE_SERVICE/other/ferry/task_workspace/gopath/src/******/go-env/go1-14-linux-amd64/src/sync/mutex.go:138
>
> # 0x482791 sync.(*Mutex).Lock+0x271 
> /home/ferry/ONLINE_SERVICE/other/ferry/task_workspace/gopath/src/******/go-env/go1-14-linux-amd64/src/sync/mutex.go:81
>
> # 0x482792 sync.(*Pool).pinSlow+0x272 
> /home/ferry/ONLINE_SERVICE/other/ferry/task_workspace/gopath/src/******/go-env/go1-14-linux-amd64/src/sync/pool.go:213
>
> --
>
> 120 @ 0x438cd0 0x4497e0 0x4497cb 0x449547 0x481c1c 0x482792 0x482793 
> 0x4824ee 0x4821af 0x4f646f 0x51ed7b 0x51ff39 0x5218e7 0x73a8b0 0x97d660 
> 0x97d60a 0x97d5e9 0x4689e1
>
> # 0x449546 sync.runtime_SemacquireMutex+0x46 
> /home/ferry/ONLINE_SERVICE/other/ferry/task_workspace/gopath/src/******/go-env/go1-14-linux-amd64/src/runtime/sema.go:71
>
> # 0x481c1b sync.(*Mutex).lockSlow+0xfb 
> /home/ferry/ONLINE_SERVICE/other/ferry/task_workspace/gopath/src/******/go-env/go1-14-linux-amd64/src/sync/mutex.go:138
>
> # 0x482791 sync.(*Mutex).Lock+0x271 
> /home/ferry/ONLINE_SERVICE/other/ferry/task_workspace/gopath/src/******/go-env/go1-14-linux-amd64/src/sync/mutex.go:81
>
> # 0x482792 sync.(*Pool).pinSlow+0x272 
> /home/ferry/ONLINE_SERVICE/other/ferry/task_workspace/gopath/src/******/go-env/go1-14-linux-amd64/src/sync/pool.go:213
>
> --
>
> 119 @ 0x438cd0 0x4497e0 0x4497cb 0x449547 0x481c1c 0x482792 0x482793 
> 0x4824ee 0x4821af 0x5269e1 0x5269d2 0x526892 0x51f6cd 0x51f116 0x51ff39 
> 0x5218e7 0x73a8b0 0x97d660 0x97d60a 0x97d5e9 0x4689e1
>
> # 0x449546 sync.runtime_SemacquireMutex+0x46 
> /home/ferry/ONLINE_SERVICE/other/ferry/task_workspace/gopath/src/******/go-env/go1-14-linux-amd64/src/runtime/sema.go:71
>
> # 0x481c1b sync.(*Mutex).lockSlow+0xfb 
> /home/ferry/ONLINE_SERVICE/other/ferry/task_workspace/gopath/src/******/go-env/go1-14-linux-amd64/src/sync/mutex.go:138
>
> # 0x482791 sync.(*Mutex).Lock+0x271 
> /home/ferry/ONLINE_SERVICE/other/ferry/task_workspace/gopath/src/******/go-env/go1-14-linux-amd64/src/sync/mutex.go:81
>
> # 0x482792 sync.(*Pool).pinSlow+0x272 
> /home/ferry/ONLINE_SERVICE/other/ferry/task_workspace/gopath/src/******/go-env/go1-14-linux-amd64/src/sync/pool.go:213
>
> --
>
> 59 @ 0x438cd0 0x4497e0 0x4497cb 0x449547 0x481c1c 0x482792 0x482793 
> 0x4824ee 0x4821af 0x4d8291 0x4d5726 0x9857f7 0x9804ca 0x97d5d5 0x4689e1
>
> # 0x449546 sync.runtime_SemacquireMutex+0x46 
> /home/ferry/ONLINE_SERVICE/other/ferry/task_workspace/gopath/src/******/go-env/go1-14-linux-amd64/src/runtime/sema.go:71
>
> # 0x481c1b sync.(*Mutex).lockSlow+0xfb 
> /home/ferry/ONLINE_SERVICE/other/ferry/task_workspace/gopath/src/******/go-env/go1-14-linux-amd64/src/sync/mutex.go:138
>
> # 0x482791 sync.(*Mutex).Lock+0x271 
> /home/ferry/ONLINE_SERVICE/other/ferry/task_workspace/gopath/src/******/go-env/go1-14-linux-amd64/src/sync/mutex.go:81
>
> # 0x482792 sync.(*Pool).pinSlow+0x272 
> /home/ferry/ONLINE_SERVICE/other/ferry/task_workspace/gopath/src/******/go-env/go1-14-linux-amd64/src/sync/pool.go:213
>
> --
>
> 36 @ 0x438cd0 0x4497e0 0x4497cb 0x449547 0x481c1c 0x482792 0x482793 
> 0x4824ee 0x4821af 0x51ed98 0x51ed88 0x51ff39 0x5218e7 0x73a8b0 0x97d660 
> 0x97d60a 0x97d5e9 0x4689e1
>
> # 0x449546 sync.runtime_SemacquireMutex+0x46 
> /home/ferry/ONLINE_SERVICE/other/ferry/task_workspace/gopath/src/******/go-env/go1-14-linux-amd64/src/runtime/sema.go:71
>
> # 0x481c1b sync.(*Mutex).lockSlow+0xfb 
> /home/ferry/ONLINE_SERVICE/other/ferry/task_workspace/gopath/src/******/go-env/go1-14-linux-amd64/src/sync/mutex.go:138
>
> # 0x482791 sync.(*Mutex).Lock+0x271 
> /home/ferry/ONLINE_SERVICE/other/ferry/task_workspace/gopath/src/******/go-env/go1-14-linux-amd64/src/sync/mutex.go:81
>
> # 0x482792 sync.(*Pool).pinSlow+0x272 
> /home/ferry/ONLINE_SERVICE/other/ferry/task_workspace/gopath/src/******/go-env/go1-14-linux-amd64/src/sync/pool.go:213
>
> --
>
> 10 @ 0x438cd0 0x4497e0 0x4497cb 0x449547 0x481c1c 0x482792 0x482793 
> 0x4824ee 0x4821af 0x4d8291 0x4d8856 0x9761b6 0x4689e1
>
> # 0x449546 sync.runtime_SemacquireMutex+0x46 
> /home/ferry/ONLINE_SERVICE/other/ferry/task_workspace/gopath/src/******/go-env/go1-14-linux-amd64/src/runtime/sema.go:71
>
> # 0x481c1b sync.(*Mutex).lockSlow+0xfb 
> /home/ferry/ONLINE_SERVICE/other/ferry/task_workspace/gopath/src/******/go-env/go1-14-linux-amd64/src/sync/mutex.go:138
>
> # 0x482791 sync.(*Mutex).Lock+0x271 
> /home/ferry/ONLINE_SERVICE/other/ferry/task_workspace/gopath/src/******/go-env/go1-14-linux-amd64/src/sync/mutex.go:81
>
> # 0x482792 sync.(*Pool).pinSlow+0x272 
> /home/ferry/ONLINE_SERVICE/other/ferry/task_workspace/gopath/src/******/go-env/go1-14-linux-amd64/src/sync/pool.go:213
>
> --
>
> 2 @ 0x438cd0 0x4497e0 0x4497cb 0x449547 0x481c1c 0x482792 0x482793 
> 0x4824ee 0x4821af 0x7c6ddc 0x7c6dc3 0x7c8cdf 0x4689e1
>
> # 0x449546 sync.runtime_SemacquireMutex+0x46 
> /home/ferry/ONLINE_SERVICE/other/ferry/task_workspace/gopath/src/******/go-env/go1-14-linux-amd64/src/runtime/sema.go:71
>
> # 0x481c1b sync.(*Mutex).lockSlow+0xfb 
> /home/ferry/ONLINE_SERVICE/other/ferry/task_workspace/gopath/src/******/go-env/go1-14-linux-amd64/src/sync/mutex.go:138
>
> # 0x482791 sync.(*Mutex).Lock+0x271 
> /home/ferry/ONLINE_SERVICE/other/ferry/task_workspace/gopath/src/******/go-env/go1-14-linux-amd64/src/sync/mutex.go:81
>
> # 0x482792 sync.(*Pool).pinSlow+0x272 
> /home/ferry/ONLINE_SERVICE/other/ferry/task_workspace/gopath/src/******/go-env/go1-14-linux-amd64/src/sync/pool.go:213
>
>> > Reproduce: 
>> > Can't find a way to reproduce this problem for now. 
>>
>> It's going to be pretty hard for us to solve the problem without a 
>> reproducer.
>
>  
>  We are now trying to reproduce this problem, but  haven't catch the bug.
>
> Thanks for the comment.
>

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/cf7b3384-8f19-4a32-8306-44ecff5a6299n%40googlegroups.com.

Reply via email to