[go-nuts] Locking (`GI_pthread_mutex_unlock`) takes most of the execution time when using cgo with large number of CPUs

Bảo Phan Quốc Tue, 20 Jun 2023 05:32:12 -0700


I'm using cgo to call a C function from Go. Inside the C function there is 
a callback to a Go function. In other way, I'm calling Go -> C -> Go.


After running pprof, I noticed that the __GI___pthread_mutex_unlock took 
half of the execution time. AFAIK, cgo has an overhead, especially calling 
back from C to Go. But it's weird that cgo takes half of the execution time 
to do some locking. Is there something wrong in my code?

Here is the gist containing the code 
https://gist.github.com/phqb/2fa5bc76b77208d7f36b151b405e6dcc.

pprof output:

```

(pprof) top
Showing nodes accounting for 136.20s, 89.12% of 152.82s total
Dropped 252 nodes (cum <= 0.76s)
Showing top 10 nodes out of 64
      flat  flat%   sum%        cum   cum%
    68.20s 44.63% 44.63%     68.20s 44.63%  __GI___pthread_mutex_unlock
    54.32s 35.55% 80.17%     54.32s 35.55%  __lll_lock_wait
     3.57s  2.34% 82.51%     57.89s 37.88%  __pthread_mutex_lock
     2.38s  1.56% 84.07%      3.15s  2.06%  runtime.casgstatus
     1.81s  1.18% 85.25%      1.81s  1.18%  runtime.futex
     1.26s  0.82% 86.08%      3.99s  2.61%  runtime.mallocgc
     1.21s  0.79% 86.87%      2.54s  1.66%  runtime.lock2
     1.21s  0.79% 87.66%      1.48s  0.97%  runtime.reentersyscall
     1.15s  0.75% 88.41%      1.15s  0.75%  runtime.procyield
     1.09s  0.71% 89.12%      1.59s  1.04%  runtime.exitsyscallfast

```

Running environment:

* Golang version: go version go1.20.5 linux/amd64

* `lscpu`:

```

Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
Address sizes:                   46 bits physical, 48 bits virtual
CPU(s):                          32
On-line CPU(s) list:             0-31
Thread(s) per core:              2
Core(s) per socket:              16
Socket(s):                       1
NUMA node(s):                    1
Vendor ID:                       GenuineIntel
CPU family:                      6
Model:                           79
Model name:                      Intel(R) Xeon(R) CPU @ 2.20GHz
Stepping:                        0
CPU MHz:                         2200.152
BogoMIPS:                        4400.30
Hypervisor vendor:               KVM
Virtualization type:             full
L1d cache:                       512 KiB
L1i cache:                       512 KiB
L2 cache:                        4 MiB
L3 cache:                        55 MiB
NUMA node0 CPU(s):               0-31
...
Flags:                           fpu vme de pse tsc msr pae mce cx8 apic 
sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx 
pdpe1gb rdtscp lm constant_tsc rep_good nopl xtop
                                 ology nonstop_tsc cpuid tsc_known_freq pni 
pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave 
avx f16c rdrand hypervisor lahf_lm abm 3dnowpre
                                 fetch invpcid_single pti ssbd ibrs ibpb 
stibp fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm rdseed 
adx smap xsaveopt arat md_clear arch_capabilities

```

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/67687f95-af27-42a3-8e44-c3ecfc9f906dn%40googlegroups.com.

[go-nuts] Locking (`__GI___pthread_mutex_unlock`) takes most of the execution time when using cgo with large number of CPUs

Reply via email to

[go-nuts] Locking (`GI_pthread_mutex_unlock`) takes most of the execution time when using cgo with large number of CPUs