First time posting on this list, but I've gone through a very similar exercise the last few months and might have some good insight for you.
Learning how to interpret profiles is extremely useful here. For example, capturing a cpu profile, heap profile, and trace will all provide different facets of what's going on under the hood. Luckily since you have high CPU usage the cpu profile alone will still be very useful. For a couple low hanging fruit: armed with a CPU and heap profile, take a look at both. You say the runtime and gc dominate the CPU pprof: this likely points to memory issues as you mentioned. Open up a heap profile, switch the `sample_index` to `alloc_space` or `alloc_objects`, and take a look to see who the largest offenders are. For a more clear pointer to the offending code's callstack, set `call_tree`, then take another look. I believe that spending a few hours/days on learning the pprof and trace tools would pay dividends given the scope of your task. It's hard to give any more detailed suggestions for performance while flying blind. Personally when I had a similar looking pprof, the two low hanging fruit were goroutine management (some were lasting longer than expected), and reduced memory usage (work / allocation was needlessly duplicated in several places). if you truly have such a large amount of in use objects (noted by inuse_space / inuse_objects in the heap's pprof), then I agree some form of sync.Pool or memory reuse may be beneficial. On Friday, July 16, 2021 at 5:27:03 AM UTC-7 rmfr wrote: > I run it at an 8 cores 16GB machine and it occupies all cpu cores it could. > > 1. It is ~95% cpu intensive and with ~5% network communications. > > 2. The codebase is huge and has more than 300 thousands of lines of code > (even didn't count the third party library yet). > > 3. The tool pprof tells nearly 50% percent of the time is spending on the > runtime, something related to gc, mallocgc, memclrNoHeapPointers, and so on. > > 4. It has ~100 million dynamic objects. > > Do you guys have some good advice to optimize the performance? > > One idea that occurs to me is to do something like sync.Pool to buffer > some most frequently allocated and freed objects. But the problem is I > didn't manage to find a golang tool to find such objects. The runtime > provides api to get the amount of objects but it didn't provide api to get > the detailed statistics of all objects. Please correct me if I'm wrong. > Thanks a lot :-) > -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/8fb0d960-2e03-4587-aca1-84e84215641an%40googlegroups.com.