Note: I don't use the init function 在2021年2月2日星期二 UTC+8 下午3:48:26<颜文泽> 写道:
> If it works, it's fine, I'll just keep using vtune. I only work on x86 > anyway. That said, I found another miracle, my program has 13 routines as > soon as it starts. It's so peculiar. I simply can't understand why this is. > > This is my code: > > [image: 2021-02-02 15-45-01 的屏幕截图.png] > And then this is the result, it's amazing.I think I know why my program is > slow, the number of routines is too high, but I found that the GOMAXPROCS > function doesn't work, it's a really confusing phenomenon for me. > My example did not do anything, my understanding of the number of runtines > should be 1 only Ah. > [image: 2021-02-02 15-45-49 的屏幕截图.png] > 在2021年2月2日星期二 UTC+8 下午3:27:45<Amnon> 写道: > >> Vtune is very useful for squeezing the ultimate performance out of Go >> programs, once you have done >> the usual optimisation, mimized allocations, io etc. >> >> pprof is more than adequate for the average programmer. But when you need >> to super-optimise >> functions which implement math kernels, crypto functions, video codecs >> etc, then without a HW perfomance >> counter based profiler such as vtune or linux perf, ( >> https://perf.wiki.kernel.org/index.php/Main_Page) you are shooting in >> the dark. >> vtune not only tells you which functions are taking the most time, but >> WHY these are taking a long time, >> how long the code is spending waiting for cache misses, and the different >> kind of stall cycles which >> kill performance on a modern CPU. >> >> Vtune or perf is also a great tool for teaching us about processors, and >> helping us understand what influences >> the rate at which instructions are executed by them. >> >> The problem with vtune is that it is quite unfriendly and expensive (> >> $3000 for a single floating license)! >> It also does not work on ARM processors (such as Apple M1). >> >> There has been a proposal to add performance counters to pprof. >> >> https://go.googlesource.com/proposal/+/refs/changes/08/219508/2/design/36821-perf-counter-pprof.md >> If accepted, this would give the power of vtune to the masses for free.. >> >> On Tuesday, 2 February 2021 at 06:37:37 UTC nnsm...@gmail.com wrote: >> >>> One more question, is it effective to use vtune to tune golang. I am >>> afraid that vtune is not suitable, although intel claims to be effective. >>> 在2021年2月2日星期二 UTC+8 下午2:32:40<颜文泽> 写道: >>> >>>> Thanks, it's not memory db, but my current test is not involving io. >>>> I'll take time to look at your information, thanks a lot. Also I found >>>> that >>>> many of the functions with high cpi rate are runtime functions, is the >>>> overhead of these functions unavoidable?The following diagram is for a >>>> single routine: >>>> [image: 2021-02-02 14-25-33 的屏幕截图.png] >>>> The following chart is for the 8 routines: >>>> [image: 2021-02-02 14-25-56 的屏幕截图.png] >>>> 在2021年2月2日星期二 UTC+8 下午2:27:39<ren...@ix.netcom.com> 写道: >>>> >>>>> Unless it is an in memory database, I would expect the IO costs to >>>>> dwarf the cpu costs, but I guess a lot depends on how you define >>>>> ‘analytical processing’. >>>>> >>>>> In my experience, “out of the box” performance of Go routines in IO >>>>> processing is outstanding. >>>>> >>>>> For the cpu bound case, I think with threads, cpu assignments >>>>> (cpuset), etc. you can probably create a higher performing system in some >>>>> cases - but it’s a lot of work. >>>>> >>>>> Even without that, I think the scheduler in most Linux systems is more >>>>> mature than the Go scheduler, and makes better choices for cache >>>>> affinity, >>>>> etc. It’s very hard to design a high performance cpu bound system that >>>>> runs >>>>> on a general purpose OS or language/platform. Without knowledge of the >>>>> olap >>>>> db design it is very hard to make a recommendation. >>>>> >>>>> This is some suggested reading to help you in your journey >>>>> https://dave.cheney.net/high-performance-go-workshop/dotgo-paris.html >>>>> >>>>> On Feb 2, 2021, at 12:07 AM, 颜文泽 <nnsm...@gmail.com> wrote: >>>>> >>>>> I don't know much about the internal implementation of golang, sorry. >>>>> I was a c programmer and I tried to implement the original logic (olap >>>>> database) by using routine as a thread replacement. But I found that I >>>>> would encounter bottlenecks, and I don't know how to solve them. Maybe I >>>>> should study the implementation of routine before I can write the right >>>>> code. >>>>> >>>>> 在2021年2月2日星期二 UTC+8 下午12:21:44<ren...@ix.netcom.com> 写道: >>>>> >>>>>> You wrote “I found that cache misses from routines switching is also >>>>>> a headache”. >>>>>> >>>>>> They would not be switching if they are cpu bound and there are less >>>>>> of than number of cpus. Remember too that you need some % of the cpus to >>>>>> execute the runtime GC code and other housekeeping. >>>>>> >>>>>> > On Feb 1, 2021, at 10:04 PM, 颜文泽 <nnsm...@gmail.com> wrote: >>>>>> > >>>>>> > I found that cache misses from routines switching is also a >>>>>> headache >>>>>> >>>>>> >>>>> -- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "golang-nuts" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to golang-nuts...@googlegroups.com. >>>>> >>>>> To view this discussion on the web visit >>>>> https://groups.google.com/d/msgid/golang-nuts/35bccad0-64a9-4796-bc3f-a9cdb8c82961n%40googlegroups.com >>>>> >>>>> <https://groups.google.com/d/msgid/golang-nuts/35bccad0-64a9-4796-bc3f-a9cdb8c82961n%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>> . >>>>> >>>>> >>>>> -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/94f53dc8-e904-43fc-90f6-ec3c103230f7n%40googlegroups.com.