Hi, Sangho Thanks for the info! BTW, gprof has one drawback - it can't profile multithreaded applications, i.e. it gives wrong results. I recommend using Intel VTune Amplifier XE, it's very useful tool, you can use it without recompiling qemu, it'll show you everything - stack traces, profiles, it even shows SMP friendlyness.
I've also done some more digging into this aio thing. Here's what I found, in main-loop.c:os_host_main_loop_wait (win32 dependent code) if I replace: select_ret = select(nfds + 1, &rfds, &wfds, &xfds, &tv0); with: qemu_mutex_unlock_iothread(); select_ret = select(nfds + 1, &rfds, &wfds, &xfds, &tv0); qemu_mutex_lock_iothread(); then the problem almost entirely cured for portio (vga), but for mmio it's still present (vigs). So, the problem here is a livelock between main thread and io thread. I'm currently studying the mmio part, i.e. we probably need to stick these qemu_mutex_unlock_iothread()/qemu_mutex_lock_iothread() somewhere else. Another thing that looks strange to me is the fact that adding qemu_mutex_unlock_iothread()/qemu_mutex_lock_iothread() makes so much difference, the thing is 'tv0' in that select is always 0, this means "poll and return immediately" and this select actually returns immediately, so why does unlock/lock makes so much difference ? I mean, if tv was > 0 then yes, main thread waits on selects, io thread livelocks on mutex, this makes sense, but not when tv is 0... I'm also studying this... On 04/10/2014 10:48 AM, 박상호 wrote: > Hi, Seokyeon and Stanislav > > > > I profiled the qemu in windows by using gprof (-pg). I run the emulator until > I show the menu screen and then shutdown. It takes about 70 seconds. Please > check the attached result. > > > > - Top ranks > > Each sample counts as 0.01 seconds. > % cumulative self self total > time seconds seconds calls ms/call ms/call name > 16.48 1.05 1.05 maru_vga_draw_line32_32 > 11.62 1.79 0.74 __udivdi3 > 6.75 2.22 0.43 os_host_main_loop_wait > 5.65 2.58 0.36 aio_ctx_prepare > 5.34 2.92 0.34 111422776 0.00 0.00 qemu_mutex_unlock > 5.34 3.26 0.34 aio_ctx_check > 5.18 3.59 0.33 8507037 0.00 0.00 slirp_pollfds_poll > 3.77 3.83 0.24 8506993 0.00 0.00 slirp_pollfds_fill > 3.14 4.03 0.20 76396706 0.00 0.00 timerlist_deadline_ns > 2.67 4.20 0.17 25465512 0.00 0.00 > timerlistgroup_deadline_ns > 2.51 4.36 0.16 __umoddi3 > 2.35 4.51 0.15 8506948 0.00 0.00 main_loop_wait > 2.20 4.65 0.14 68485894 0.00 0.00 qemu_clock_get_ns > 2.04 4.78 0.13 8507043 0.00 0.00 > qemu_clock_run_all_timers > 1.88 4.90 0.12 103165614 0.00 0.00 qemu_mutex_lock > 1.88 5.02 0.12 25664993 0.00 0.00 timerlist_run_timers > > Many functions related with aio and timerlist are too frequently as you have > expected. > > According to the call graph (from 1714 lines), > > ----------------------------------------------- > 42 aio_poll <cycle 1> [94] > 8506969 main_loop_wait <cycle 1> [8] > 0.36 0.23 8458958/33329095 aio_ctx_check [4] > 0.36 0.23 8499543/33329095 aio_ctx_prepare [3] > [16] 2.7 0.17 0.00 25465512 timerlistgroup_deadline_ns > <cycle > 76396706 timerlist_deadline_ns > <cycle 1 > ----------------------------------------------- > main_loop_wait(), aio_ctx_check() and aio_ctx_prepare() call > timerlistgroup_deadline_ns() almouse evenly. > > aio_ctx_check() and aio_ctx_prepare() are used for GSourceFuncs and we can > reasonably suspect the aio implementation for win32. > > main_loop_wait() also calls excessively timerlistgroup_deadline_ns(). > > > > I have tested it in my ubuntu box. I run the emulator until I show the menu > screen and then shutdown. It takes about 20 seconds. Just compare the number > of calls. (25465512 per 70 seconds vs 78696 per 20 seconds ) > > Each sample counts as 0.01 seconds. > % cumulative self self total > time seconds seconds calls ms/call ms/call name > 9.09 0.04 0.04 642 0.06 0.08 vga_update_display > 6.82 0.07 0.03 32540 0.00 0.00 main_loop_wait > 6.82 0.10 0.03 30701 0.00 0.00 phys_page_set_level > 4.55 0.12 0.02 5883501 0.00 0.00 > address_space_translate_in > 4.55 0.14 0.02 5883382 0.00 0.00 address_space_translate > 4.55 0.16 0.02 189067 0.00 0.00 cpu_get_clock_locked > 4.55 0.18 0.02 831 0.02 0.02 > qcow2_check_metadata_overl > 4.55 0.20 0.02 aio_ctx_prepare > 2.27 0.21 0.01 5952765 0.00 0.00 phys_page_find > 2.27 0.22 0.01 5835718 0.00 0.00 qemu_get_ram_block > 2.27 0.23 0.01 1177955 0.00 0.00 qemu_mutex_lock > ... > > 0.00 0.44 0.00 236252 0.00 0.00 timerlist_deadline_ns > > ... > > ----------------------------------------------- > 0.00 0.00 42/78696 aio_poll <cycle 1> [70] > 0.00 0.00 19116/78696 aio_ctx_check [34] > 0.00 0.00 26975/78696 aio_ctx_prepare [21] > 0.00 0.00 32563/78696 main_loop_wait [5] > [60] 2.1 0.00 0.01 78696 timerlistgroup_deadline_ns [60] > 0.00 0.01 236252/236252 timerlist_deadline_ns [59] > ----------------------------------------------- > ... > > > > In summary, the aio implementation for win32 may be the reason and, however, > I still don't know exactly. I need to think about the result more and check > the aio implementation. > > > > ------- *Original Message* ------- > > *Sender* : 박상호<[email protected]> 수석/파트장/Core파트/에스코어 > > *Date* : 2014-04-09 16:46 (GMT+09:00) > > *Title* : Re: [Dev] [SDK/Emulator] Tizen emulator on windows performance > > > > Hi, Seokyeon Hwang > > I’m afraid that the same performance degradation can happen in qemu 2.0 that > will be released at Apr. 10. (http://wiki.qemu.org/Planning/2.0) > > I think that we need to dig more this issue until next week. J > > *From:*SeokYeon Hwang [mailto:[email protected]] > *Sent:* Wednesday, April 09, 2014 3:11 PM > *To:* Stanislav Vorobiov; [email protected]; 박상호 > *Subject:* Re: Re: [Dev] [SDK/Emulator] Tizen emulator on windows performance > > > > @ stanislav, > > I see. You didn't want to apply W/A patch. > > And... yes, we should study win32-aio.c in more detail. > > > > I didn't test 3.12 kernel on Windows host yet. I should try it. > > > > @ sangho and all, > > How about your opinion? > > > > > > ------- *Original Message* ------- > > *Sender*: Stanislav Vorobiov<[email protected] > <mailto:[email protected]>> Expert Engineer/SRR-Tizen S/W Group/삼성전자 > > *Date*: 2014-04-08 19:09 (GMT+09:00) > > *Title*: Re: [Dev] [SDK/Emulator] Tizen emulator on windows performance > > > > Hi, Seokyeon > >> Yesterday, I looked up the related code and tested it. >> But, I am not quite sure about the changed timer code in QEMU. >> >> the problem is disappeared by Stanislav's patch. However, I think adding >> dummy notifier from timerlist registration is better than checking >> use_icount according to the current changed timer logic. I'm not 100% sure >> about this. > I've tried the patch, it looks like the fix is almost the same as mine in > terms of performance, i.e. it makes things better, but not as good as in 1.6. > And the difference > is big, with 1.6 performance was much better. IMHO we didn't fix the problem > yet and this patch or mine shouldn't be applied. I'll try to look at this > problem again taking > this patch into account, I really hope that we'll find the right solution for > this... > >> >> >> If anyone knows about the following, please answer me. >> >> 1. Main-loop registers aio_notify to use own timers. Why do 6 timerlist, >> which are created by init_clocks() function in CPU thread and IO thread, >> eventually call aio_notify? aio_notify is called because there is no >> notifier registration explicitly. >> >> 2. The same above timer logic is performed in linux and Windows, but it is >> slow in Windows. What is the major cause of performance decline in Windows? > It might be that aio logic broke for windows, i.e. misuse of IoCompletion api > or something, m.b. we should study win32-aio.c in more detail ? > > Also, I noticed one more thing, it may be related to this problem. mobile > image doesn't boot with kernel 3.12 at all on windows, it hangs somewhere in > network initialization (not 100% sure), that place also causes a little delay > with 3.4 kernel, but with 3.12 it never gets pass it. I've tried this both > without and > with this patch. Also, Tizen IVI doesn't have this problem, it boots fine. > > On 04/08/2014 11:19 AM, SeokYeon Hwang wrote: >> Sorry, my attachment was missing. >> >> >> >> Thanks. >> >> >> >> ------- *Original Message* ------- >> >> *Sender* : 황석연 수석보/VM파트/에스코어 >> >> *Date* : 2014-04-08 16:11 (GMT+09:00) >> >> *Title* : Re: [Dev] [SDK/Emulator] Tizen emulator on windows performance >> >> >> >> Hi, everyone. >> >> >> >> Sorry for late reply. >> >> Yesterday, I looked up the related code and tested it. >> But, I am not quite sure about the changed timer code in QEMU. >> >> the problem is disappeared by Stanislav's patch. However, I think adding >> dummy notifier from timerlist registration is better than checking >> use_icount according to the current changed timer logic. I'm not 100% sure >> about this. >> >> >> If anyone knows about the following, please answer me. >> >> 1. Main-loop registers aio_notify to use own timers. Why do 6 timerlist, >> which are created by init_clocks() function in CPU thread and IO thread, >> eventually call aio_notify? aio_notify is called because there is no >> notifier registration explicitly. >> >> 2. The same above timer logic is performed in linux and Windows, but it is >> slow in Windows. What is the major cause of performance decline in Windows? >> >> >> I'll apply Stanislav's patch or the "dummy_notifier patch" attached as >> workaround If I cannot figure it out until this week. >> If you have any comment about this, please let me know. >> >> >> >> Thanks. >> >> >> >> ============================================================================================ >> >> *Sender*: Seokyeon Hwang> >> >> *Date*: 2014-03-14 10:35 (GMT+09:00) >> >> *Title*: Re: [Dev] [SDK/Emulator] Tizen emulator on windows performance >> >> >> >> Great job, thanks. >> >> >> >> I should test with "vanilla QEMU 1.6" on windows. >> >> I think it could be our mis-use QEMU timer API, or some other mistake on >> tizen specific devices. >> >> I will test it until next week. >> >> >> >> ------- *Original Message* ------- >> >> *Sender*: Stanislav Vorobiov> Expert Engineer/SRR-Tizen S/W Group/삼성전자 >> >> *Date*: 2014-03-14 02:22 (GMT+09:00) >> >> *Title*: Re: [Dev] [SDK/Emulator] Tizen emulator on windows performance >> >> >> >> I was able to make some progress on this issue, it looks like this commit: >> >> b1bbfe72ec1ebf302d97f886cc646466c0abd679 aio / timers: On timer >> modification, qemu_notify or aio_notify >> >> causes the degradation, I'm attaching the patch that reverts changes in this >> commit. Although emulator is >> performing better with this patch, it's still not as good as it was with >> qemu 1.6. Also, this patch >> is a dirty hack of course, it reverts generic code that works fine on linux >> and mac os x, but the problem is on windows >> only. >> >> Any comments are welcome... >> >> On 03/12/2014 02:59 PM, Stanislav Vorobiov wrote: >>> Hi all, >>> >>> Just for information, Intel VTune Amplifier XE for windows works great with >>> MinGW, it's capable of gathering >>> correct profiles and symbol naming is ok, you don't even need to build qemu >>> with some special options. >>> >>> I'm using it now to find the cause of this performance degradation, m.b. >>> someone else will find it useful as well. >>> >>> Thanks. >>> >>> On 01/16/2014 06:38 AM, 황석연wrote: >>>> Dear all, >>>> >>>> >>>> >>>> @ stanislav >>>> >>>> You are right. The performance profiling in Windows is very hard job. >>>> >>>> Actually I prefer using profiling tool to analysing sources, trial and >>>> error, in Windows - MinGW. >>>> >>>> >>>> >>>> @ all >>>> >>>> If anyone knows good profiling tool in Windows - MinGW, >>>> >>>> Please let us know. >>>> >>>> >>>> >>>> Thanks. >>>> >>>> >>>> >>>> >>>> >>>> ------- *Original Message* ------- >>>> >>>> *Sender* : Stanislav VorobiovLeading Engineer/SRR-Mobile S/W Group/삼성전자 >>>> >>>> *Date* : 2014-01-15 14:54 (GMT+09:00) >>>> >>>> *Title* : Re: [Dev] [SDK/Emulator] Tizen emulator on windows performance >>>> >>>> >>>> >>>> Hi, Syeon >>>> >>>> Yes, but unfortunately it's hard to say where exactly is that problem. It >>>> would be great to do some profiling, but on MinGW it seems >>>> not an easy task. In MinGW there're no things such as valgrind or perf and >>>> all existing windows profiling tools require .pdb database, >>>> which means they can only profile executables built by visual studio. >>>> After some struggling I've managed to run qemu with gprof, which >>>> gave me output with correct symbol naming, but unfortunately the output is >>>> still not usefull, m.b. it's because gprof is known to not >>>> work correctly with multithreaded applications. Do you have suggestions >>>> how can we profile qemu on windows ? Are there any good tools >>>> you know about ? >>>> >>>> On 01/15/2014 08:35 AM, SeokYeon Hwang wrote: >>>>> Dear all, >>>>> >>>>> >>>>> >>>>> I can reproduce performance degradation on Windows. >>>>> >>>>> We should figure out why. >>>>> >>>>> I thinks it could be related with timer logic changes on 1.7.0. >>>>> >>>>> >>>>> >>>>> Thanks. >>>>> >>>>> >>>>> >>>>> ------- *Original Message* ------- >>>>> >>>>> *Sender* : Stanislav VorobiovLeading Engineer/SRR-Mobile S/W Group/삼성전자 >>>>> >>>>> *Date* : 2014-01-13 14:52 (GMT+09:00) >>>>> >>>>> *Title* : Re: [Dev] [SDK/Emulator] Tizen emulator on windows performance >>>>> >>>>> >>>>> >>>>> Hi, Syeon >>>>> >>>>> It's not necessarily related to HAXM, the thing is slowdown is >>>>> significant, e.g. home screen renders about >>>>> 5 times longer than before, home screen scrolling is like 2-3 fps. Other >>>>> graphics apps are also slow. >>>>> >>>>> On 01/13/2014 06:19 AM, 황석연wrote: >>>>>> Hi, stanislav, >>>>>> >>>>>> >>>>>> >>>>>> According to my memory, there is no significant changes related with >>>>>> HAXM. >>>>>> >>>>>> But I will re-check about it. >>>>>> >>>>>> >>>>>> >>>>>> ------- *Original Message* ------- >>>>>> >>>>>> *Sender* : Stanislav VorobiovLeading Engineer/SRR-Mobile S/W Group/삼성전자 >>>>>> >>>>>> *Date* : 2014-01-10 22:23 (GMT+09:00) >>>>>> >>>>>> *Title* : Re: [Dev] [SDK/Emulator] Tizen emulator on windows performance >>>>>> >>>>>> >>>>>> >>>>>> Also, this happens both with maru VGA and VIGS >>>>>> >>>>>> On 01/10/2014 01:06 PM, Stanislav Vorobiov wrote: >>>>>>> Hi, all >>>>>>> >>>>>>> After updating tizen branch today (with 1.7.0 merge) I've noticed >>>>>>> performance degradation on windows 7 64-bit with HAXM-enabled, >>>>>>> is this some known issue ? Were there significant changes to HAXM in >>>>>>> 1.7.0 merge ? >>>>>>> >>>>>>> On 01/08/2014 07:59 AM, 황석연wrote: >>>>>>>> Dear all, >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> A QEMU 1.7.0 stable version has been merged into tizen branch. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Thanks. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> ------- *Original Message* ------- >>>>>>>> >>>>>>>> *Sender* : 황석연책임/VM파트/에스코어 >>>>>>>> >>>>>>>> *Date* : 2014-01-03 13:16 (GMT+09:00) >>>>>>>> >>>>>>>> *Title* : [Dev] [SDK/Emulator] Merge qemu stable-1.7.0 on tizen >>>>>>>> emulator >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Dear all, >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> We has been tested "Tizen Emulator" with tizen_qemu_1.7 branch, and it >>>>>>>> works well. >>>>>>>> >>>>>>>> So we planned to merge it to tizen branch on next Tuesday - 7, Jan. >>>>>>>> >>>>>>>> If you have any opinion, please let me know. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> *And please subscribe "Dev" mailing list on "tizen.org".* >>>>>>>> >>>>>>>> *https://lists.tizen.org/listinfo/dev* >>>>>>>> >>>>>>>> *I don't add any other recipients after this mail.* >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> @ John, >>>>>>>> >>>>>>>> Please forward this mail to IVI maintainer. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Thanks. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Dev mailing list >>>>>>>> [email protected] <mailto:[email protected]> >>>>>>>> https://lists.tizen.org/listinfo/dev >>>>>>>> >>>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Dev mailing list >>>>>> [email protected] <mailto:[email protected]> >>>>>> https://lists.tizen.org/listinfo/dev >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> > > > > > > > > > > > > > > > > 박상호 올림 > > > > Sangho Park (Ph.D) > > Principal Engineer, > > Core Part, OS Lab, > > S-Core > > Tel) +82-70-7125-5039 > > Mobile) +82-10-2546-9871 > > E-mail) [email protected] <mailto:[email protected]> > > > _______________________________________________ Dev mailing list [email protected] https://lists.tizen.org/listinfo/dev
