Hi, Sangho I've tested with both 3.4 and 3.12 kernel and I think that performance is exactly as good as in 1.6! Since timers were refactored they now use more precise timeouts and that causes problems on windows because of g_poll as you said, true.
Thank you very much for helping with this, you've done amazing job! P.S: Since this is qemu bug, this fix should probably go upstream, but I'm not sure about the value of 10 either, m.b. we can just change qemu's qemu_poll_ns to just use WaitForMultipleObjects on windows and pass whatever timeout the caller passed without these (timeout >= 10) hacks... But that has to be tried out first... And once again, thanks a lot! On 04/10/2014 07:32 PM, 박상호 wrote: > Hi, Stanislav and Seokyeon. > > > > At first, thank you for information and a nice tool. > > > > I thought that aio_ctx_prepare() and aio_ctx_check() are called too > enormousely many times. And I found that almost every g_poll() in > qemu_poll_ns() is immediately returned even though the timeout is not zero. > > > > In glib/gpoll.c of glib-2.43.3, > > 325 /* If not, and we have a significant timeout, poll again with > 326 * timeout then. Note that this will return indication for only > 327 * one event, or only for messages. We ignore timeouts less than > 328 * ten milliseconds as they are mostly pointless on Windows, the > 329 * MsgWaitForMultipleObjectsEx() call will timeout right away > 330 * anyway. > 331 */ > 332 if (retval == 0 && (timeout == INFINITE || timeout >= 10)) > 333 retval = poll_rest (poll_msgs, handles, nhandles, fds, nfds, timeout); > it does not call poll when timeout < 10. So the enormouse g_poll() calls are > really polling resources. > > > > I patched qemu_poll_ns() and I feel better. > > However, I'm not sure that it is as good as 1.6. Please check the patch. > > > > I need to check the comment of glib - i.e Is really the precision of > WaitForMultipleObject is less than 10 milleseconds. > > > > > > ------- *Original Message* ------- > > *Sender* : Stanislav Vorobiov<[email protected]> Expert > Engineer/SRR-Tizen S/W Group/삼성전자 > > *Date* : 2014-04-10 16:23 (GMT+09:00) > > *Title* : Re: [Dev] [SDK/Emulator] Tizen emulator on windows performance > > > > Hi, Sangho > > Thanks for the info! BTW, gprof has one drawback - it can't profile > multithreaded applications, i.e. it gives wrong results. I recommend using > Intel VTune Amplifier XE, it's very useful tool, you can use it without > recompiling qemu, it'll show you everything - stack traces, profiles, it even > shows SMP friendlyness. > > I've also done some more digging into this aio thing. Here's what I found, in > main-loop.c:os_host_main_loop_wait (win32 dependent code) > if I replace: > > select_ret = select(nfds + 1, &rfds, &wfds, &xfds, &tv0); > > with: > > qemu_mutex_unlock_iothread(); > select_ret = select(nfds + 1, &rfds, &wfds, &xfds, &tv0); > qemu_mutex_lock_iothread(); > > then the problem almost entirely cured for portio (vga), but for mmio it's > still present (vigs). So, the problem here is a livelock > between main thread and io thread. I'm currently studying the mmio part, i.e. > we probably need to stick these > qemu_mutex_unlock_iothread()/qemu_mutex_lock_iothread() somewhere else. > > Another thing that looks strange to me is the fact that adding > qemu_mutex_unlock_iothread()/qemu_mutex_lock_iothread() > makes so much difference, the thing is 'tv0' in that select is always 0, this > means "poll and return immediately" and this select > actually returns immediately, so why does unlock/lock makes so much > difference ? I mean, if tv was > 0 then yes, main thread waits > on selects, io thread livelocks on mutex, this makes sense, but not when tv > is 0... I'm also studying this... > > On 04/10/2014 10:48 AM, 박상호 wrote: >> Hi, Seokyeon and Stanislav >> >> >> >> I profiled the qemu in windows by using gprof (-pg). I run the emulator >> until I show the menu screen and then shutdown. It takes about 70 seconds. >> Please check the attached result. >> >> >> >> - Top ranks >> >> Each sample counts as 0.01 seconds. >> % cumulative self self total >> time seconds seconds calls ms/call ms/call name >> 16.48 1.05 1.05 maru_vga_draw_line32_32 >> 11.62 1.79 0.74 __udivdi3 >> 6.75 2.22 0.43 os_host_main_loop_wait >> 5.65 2.58 0.36 aio_ctx_prepare >> 5.34 2.92 0.34 111422776 0.00 0.00 qemu_mutex_unlock >> 5.34 3.26 0.34 aio_ctx_check >> 5.18 3.59 0.33 8507037 0.00 0.00 slirp_pollfds_poll >> 3.77 3.83 0.24 8506993 0.00 0.00 slirp_pollfds_fill >> 3.14 4.03 0.20 76396706 0.00 0.00 timerlist_deadline_ns >> 2.67 4.20 0.17 25465512 0.00 0.00 >> timerlistgroup_deadline_ns >> 2.51 4.36 0.16 __umoddi3 >> 2.35 4.51 0.15 8506948 0.00 0.00 main_loop_wait >> 2.20 4.65 0.14 68485894 0.00 0.00 qemu_clock_get_ns >> 2.04 4.78 0.13 8507043 0.00 0.00 >> qemu_clock_run_all_timers >> 1.88 4.90 0.12 103165614 0.00 0.00 qemu_mutex_lock >> 1.88 5.02 0.12 25664993 0.00 0.00 timerlist_run_timers >> >> Many functions related with aio and timerlist are too frequently as you have >> expected. >> >> According to the call graph (from 1714 lines), >> >> ----------------------------------------------- >> 42 aio_poll [94] >> 8506969 main_loop_wait [8] >> 0.36 0.23 8458958/33329095 aio_ctx_check [4] >> 0.36 0.23 8499543/33329095 aio_ctx_prepare [3] >> [16] 2.7 0.17 0.00 25465512 timerlistgroup_deadline_ns > >> 76396706 timerlist_deadline_ns > >> ----------------------------------------------- >> main_loop_wait(), aio_ctx_check() and aio_ctx_prepare() call >> timerlistgroup_deadline_ns() almouse evenly. >> >> aio_ctx_check() and aio_ctx_prepare() are used for GSourceFuncs and we can >> reasonably suspect the aio implementation for win32. >> >> main_loop_wait() also calls excessively timerlistgroup_deadline_ns(). >> >> >> >> I have tested it in my ubuntu box. I run the emulator until I show the menu >> screen and then shutdown. It takes about 20 seconds. Just compare the number >> of calls. (25465512 per 70 seconds vs 78696 per 20 seconds ) >> >> Each sample counts as 0.01 seconds. >> % cumulative self self total >> time seconds seconds calls ms/call ms/call name >> 9.09 0.04 0.04 642 0.06 0.08 vga_update_display >> 6.82 0.07 0.03 32540 0.00 0.00 main_loop_wait >> 6.82 0.10 0.03 30701 0.00 0.00 phys_page_set_level >> 4.55 0.12 0.02 5883501 0.00 0.00 >> address_space_translate_in >> 4.55 0.14 0.02 5883382 0.00 0.00 address_space_translate >> 4.55 0.16 0.02 189067 0.00 0.00 cpu_get_clock_locked >> 4.55 0.18 0.02 831 0.02 0.02 >> qcow2_check_metadata_overl >> 4.55 0.20 0.02 aio_ctx_prepare >> 2.27 0.21 0.01 5952765 0.00 0.00 phys_page_find >> 2.27 0.22 0.01 5835718 0.00 0.00 qemu_get_ram_block >> 2.27 0.23 0.01 1177955 0.00 0.00 qemu_mutex_lock >> ... >> >> 0.00 0.44 0.00 236252 0.00 0.00 timerlist_deadline_ns >> >> ... >> >> ----------------------------------------------- >> 0.00 0.00 42/78696 aio_poll [70] >> 0.00 0.00 19116/78696 aio_ctx_check [34] >> 0.00 0.00 26975/78696 aio_ctx_prepare [21] >> 0.00 0.00 32563/78696 main_loop_wait [5] >> [60] 2.1 0.00 0.01 78696 timerlistgroup_deadline_ns [60] >> 0.00 0.01 236252/236252 timerlist_deadline_ns [59] >> ----------------------------------------------- >> ... >> >> >> >> In summary, the aio implementation for win32 may be the reason and, however, >> I still don't know exactly. I need to think about the result more and check >> the aio implementation. >> >> >> >> ------- *Original Message* ------- >> >> *Sender* : 박상호수석/파트장/Core파트/에스코어 >> >> *Date* : 2014-04-09 16:46 (GMT+09:00) >> >> *Title* : Re: [Dev] [SDK/Emulator] Tizen emulator on windows performance >> >> >> >> Hi, Seokyeon Hwang >> >> I’m afraid that the same performance degradation can happen in qemu 2.0 that >> will be released at Apr. 10. (http://wiki.qemu.org/Planning/2.0) >> >> I think that we need to dig more this issue until next week. J >> >> *From:*SeokYeon Hwang [mailto:[email protected]] >> *Sent:* Wednesday, April 09, 2014 3:11 PM >> *To:* Stanislav Vorobiov; [email protected]; 박상호 >> *Subject:* Re: Re: [Dev] [SDK/Emulator] Tizen emulator on windows performance >> >> >> >> @ stanislav, >> >> I see. You didn't want to apply W/A patch. >> >> And... yes, we should study win32-aio.c in more detail. >> >> >> >> I didn't test 3.12 kernel on Windows host yet. I should try it. >> >> >> >> @ sangho and all, >> >> How about your opinion? >> >> >> >> >> >> ------- *Original Message* ------- >> >> *Sender*: Stanislav Vorobiov> Expert Engineer/SRR-Tizen S/W Group/삼성전자 >> >> *Date*: 2014-04-08 19:09 (GMT+09:00) >> >> *Title*: Re: [Dev] [SDK/Emulator] Tizen emulator on windows performance >> >> >> >> Hi, Seokyeon >> >>> Yesterday, I looked up the related code and tested it. >>> But, I am not quite sure about the changed timer code in QEMU. >>> >>> the problem is disappeared by Stanislav's patch. However, I think adding >>> dummy notifier from timerlist registration is better than checking >>> use_icount according to the current changed timer logic. I'm not 100% sure >>> about this. >> I've tried the patch, it looks like the fix is almost the same as mine in >> terms of performance, i.e. it makes things better, but not as good as in >> 1.6. And the difference >> is big, with 1.6 performance was much better. IMHO we didn't fix the problem >> yet and this patch or mine shouldn't be applied. I'll try to look at this >> problem again taking >> this patch into account, I really hope that we'll find the right solution >> for this... >> >>> >>> >>> If anyone knows about the following, please answer me. >>> >>> 1. Main-loop registers aio_notify to use own timers. Why do 6 timerlist, >>> which are created by init_clocks() function in CPU thread and IO thread, >>> eventually call aio_notify? aio_notify is called because there is no >>> notifier registration explicitly. >>> >>> 2. The same above timer logic is performed in linux and Windows, but it is >>> slow in Windows. What is the major cause of performance decline in Windows? >> It might be that aio logic broke for windows, i.e. misuse of IoCompletion >> api or something, m.b. we should study win32-aio.c in more detail ? >> >> Also, I noticed one more thing, it may be related to this problem. mobile >> image doesn't boot with kernel 3.12 at all on windows, it hangs somewhere in >> network initialization (not 100% sure), that place also causes a little >> delay with 3.4 kernel, but with 3.12 it never gets pass it. I've tried this >> both without and >> with this patch. Also, Tizen IVI doesn't have this problem, it boots fine. >> >> On 04/08/2014 11:19 AM, SeokYeon Hwang wrote: >>> Sorry, my attachment was missing. >>> >>> >>> >>> Thanks. >>> >>> >>> >>> ------- *Original Message* ------- >>> >>> *Sender* : 황석연 수석보/VM파트/에스코어 >>> >>> *Date* : 2014-04-08 16:11 (GMT+09:00) >>> >>> *Title* : Re: [Dev] [SDK/Emulator] Tizen emulator on windows performance >>> >>> >>> >>> Hi, everyone. >>> >>> >>> >>> Sorry for late reply. >>> >>> Yesterday, I looked up the related code and tested it. >>> But, I am not quite sure about the changed timer code in QEMU. >>> >>> the problem is disappeared by Stanislav's patch. However, I think adding >>> dummy notifier from timerlist registration is better than checking >>> use_icount according to the current changed timer logic. I'm not 100% sure >>> about this. >>> >>> >>> If anyone knows about the following, please answer me. >>> >>> 1. Main-loop registers aio_notify to use own timers. Why do 6 timerlist, >>> which are created by init_clocks() function in CPU thread and IO thread, >>> eventually call aio_notify? aio_notify is called because there is no >>> notifier registration explicitly. >>> >>> 2. The same above timer logic is performed in linux and Windows, but it is >>> slow in Windows. What is the major cause of performance decline in Windows? >>> >>> >>> I'll apply Stanislav's patch or the "dummy_notifier patch" attached as >>> workaround If I cannot figure it out until this week. >>> If you have any comment about this, please let me know. >>> >>> >>> >>> Thanks. >>> >>> >>> >>> ============================================================================================ >>> >>> *Sender*: Seokyeon Hwang> >>> >>> *Date*: 2014-03-14 10:35 (GMT+09:00) >>> >>> *Title*: Re: [Dev] [SDK/Emulator] Tizen emulator on windows performance >>> >>> >>> >>> Great job, thanks. >>> >>> >>> >>> I should test with "vanilla QEMU 1.6" on windows. >>> >>> I think it could be our mis-use QEMU timer API, or some other mistake on >>> tizen specific devices. >>> >>> I will test it until next week. >>> >>> >>> >>> ------- *Original Message* ------- >>> >>> *Sender*: Stanislav Vorobiov> Expert Engineer/SRR-Tizen S/W Group/삼성전자 >>> >>> *Date*: 2014-03-14 02:22 (GMT+09:00) >>> >>> *Title*: Re: [Dev] [SDK/Emulator] Tizen emulator on windows performance >>> >>> >>> >>> I was able to make some progress on this issue, it looks like this commit: >>> >>> b1bbfe72ec1ebf302d97f886cc646466c0abd679 aio / timers: On timer >>> modification, qemu_notify or aio_notify >>> >>> causes the degradation, I'm attaching the patch that reverts changes in >>> this commit. Although emulator is >>> performing better with this patch, it's still not as good as it was with >>> qemu 1.6. Also, this patch >>> is a dirty hack of course, it reverts generic code that works fine on linux >>> and mac os x, but the problem is on windows >>> only. >>> >>> Any comments are welcome... >>> >>> On 03/12/2014 02:59 PM, Stanislav Vorobiov wrote: >>>> Hi all, >>>> >>>> Just for information, Intel VTune Amplifier XE for windows works great >>>> with MinGW, it's capable of gathering >>>> correct profiles and symbol naming is ok, you don't even need to build >>>> qemu with some special options. >>>> >>>> I'm using it now to find the cause of this performance degradation, m.b. >>>> someone else will find it useful as well. >>>> >>>> Thanks. >>>> >>>> On 01/16/2014 06:38 AM, 황석연wrote: >>>>> Dear all, >>>>> >>>>> >>>>> >>>>> @ stanislav >>>>> >>>>> You are right. The performance profiling in Windows is very hard job. >>>>> >>>>> Actually I prefer using profiling tool to analysing sources, trial and >>>>> error, in Windows - MinGW. >>>>> >>>>> >>>>> >>>>> @ all >>>>> >>>>> If anyone knows good profiling tool in Windows - MinGW, >>>>> >>>>> Please let us know. >>>>> >>>>> >>>>> >>>>> Thanks. >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> ------- *Original Message* ------- >>>>> >>>>> *Sender* : Stanislav VorobiovLeading Engineer/SRR-Mobile S/W Group/삼성전자 >>>>> >>>>> *Date* : 2014-01-15 14:54 (GMT+09:00) >>>>> >>>>> *Title* : Re: [Dev] [SDK/Emulator] Tizen emulator on windows performance >>>>> >>>>> >>>>> >>>>> Hi, Syeon >>>>> >>>>> Yes, but unfortunately it's hard to say where exactly is that problem. It >>>>> would be great to do some profiling, but on MinGW it seems >>>>> not an easy task. In MinGW there're no things such as valgrind or perf >>>>> and all existing windows profiling tools require .pdb database, >>>>> which means they can only profile executables built by visual studio. >>>>> After some struggling I've managed to run qemu with gprof, which >>>>> gave me output with correct symbol naming, but unfortunately the output >>>>> is still not usefull, m.b. it's because gprof is known to not >>>>> work correctly with multithreaded applications. Do you have suggestions >>>>> how can we profile qemu on windows ? Are there any good tools >>>>> you know about ? >>>>> >>>>> On 01/15/2014 08:35 AM, SeokYeon Hwang wrote: >>>>>> Dear all, >>>>>> >>>>>> >>>>>> >>>>>> I can reproduce performance degradation on Windows. >>>>>> >>>>>> We should figure out why. >>>>>> >>>>>> I thinks it could be related with timer logic changes on 1.7.0. >>>>>> >>>>>> >>>>>> >>>>>> Thanks. >>>>>> >>>>>> >>>>>> >>>>>> ------- *Original Message* ------- >>>>>> >>>>>> *Sender* : Stanislav VorobiovLeading Engineer/SRR-Mobile S/W Group/삼성전자 >>>>>> >>>>>> *Date* : 2014-01-13 14:52 (GMT+09:00) >>>>>> >>>>>> *Title* : Re: [Dev] [SDK/Emulator] Tizen emulator on windows performance >>>>>> >>>>>> >>>>>> >>>>>> Hi, Syeon >>>>>> >>>>>> It's not necessarily related to HAXM, the thing is slowdown is >>>>>> significant, e.g. home screen renders about >>>>>> 5 times longer than before, home screen scrolling is like 2-3 fps. Other >>>>>> graphics apps are also slow. >>>>>> >>>>>> On 01/13/2014 06:19 AM, 황석연wrote: >>>>>>> Hi, stanislav, >>>>>>> >>>>>>> >>>>>>> >>>>>>> According to my memory, there is no significant changes related with >>>>>>> HAXM. >>>>>>> >>>>>>> But I will re-check about it. >>>>>>> >>>>>>> >>>>>>> >>>>>>> ------- *Original Message* ------- >>>>>>> >>>>>>> *Sender* : Stanislav VorobiovLeading Engineer/SRR-Mobile S/W Group/삼성전자 >>>>>>> >>>>>>> *Date* : 2014-01-10 22:23 (GMT+09:00) >>>>>>> >>>>>>> *Title* : Re: [Dev] [SDK/Emulator] Tizen emulator on windows performance >>>>>>> >>>>>>> >>>>>>> >>>>>>> Also, this happens both with maru VGA and VIGS >>>>>>> >>>>>>> On 01/10/2014 01:06 PM, Stanislav Vorobiov wrote: >>>>>>>> Hi, all >>>>>>>> >>>>>>>> After updating tizen branch today (with 1.7.0 merge) I've noticed >>>>>>>> performance degradation on windows 7 64-bit with HAXM-enabled, >>>>>>>> is this some known issue ? Were there significant changes to HAXM in >>>>>>>> 1.7.0 merge ? >>>>>>>> >>>>>>>> On 01/08/2014 07:59 AM, 황석연wrote: >>>>>>>>> Dear all, >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> A QEMU 1.7.0 stable version has been merged into tizen branch. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Thanks. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> ------- *Original Message* ------- >>>>>>>>> >>>>>>>>> *Sender* : 황석연책임/VM파트/에스코어 >>>>>>>>> >>>>>>>>> *Date* : 2014-01-03 13:16 (GMT+09:00) >>>>>>>>> >>>>>>>>> *Title* : [Dev] [SDK/Emulator] Merge qemu stable-1.7.0 on tizen >>>>>>>>> emulator >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Dear all, >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> We has been tested "Tizen Emulator" with tizen_qemu_1.7 branch, and >>>>>>>>> it works well. >>>>>>>>> >>>>>>>>> So we planned to merge it to tizen branch on next Tuesday - 7, Jan. >>>>>>>>> >>>>>>>>> If you have any opinion, please let me know. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> *And please subscribe "Dev" mailing list on "tizen.org".* >>>>>>>>> >>>>>>>>> *https://lists.tizen.org/listinfo/dev* >>>>>>>>> >>>>>>>>> *I don't add any other recipients after this mail.* >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> @ John, >>>>>>>>> >>>>>>>>> Please forward this mail to IVI maintainer. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Thanks. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> Dev mailing list >>>>>>>>> [email protected] >>>>>>>>> https://lists.tizen.org/listinfo/dev >>>>>>>>> >>>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Dev mailing list >>>>>>> [email protected] >>>>>>> https://lists.tizen.org/listinfo/dev >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> 박상호 올림 >> >> >> >> Sangho Park (Ph.D) >> >> Principal Engineer, >> >> Core Part, OS Lab, >> >> S-Core >> >> Tel) +82-70-7125-5039 >> >> Mobile) +82-10-2546-9871 >> >> E-mail) [email protected] >> >> >> > > > > > > > > 박상호 올림 > > > > Sangho Park (Ph.D) > > Principal Engineer, > > Core Part, OS Lab, > > S-Core > > Tel) +82-70-7125-5039 > > Mobile) +82-10-2546-9871 > > E-mail) [email protected] <mailto:[email protected]> > > > > > _______________________________________________ Dev mailing list [email protected] https://lists.tizen.org/listinfo/dev
