On Sat, May 9, 2020 at 12:17 PM Aleksandar Markovic <aleksandar.qemu.de...@gmail.com> wrote: > сре, 6. мај 2020. у 13:26 Alex Bennée <alex.ben...@linaro.org> је написао/ла: > > > This is very much driven by how much code generation vs running you see. > > In most of my personal benchmarks I never really notice code generation > > because I give my machines large amounts of RAM so code tends to stay > > resident so not need to be re-translated. When the optimiser shows up > > it's usually accompanied by high TB flush and invalidate counts in "info > > jit" because we are doing more translation that we usually do. > > > > Yes, I think the machine was setup with only 128MB RAM. > > That would be an interesting experiment for Ahmed actually - to > measure impact of given RAM memory to performance. > > But it looks that at least for machines with small RAM, translation > phase will take significant percentage. > > I am attaching call graph for translation phase for "Hello World" built > for mips, and emulated by QEMU: *tb_gen_code() and its calees)
Sorry if I'm stating the obvious but both "Hello World" and a Linux boot will exhibit similar behaviors with low reuse of translated blocks, which means translation will show up in profiles as a lot of time is spent in translating blocks that will run once. If you push in that direction you might reach the conclusion that a non JIST simulator is faster than QEMU. You will have to carefully select the tests you run: you need a large spectrum from Linux boot, "Hello World" up to synthetic benchmarks. Again sorry if that was too trivial :-) Laurent