On 01/24/14 02:52, Qiao Nuohan wrote: > On 01/23/2014 01:04 AM, Laszlo Ersek wrote: >>> @@ -864,6 +884,16 @@ static int dump_init(DumpState *s, int fd, bool >>> paging, bool has_filter, >>> > >>> qemu_get_guest_simple_memory_mapping(&s->list,&s->guest_phys_blocks); >>> > } >>> > >>> > + s->nr_cpus = nr_cpus; >>> > + s->page_size = TARGET_PAGE_SIZE; >>> > + s->page_shift = ffs(s->page_size) - 1; >>> > + >>> > + get_max_mapnr(s); >> Again from v6 10/11, good. The flag_flatten assignment has been dropped. >> Initialization seems to happen in a good spot this time too. >> >>> > + >>> > + uint64_t tmp; >>> > + tmp = DIV_ROUND_UP(DIV_ROUND_UP(s->max_mapnr, CHAR_BIT), >>> s->page_size); >>> > + s->len_dump_bitmap = tmp * s->page_size; >>> > + >>> > if (s->has_filter) { >>> > memory_mapping_filter(&s->list, s->begin, s->length); >>> > } >> Again from v6 10/11. >> >> These assignments now all occur without depending on a user request for >> a compressed dump (kept this way in v7 12/13 too), but they are not >> costly. The loop in get_max_mapnr() iterates over less than 10 mappings >> in the non-paging dump case, and in the paging dump case it also >> shouldn't be more than a hundred or so (as I recall from earlier >> testing). This might be worth some regression-testing (perf-wise), but >> it looks OK to me. >> > > I see, moving them into "if(format...) {...}" block would be better. But, I > have no idea of "regression-testing (perf-wise)", would you mind give > some hint?
I meant comparing how long it would take to dump in paging mode before this patchset, vs. after this patchset. In order to see the difference that is introduced by get_max_mapnr() when paging is enabled. However, please ignore this point. First, the loop is most probably negligible even for a paging dump. Second, you could make it conditional on compressed dumps (which force non-paging + non-filtering), where the number of mappings is very low. And third, as I wrote in later, the loop should be replaced anyway with an O(1) QTAILQ_LAST() access. So please just ignore this "performance" remark. Ultimately, what I suggest for get_max_mapnr() is: - rebase it to guest_phys_blocks, just like the other two places (which are now calling get_next_page()), - use QTAILQ_LAST() in it, - don't bother making it conditional (ie. its current call site is fine), because: - It'll be fast with QTAILQ_LAST(), - guest_phys_blocks is available in any case, so you can access it always Thanks Laszlo