Am 11.03.2013 um 15:14 schrieb Paolo Bonzini <pbonz...@redhat.com>: > Il 11/03/2013 14:44, Peter Lieven ha scritto: >> Hi, >> >> I ever since had a few VMs which are very hard to migrate because of a >> lot of memory I/O. I found that finding the next dirty bit >> seemed to be one of the culprits (apart from removing locking which >> Paolo is working on). >> >> I have to following proposal which seems to help a lot in my case. Just >> wanted to have some feedback. >> I applied the same unrolling idea like in buffer_is_zero(). >> >> Peter >> >> --- a/util/bitops.c >> +++ b/util/bitops.c >> @@ -24,12 +24,13 @@ unsigned long find_next_bit(const unsigned long >> *addr, unsigned long size, >> const unsigned long *p = addr + BITOP_WORD(offset); >> unsigned long result = offset & ~(BITS_PER_LONG-1); >> unsigned long tmp; >> + unsigned long d0,d1,d2,d3; >> >> if (offset >= size) { >> return size; >> } >> size -= result; >> - offset %= BITS_PER_LONG; >> + offset &= (BITS_PER_LONG-1); >> if (offset) { >> tmp = *(p++); >> tmp &= (~0UL << offset); >> @@ -43,6 +44,18 @@ unsigned long find_next_bit(const unsigned long >> *addr, unsigned long size, >> result += BITS_PER_LONG; >> } >> while (size & ~(BITS_PER_LONG-1)) { >> + while (!(size & (4*BITS_PER_LONG-1))) { > > This really means > > if (!(size & (4*BITS_PER_LONG-1))) { > while (1) { > ... > } > } > > because the subtraction will not change the result of the "while" loop > condition.
Are you sure? The above is working nicely for me (wondering why ;-)) I think !(size & (4*BITS_PER_LONG-1)) is the same as what you propose. If size & (4*BITS_PER_LONG-1) is not zero its not dividable by 4*BITS_PER_LONG. But it see it might be a problem for size == 0. > > What you want is probably "while (size & ~(4*BITS_PER_LONG-1))", which > in turn means "while (size >= 4*BITS_PER_LONG). > > Please change both while loops to use a ">=" condition, it's easier to read. Good idea, its easier to understand. Has anyone evidence if unrolling 4 longs is optimal on today processors? I just chooses 4 longs because it was the same in buffer_is_zero. Peter > > Paolo > >> + d0 = *p; >> + d1 = *(p+1); >> + d2 = *(p+2); >> + d3 = *(p+3); >> + if (d0 || d1 || d2 || d3) { >> + break; >> + } >> + p+=4; >> + result += 4*BITS_PER_LONG; >> + size -= 4*BITS_PER_LONG; >> + } >> if ((tmp = *(p++))) { >> goto found_middle; >> } >