On 06/19/2012 09:08 PM, Eric Blake wrote:
> On 06/19/2012 09:43 AM, Orit Wasserman wrote:
>> Signed-off-by: Benoit Hudzia <benoit.hud...@sap.com>
>> Signed-off-by: Petter Svard <pett...@cs.umu.se>
>> Signed-off-by: Aidan Shribman <aidan.shrib...@sap.com>
>> Signed-off-by: Orit Wasserman <owass...@redhat.com>
>> ---
>>  savevm.c |   91 
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>  1 files changed, 91 insertions(+), 0 deletions(-)
>>
> 
>> +int xbzrle_encode_buffer(uint8_t *old_buf, uint8_t *new_buf, int slen,
>> +                         uint8_t *dst, int dlen)
>> +{
>> +    uint32_t zrun_len = 0, nzrun_len = 0;
>> +    int d = 0 , i = 0;
>> +    uint8_t *nzrun_start = NULL;
>> +
>> +    while (i < slen) {
>> +        /* overflow */
>> +        if (d + 2 > dlen) {
>> +            return -1;
>> +        }
>> +
>> +        while (!(old_buf[i] ^ new_buf[i]) && ++i <= slen) {
>> +            zrun_len++;
>> +        }
> 
> Can we vectorize this to compare a word at a time for improved speed?
> It would be really cool if glibc had an interface like memcmp(), but
> which returned the offset of the first difference instead of a
> comparison of the overall memory regions.  But even without such an
> interface, pseudocode such as:
> 
> while (i < slen) {
>   check for overflow
>   if (not aligned)
>     up to sizeof(long) byte comparisons
>   while (long comparisons && not end)
>     increment i by sizeof(long)
>   if (end of buffer) {
>     either buffer unchanged, or we have trailing zrun
>     break;
>   }
>   // found a word that differed
>   do byte-wise comparisons on that word
>   then back to word-size xor, checking for no zero bytes in the xor (or
> anywhere that a zero byte occurs in isolation, it is more compact to
> pass that lone byte as part of the xor encoding instead of breaking out
> to a zrun)
> }
I will add it.
> 
>> +
>> +        zrun_len = 0;
>> +        nzrun_start = new_buf + i;
>> +        while ((old_buf[i] ^ new_buf[i]) != 0 && ++i <= slen) {
>> +            nzrun_len++;
>> +        }
> 
> Note that the encoding for a single unchanged byte with bytes changed on
> both sides is more compact to pass the unchanged byte as part of the
> nzrun, instead of breaking out into a zrun and starting a second nzrun.
>  If I'm analyzing things correctly, that's also true for a run of two
> bytes, and it's not until we get to a run of three or more unchanged
> bytes where we start to get compression (except for the corner case of
> ending on a 1- or 2-byte zrun).  Should we be altering this algorithm to
> take advantage of this?

I used to have this optimization but didn't see any noticeable performance 
improvement.
It made the code much more complicated and less readable so I chose to remove 
it.
I think adding SSE support will have better chance in improving the compression 
performance.

> 
>> +int xbzrle_decode_buffer(uint8_t *src, int slen, uint8_t *dst, int dlen)
>> +{
>> +    int i = 0, d = 0;
>> +    uint32_t count = 0;
>> +
>> +    while (i < slen) {
>> +
>> +        /* zrun */
>> +        i += uleb128_decode_small(src + i, &count);
>> +        d += count;
>> +
>> +        /* overflow */
>> +        g_assert(d <= dlen);
> 
> Again, is g_assert() appropriate for parsing user-supplied input, or are
> there more appropriate ways for gracefully failing an incoming migration
> attempt from corrupt data?
I will change it to return -1.
> 
>> +
>> +        /* completed decoding */
>> +        if (i == slen - 1) {
>> +            return d;
>> +        }
>> +
>> +        /* nzrun */
>> +        i += uleb128_decode_small(src + i, &count);
>> +
>> +        g_assert(d + count <= dlen);
>> +
>> +        memcpy(dst + d , src + i, count);
> 
> Am I correct that you are using XOR to compute the locations of nzrun,
> but then transferring the original data (and not the xor data) over the
> wire?  This was not made clear in the doc patch of 3/13.  I would
> suggest adding an example to the docs of a 4k page that consists only of
> a short string at the beginning of the page (and NUL bytes thereafter),
> then show both the before and after version of that page to demonstrate
> sparse memory changes, as well as the XBZRLE encoding of that page.
> 
ok.

Orit


Reply via email to