On 05/08/2015 02:25 AM, Markus Armbruster wrote: > Eric Blake <ebl...@redhat.com> writes: > >> On 05/06/2015 10:18 AM, John Snow wrote: >> >>>> To find out, add just buffering. Something like this in your patch >>>> instead of byte2hex(): >>>> >>>> for (i = 0; i < len; i++) { >>>> - qtest_sendf(chr, "%02x", data[i]); >>>> + snprintf(&enc[i * 2], 2, "%02x", data[i]); >>>> } >>>> >>>> If the speedup is pretty much entirely due to buffering (which I >>>> suspect), then your commit message could use a bit of love :) >>>> >>> >>> When you're right, you're right. The difference may not be statistically >>> meaningful, but with today's current planetary alignment, using >>> sprintf() to batch the sends instead of my home-rolled nib computation >>> function, I can eke out a few more tenths of a second. >> >> I'm a bit surprised - making a function call per byte generally executes >> more instructions than open-coding the conversion (albeit the branch >> prediction in the hardware probably does fairly well over long strings, >> since it is a tight and predictable loop). Remember, sprintf() has to >> decode the format string on every call (unless the compiler is smart >> enough to open-code what sprintf would do). > > John's measurements show that the speed difference between snprintf() > and a local copy of formatting code gets thoroughly drowned in noise. > > The snprintf() version takes 18 lines less, according to diffstat. Less > code, same measured performance, what's not to like? > > However, if you feel strongly about avoiding snprintf() here, I won't > argue further. Except for the commit message: it needs to be fixed not > to claim avoiding "printf and friends" makes a speed difference. >
My reasoning was the same as Markus's: the difference was so negligible that I went with the "less home-rolled code" version. I already staged this series without the nib functions and submitted the snprintf version as its own patch with a less disparaging (to printf and friends) commit message. Any further micro-optimization is a waste of time to properly benchmark and split hairs. I already dropped the test from ~14s to ~4s. Good enough. --js