register(s) are nicely handled by the compiler today, though locals
may help it ;)

Between 30% and 50% improvements (depending on the run, I'd have to do
an average to be more precise), is not that negligible IMHO.

On Tue, Nov 24, 2015 at 7:09 PM, Jim Jagielski <j...@jagunet.com> wrote:
> If we really want to squeek out optimizations, judicious use
> of 'register' might help even...
>
> But after awhile things start getting silly :)
>
>> On Nov 24, 2015, at 1:04 PM, Yann Ylavic <ylavic....@gmail.com> wrote:
>>
>> I did some testing with different implémentations and my results show
>> that fastest one is:
>>
>> int ap_casecmpstr_2(const char *s1, const char *s2)
>> {
>>    size_t i;
>>    const unsigned char *ps1 = (const unsigned char *) s1;
>>    const unsigned char *ps2 = (const unsigned char *) s2;
>>
>>    for (i = 0; ; ++i) {
>>        const int c1 = ps1[i];
>>        const int c2 = ps2[i];
>>
>>        if (c1 != c2) {
>>            return c1 - c2;
>>        }
>>        if (!c1) {
>>            break;
>>        }
>>    }
>>    return (0);
>> }
>>
>> int ap_casecmpstrn_2(const char *s1, const char *s2, size_t n)
>> {
>>    size_t i;
>>    const unsigned char *ps1 = (const unsigned char *) s1;
>>    const unsigned char *ps2 = (const unsigned char *) s2;
>>
>>    for (i = 0; i < n; ++i) {
>>        const int c1 = ps1[i];
>>        const int c2 = ps2[i];
>>
>>        if (c1 != c2) {
>>            return c1 - c2;
>>        }
>>        if (!c1) {
>>            break;
>>        }
>>    }
>>    return (0);
>> }
>>
>> Some samples (test program attached):
>>
>> $ gcc -Wall -O2 newtest.c -o newtest -lrt
>> $ for i in `seq 0 2`; do
>>    ./newtest $i 150000000 \
>>        xcxcxcxcxcxcxcxcxcxcwwwwwwwwwwaaaaaaaaaa \
>>        xcxcxcxcxcxcxcxcxcxcwwwwwwwwwwaaaaaaaaaa \
>>        0
>> done
>> - str[n]casecmp (nb=150000000, len=0)
>> time = 8.444547186 : res = 0
>> - ap_casecmpstr[n] (nb=150000000, len=0)
>> time = 8.299781468 : res = 0
>> - ap_casecmpstr[n] w/ index (nb=150000000, len=0)
>> time = 6.148787259 : res = 0
>>
>> That's ~30% better.
>>
>> $ gcc -Wall -Os newtest.c -o newtest -lrt
>> $ for i in `seq 0 2`; do
>>    ./newtest $i 150000000 \
>>        xcxcxcxcxcxcxcxcxcxcwwwwwwwwwwaaaaaaaaaa \
>>        xcxcxcxcxcxcxcxcxcxcwwwwwwwwwwaaaaaaaaaa \
>>        0
>> done
>> - str[n]casecmp (nb=150000000, len=0)
>> time = 8.528311136 : res = 0
>> - ap_casecmpstr[n] (nb=150000000, len=0)
>> time = 10.150553381 : res = 0
>> - ap_casecmpstr[n] w/ index (nb=150000000, len=0)
>> time = 9.758638566 : res = 0
>>
>> The string.h's str[n]casecmp beat us with -Os, still this new
>> implementation is better than the current one.
>>
>> WDYT, should I commit these new versions?
>> <newtest.c>
>

Reply via email to