register(s) are nicely handled by the compiler today, though locals may help it ;)
Between 30% and 50% improvements (depending on the run, I'd have to do an average to be more precise), is not that negligible IMHO. On Tue, Nov 24, 2015 at 7:09 PM, Jim Jagielski <j...@jagunet.com> wrote: > If we really want to squeek out optimizations, judicious use > of 'register' might help even... > > But after awhile things start getting silly :) > >> On Nov 24, 2015, at 1:04 PM, Yann Ylavic <ylavic....@gmail.com> wrote: >> >> I did some testing with different implémentations and my results show >> that fastest one is: >> >> int ap_casecmpstr_2(const char *s1, const char *s2) >> { >> size_t i; >> const unsigned char *ps1 = (const unsigned char *) s1; >> const unsigned char *ps2 = (const unsigned char *) s2; >> >> for (i = 0; ; ++i) { >> const int c1 = ps1[i]; >> const int c2 = ps2[i]; >> >> if (c1 != c2) { >> return c1 - c2; >> } >> if (!c1) { >> break; >> } >> } >> return (0); >> } >> >> int ap_casecmpstrn_2(const char *s1, const char *s2, size_t n) >> { >> size_t i; >> const unsigned char *ps1 = (const unsigned char *) s1; >> const unsigned char *ps2 = (const unsigned char *) s2; >> >> for (i = 0; i < n; ++i) { >> const int c1 = ps1[i]; >> const int c2 = ps2[i]; >> >> if (c1 != c2) { >> return c1 - c2; >> } >> if (!c1) { >> break; >> } >> } >> return (0); >> } >> >> Some samples (test program attached): >> >> $ gcc -Wall -O2 newtest.c -o newtest -lrt >> $ for i in `seq 0 2`; do >> ./newtest $i 150000000 \ >> xcxcxcxcxcxcxcxcxcxcwwwwwwwwwwaaaaaaaaaa \ >> xcxcxcxcxcxcxcxcxcxcwwwwwwwwwwaaaaaaaaaa \ >> 0 >> done >> - str[n]casecmp (nb=150000000, len=0) >> time = 8.444547186 : res = 0 >> - ap_casecmpstr[n] (nb=150000000, len=0) >> time = 8.299781468 : res = 0 >> - ap_casecmpstr[n] w/ index (nb=150000000, len=0) >> time = 6.148787259 : res = 0 >> >> That's ~30% better. >> >> $ gcc -Wall -Os newtest.c -o newtest -lrt >> $ for i in `seq 0 2`; do >> ./newtest $i 150000000 \ >> xcxcxcxcxcxcxcxcxcxcwwwwwwwwwwaaaaaaaaaa \ >> xcxcxcxcxcxcxcxcxcxcwwwwwwwwwwaaaaaaaaaa \ >> 0 >> done >> - str[n]casecmp (nb=150000000, len=0) >> time = 8.528311136 : res = 0 >> - ap_casecmpstr[n] (nb=150000000, len=0) >> time = 10.150553381 : res = 0 >> - ap_casecmpstr[n] w/ index (nb=150000000, len=0) >> time = 9.758638566 : res = 0 >> >> The string.h's str[n]casecmp beat us with -Os, still this new >> implementation is better than the current one. >> >> WDYT, should I commit these new versions? >> <newtest.c> >