Builtin/headers: Constant arguments and adding extra entry points.

Ondřej Bílka Thu, 04 Jun 2015 12:46:11 -0700

I start with simplest suggestion which is precomputing constant arguments 
like saving multiplication cost in strchr with:


char *strchr_c(char *x, unsigned long u);
#define strchr(x,c) \
(__builtin_constant_p(c) ? strchr_c (x, c * (~0ULL / 255)) : strchr (x,c))


Then I am working on using constant n for memset and memcpy. These
cannot be done in gcc alone as you need to choose implementation based
on cpu and for different sizes different are best for different cpu.

Some users try to always do inlining like in rte_memcpy. That works
better than gcc one as its optimized for newer processors.

For sizes beyond 64 bytes trying to fully expand memcpy and memset
doesn't make lot of sense as libcall is faster.

To get benefits of inlining I now work on following approach. For sizes
< 64 use builtin. For n 64-1024 make indirect jump according to
cpu-specific table that you get from libc.

That would allow do unrolling upto size 1024 into sequence of movsqa's
without increasing cache footprint much. Same with memset(x,0,n) except you
need to pass 0.0 argument to have zero xmm register.


Entry point for aligned input doesn't make lot of sense. As input is
short you want to go into copy of header and you save difference between
aligned/unaligned load and crosspage check. As you need to duplicate
header icache cost could be bigger.

What makes sense is inline headers instead expanding whole function.
I am looking at following expansion of strcmp/memcmp:

int
inline_strcmp (const char *x, const char *y)
{
  int r = *((unsigned char *) x) - *((unsigned char *) y);
  return r ? r : strcmp(x, y);
}

int
inline_memcmp (const void *x, const void *y, size_t n)
{
  if (n == 0)
    return 0;
  int r = *((unsigned char *) x) - *((unsigned char *) y);
  return r ? r : memcmp(x + 1, y + 1, n - 1);
}

Note that end is not tested as its unlikely. Same transformation
could be done for strncmp, strcasecmp and strncasecmp but we at libc
would need to improve tls access of tolower which now requires call
which defeats purpose of inline.

That gives considerable savings as in my profile 32.4% calls 
of strcmp and calls of 49.5% differ in first byte. From profiling 
data these branches are almost completely predictable as I see long
sequences of calls that differ at 0 followed by sequence that differ
in other. From programs measured it could harm only make. See attached
data.

On x64 adding match for first 16 bytes using sse would also make sense.
except make all other programs have 90% of calls differ in first 16 bytes.

Same could be done for strchr/memchr headers where first 16 bytes also
form majority.

In case of make we should check if in strchr(x,'/') we have x[0] == '/'
which happens 85.1% times.

In generic case same header would be bigger so question if its
profitable versus code size becomes more significant.

For similar questions I have on todo list add counters for userspace
profiling. Decision if some optimization is profitable depends on
details like average size of input that cannot be directly determined
from profile. For example in strstr we would need digraph that occurs
least often.

I don't know if that could be integrated into -fprofile-generate
-fprofile-use or done before that as it would change control flow or do
it just by macros. If we could convince people to do compilation with
profiling it would also allow to directly precompute tables like below
without large header hacks, and make things like calculating perfect 
hashing possible without external tools.

For precomputed tables I so far know two use-cases

One case would be memchr("abc",x,3) or strchr("abc",x) pattern. 
I found that in libc to test membership which is obviously ineffective.
Second use case is strpbrk family.

These have in common that they could benefit from precomputed table with
1 for present bytes and 0 otherwise. While I could create such table I
couldn't do that without 256 warnings. Following constructs table just
fine but complains

warning: initializer element is not a constant expression

int
main()
{
  static char x[256]  =  {strchr("aaa", 'a') == NULL, strchr("aaa", 'b') == 
NULL};
  printf("%i %i %s", x[0],x[1], x);
}

Same trick could be used for making bitwise array.

Also its weird what you could and cannot do in static initializers. 
I was surprised that I could use strchr but couldn't evalutate "abc"[2]
as 'c'.


When bug above gets fixed that allows these functions to be lot faster,
as most of time you get match in first 8 bytes.

Statistic of comparison routines collected with dryrun, for source see

kam.mff.cuni.cz/~ondra/dryrun.tar.bz2


summary strcmp:


replaying ls

average size   0.2 calls      246 succeed  93.1% latencies   1.1   2.8
s1    aligned to 4 bytes 100.0% aligned to 8 bytes 100.0% aligned to 16 bytes 
100.0%
s2    aligned to 4 bytes  21.5% aligned to 8 bytes  10.2% aligned to 16 bytes   
3.7%
s1-s2 aligned to 4 bytes  21.5% aligned to 8 bytes  10.2% aligned to 16 bytes   
3.7%
n <= 0:  88.2% n <= 1:  93.5% n <= 2: 100.0% n <= 3: 100.0%  n <= 4: 100.0% n 
<= 8: 100.0% n <= 16: 100.0% n <= 32: 100.0% n <= 64: 100.0%
replaying bash

average size   4.0 calls      711 succeed  57.7% latencies  -4.6  -4.6
s1    aligned to 4 bytes  65.4% aligned to 8 bytes  56.0% aligned to 16 bytes   
2.0%
s2    aligned to 4 bytes  58.6% aligned to 8 bytes  50.8% aligned to 16 bytes   
3.4%
s1-s2 aligned to 4 bytes  49.6% aligned to 8 bytes  39.9% aligned to 16 bytes  
37.1%
n <= 0:   0.1% n <= 1:  49.9% n <= 2:  60.6% n <= 3:  64.3%  n <= 4:  71.6% n 
<= 8:  81.3% n <= 16:  99.4% n <= 32: 100.0% n <= 64: 100.0%
replaying dircolors

average size   1.0 calls       54 succeed  96.3% latencies  -5.1  -6.1
s1    aligned to 4 bytes 100.0% aligned to 8 bytes 100.0% aligned to 16 bytes  
98.1%
s2    aligned to 4 bytes   1.9% aligned to 8 bytes   1.9% aligned to 16 bytes   
1.9%
s1-s2 aligned to 4 bytes   1.9% aligned to 8 bytes   1.9% aligned to 16 bytes   
1.9%
n <= 0:  87.0% n <= 1:  87.0% n <= 2:  87.0% n <= 3:  87.0%  n <= 4:  88.9% n 
<= 8:  94.4% n <= 16: 100.0% n <= 32: 100.0% n <= 64: 100.0%
replaying ps

average size   1.7 calls      239 succeed  87.0% latencies   3.6  16.0
s1    aligned to 4 bytes  94.1% aligned to 8 bytes  88.3% aligned to 16 bytes  
88.3%
s2    aligned to 4 bytes  28.0% aligned to 8 bytes  13.4% aligned to 16 bytes  
11.3%
s1-s2 aligned to 4 bytes  28.5% aligned to 8 bytes  13.0% aligned to 16 bytes  
12.6%
n <= 0:  58.6% n <= 1:  77.0% n <= 2:  77.4% n <= 3:  81.6%  n <= 4:  84.1% n 
<= 8:  94.1% n <= 16: 100.0% n <= 32: 100.0% n <= 64: 100.0%
replaying ssh-add

average size  11.5 calls      221 succeed   0.9% latencies   1.4   0.8
s1    aligned to 4 bytes  97.3% aligned to 8 bytes  97.3% aligned to 16 bytes  
97.3%
s2    aligned to 4 bytes  92.8% aligned to 8 bytes  91.4% aligned to 16 bytes  
91.0%
s1-s2 aligned to 4 bytes  94.6% aligned to 8 bytes  93.2% aligned to 16 bytes  
92.8%
n <= 0:   1.4% n <= 1:   1.4% n <= 2:   1.8% n <= 3:  12.7%  n <= 4:  13.6% n 
<= 8:  29.0% n <= 16:  83.3% n <= 32: 100.0% n <= 64: 100.0%
replaying ssh-keygen

average size  11.5 calls      222 succeed   0.9% latencies   1.7   2.0
s1    aligned to 4 bytes  97.3% aligned to 8 bytes  97.3% aligned to 16 bytes  
97.3%
s2    aligned to 4 bytes  92.3% aligned to 8 bytes  91.0% aligned to 16 bytes  
90.5%
s1-s2 aligned to 4 bytes  94.1% aligned to 8 bytes  92.8% aligned to 16 bytes  
92.3%
n <= 0:   1.4% n <= 1:   1.4% n <= 2:   1.8% n <= 3:  12.6%  n <= 4:  13.5% n 
<= 8:  29.3% n <= 16:  83.3% n <= 32: 100.0% n <= 64: 100.0%
replaying mc

average size   7.3 calls    16244 succeed  62.2% latencies -182.0 -181.9
s1    aligned to 4 bytes  95.6% aligned to 8 bytes  95.3% aligned to 16 bytes  
95.3%
s2    aligned to 4 bytes  80.4% aligned to 8 bytes  78.6% aligned to 16 bytes  
77.3%
s1-s2 aligned to 4 bytes  79.6% aligned to 8 bytes  78.2% aligned to 16 bytes  
76.9%
n <= 0:  28.6% n <= 1:  32.1% n <= 2:  35.6% n <= 3:  43.6%  n <= 4:  48.4% n 
<= 8:  61.3% n <= 16:  87.1% n <= 32:  99.7% n <= 64:  99.9%
replaying killall

average size   0.1 calls      281 succeed  99.3% latencies  10.9   0.5
s1    aligned to 4 bytes   0.4% aligned to 8 bytes   0.4% aligned to 16 bytes   
0.4%
s2    aligned to 4 bytes 100.0% aligned to 8 bytes 100.0% aligned to 16 bytes 
100.0%
s1-s2 aligned to 4 bytes   0.4% aligned to 8 bytes   0.4% aligned to 16 bytes   
0.4%
n <= 0:  97.5% n <= 1:  99.6% n <= 2:  99.6% n <= 3:  99.6%  n <= 4:  99.6% n 
<= 8:  99.6% n <= 16: 100.0% n <= 32: 100.0% n <= 64: 100.0%
replaying iceweasel

average size   5.8 calls    13136 succeed  86.7% latencies -39.2 -33.5
s1    aligned to 4 bytes  32.5% aligned to 8 bytes  14.5% aligned to 16 bytes   
7.6%
s2    aligned to 4 bytes  31.8% aligned to 8 bytes  16.7% aligned to 16 bytes  
10.9%
s1-s2 aligned to 4 bytes  28.6% aligned to 8 bytes  14.1% aligned to 16 bytes   
6.8%
n <= 0:  33.0% n <= 1:  41.5% n <= 2:  45.8% n <= 3:  54.4%  n <= 4:  58.6% n 
<= 8:  68.4% n <= 16:  92.3% n <= 32:  99.9% n <= 64: 100.0%
replaying mutt

average size  28.3 calls    27644 succeed  39.4% latencies -157.4 -134.1
s1    aligned to 4 bytes  99.8% aligned to 8 bytes  73.0% aligned to 16 bytes  
73.0%
s2    aligned to 4 bytes  85.0% aligned to 8 bytes  61.2% aligned to 16 bytes  
59.0%
s1-s2 aligned to 4 bytes  84.9% aligned to 8 bytes  76.4% aligned to 16 bytes  
74.3%
n <= 0:  19.0% n <= 1:  33.3% n <= 2:  35.0% n <= 3:  35.8%  n <= 4:  37.2% n 
<= 8:  39.2% n <= 16:  40.1% n <= 32:  56.7% n <= 64:  89.3%
replaying irb

average size   3.1 calls    10058 succeed  39.2% latencies -102.7 -98.0
s1    aligned to 4 bytes   0.3% aligned to 8 bytes   0.3% aligned to 16 bytes   
0.1%
s2    aligned to 4 bytes  21.4% aligned to 8 bytes   8.2% aligned to 16 bytes   
4.4%
s1-s2 aligned to 4 bytes  41.6% aligned to 8 bytes  28.5% aligned to 16 bytes  
13.0%
n <= 0:   2.0% n <= 1:   9.2% n <= 2:  33.5% n <= 3:  74.8%  n <= 4:  84.7% n 
<= 8:  99.9% n <= 16:  99.9% n <= 32: 100.0% n <= 64: 100.0%
replaying vim

average size   1.5 calls   161275 succeed  84.9% latencies 105.3 124.3
s1    aligned to 4 bytes  75.5% aligned to 8 bytes  71.3% aligned to 16 bytes  
70.2%
s2    aligned to 4 bytes  47.0% aligned to 8 bytes  41.2% aligned to 16 bytes  
39.8%
s1-s2 aligned to 4 bytes  45.2% aligned to 8 bytes  39.4% aligned to 16 bytes  
37.2%
n <= 0:  54.1% n <= 1:  73.1% n <= 2:  81.8% n <= 3:  86.7%  n <= 4:  90.6% n 
<= 8:  96.7% n <= 16:  99.3% n <= 32: 100.0% n <= 64: 100.0%
replaying ar

average size   0.2 calls  1000000 succeed  99.9% latencies   5.0   4.8
s1    aligned to 4 bytes  25.0% aligned to 8 bytes  13.0% aligned to 16 bytes   
6.1%
s2    aligned to 4 bytes 100.0% aligned to 8 bytes 100.0% aligned to 16 bytes 
100.0%
s1-s2 aligned to 4 bytes  25.0% aligned to 8 bytes  13.0% aligned to 16 bytes   
6.1%
n <= 0:  90.9% n <= 1:  97.6% n <= 2:  98.3% n <= 3:  99.6%  n <= 4:  99.7% n 
<= 8:  99.9% n <= 16: 100.0% n <= 32: 100.0% n <= 64: 100.0%
replaying make

average size  30.1 calls  1000000 succeed  98.7% latencies   1.2   1.6
s1    aligned to 4 bytes  28.4% aligned to 8 bytes  20.7% aligned to 16 bytes   
9.8%
s2    aligned to 4 bytes  26.6% aligned to 8 bytes  18.8% aligned to 16 bytes   
7.7%
s1-s2 aligned to 4 bytes  22.2% aligned to 8 bytes  12.3% aligned to 16 bytes   
4.5%
n <= 0:   4.2% n <= 1:   4.2% n <= 2:   5.3% n <= 3:   5.3%  n <= 4:   5.3% n 
<= 8:   5.3% n <= 16:   8.9% n <= 32:  77.8% n <= 64: 100.0%
replaying /usr/lib/gcc/x86_64-linux-gnu/4.9/cc1

average size   5.8 calls    15151 succeed  37.9% latencies   2.9  -2.6
s1    aligned to 4 bytes  40.7% aligned to 8 bytes  37.0% aligned to 16 bytes  
36.0%
s2    aligned to 4 bytes  97.1% aligned to 8 bytes  96.8% aligned to 16 bytes  
45.7%
s1-s2 aligned to 4 bytes  40.3% aligned to 8 bytes  36.6% aligned to 16 bytes  
35.1%
n <= 0:  12.5% n <= 1:  14.5% n <= 2:  15.0% n <= 3:  58.5%  n <= 4:  68.1% n 
<= 8:  80.2% n <= 16:  94.6% n <= 32:  98.0% n <= 64: 100.0%
replaying gcc

average size   0.5 calls      235 succeed  93.6% latencies   2.9   4.0
s1    aligned to 4 bytes  30.2% aligned to 8 bytes  17.0% aligned to 16 bytes   
9.4%
s2    aligned to 4 bytes   5.5% aligned to 8 bytes   4.7% aligned to 16 bytes   
4.7%
s1-s2 aligned to 4 bytes  25.1% aligned to 8 bytes  19.1% aligned to 16 bytes  
11.1%
n <= 0:  74.9% n <= 1:  92.3% n <= 2:  93.2% n <= 3:  94.5%  n <= 4:  98.7% n 
<= 8: 100.0% n <= 16: 100.0% n <= 32: 100.0% n <= 64: 100.0%
replaying /bin/bash

average size   5.8 calls     2108 succeed  37.3% latencies -39.0 -39.5
s1    aligned to 4 bytes  71.4% aligned to 8 bytes  54.8% aligned to 16 bytes   
2.6%
s2    aligned to 4 bytes  59.3% aligned to 8 bytes  43.8% aligned to 16 bytes   
1.8%
s1-s2 aligned to 4 bytes  50.9% aligned to 8 bytes  39.0% aligned to 16 bytes  
35.7%
n <= 0:   0.1% n <= 1:  34.4% n <= 2:  45.2% n <= 3:  47.9%  n <= 4:  59.3% n 
<= 8:  68.5% n <= 16:  99.1% n <= 32: 100.0% n <= 64: 100.0%
replaying /usr/bin/lsof

average size   9.4 calls       56 succeed  33.9% latencies  29.8  29.5
s1    aligned to 4 bytes  98.2% aligned to 8 bytes  98.2% aligned to 16 bytes  
98.2%
s2    aligned to 4 bytes 100.0% aligned to 8 bytes 100.0% aligned to 16 bytes 
100.0%
s1-s2 aligned to 4 bytes  98.2% aligned to 8 bytes  98.2% aligned to 16 bytes  
98.2%
n <= 0:   1.8% n <= 1:  30.4% n <= 2:  30.4% n <= 3:  30.4%  n <= 4:  37.5% n 
<= 8:  55.4% n <= 16:  78.6% n <= 32: 100.0% n <= 64: 100.0%
replaying find

average size   0.2 calls      297 succeed  96.3% latencies  -0.9  -9.5
s1    aligned to 4 bytes  26.6% aligned to 8 bytes  15.8% aligned to 16 bytes   
9.8%
s2    aligned to 4 bytes  20.2% aligned to 8 bytes   0.3% aligned to 16 bytes   
0.3%
s1-s2 aligned to 4 bytes  31.3% aligned to 8 bytes  16.2% aligned to 16 bytes   
8.8%
n <= 0:  93.6% n <= 1:  97.0% n <= 2:  97.3% n <= 3:  97.6%  n <= 4:  98.3% n 
<= 8:  99.7% n <= 16:  99.7% n <= 32: 100.0% n <= 64: 100.0%
replaying pager

average size   0.8 calls      116 succeed  94.8% latencies -18.6 -18.6
s1    aligned to 4 bytes  93.1% aligned to 8 bytes  92.2% aligned to 16 bytes  
91.4%
s2    aligned to 4 bytes   7.8% aligned to 8 bytes   7.8% aligned to 16 bytes   
6.9%
s1-s2 aligned to 4 bytes   6.0% aligned to 8 bytes   5.2% aligned to 16 bytes   
5.2%
n <= 0:  75.0% n <= 1:  86.2% n <= 2:  87.9% n <= 3:  89.7%  n <= 4:  94.0% n 
<= 8:  98.3% n <= 16: 100.0% n <= 32: 100.0% n <= 64: 100.0%
replaying man

average size   1.0 calls     1723 succeed  97.6% latencies   1.6 -13.4
s1    aligned to 4 bytes  37.8% aligned to 8 bytes  26.7% aligned to 16 bytes  
19.4%
s2    aligned to 4 bytes  56.3% aligned to 8 bytes  47.8% aligned to 16 bytes  
38.2%
s1-s2 aligned to 4 bytes  34.5% aligned to 8 bytes  23.6% aligned to 16 bytes  
18.7%
n <= 0:  71.7% n <= 1:  92.9% n <= 2:  93.4% n <= 3:  93.6%  n <= 4:  93.9% n 
<= 8:  97.3% n <= 16:  98.8% n <= 32:  99.5% n <= 64: 100.0%
replaying troff

average size   1.3 calls   178664 succeed  94.4% latencies -63.4 -59.8
s1    aligned to 4 bytes  86.8% aligned to 8 bytes  84.8% aligned to 16 bytes  
83.9%
s2    aligned to 4 bytes  27.7% aligned to 8 bytes  17.3% aligned to 16 bytes   
9.8%
s1-s2 aligned to 4 bytes  27.1% aligned to 8 bytes  16.5% aligned to 16 bytes   
9.2%
n <= 0:  57.9% n <= 1:  63.9% n <= 2:  78.9% n <= 3:  90.7%  n <= 4:  95.6% n 
<= 8:  97.3% n <= 16:  99.9% n <= 32: 100.0% n <= 64: 100.0%
replaying grotty

average size   6.1 calls     5553 succeed  62.8% latencies -18.7 -31.6
s1    aligned to 4 bytes  99.0% aligned to 8 bytes  98.9% aligned to 16 bytes  
98.9%
s2    aligned to 4 bytes  90.2% aligned to 8 bytes  89.6% aligned to 16 bytes  
89.4%
s1-s2 aligned to 4 bytes  89.2% aligned to 8 bytes  88.6% aligned to 16 bytes  
88.3%
n <= 0:  11.1% n <= 1:  16.4% n <= 2:  31.1% n <= 3:  49.3%  n <= 4:  55.3% n 
<= 8:  56.4% n <= 16:  98.4% n <= 32: 100.0% n <= 64: 100.0%
replaying groff

average size   0.2 calls      696 succeed  98.4% latencies  12.6  10.3
s1    aligned to 4 bytes  91.7% aligned to 8 bytes  90.9% aligned to 16 bytes  
90.9%
s2    aligned to 4 bytes  33.5% aligned to 8 bytes  18.4% aligned to 16 bytes   
9.1%
s1-s2 aligned to 4 bytes  25.7% aligned to 8 bytes   9.9% aligned to 16 bytes   
0.6%
n <= 0:  88.8% n <= 1:  98.3% n <= 2:  99.1% n <= 3:  99.6%  n <= 4:  99.7% n 
<= 8:  99.9% n <= 16: 100.0% n <= 32: 100.0% n <= 64: 100.0%
replaying as

average size   6.7 calls     5198 succeed  36.7% latencies  24.5  20.4
s1    aligned to 4 bytes  28.9% aligned to 8 bytes  14.7% aligned to 16 bytes   
7.4%
s2    aligned to 4 bytes  28.9% aligned to 8 bytes  14.7% aligned to 16 bytes   
7.3%
s1-s2 aligned to 4 bytes  74.1% aligned to 8 bytes  67.9% aligned to 16 bytes  
64.6%
n <= 0:   4.0% n <= 1:  10.4% n <= 2:  13.8% n <= 3:  18.6%  n <= 4:  25.0% n 
<= 8:  67.4% n <= 16: 100.0% n <= 32: 100.0% n <= 64: 100.0%

summary memcmp:

replaying ls

average size   0.4 calls     9641 succeed 100.0% latencies  -6.2  -7.0
s1    aligned to 4 bytes  27.2% aligned to 8 bytes  12.3% aligned to 16 bytes   
2.5%
s2    aligned to 4 bytes  26.0% aligned to 8 bytes  15.6% aligned to 16 bytes   
8.4%
s1-s2 aligned to 4 bytes  25.0% aligned to 8 bytes  12.6% aligned to 16 bytes   
6.4%
n <= 0:  63.7% n <= 1:  97.1% n <= 2: 100.0% n <= 3: 100.0%  n <= 4: 100.0% n 
<= 8: 100.0% n <= 16: 100.0% n <= 32: 100.0% n <= 64: 100.0%
replaying awk

average size   0.5 calls      158 succeed  93.0% latencies   0.9   0.9
s1    aligned to 4 bytes  51.3% aligned to 8 bytes  46.8% aligned to 16 bytes  
46.8%
s2    aligned to 4 bytes 100.0% aligned to 8 bytes 100.0% aligned to 16 bytes 
100.0%
s1-s2 aligned to 4 bytes  51.3% aligned to 8 bytes  46.8% aligned to 16 bytes  
46.8%
n <= 0:  78.5% n <= 1:  89.9% n <= 2:  93.7% n <= 3:  96.2%  n <= 4:  97.5% n 
<= 8: 100.0% n <= 16: 100.0% n <= 32: 100.0% n <= 64: 100.0%
replaying mc

average size   0.4 calls     1942 succeed  98.5% latencies -199.0 -199.0
s1    aligned to 4 bytes  31.3% aligned to 8 bytes  21.9% aligned to 16 bytes  
16.3%
s2    aligned to 4 bytes  28.7% aligned to 8 bytes  22.8% aligned to 16 bytes  
14.8%
s1-s2 aligned to 4 bytes  28.8% aligned to 8 bytes  19.4% aligned to 16 bytes  
14.4%
n <= 0:  79.2% n <= 1:  96.7% n <= 2:  96.7% n <= 3:  96.7%  n <= 4:  98.7% n 
<= 8:  99.4% n <= 16:  99.9% n <= 32: 100.0% n <= 64: 100.0%
replaying mutt

average size   4.7 calls    29693 succeed 100.0% latencies -251.6 -253.5
s1    aligned to 4 bytes  99.8% aligned to 8 bytes   1.4% aligned to 16 bytes   
1.4%
s2    aligned to 4 bytes 100.0% aligned to 8 bytes  99.8% aligned to 16 bytes  
99.8%
s1-s2 aligned to 4 bytes  99.8% aligned to 8 bytes   1.3% aligned to 16 bytes   
1.3%
n <= 0:   8.7% n <= 1:   8.9% n <= 2:   8.9% n <= 3:   8.9%  n <= 4:   8.9% n 
<= 8:  98.9% n <= 16:  98.9% n <= 32: 100.0% n <= 64: 100.0%
replaying irb

average size   2.9 calls      306 succeed  88.2% latencies -109.3 -112.0
s1    aligned to 4 bytes  34.0% aligned to 8 bytes  19.0% aligned to 16 bytes  
13.7%
s2    aligned to 4 bytes  82.4% aligned to 8 bytes  73.5% aligned to 16 bytes  
35.9%
s1-s2 aligned to 4 bytes  34.6% aligned to 8 bytes  19.9% aligned to 16 bytes  
12.4%
n <= 0:  67.3% n <= 1:  69.9% n <= 2:  80.4% n <= 3:  81.0%  n <= 4:  84.6% n 
<= 8:  87.9% n <= 16:  89.5% n <= 32:  99.3% n <= 64: 100.0%
replaying vim

average size   1.5 calls   467979 succeed  99.1% latencies 101.4  95.6
s1    aligned to 4 bytes  25.6% aligned to 8 bytes  15.6% aligned to 16 bytes  
10.0%
s2    aligned to 4 bytes  59.5% aligned to 8 bytes  47.0% aligned to 16 bytes  
46.3%
s1-s2 aligned to 4 bytes  20.4% aligned to 8 bytes   8.6% aligned to 16 bytes   
3.6%
n <= 0:   6.7% n <= 1:  52.2% n <= 2:  94.6% n <= 3:  98.4%  n <= 4:  99.0% n 
<= 8: 100.0% n <= 16: 100.0% n <= 32: 100.0% n <= 64: 100.0%
replaying make

average size   7.2 calls  1000000 succeed  99.5% latencies   1.3   1.4
s1    aligned to 4 bytes  19.2% aligned to 8 bytes  12.3% aligned to 16 bytes   
8.4%
s2    aligned to 4 bytes  27.5% aligned to 8 bytes  15.8% aligned to 16 bytes   
6.6%
s1-s2 aligned to 4 bytes  24.8% aligned to 8 bytes  12.2% aligned to 16 bytes   
6.0%
n <= 0:  72.1% n <= 1:  75.0% n <= 2:  75.3% n <= 3:  75.3%  n <= 4:  75.3% n 
<= 8:  76.1% n <= 16:  76.6% n <= 32: 100.0% n <= 64: 100.0%
replaying /usr/lib/gcc/x86_64-linux-gnu/4.9/cc1

average size   4.4 calls     6108 succeed  34.0% latencies   0.0  10.5
s1    aligned to 4 bytes  27.7% aligned to 8 bytes   2.2% aligned to 16 bytes   
1.5%
s2    aligned to 4 bytes  80.8% aligned to 8 bytes  79.2% aligned to 16 bytes  
42.5%
s1-s2 aligned to 4 bytes  27.9% aligned to 8 bytes   3.3% aligned to 16 bytes   
2.4%
n <= 0:  23.8% n <= 1:  26.5% n <= 2:  27.2% n <= 3:  27.4%  n <= 4:  52.5% n 
<= 8:  96.1% n <= 16:  99.9% n <= 32: 100.0% n <= 64: 100.0%
replaying gcc

average size   0.0 calls    63189 succeed  99.9% latencies   1.6   1.7
s1    aligned to 4 bytes   3.4% aligned to 8 bytes   3.2% aligned to 16 bytes   
3.1%
s2    aligned to 4 bytes  26.5% aligned to 8 bytes  11.9% aligned to 16 bytes   
6.6%
s1-s2 aligned to 4 bytes  24.7% aligned to 8 bytes  13.2% aligned to 16 bytes   
7.7%
n <= 0:  96.3% n <= 1:  99.7% n <= 2:  99.9% n <= 3:  99.9%  n <= 4:  99.9% n 
<= 8: 100.0% n <= 16: 100.0% n <= 32: 100.0% n <= 64: 100.0%
replaying pager

average size   0.9 calls      118 succeed  56.8% latencies -18.2 -18.2
s1    aligned to 4 bytes  23.7% aligned to 8 bytes  15.3% aligned to 16 bytes   
8.5%
s2    aligned to 4 bytes  21.2% aligned to 8 bytes  16.9% aligned to 16 bytes  
13.6%
s1-s2 aligned to 4 bytes  30.5% aligned to 8 bytes  17.8% aligned to 16 bytes  
11.9%
n <= 0:  54.2% n <= 1:  56.8% n <= 2:  98.3% n <= 3:  98.3%  n <= 4: 100.0% n 
<= 8: 100.0% n <= 16: 100.0% n <= 32: 100.0% n <= 64: 100.0%
replaying man

average size  12.3 calls      119 succeed  49.6% latencies -16.9  -5.0
s1    aligned to 4 bytes 100.0% aligned to 8 bytes 100.0% aligned to 16 bytes 
100.0%
s2    aligned to 4 bytes 100.0% aligned to 8 bytes 100.0% aligned to 16 bytes 
100.0%
s1-s2 aligned to 4 bytes 100.0% aligned to 8 bytes 100.0% aligned to 16 bytes 
100.0%
n <= 0:   0.8% n <= 1:  21.8% n <= 2:  21.8% n <= 3:  21.8%  n <= 4:  21.8% n 
<= 8:  50.4% n <= 16:  89.1% n <= 32:  89.1% n <= 64: 100.0%
replaying as

average size   5.3 calls     8968 succeed   2.1% latencies  16.0   4.8
s1    aligned to 4 bytes  42.8% aligned to 8 bytes  39.1% aligned to 16 bytes  
38.4%
s2    aligned to 4 bytes  35.4% aligned to 8 bytes  23.9% aligned to 16 bytes  
18.8%
s1-s2 aligned to 4 bytes  26.3% aligned to 8 bytes  13.1% aligned to 16 bytes   
7.4%
n <= 0:   0.2% n <= 1:   0.3% n <= 2:   1.5% n <= 3:  12.7%  n <= 4:  47.8% n 
<= 8:  98.9% n <= 16:  99.6% n <= 32: 100.0% n <= 64: 100.0%

summary strcasecmp:


replaying mutt

average size   1.2 calls    53965 succeed 100.0% latencies -252.2 -251.1
s1    aligned to 4 bytes 100.0% aligned to 8 bytes 100.0% aligned to 16 bytes 
100.0%
s2    aligned to 4 bytes  31.7% aligned to 8 bytes  20.8% aligned to 16 bytes  
11.9%
s1-s2 aligned to 4 bytes  31.7% aligned to 8 bytes  20.8% aligned to 16 bytes  
11.9%
n <= 0:  63.4% n <= 1:  65.3% n <= 2:  65.3% n <= 3:  88.7%  n <= 4: 100.0% n 
<= 8: 100.0% n <= 16: 100.0% n <= 32: 100.0% n <= 64: 100.0%
average case mismatches   0.581
replaying irb

average size   1.0 calls      693 succeed  94.5% latencies -97.4 -97.9
s1    aligned to 4 bytes  30.4% aligned to 8 bytes  11.4% aligned to 16 bytes   
4.2%
s2    aligned to 4 bytes  29.1% aligned to 8 bytes  14.3% aligned to 16 bytes  
10.2%
s1-s2 aligned to 4 bytes  27.4% aligned to 8 bytes  13.6% aligned to 16 bytes   
5.9%
n <= 0:  84.6% n <= 1:  88.3% n <= 2:  89.0% n <= 3:  89.8%  n <= 4:  90.3% n 
<= 8:  93.8% n <= 16:  99.6% n <= 32: 100.0% n <= 64: 100.0%
average case mismatches   0.000
replaying vim

average size   0.5 calls     2194 succeed  95.2% latencies -19.8  -9.9
s1    aligned to 4 bytes  92.7% aligned to 8 bytes  92.6% aligned to 16 bytes  
91.7%
s2    aligned to 4 bytes  27.7% aligned to 8 bytes  10.9% aligned to 16 bytes   
6.5%
s1-s2 aligned to 4 bytes  26.5% aligned to 8 bytes  10.2% aligned to 16 bytes   
5.3%
n <= 0:  87.2% n <= 1:  90.6% n <= 2:  91.3% n <= 3:  94.5%  n <= 4:  97.4% n 
<= 8:  99.1% n <= 16: 100.0% n <= 32: 100.0% n <= 64: 100.0%
average case mismatches   0.024
replaying /usr/lib/gcc/x86_64-linux-gnu/4.9/cc1

average size   5.3 calls      108 succeed   4.6% latencies  31.5  -6.5
s1    aligned to 4 bytes   6.5% aligned to 8 bytes   5.6% aligned to 16 bytes   
5.6%
s2    aligned to 4 bytes   1.9% aligned to 8 bytes   0.9% aligned to 16 bytes   
0.9%
s1-s2 aligned to 4 bytes  93.5% aligned to 8 bytes  93.5% aligned to 16 bytes  
93.5%
n <= 0:   0.9% n <= 1:   0.9% n <= 2:   0.9% n <= 3:   0.9%  n <= 4:   3.7% n 
<= 8:  95.4% n <= 16: 100.0% n <= 32: 100.0% n <= 64: 100.0%
average case mismatches   0.028
replaying /usr/bin/lsof

average size   0.1 calls      181 succeed  98.9% latencies  32.2  36.8
s1    aligned to 4 bytes  20.4% aligned to 8 bytes  17.1% aligned to 16 bytes  
17.1%
s2    aligned to 4 bytes  17.7% aligned to 8 bytes   0.6% aligned to 16 bytes   
0.6%
s1-s2 aligned to 4 bytes  26.0% aligned to 8 bytes  12.7% aligned to 16 bytes   
6.1%
n <= 0:  97.2% n <= 1:  99.4% n <= 2:  99.4% n <= 3:  99.4%  n <= 4:  99.4% n 
<= 8: 100.0% n <= 16: 100.0% n <= 32: 100.0% n <= 64: 100.0%
average case mismatches   0.000
replaying man

average size   2.1 calls    70892 succeed 100.0% latencies -353.3 -355.8
s1    aligned to 4 bytes 100.0% aligned to 8 bytes 100.0% aligned to 16 bytes 
100.0%
s2    aligned to 4 bytes 100.0% aligned to 8 bytes 100.0% aligned to 16 bytes 
100.0%
s1-s2 aligned to 4 bytes 100.0% aligned to 8 bytes 100.0% aligned to 16 bytes 
100.0%
n <= 0:  38.8% n <= 1:  63.4% n <= 2:  74.7% n <= 3:  81.3%  n <= 4:  86.7% n 
<= 8:  95.5% n <= 16:  98.4% n <= 32: 100.0% n <= 64: 100.0%
average case mismatches   0.063
replaying preconv

average size   0.6 calls       75 succeed  97.3% latencies -35.2  -6.9
s1    aligned to 4 bytes  97.3% aligned to 8 bytes  96.0% aligned to 16 bytes  
96.0%
s2    aligned to 4 bytes  38.7% aligned to 8 bytes  21.3% aligned to 16 bytes   
9.3%
s1-s2 aligned to 4 bytes  37.3% aligned to 8 bytes  21.3% aligned to 16 bytes   
9.3%
n <= 0:  84.0% n <= 1:  85.3% n <= 2:  85.3% n <= 3:  86.7%  n <= 4:  98.7% n 
<= 8: 100.0% n <= 16: 100.0% n <= 32: 100.0% n <= 64: 100.0%
average case mismatches   0.453


summary strncasecmp:

replaying mutt

average size   0.5 calls   233025 succeed  95.9% latencies -260.3 -259.2
s1    aligned to 4 bytes  24.4% aligned to 8 bytes  23.6% aligned to 16 bytes   
0.4%
s2    aligned to 4 bytes 100.0% aligned to 8 bytes  49.2% aligned to 16 bytes  
25.8%
s1-s2 aligned to 4 bytes  24.4% aligned to 8 bytes  13.2% aligned to 16 bytes   
7.5%
n <= 0:  81.1% n <= 1:  85.7% n <= 2:  87.6% n <= 3: 100.0%  n <= 4: 100.0% n 
<= 8: 100.0% n <= 16: 100.0% n <= 32: 100.0% n <= 64: 100.0%
average case mismatches   0.000
replaying vim

average size   2.8 calls    10719 succeed  98.3% latencies -20.9 -20.2
s1    aligned to 4 bytes  30.3% aligned to 8 bytes  11.4% aligned to 16 bytes   
8.1%
s2    aligned to 4 bytes  20.7% aligned to 8 bytes   5.0% aligned to 16 bytes   
3.5%
s1-s2 aligned to 4 bytes  27.9% aligned to 8 bytes   8.1% aligned to 16 bytes   
3.7%
n <= 0:  55.5% n <= 1:  57.6% n <= 2:  58.4% n <= 3:  71.2%  n <= 4:  72.6% n 
<= 8:  86.6% n <= 16: 100.0% n <= 32: 100.0% n <= 64: 100.0%
average case mismatches   0.002
replaying man

average size   1.3 calls      167 succeed  91.0% latencies -17.1  22.7
s1    aligned to 4 bytes 100.0% aligned to 8 bytes 100.0% aligned to 16 bytes 
100.0%
s2    aligned to 4 bytes 100.0% aligned to 8 bytes 100.0% aligned to 16 bytes 
100.0%
s1-s2 aligned to 4 bytes 100.0% aligned to 8 bytes 100.0% aligned to 16 bytes 
100.0%
n <= 0:  50.3% n <= 1:  64.1% n <= 2:  66.5% n <= 3:  89.8%  n <= 4:  98.8% n 
<= 8: 100.0% n <= 16: 100.0% n <= 32: 100.0% n <= 64: 100.0%
average case mismatches   0.000
replaying as

average size   0.0 calls     3267 succeed 100.0% latencies   1.5   7.7
s1    aligned to 4 bytes  24.4% aligned to 8 bytes  12.3% aligned to 16 bytes   
6.0%
s2    aligned to 4 bytes   0.1% aligned to 8 bytes   0.0% aligned to 16 bytes   
0.0%
s1-s2 aligned to 4 bytes  25.3% aligned to 8 bytes  11.6% aligned to 16 bytes   
6.0%
n <= 0:  99.9% n <= 1: 100.0% n <= 2: 100.0% n <= 3: 100.0%  n <= 4: 100.0% n 
<= 8: 100.0% n <= 16: 100.0% n <= 32: 100.0% n <= 64: 100.0%
average case mismatches   0.000

Builtin/headers: Constant arguments and adding extra entry points.

Reply via email to