I start with simplest suggestion which is precomputing constant arguments
like saving multiplication cost in strchr with:
char *strchr_c(char *x, unsigned long u);
#define strchr(x,c) \
(__builtin_constant_p(c) ? strchr_c (x, c * (~0ULL / 255)) : strchr (x,c))
Then I am working on using constant n for memset and memcpy. These
cannot be done in gcc alone as you need to choose implementation based
on cpu and for different sizes different are best for different cpu.
Some users try to always do inlining like in rte_memcpy. That works
better than gcc one as its optimized for newer processors.
For sizes beyond 64 bytes trying to fully expand memcpy and memset
doesn't make lot of sense as libcall is faster.
To get benefits of inlining I now work on following approach. For sizes
< 64 use builtin. For n 64-1024 make indirect jump according to
cpu-specific table that you get from libc.
That would allow do unrolling upto size 1024 into sequence of movsqa's
without increasing cache footprint much. Same with memset(x,0,n) except you
need to pass 0.0 argument to have zero xmm register.
Entry point for aligned input doesn't make lot of sense. As input is
short you want to go into copy of header and you save difference between
aligned/unaligned load and crosspage check. As you need to duplicate
header icache cost could be bigger.
What makes sense is inline headers instead expanding whole function.
I am looking at following expansion of strcmp/memcmp:
int
inline_strcmp (const char *x, const char *y)
{
int r = *((unsigned char *) x) - *((unsigned char *) y);
return r ? r : strcmp(x, y);
}
int
inline_memcmp (const void *x, const void *y, size_t n)
{
if (n == 0)
return 0;
int r = *((unsigned char *) x) - *((unsigned char *) y);
return r ? r : memcmp(x + 1, y + 1, n - 1);
}
Note that end is not tested as its unlikely. Same transformation
could be done for strncmp, strcasecmp and strncasecmp but we at libc
would need to improve tls access of tolower which now requires call
which defeats purpose of inline.
That gives considerable savings as in my profile 32.4% calls
of strcmp and calls of 49.5% differ in first byte. From profiling
data these branches are almost completely predictable as I see long
sequences of calls that differ at 0 followed by sequence that differ
in other. From programs measured it could harm only make. See attached
data.
On x64 adding match for first 16 bytes using sse would also make sense.
except make all other programs have 90% of calls differ in first 16 bytes.
Same could be done for strchr/memchr headers where first 16 bytes also
form majority.
In case of make we should check if in strchr(x,'/') we have x[0] == '/'
which happens 85.1% times.
In generic case same header would be bigger so question if its
profitable versus code size becomes more significant.
For similar questions I have on todo list add counters for userspace
profiling. Decision if some optimization is profitable depends on
details like average size of input that cannot be directly determined
from profile. For example in strstr we would need digraph that occurs
least often.
I don't know if that could be integrated into -fprofile-generate
-fprofile-use or done before that as it would change control flow or do
it just by macros. If we could convince people to do compilation with
profiling it would also allow to directly precompute tables like below
without large header hacks, and make things like calculating perfect
hashing possible without external tools.
For precomputed tables I so far know two use-cases
One case would be memchr("abc",x,3) or strchr("abc",x) pattern.
I found that in libc to test membership which is obviously ineffective.
Second use case is strpbrk family.
These have in common that they could benefit from precomputed table with
1 for present bytes and 0 otherwise. While I could create such table I
couldn't do that without 256 warnings. Following constructs table just
fine but complains
warning: initializer element is not a constant expression
int
main()
{
static char x[256] = {strchr("aaa", 'a') == NULL, strchr("aaa", 'b') ==
NULL};
printf("%i %i %s", x[0],x[1], x);
}
Same trick could be used for making bitwise array.
Also its weird what you could and cannot do in static initializers.
I was surprised that I could use strchr but couldn't evalutate "abc"[2]
as 'c'.
When bug above gets fixed that allows these functions to be lot faster,
as most of time you get match in first 8 bytes.
Statistic of comparison routines collected with dryrun, for source see
kam.mff.cuni.cz/~ondra/dryrun.tar.bz2
summary strcmp:
replaying ls
average size 0.2 calls 246 succeed 93.1% latencies 1.1 2.8
s1 aligned to 4 bytes 100.0% aligned to 8 bytes 100.0% aligned to 16 bytes
100.0%
s2 aligned to 4 bytes 21.5% aligned to 8 bytes 10.2% aligned to 16 bytes
3.7%
s1-s2 aligned to 4 bytes 21.5% aligned to 8 bytes 10.2% aligned to 16 bytes
3.7%
n <= 0: 88.2% n <= 1: 93.5% n <= 2: 100.0% n <= 3: 100.0% n <= 4: 100.0% n
<= 8: 100.0% n <= 16: 100.0% n <= 32: 100.0% n <= 64: 100.0%
replaying bash
average size 4.0 calls 711 succeed 57.7% latencies -4.6 -4.6
s1 aligned to 4 bytes 65.4% aligned to 8 bytes 56.0% aligned to 16 bytes
2.0%
s2 aligned to 4 bytes 58.6% aligned to 8 bytes 50.8% aligned to 16 bytes
3.4%
s1-s2 aligned to 4 bytes 49.6% aligned to 8 bytes 39.9% aligned to 16 bytes
37.1%
n <= 0: 0.1% n <= 1: 49.9% n <= 2: 60.6% n <= 3: 64.3% n <= 4: 71.6% n
<= 8: 81.3% n <= 16: 99.4% n <= 32: 100.0% n <= 64: 100.0%
replaying dircolors
average size 1.0 calls 54 succeed 96.3% latencies -5.1 -6.1
s1 aligned to 4 bytes 100.0% aligned to 8 bytes 100.0% aligned to 16 bytes
98.1%
s2 aligned to 4 bytes 1.9% aligned to 8 bytes 1.9% aligned to 16 bytes
1.9%
s1-s2 aligned to 4 bytes 1.9% aligned to 8 bytes 1.9% aligned to 16 bytes
1.9%
n <= 0: 87.0% n <= 1: 87.0% n <= 2: 87.0% n <= 3: 87.0% n <= 4: 88.9% n
<= 8: 94.4% n <= 16: 100.0% n <= 32: 100.0% n <= 64: 100.0%
replaying ps
average size 1.7 calls 239 succeed 87.0% latencies 3.6 16.0
s1 aligned to 4 bytes 94.1% aligned to 8 bytes 88.3% aligned to 16 bytes
88.3%
s2 aligned to 4 bytes 28.0% aligned to 8 bytes 13.4% aligned to 16 bytes
11.3%
s1-s2 aligned to 4 bytes 28.5% aligned to 8 bytes 13.0% aligned to 16 bytes
12.6%
n <= 0: 58.6% n <= 1: 77.0% n <= 2: 77.4% n <= 3: 81.6% n <= 4: 84.1% n
<= 8: 94.1% n <= 16: 100.0% n <= 32: 100.0% n <= 64: 100.0%
replaying ssh-add
average size 11.5 calls 221 succeed 0.9% latencies 1.4 0.8
s1 aligned to 4 bytes 97.3% aligned to 8 bytes 97.3% aligned to 16 bytes
97.3%
s2 aligned to 4 bytes 92.8% aligned to 8 bytes 91.4% aligned to 16 bytes
91.0%
s1-s2 aligned to 4 bytes 94.6% aligned to 8 bytes 93.2% aligned to 16 bytes
92.8%
n <= 0: 1.4% n <= 1: 1.4% n <= 2: 1.8% n <= 3: 12.7% n <= 4: 13.6% n
<= 8: 29.0% n <= 16: 83.3% n <= 32: 100.0% n <= 64: 100.0%
replaying ssh-keygen
average size 11.5 calls 222 succeed 0.9% latencies 1.7 2.0
s1 aligned to 4 bytes 97.3% aligned to 8 bytes 97.3% aligned to 16 bytes
97.3%
s2 aligned to 4 bytes 92.3% aligned to 8 bytes 91.0% aligned to 16 bytes
90.5%
s1-s2 aligned to 4 bytes 94.1% aligned to 8 bytes 92.8% aligned to 16 bytes
92.3%
n <= 0: 1.4% n <= 1: 1.4% n <= 2: 1.8% n <= 3: 12.6% n <= 4: 13.5% n
<= 8: 29.3% n <= 16: 83.3% n <= 32: 100.0% n <= 64: 100.0%
replaying mc
average size 7.3 calls 16244 succeed 62.2% latencies -182.0 -181.9
s1 aligned to 4 bytes 95.6% aligned to 8 bytes 95.3% aligned to 16 bytes
95.3%
s2 aligned to 4 bytes 80.4% aligned to 8 bytes 78.6% aligned to 16 bytes
77.3%
s1-s2 aligned to 4 bytes 79.6% aligned to 8 bytes 78.2% aligned to 16 bytes
76.9%
n <= 0: 28.6% n <= 1: 32.1% n <= 2: 35.6% n <= 3: 43.6% n <= 4: 48.4% n
<= 8: 61.3% n <= 16: 87.1% n <= 32: 99.7% n <= 64: 99.9%
replaying killall
average size 0.1 calls 281 succeed 99.3% latencies 10.9 0.5
s1 aligned to 4 bytes 0.4% aligned to 8 bytes 0.4% aligned to 16 bytes
0.4%
s2 aligned to 4 bytes 100.0% aligned to 8 bytes 100.0% aligned to 16 bytes
100.0%
s1-s2 aligned to 4 bytes 0.4% aligned to 8 bytes 0.4% aligned to 16 bytes
0.4%
n <= 0: 97.5% n <= 1: 99.6% n <= 2: 99.6% n <= 3: 99.6% n <= 4: 99.6% n
<= 8: 99.6% n <= 16: 100.0% n <= 32: 100.0% n <= 64: 100.0%
replaying iceweasel
average size 5.8 calls 13136 succeed 86.7% latencies -39.2 -33.5
s1 aligned to 4 bytes 32.5% aligned to 8 bytes 14.5% aligned to 16 bytes
7.6%
s2 aligned to 4 bytes 31.8% aligned to 8 bytes 16.7% aligned to 16 bytes
10.9%
s1-s2 aligned to 4 bytes 28.6% aligned to 8 bytes 14.1% aligned to 16 bytes
6.8%
n <= 0: 33.0% n <= 1: 41.5% n <= 2: 45.8% n <= 3: 54.4% n <= 4: 58.6% n
<= 8: 68.4% n <= 16: 92.3% n <= 32: 99.9% n <= 64: 100.0%
replaying mutt
average size 28.3 calls 27644 succeed 39.4% latencies -157.4 -134.1
s1 aligned to 4 bytes 99.8% aligned to 8 bytes 73.0% aligned to 16 bytes
73.0%
s2 aligned to 4 bytes 85.0% aligned to 8 bytes 61.2% aligned to 16 bytes
59.0%
s1-s2 aligned to 4 bytes 84.9% aligned to 8 bytes 76.4% aligned to 16 bytes
74.3%
n <= 0: 19.0% n <= 1: 33.3% n <= 2: 35.0% n <= 3: 35.8% n <= 4: 37.2% n
<= 8: 39.2% n <= 16: 40.1% n <= 32: 56.7% n <= 64: 89.3%
replaying irb
average size 3.1 calls 10058 succeed 39.2% latencies -102.7 -98.0
s1 aligned to 4 bytes 0.3% aligned to 8 bytes 0.3% aligned to 16 bytes
0.1%
s2 aligned to 4 bytes 21.4% aligned to 8 bytes 8.2% aligned to 16 bytes
4.4%
s1-s2 aligned to 4 bytes 41.6% aligned to 8 bytes 28.5% aligned to 16 bytes
13.0%
n <= 0: 2.0% n <= 1: 9.2% n <= 2: 33.5% n <= 3: 74.8% n <= 4: 84.7% n
<= 8: 99.9% n <= 16: 99.9% n <= 32: 100.0% n <= 64: 100.0%
replaying vim
average size 1.5 calls 161275 succeed 84.9% latencies 105.3 124.3
s1 aligned to 4 bytes 75.5% aligned to 8 bytes 71.3% aligned to 16 bytes
70.2%
s2 aligned to 4 bytes 47.0% aligned to 8 bytes 41.2% aligned to 16 bytes
39.8%
s1-s2 aligned to 4 bytes 45.2% aligned to 8 bytes 39.4% aligned to 16 bytes
37.2%
n <= 0: 54.1% n <= 1: 73.1% n <= 2: 81.8% n <= 3: 86.7% n <= 4: 90.6% n
<= 8: 96.7% n <= 16: 99.3% n <= 32: 100.0% n <= 64: 100.0%
replaying ar
average size 0.2 calls 1000000 succeed 99.9% latencies 5.0 4.8
s1 aligned to 4 bytes 25.0% aligned to 8 bytes 13.0% aligned to 16 bytes
6.1%
s2 aligned to 4 bytes 100.0% aligned to 8 bytes 100.0% aligned to 16 bytes
100.0%
s1-s2 aligned to 4 bytes 25.0% aligned to 8 bytes 13.0% aligned to 16 bytes
6.1%
n <= 0: 90.9% n <= 1: 97.6% n <= 2: 98.3% n <= 3: 99.6% n <= 4: 99.7% n
<= 8: 99.9% n <= 16: 100.0% n <= 32: 100.0% n <= 64: 100.0%
replaying make
average size 30.1 calls 1000000 succeed 98.7% latencies 1.2 1.6
s1 aligned to 4 bytes 28.4% aligned to 8 bytes 20.7% aligned to 16 bytes
9.8%
s2 aligned to 4 bytes 26.6% aligned to 8 bytes 18.8% aligned to 16 bytes
7.7%
s1-s2 aligned to 4 bytes 22.2% aligned to 8 bytes 12.3% aligned to 16 bytes
4.5%
n <= 0: 4.2% n <= 1: 4.2% n <= 2: 5.3% n <= 3: 5.3% n <= 4: 5.3% n
<= 8: 5.3% n <= 16: 8.9% n <= 32: 77.8% n <= 64: 100.0%
replaying /usr/lib/gcc/x86_64-linux-gnu/4.9/cc1
average size 5.8 calls 15151 succeed 37.9% latencies 2.9 -2.6
s1 aligned to 4 bytes 40.7% aligned to 8 bytes 37.0% aligned to 16 bytes
36.0%
s2 aligned to 4 bytes 97.1% aligned to 8 bytes 96.8% aligned to 16 bytes
45.7%
s1-s2 aligned to 4 bytes 40.3% aligned to 8 bytes 36.6% aligned to 16 bytes
35.1%
n <= 0: 12.5% n <= 1: 14.5% n <= 2: 15.0% n <= 3: 58.5% n <= 4: 68.1% n
<= 8: 80.2% n <= 16: 94.6% n <= 32: 98.0% n <= 64: 100.0%
replaying gcc
average size 0.5 calls 235 succeed 93.6% latencies 2.9 4.0
s1 aligned to 4 bytes 30.2% aligned to 8 bytes 17.0% aligned to 16 bytes
9.4%
s2 aligned to 4 bytes 5.5% aligned to 8 bytes 4.7% aligned to 16 bytes
4.7%
s1-s2 aligned to 4 bytes 25.1% aligned to 8 bytes 19.1% aligned to 16 bytes
11.1%
n <= 0: 74.9% n <= 1: 92.3% n <= 2: 93.2% n <= 3: 94.5% n <= 4: 98.7% n
<= 8: 100.0% n <= 16: 100.0% n <= 32: 100.0% n <= 64: 100.0%
replaying /bin/bash
average size 5.8 calls 2108 succeed 37.3% latencies -39.0 -39.5
s1 aligned to 4 bytes 71.4% aligned to 8 bytes 54.8% aligned to 16 bytes
2.6%
s2 aligned to 4 bytes 59.3% aligned to 8 bytes 43.8% aligned to 16 bytes
1.8%
s1-s2 aligned to 4 bytes 50.9% aligned to 8 bytes 39.0% aligned to 16 bytes
35.7%
n <= 0: 0.1% n <= 1: 34.4% n <= 2: 45.2% n <= 3: 47.9% n <= 4: 59.3% n
<= 8: 68.5% n <= 16: 99.1% n <= 32: 100.0% n <= 64: 100.0%
replaying /usr/bin/lsof
average size 9.4 calls 56 succeed 33.9% latencies 29.8 29.5
s1 aligned to 4 bytes 98.2% aligned to 8 bytes 98.2% aligned to 16 bytes
98.2%
s2 aligned to 4 bytes 100.0% aligned to 8 bytes 100.0% aligned to 16 bytes
100.0%
s1-s2 aligned to 4 bytes 98.2% aligned to 8 bytes 98.2% aligned to 16 bytes
98.2%
n <= 0: 1.8% n <= 1: 30.4% n <= 2: 30.4% n <= 3: 30.4% n <= 4: 37.5% n
<= 8: 55.4% n <= 16: 78.6% n <= 32: 100.0% n <= 64: 100.0%
replaying find
average size 0.2 calls 297 succeed 96.3% latencies -0.9 -9.5
s1 aligned to 4 bytes 26.6% aligned to 8 bytes 15.8% aligned to 16 bytes
9.8%
s2 aligned to 4 bytes 20.2% aligned to 8 bytes 0.3% aligned to 16 bytes
0.3%
s1-s2 aligned to 4 bytes 31.3% aligned to 8 bytes 16.2% aligned to 16 bytes
8.8%
n <= 0: 93.6% n <= 1: 97.0% n <= 2: 97.3% n <= 3: 97.6% n <= 4: 98.3% n
<= 8: 99.7% n <= 16: 99.7% n <= 32: 100.0% n <= 64: 100.0%
replaying pager
average size 0.8 calls 116 succeed 94.8% latencies -18.6 -18.6
s1 aligned to 4 bytes 93.1% aligned to 8 bytes 92.2% aligned to 16 bytes
91.4%
s2 aligned to 4 bytes 7.8% aligned to 8 bytes 7.8% aligned to 16 bytes
6.9%
s1-s2 aligned to 4 bytes 6.0% aligned to 8 bytes 5.2% aligned to 16 bytes
5.2%
n <= 0: 75.0% n <= 1: 86.2% n <= 2: 87.9% n <= 3: 89.7% n <= 4: 94.0% n
<= 8: 98.3% n <= 16: 100.0% n <= 32: 100.0% n <= 64: 100.0%
replaying man
average size 1.0 calls 1723 succeed 97.6% latencies 1.6 -13.4
s1 aligned to 4 bytes 37.8% aligned to 8 bytes 26.7% aligned to 16 bytes
19.4%
s2 aligned to 4 bytes 56.3% aligned to 8 bytes 47.8% aligned to 16 bytes
38.2%
s1-s2 aligned to 4 bytes 34.5% aligned to 8 bytes 23.6% aligned to 16 bytes
18.7%
n <= 0: 71.7% n <= 1: 92.9% n <= 2: 93.4% n <= 3: 93.6% n <= 4: 93.9% n
<= 8: 97.3% n <= 16: 98.8% n <= 32: 99.5% n <= 64: 100.0%
replaying troff
average size 1.3 calls 178664 succeed 94.4% latencies -63.4 -59.8
s1 aligned to 4 bytes 86.8% aligned to 8 bytes 84.8% aligned to 16 bytes
83.9%
s2 aligned to 4 bytes 27.7% aligned to 8 bytes 17.3% aligned to 16 bytes
9.8%
s1-s2 aligned to 4 bytes 27.1% aligned to 8 bytes 16.5% aligned to 16 bytes
9.2%
n <= 0: 57.9% n <= 1: 63.9% n <= 2: 78.9% n <= 3: 90.7% n <= 4: 95.6% n
<= 8: 97.3% n <= 16: 99.9% n <= 32: 100.0% n <= 64: 100.0%
replaying grotty
average size 6.1 calls 5553 succeed 62.8% latencies -18.7 -31.6
s1 aligned to 4 bytes 99.0% aligned to 8 bytes 98.9% aligned to 16 bytes
98.9%
s2 aligned to 4 bytes 90.2% aligned to 8 bytes 89.6% aligned to 16 bytes
89.4%
s1-s2 aligned to 4 bytes 89.2% aligned to 8 bytes 88.6% aligned to 16 bytes
88.3%
n <= 0: 11.1% n <= 1: 16.4% n <= 2: 31.1% n <= 3: 49.3% n <= 4: 55.3% n
<= 8: 56.4% n <= 16: 98.4% n <= 32: 100.0% n <= 64: 100.0%
replaying groff
average size 0.2 calls 696 succeed 98.4% latencies 12.6 10.3
s1 aligned to 4 bytes 91.7% aligned to 8 bytes 90.9% aligned to 16 bytes
90.9%
s2 aligned to 4 bytes 33.5% aligned to 8 bytes 18.4% aligned to 16 bytes
9.1%
s1-s2 aligned to 4 bytes 25.7% aligned to 8 bytes 9.9% aligned to 16 bytes
0.6%
n <= 0: 88.8% n <= 1: 98.3% n <= 2: 99.1% n <= 3: 99.6% n <= 4: 99.7% n
<= 8: 99.9% n <= 16: 100.0% n <= 32: 100.0% n <= 64: 100.0%
replaying as
average size 6.7 calls 5198 succeed 36.7% latencies 24.5 20.4
s1 aligned to 4 bytes 28.9% aligned to 8 bytes 14.7% aligned to 16 bytes
7.4%
s2 aligned to 4 bytes 28.9% aligned to 8 bytes 14.7% aligned to 16 bytes
7.3%
s1-s2 aligned to 4 bytes 74.1% aligned to 8 bytes 67.9% aligned to 16 bytes
64.6%
n <= 0: 4.0% n <= 1: 10.4% n <= 2: 13.8% n <= 3: 18.6% n <= 4: 25.0% n
<= 8: 67.4% n <= 16: 100.0% n <= 32: 100.0% n <= 64: 100.0%
summary memcmp:
replaying ls
average size 0.4 calls 9641 succeed 100.0% latencies -6.2 -7.0
s1 aligned to 4 bytes 27.2% aligned to 8 bytes 12.3% aligned to 16 bytes
2.5%
s2 aligned to 4 bytes 26.0% aligned to 8 bytes 15.6% aligned to 16 bytes
8.4%
s1-s2 aligned to 4 bytes 25.0% aligned to 8 bytes 12.6% aligned to 16 bytes
6.4%
n <= 0: 63.7% n <= 1: 97.1% n <= 2: 100.0% n <= 3: 100.0% n <= 4: 100.0% n
<= 8: 100.0% n <= 16: 100.0% n <= 32: 100.0% n <= 64: 100.0%
replaying awk
average size 0.5 calls 158 succeed 93.0% latencies 0.9 0.9
s1 aligned to 4 bytes 51.3% aligned to 8 bytes 46.8% aligned to 16 bytes
46.8%
s2 aligned to 4 bytes 100.0% aligned to 8 bytes 100.0% aligned to 16 bytes
100.0%
s1-s2 aligned to 4 bytes 51.3% aligned to 8 bytes 46.8% aligned to 16 bytes
46.8%
n <= 0: 78.5% n <= 1: 89.9% n <= 2: 93.7% n <= 3: 96.2% n <= 4: 97.5% n
<= 8: 100.0% n <= 16: 100.0% n <= 32: 100.0% n <= 64: 100.0%
replaying mc
average size 0.4 calls 1942 succeed 98.5% latencies -199.0 -199.0
s1 aligned to 4 bytes 31.3% aligned to 8 bytes 21.9% aligned to 16 bytes
16.3%
s2 aligned to 4 bytes 28.7% aligned to 8 bytes 22.8% aligned to 16 bytes
14.8%
s1-s2 aligned to 4 bytes 28.8% aligned to 8 bytes 19.4% aligned to 16 bytes
14.4%
n <= 0: 79.2% n <= 1: 96.7% n <= 2: 96.7% n <= 3: 96.7% n <= 4: 98.7% n
<= 8: 99.4% n <= 16: 99.9% n <= 32: 100.0% n <= 64: 100.0%
replaying mutt
average size 4.7 calls 29693 succeed 100.0% latencies -251.6 -253.5
s1 aligned to 4 bytes 99.8% aligned to 8 bytes 1.4% aligned to 16 bytes
1.4%
s2 aligned to 4 bytes 100.0% aligned to 8 bytes 99.8% aligned to 16 bytes
99.8%
s1-s2 aligned to 4 bytes 99.8% aligned to 8 bytes 1.3% aligned to 16 bytes
1.3%
n <= 0: 8.7% n <= 1: 8.9% n <= 2: 8.9% n <= 3: 8.9% n <= 4: 8.9% n
<= 8: 98.9% n <= 16: 98.9% n <= 32: 100.0% n <= 64: 100.0%
replaying irb
average size 2.9 calls 306 succeed 88.2% latencies -109.3 -112.0
s1 aligned to 4 bytes 34.0% aligned to 8 bytes 19.0% aligned to 16 bytes
13.7%
s2 aligned to 4 bytes 82.4% aligned to 8 bytes 73.5% aligned to 16 bytes
35.9%
s1-s2 aligned to 4 bytes 34.6% aligned to 8 bytes 19.9% aligned to 16 bytes
12.4%
n <= 0: 67.3% n <= 1: 69.9% n <= 2: 80.4% n <= 3: 81.0% n <= 4: 84.6% n
<= 8: 87.9% n <= 16: 89.5% n <= 32: 99.3% n <= 64: 100.0%
replaying vim
average size 1.5 calls 467979 succeed 99.1% latencies 101.4 95.6
s1 aligned to 4 bytes 25.6% aligned to 8 bytes 15.6% aligned to 16 bytes
10.0%
s2 aligned to 4 bytes 59.5% aligned to 8 bytes 47.0% aligned to 16 bytes
46.3%
s1-s2 aligned to 4 bytes 20.4% aligned to 8 bytes 8.6% aligned to 16 bytes
3.6%
n <= 0: 6.7% n <= 1: 52.2% n <= 2: 94.6% n <= 3: 98.4% n <= 4: 99.0% n
<= 8: 100.0% n <= 16: 100.0% n <= 32: 100.0% n <= 64: 100.0%
replaying make
average size 7.2 calls 1000000 succeed 99.5% latencies 1.3 1.4
s1 aligned to 4 bytes 19.2% aligned to 8 bytes 12.3% aligned to 16 bytes
8.4%
s2 aligned to 4 bytes 27.5% aligned to 8 bytes 15.8% aligned to 16 bytes
6.6%
s1-s2 aligned to 4 bytes 24.8% aligned to 8 bytes 12.2% aligned to 16 bytes
6.0%
n <= 0: 72.1% n <= 1: 75.0% n <= 2: 75.3% n <= 3: 75.3% n <= 4: 75.3% n
<= 8: 76.1% n <= 16: 76.6% n <= 32: 100.0% n <= 64: 100.0%
replaying /usr/lib/gcc/x86_64-linux-gnu/4.9/cc1
average size 4.4 calls 6108 succeed 34.0% latencies 0.0 10.5
s1 aligned to 4 bytes 27.7% aligned to 8 bytes 2.2% aligned to 16 bytes
1.5%
s2 aligned to 4 bytes 80.8% aligned to 8 bytes 79.2% aligned to 16 bytes
42.5%
s1-s2 aligned to 4 bytes 27.9% aligned to 8 bytes 3.3% aligned to 16 bytes
2.4%
n <= 0: 23.8% n <= 1: 26.5% n <= 2: 27.2% n <= 3: 27.4% n <= 4: 52.5% n
<= 8: 96.1% n <= 16: 99.9% n <= 32: 100.0% n <= 64: 100.0%
replaying gcc
average size 0.0 calls 63189 succeed 99.9% latencies 1.6 1.7
s1 aligned to 4 bytes 3.4% aligned to 8 bytes 3.2% aligned to 16 bytes
3.1%
s2 aligned to 4 bytes 26.5% aligned to 8 bytes 11.9% aligned to 16 bytes
6.6%
s1-s2 aligned to 4 bytes 24.7% aligned to 8 bytes 13.2% aligned to 16 bytes
7.7%
n <= 0: 96.3% n <= 1: 99.7% n <= 2: 99.9% n <= 3: 99.9% n <= 4: 99.9% n
<= 8: 100.0% n <= 16: 100.0% n <= 32: 100.0% n <= 64: 100.0%
replaying pager
average size 0.9 calls 118 succeed 56.8% latencies -18.2 -18.2
s1 aligned to 4 bytes 23.7% aligned to 8 bytes 15.3% aligned to 16 bytes
8.5%
s2 aligned to 4 bytes 21.2% aligned to 8 bytes 16.9% aligned to 16 bytes
13.6%
s1-s2 aligned to 4 bytes 30.5% aligned to 8 bytes 17.8% aligned to 16 bytes
11.9%
n <= 0: 54.2% n <= 1: 56.8% n <= 2: 98.3% n <= 3: 98.3% n <= 4: 100.0% n
<= 8: 100.0% n <= 16: 100.0% n <= 32: 100.0% n <= 64: 100.0%
replaying man
average size 12.3 calls 119 succeed 49.6% latencies -16.9 -5.0
s1 aligned to 4 bytes 100.0% aligned to 8 bytes 100.0% aligned to 16 bytes
100.0%
s2 aligned to 4 bytes 100.0% aligned to 8 bytes 100.0% aligned to 16 bytes
100.0%
s1-s2 aligned to 4 bytes 100.0% aligned to 8 bytes 100.0% aligned to 16 bytes
100.0%
n <= 0: 0.8% n <= 1: 21.8% n <= 2: 21.8% n <= 3: 21.8% n <= 4: 21.8% n
<= 8: 50.4% n <= 16: 89.1% n <= 32: 89.1% n <= 64: 100.0%
replaying as
average size 5.3 calls 8968 succeed 2.1% latencies 16.0 4.8
s1 aligned to 4 bytes 42.8% aligned to 8 bytes 39.1% aligned to 16 bytes
38.4%
s2 aligned to 4 bytes 35.4% aligned to 8 bytes 23.9% aligned to 16 bytes
18.8%
s1-s2 aligned to 4 bytes 26.3% aligned to 8 bytes 13.1% aligned to 16 bytes
7.4%
n <= 0: 0.2% n <= 1: 0.3% n <= 2: 1.5% n <= 3: 12.7% n <= 4: 47.8% n
<= 8: 98.9% n <= 16: 99.6% n <= 32: 100.0% n <= 64: 100.0%
summary strcasecmp:
replaying mutt
average size 1.2 calls 53965 succeed 100.0% latencies -252.2 -251.1
s1 aligned to 4 bytes 100.0% aligned to 8 bytes 100.0% aligned to 16 bytes
100.0%
s2 aligned to 4 bytes 31.7% aligned to 8 bytes 20.8% aligned to 16 bytes
11.9%
s1-s2 aligned to 4 bytes 31.7% aligned to 8 bytes 20.8% aligned to 16 bytes
11.9%
n <= 0: 63.4% n <= 1: 65.3% n <= 2: 65.3% n <= 3: 88.7% n <= 4: 100.0% n
<= 8: 100.0% n <= 16: 100.0% n <= 32: 100.0% n <= 64: 100.0%
average case mismatches 0.581
replaying irb
average size 1.0 calls 693 succeed 94.5% latencies -97.4 -97.9
s1 aligned to 4 bytes 30.4% aligned to 8 bytes 11.4% aligned to 16 bytes
4.2%
s2 aligned to 4 bytes 29.1% aligned to 8 bytes 14.3% aligned to 16 bytes
10.2%
s1-s2 aligned to 4 bytes 27.4% aligned to 8 bytes 13.6% aligned to 16 bytes
5.9%
n <= 0: 84.6% n <= 1: 88.3% n <= 2: 89.0% n <= 3: 89.8% n <= 4: 90.3% n
<= 8: 93.8% n <= 16: 99.6% n <= 32: 100.0% n <= 64: 100.0%
average case mismatches 0.000
replaying vim
average size 0.5 calls 2194 succeed 95.2% latencies -19.8 -9.9
s1 aligned to 4 bytes 92.7% aligned to 8 bytes 92.6% aligned to 16 bytes
91.7%
s2 aligned to 4 bytes 27.7% aligned to 8 bytes 10.9% aligned to 16 bytes
6.5%
s1-s2 aligned to 4 bytes 26.5% aligned to 8 bytes 10.2% aligned to 16 bytes
5.3%
n <= 0: 87.2% n <= 1: 90.6% n <= 2: 91.3% n <= 3: 94.5% n <= 4: 97.4% n
<= 8: 99.1% n <= 16: 100.0% n <= 32: 100.0% n <= 64: 100.0%
average case mismatches 0.024
replaying /usr/lib/gcc/x86_64-linux-gnu/4.9/cc1
average size 5.3 calls 108 succeed 4.6% latencies 31.5 -6.5
s1 aligned to 4 bytes 6.5% aligned to 8 bytes 5.6% aligned to 16 bytes
5.6%
s2 aligned to 4 bytes 1.9% aligned to 8 bytes 0.9% aligned to 16 bytes
0.9%
s1-s2 aligned to 4 bytes 93.5% aligned to 8 bytes 93.5% aligned to 16 bytes
93.5%
n <= 0: 0.9% n <= 1: 0.9% n <= 2: 0.9% n <= 3: 0.9% n <= 4: 3.7% n
<= 8: 95.4% n <= 16: 100.0% n <= 32: 100.0% n <= 64: 100.0%
average case mismatches 0.028
replaying /usr/bin/lsof
average size 0.1 calls 181 succeed 98.9% latencies 32.2 36.8
s1 aligned to 4 bytes 20.4% aligned to 8 bytes 17.1% aligned to 16 bytes
17.1%
s2 aligned to 4 bytes 17.7% aligned to 8 bytes 0.6% aligned to 16 bytes
0.6%
s1-s2 aligned to 4 bytes 26.0% aligned to 8 bytes 12.7% aligned to 16 bytes
6.1%
n <= 0: 97.2% n <= 1: 99.4% n <= 2: 99.4% n <= 3: 99.4% n <= 4: 99.4% n
<= 8: 100.0% n <= 16: 100.0% n <= 32: 100.0% n <= 64: 100.0%
average case mismatches 0.000
replaying man
average size 2.1 calls 70892 succeed 100.0% latencies -353.3 -355.8
s1 aligned to 4 bytes 100.0% aligned to 8 bytes 100.0% aligned to 16 bytes
100.0%
s2 aligned to 4 bytes 100.0% aligned to 8 bytes 100.0% aligned to 16 bytes
100.0%
s1-s2 aligned to 4 bytes 100.0% aligned to 8 bytes 100.0% aligned to 16 bytes
100.0%
n <= 0: 38.8% n <= 1: 63.4% n <= 2: 74.7% n <= 3: 81.3% n <= 4: 86.7% n
<= 8: 95.5% n <= 16: 98.4% n <= 32: 100.0% n <= 64: 100.0%
average case mismatches 0.063
replaying preconv
average size 0.6 calls 75 succeed 97.3% latencies -35.2 -6.9
s1 aligned to 4 bytes 97.3% aligned to 8 bytes 96.0% aligned to 16 bytes
96.0%
s2 aligned to 4 bytes 38.7% aligned to 8 bytes 21.3% aligned to 16 bytes
9.3%
s1-s2 aligned to 4 bytes 37.3% aligned to 8 bytes 21.3% aligned to 16 bytes
9.3%
n <= 0: 84.0% n <= 1: 85.3% n <= 2: 85.3% n <= 3: 86.7% n <= 4: 98.7% n
<= 8: 100.0% n <= 16: 100.0% n <= 32: 100.0% n <= 64: 100.0%
average case mismatches 0.453
summary strncasecmp:
replaying mutt
average size 0.5 calls 233025 succeed 95.9% latencies -260.3 -259.2
s1 aligned to 4 bytes 24.4% aligned to 8 bytes 23.6% aligned to 16 bytes
0.4%
s2 aligned to 4 bytes 100.0% aligned to 8 bytes 49.2% aligned to 16 bytes
25.8%
s1-s2 aligned to 4 bytes 24.4% aligned to 8 bytes 13.2% aligned to 16 bytes
7.5%
n <= 0: 81.1% n <= 1: 85.7% n <= 2: 87.6% n <= 3: 100.0% n <= 4: 100.0% n
<= 8: 100.0% n <= 16: 100.0% n <= 32: 100.0% n <= 64: 100.0%
average case mismatches 0.000
replaying vim
average size 2.8 calls 10719 succeed 98.3% latencies -20.9 -20.2
s1 aligned to 4 bytes 30.3% aligned to 8 bytes 11.4% aligned to 16 bytes
8.1%
s2 aligned to 4 bytes 20.7% aligned to 8 bytes 5.0% aligned to 16 bytes
3.5%
s1-s2 aligned to 4 bytes 27.9% aligned to 8 bytes 8.1% aligned to 16 bytes
3.7%
n <= 0: 55.5% n <= 1: 57.6% n <= 2: 58.4% n <= 3: 71.2% n <= 4: 72.6% n
<= 8: 86.6% n <= 16: 100.0% n <= 32: 100.0% n <= 64: 100.0%
average case mismatches 0.002
replaying man
average size 1.3 calls 167 succeed 91.0% latencies -17.1 22.7
s1 aligned to 4 bytes 100.0% aligned to 8 bytes 100.0% aligned to 16 bytes
100.0%
s2 aligned to 4 bytes 100.0% aligned to 8 bytes 100.0% aligned to 16 bytes
100.0%
s1-s2 aligned to 4 bytes 100.0% aligned to 8 bytes 100.0% aligned to 16 bytes
100.0%
n <= 0: 50.3% n <= 1: 64.1% n <= 2: 66.5% n <= 3: 89.8% n <= 4: 98.8% n
<= 8: 100.0% n <= 16: 100.0% n <= 32: 100.0% n <= 64: 100.0%
average case mismatches 0.000
replaying as
average size 0.0 calls 3267 succeed 100.0% latencies 1.5 7.7
s1 aligned to 4 bytes 24.4% aligned to 8 bytes 12.3% aligned to 16 bytes
6.0%
s2 aligned to 4 bytes 0.1% aligned to 8 bytes 0.0% aligned to 16 bytes
0.0%
s1-s2 aligned to 4 bytes 25.3% aligned to 8 bytes 11.6% aligned to 16 bytes
6.0%
n <= 0: 99.9% n <= 1: 100.0% n <= 2: 100.0% n <= 3: 100.0% n <= 4: 100.0% n
<= 8: 100.0% n <= 16: 100.0% n <= 32: 100.0% n <= 64: 100.0%
average case mismatches 0.000