Hi Paul, > (Ooo! Ooo! Performance measurements! I love this stuff!)
Me too :-) > It depends on the data. In the typical case, "sort" is applied to > text data, which does not contain NUL bytes. The data in that > benchmark contained many NUL bytes. If you take the same benchmark > and uniformly replace "\0" with "\t" in compare.c, then the situation > is much different: coreutils memxfrm is about 3 times faster than > gnulib memxfrm on the larger test cases Indeed. By changing the compare.c program to try - first, a small string, 3 times, - then, a 40 KB string with \t separators, 3 times, - then, the same string with \0 separators, 3 times, I confirm your timings: Time for gnulib_memxfrm: 0,036002 Time for coreutils_memxfrm: 0,036002 Time for gnulib_memxfrm: 0,032002 Time for coreutils_memxfrm: 0,036002 Time for gnulib_memxfrm: 0,036003 Time for coreutils_memxfrm: 0,036002 Time for gnulib_memxfrm: 10,9647 Time for coreutils_memxfrm: 3,54422 Time for gnulib_memxfrm: 10,4247 Time for coreutils_memxfrm: 3,54422 Time for gnulib_memxfrm: 10,4407 Time for coreutils_memxfrm: 3,54422 Time for gnulib_memxfrm: 1,98012 Time for coreutils_memxfrm: 3,42021 Time for gnulib_memxfrm: 1,98012 Time for coreutils_memxfrm: 3,41621 Time for gnulib_memxfrm: 1,98412 Time for coreutils_memxfrm: 3,41621 The reason is that gnulib_memxfrm duplicates the allocated memory size, from 4 KB to 8 KB to 16 KB etc., ignoring the expected result size of 108 KB that strxfrm is returned. After this is fixed, I get better timings. > I expect that this performance glitch in gnulib memxfrm could be > improved, as it shouldn't simply double buffer sizes when they're too > small, as at that point it already knows what the final buffer size > should be. Doing this should bring up gnulib memxfrm to be as fast as > coreutils xmemxfrm for this benchmark. Yes: Without strdup: Time for gnulib_memxfrm: 0,036002 Time for coreutils_memxfrm: 0,036002 Time for gnulib_memxfrm: 0,036002 Time for coreutils_memxfrm: 0,036003 Time for gnulib_memxfrm: 0,036002 Time for coreutils_memxfrm: 0,036002 Time for gnulib_memxfrm: 4,29627 Time for coreutils_memxfrm: 3,55222 Time for gnulib_memxfrm: 3,54422 Time for coreutils_memxfrm: 3,55222 Time for gnulib_memxfrm: 3,54022 Time for coreutils_memxfrm: 3,55222 Time for gnulib_memxfrm: 1,98412 Time for coreutils_memxfrm: 3,42421 Time for gnulib_memxfrm: 1,98412 Time for coreutils_memxfrm: 3,42421 Time for gnulib_memxfrm: 1,98812 Time for coreutils_memxfrm: 3,42021 With strdup: Time for gnulib_memxfrm: 0,036002 Time for coreutils_memxfrm: 0,036002 Time for gnulib_memxfrm: 0,032002 Time for coreutils_memxfrm: 0,036002 Time for gnulib_memxfrm: 0,036003 Time for coreutils_memxfrm: 0,032002 Time for gnulib_memxfrm: 4,39227 Time for coreutils_memxfrm: 3,54822 Time for gnulib_memxfrm: 3,62823 Time for coreutils_memxfrm: 3,54822 Time for gnulib_memxfrm: 3,64023 Time for coreutils_memxfrm: 3,54822 Time for gnulib_memxfrm: 1,98412 Time for coreutils_memxfrm: 3,41621 Time for gnulib_memxfrm: 1,98012 Time for coreutils_memxfrm: 3,42021 Time for gnulib_memxfrm: 1,98012 Time for coreutils_memxfrm: 3,41621 So, this means that my estimation of the overhead of strdup was incorrect. It is not unnoticeable. In the second of the three cases, it is about 2.5% of the 3.55 seconds. In other words, a malloc + memmove costs about 5% of one strxfrm call. But this means that reducing the average numbers of strxfrm calls from 2 to 1, at the cost of some more malloc, will be a speed-up. This is what I did in the patch below, and now gnulib_memxfrm is consistently the winner: Time for gnulib_memxfrm: 0,036002 Time for coreutils_memxfrm: 0,036002 Time for gnulib_memxfrm: 0,032002 Time for coreutils_memxfrm: 0,036002 Time for gnulib_memxfrm: 0,036003 Time for coreutils_memxfrm: 0,032002 Time for gnulib_memxfrm: 2,90418 Time for coreutils_memxfrm: 3,54022 Time for gnulib_memxfrm: 1,97212 Time for coreutils_memxfrm: 3,54022 Time for gnulib_memxfrm: 1,99212 Time for coreutils_memxfrm: 3,54022 Time for gnulib_memxfrm: 1,84011 Time for coreutils_memxfrm: 3,41621 Time for gnulib_memxfrm: 1,82811 Time for coreutils_memxfrm: 3,41621 Time for gnulib_memxfrm: 1,83611 Time for coreutils_memxfrm: 3,41621 > Still, gnulib memxfrm is > problematic, because it insists on managing memory itself. But by doing so it is now 44% faster than the coreutils_memxfrm on the case of long strings, with or without NULs. Starting from a certain complexity, doing malloc as part of the API is a win. Only people who do some kind of embedded systems programming require to be able to do memory allocation statically. For example, GNU libiconv now has a function iconv_open_into, that is like iconv_open without memory allocation. But very very few people need that. Even the most basic functions in libunistring (unistr.in.h: u8_to_u16, u8_to_u32, ..., uniconv.in.h: u8_conv_from_encoding, u8_conv_to_encoding, etc.) rely on implicit dynamic memory allocation. No one has complained about this. > The point > remains, though, that it's confusing that gnulib memxfrm's name begins > with "mem", as the mem* functions don't allocate memory. Would you > consider a patch that renames gnulib memxfrm to amemxfrm, or to some > other such name? No, this is not good. The variant which never allocates memory by itself would be more complex to use and slower on average that gnulib's function. Also, functions like 'strdup', 'putenv', 'setenv', 'scandir', 'fts', all do dynamic memory allocation without having an 'a' in their name to indicate this. 2010-08-08 Bruno Haible <br...@clisp.org> memxfrm: Speed up. * lib/memxfrm.c (memxfrm): Allocate enough memory ahead of time, so that usually only one call to strxfrm is necessary for each string part. Reported by Paul Eggert <egg...@cs.ucla.edu>. --- lib/memxfrm.c.orig Sun Aug 8 11:56:19 2010 +++ lib/memxfrm.c Sun Aug 8 11:55:46 2010 @@ -64,12 +64,40 @@ for (;;) { /* Search next NUL byte. */ - const char *q = p + strlen (p); + size_t l = strlen (p); for (;;) { size_t k; + /* A call to strxfrm costs about 20 times more than a call to + strdup of the result. Therefore it is worth to try to avoid + calling strxfrm more than once on a given string, by making + enough room before calling strxfrm. + The size of the strxfrm result, k, is likely to be between + l and 3 * l. */ + if (3 * l >= allocated - length) + { + /* Grow the result buffer. */ + size_t new_allocated; + char *new_result; + + new_allocated = length + 3 * l + 1; + if (new_allocated < 2 * allocated) + new_allocated = 2 * allocated; + if (new_allocated < 64) + new_allocated = 64; + if (result == resultbuf) + new_result = (char *) malloc (new_allocated); + else + new_result = (char *) realloc (result, new_allocated); + if (new_result != NULL) + { + allocated = new_allocated; + result = new_result; + } + } + errno = 0; k = strxfrm (result + length, p, allocated - length); if (errno != 0) @@ -77,17 +105,21 @@ if (k >= allocated - length) { /* Grow the result buffer. */ + size_t new_allocated; char *new_result; - allocated = 2 * allocated; - if (allocated < 64) - allocated = 64; + new_allocated = length + k + 1; + if (new_allocated < 2 * allocated) + new_allocated = 2 * allocated; + if (new_allocated < 64) + new_allocated = 64; if (result == resultbuf) - new_result = (char *) malloc (allocated); + new_result = (char *) malloc (new_allocated); else - new_result = (char *) realloc (result, allocated); + new_result = (char *) realloc (result, new_allocated); if (new_result == NULL) goto out_of_memory_1; + allocated = new_allocated; result = new_result; } else @@ -97,7 +129,7 @@ } } - p = q + 1; + p = p + l + 1; if (p == p_end) break; result[length] = '\0'; @@ -105,12 +137,23 @@ } } - /* Shrink the allocated memory if possible. */ - if (result != resultbuf && (length > 0 ? length : 1) < allocated) + /* Shrink the allocated memory if possible. + It is not worth calling realloc when length + 1 == allocated; it would + save just one byte. */ + if (result != resultbuf && length + 1 < allocated) { - char *memory = (char *) realloc (result, length > 0 ? length : 1); - if (memory != NULL) - result = memory; + if ((length > 0 ? length : 1) <= *lengthp) + { + memcpy (resultbuf, result, length); + free (result); + result = resultbuf; + } + else + { + char *memory = (char *) realloc (result, length > 0 ? length : 1); + if (memory != NULL) + result = memory; + } } s[n] = orig_sentinel;