The branch, master has been updated via 03e880931d0 doc: Update doc about talloc vs malloc speed via a66db6c16e4 lib:talloc: Use tabs to align output in speed test via 39c13f062a8 lib:talloc: Increase alloc size to 128 kilobytes via 00d5982da2f lib:talloc: Don't optimize the speed test via b6bf9cba80e lib:talloc: Add talloc_zero vs calloc test via 8dddea2ceda lib:talloc: Use memset_s() to avoid the call gets optimized out via 6812e30be35 lib:talloc: Remove trailing spaces from testsuite.c from 20a3a94e06a lib:ldb: Document environment variables in ldb manpage
https://git.samba.org/?p=samba.git;a=shortlog;h=master - Log ----------------------------------------------------------------- commit 03e880931d0a6171826f5ffc33e7b4d86eea54a6 Author: Andreas Schneider <a...@samba.org> Date: Wed Mar 29 11:04:38 2023 +0200 doc: Update doc about talloc vs malloc speed Signed-off-by: Andreas Schneider <a...@samba.org> Reviewed-by: Martin Schwenke <mar...@meltin.net> Autobuild-User(master): Martin Schwenke <mart...@samba.org> Autobuild-Date(master): Sat Sep 28 01:20:01 UTC 2024 on atb-devel-224 commit a66db6c16e4225e200da7f6f939b756026cee61f Author: Andreas Schneider <a...@samba.org> Date: Thu Apr 27 11:31:07 2023 +0200 lib:talloc: Use tabs to align output in speed test Signed-off-by: Andreas Schneider <a...@samba.org> Reviewed-by: Martin Schwenke <mar...@meltin.net> commit 39c13f062a859d17b5be3a2a02fece8d0b1819f1 Author: Andreas Schneider <a...@samba.org> Date: Thu Apr 27 11:24:59 2023 +0200 lib:talloc: Increase alloc size to 128 kilobytes We want to avoid that the optimizer will use stack allocations. This way the test should be a bit more realistic. Signed-off-by: Andreas Schneider <a...@samba.org> Reviewed-by: Martin Schwenke <mar...@meltin.net> commit 00d5982da2f5f2595b634b957c7d0b990e12b37b Author: Andreas Schneider <a...@samba.org> Date: Mon Apr 17 09:25:48 2023 +0200 lib:talloc: Don't optimize the speed test If the speed test gets optimized, the malloc() and free() might be replaced by stack allocations. Signed-off-by: Andreas Schneider <a...@samba.org> Reviewed-by: Martin Schwenke <mar...@meltin.net> commit b6bf9cba80e55ca5aac3611e55dc952782c867c7 Author: Andreas Schneider <a...@samba.org> Date: Fri Apr 14 21:34:59 2023 +0200 lib:talloc: Add talloc_zero vs calloc test Signed-off-by: Andreas Schneider <a...@samba.org> Reviewed-by: Martin Schwenke <mar...@meltin.net> commit 8dddea2ceda40f2365bd6b1a62826b84dc523b74 Author: Andreas Schneider <a...@samba.org> Date: Tue Feb 13 09:22:56 2024 +0100 lib:talloc: Use memset_s() to avoid the call gets optimized out Signed-off-by: Andreas Schneider <a...@samba.org> Reviewed-by: Martin Schwenke <mar...@meltin.net> commit 6812e30be35508db3ac23406243fe08da9cf3084 Author: Andreas Schneider <a...@samba.org> Date: Tue Feb 6 18:03:22 2024 +0100 lib:talloc: Remove trailing spaces from testsuite.c Signed-off-by: Andreas Schneider <a...@samba.org> Reviewed-by: Martin Schwenke <mar...@meltin.net> ----------------------------------------------------------------------- Summary of changes: lib/talloc/doc/mainpage.dox | 10 ++-- lib/talloc/man/talloc.3.xml | 12 ++-- lib/talloc/talloc_guide.txt | 11 ++-- lib/talloc/testsuite.c | 142 ++++++++++++++++++++++++++++++-------------- lib/talloc/wscript | 2 +- 5 files changed, 114 insertions(+), 63 deletions(-) Changeset truncated at 500 lines: diff --git a/lib/talloc/doc/mainpage.dox b/lib/talloc/doc/mainpage.dox index ece6ccb8f0c..d881e503a43 100644 --- a/lib/talloc/doc/mainpage.dox +++ b/lib/talloc/doc/mainpage.dox @@ -69,11 +69,11 @@ * @section talloc_performance Performance * * All the additional features of talloc() over malloc() do come at a price. We - * have a simple performance test in Samba4 that measures talloc() versus - * malloc() performance, and it seems that talloc() is about 4% slower than - * malloc() on my x86 Debian Linux box. For Samba, the great reduction in code - * complexity that we get by using talloc makes this worthwhile, especially as - * the total overhead of talloc/malloc in Samba is already quite small. + * have a performance test in Samba that measures talloc() versus malloc() + * performance, and it seems that talloc() is about 50% slower than malloc() + * (AMD Ryzen 9 3900X). For Samba, the great reduction in code complexity that + * we get by using talloc makes this worthwhile, especially as the total + * overhead of talloc/malloc in Samba is already quite small. * * @section talloc_named Named blocks * diff --git a/lib/talloc/man/talloc.3.xml b/lib/talloc/man/talloc.3.xml index c51061fce1f..e26b16dbecf 100644 --- a/lib/talloc/man/talloc.3.xml +++ b/lib/talloc/man/talloc.3.xml @@ -767,12 +767,12 @@ if (ptr) memcpy(ptr, p, strlen(p)+1);</programlisting> <refsect1><title>PERFORMANCE</title> <para> All the additional features of talloc(3) over malloc(3) do come at a - price. We have a simple performance test in Samba4 that measures - talloc() versus malloc() performance, and it seems that talloc() is - about 10% slower than malloc() on my x86 Debian Linux box. For - Samba, the great reduction in code complexity that we get by using - talloc makes this worthwhile, especially as the total overhead of - talloc/malloc in Samba is already quite small. + price. We have a performance test in Samba that measures talloc() versus + malloc() performance, and it seems that talloc() is + about 50% slower than malloc() (AMD Ryzen 9 3900X). For Samba, the great + reduction in code complexity that we get by using talloc makes this + worthwhile, especially as the total overhead of talloc/malloc in Samba + is already quite small. </para> </refsect1> <refsect1><title>SEE ALSO</title> diff --git a/lib/talloc/talloc_guide.txt b/lib/talloc/talloc_guide.txt index dedda6c0678..d6e3646a1bd 100644 --- a/lib/talloc/talloc_guide.txt +++ b/lib/talloc/talloc_guide.txt @@ -43,12 +43,11 @@ testsuite.c to clarify how some particular situation is handled. Performance ----------- -All the additional features of talloc() over malloc() do come at a -price. We have a simple performance test in Samba4 that measures -talloc() versus malloc() performance, and it seems that talloc() is -about 4% slower than malloc() on my x86 Debian Linux box. For Samba, -the great reduction in code complexity that we get by using talloc -makes this worthwhile, especially as the total overhead of +All the additional features of talloc() over malloc() do come at a price. We +have a performance test in Samba4 that measures talloc() versus malloc() +performance, and it seems that talloc() is about 50% slower than malloc() (AMD +Ryzen 9 3900X). For Samba, the great reduction in code complexity that we get by +using talloc makes this worthwhile, especially as the total overhead of talloc/malloc in Samba is already quite small. diff --git a/lib/talloc/testsuite.c b/lib/talloc/testsuite.c index 282ebc6956d..dc0039940ff 100644 --- a/lib/talloc/testsuite.c +++ b/lib/talloc/testsuite.c @@ -1,14 +1,14 @@ -/* +/* Unix SMB/CIFS implementation. local testing of talloc routines. Copyright (C) Andrew Tridgell 2004 - + ** NOTE! The following LGPL license applies to the talloc ** library. This does NOT imply that all of Samba is released ** under the LGPL - + This library is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either @@ -42,6 +42,14 @@ #include "talloc_testsuite.h" +#ifndef disable_optimization +#if __has_attribute(optimize) +#define disable_optimization __attribute__((optimize("O0"))) +#else /* disable_optimization */ +#define disable_optimization +#endif +#endif /* disable_optimization */ + static struct timeval private_timeval_current(void) { struct timeval tv; @@ -52,7 +60,7 @@ static struct timeval private_timeval_current(void) static double private_timeval_elapsed(struct timeval *tv) { struct timeval tv2 = private_timeval_current(); - return (tv2.tv_sec - tv->tv_sec) + + return (tv2.tv_sec - tv->tv_sec) + (tv2.tv_usec - tv->tv_usec)*1.0e-6; } @@ -135,7 +143,7 @@ static void test_log_stdout(const char *message) } /* - test references + test references */ static bool test_ref1(void) { @@ -150,7 +158,7 @@ static bool test_ref1(void) talloc_named_const(p1, 2, "x2"); talloc_named_const(p1, 3, "x3"); - r1 = talloc_named_const(root, 1, "r1"); + r1 = talloc_named_const(root, 1, "r1"); ref = talloc_reference(r1, p2); talloc_report_full(root, stderr); @@ -192,7 +200,7 @@ static bool test_ref1(void) } /* - test references + test references */ static bool test_ref2(void) { @@ -206,7 +214,7 @@ static bool test_ref2(void) talloc_named_const(p1, 1, "x3"); p2 = talloc_named_const(p1, 1, "p2"); - r1 = talloc_named_const(root, 1, "r1"); + r1 = talloc_named_const(root, 1, "r1"); ref = talloc_reference(r1, p2); talloc_report_full(root, stderr); @@ -247,7 +255,7 @@ static bool test_ref2(void) } /* - test references + test references */ static bool test_ref3(void) { @@ -287,7 +295,7 @@ static bool test_ref3(void) } /* - test references + test references */ static bool test_ref4(void) { @@ -302,7 +310,7 @@ static bool test_ref4(void) talloc_named_const(p1, 1, "x3"); p2 = talloc_named_const(p1, 1, "p2"); - r1 = talloc_named_const(root, 1, "r1"); + r1 = talloc_named_const(root, 1, "r1"); ref = talloc_reference(r1, p2); talloc_report_full(root, stderr); @@ -338,7 +346,7 @@ static bool test_ref4(void) /* - test references + test references */ static bool test_unlink1(void) { @@ -353,7 +361,7 @@ static bool test_unlink1(void) talloc_named_const(p1, 1, "x3"); p2 = talloc_named_const(p1, 1, "p2"); - r1 = talloc_named_const(p1, 1, "r1"); + r1 = talloc_named_const(p1, 1, "r1"); ref = talloc_reference(r1, p2); talloc_report_full(root, stderr); @@ -439,11 +447,11 @@ static bool test_misc(void) CHECK_BLOCKS("misc", p1, 2); CHECK_BLOCKS("misc", root, 3); - torture_assert("misc", talloc_free(NULL) == -1, + torture_assert("misc", talloc_free(NULL) == -1, "talloc_free(NULL) should give -1\n"); talloc_set_destructor(p1, fail_destructor); - torture_assert("misc", talloc_free(p1) == -1, + torture_assert("misc", talloc_free(p1) == -1, "Failed destructor should cause talloc_free to fail\n"); talloc_set_destructor(p1, NULL); @@ -458,10 +466,10 @@ static bool test_misc(void) "failed: strdup on NULL should give NULL\n"); p2 = talloc_strndup(p1, "foo", 2); - torture_assert("misc", strcmp("fo", p2) == 0, + torture_assert("misc", strcmp("fo", p2) == 0, "strndup doesn't work\n"); p2 = talloc_asprintf_append_buffer(p2, "o%c", 'd'); - torture_assert("misc", strcmp("food", p2) == 0, + torture_assert("misc", strcmp("food", p2) == 0, "talloc_asprintf_append_buffer doesn't work\n"); CHECK_BLOCKS("misc", p2, 1); CHECK_BLOCKS("misc", p1, 3); @@ -634,7 +642,7 @@ static bool test_realloc_child(void) el1->list3 = talloc(el1, struct el2 *); el1->list3[0] = talloc(el1->list3, struct el2); el1->list3[0]->name = talloc_strdup(el1->list3[0], "testing2"); - + el2 = talloc(el1->list, struct el2); CHECK_PARENT("el2", el2, el1->list); el2_2 = talloc(el1->list2, struct el2); @@ -742,7 +750,7 @@ static bool test_steal(void) talloc_steal(root, p2); CHECK_BLOCKS("steal", root, 2); CHECK_SIZE("steal", root, 20); - + talloc_free(p2); CHECK_BLOCKS("steal", root, 1); @@ -851,9 +859,14 @@ static bool test_unref_reparent(void) return true; } +/* Make the size big enough to not fit into the stack */ +#define ALLOC_SIZE (128 * 1024) +#define ALLOC_DUP_STRING "talloc talloc talloc talloc talloc talloc talloc" + /* measure the speed of talloc versus malloc */ +static bool test_speed(void) disable_optimization; static bool test_speed(void) { void *ctx = talloc_new(NULL); @@ -869,9 +882,9 @@ static bool test_speed(void) do { void *p1, *p2, *p3; for (i=0;i<loop;i++) { - p1 = talloc_size(ctx, loop % 100); - p2 = talloc_strdup(p1, "foo bar"); - p3 = talloc_size(p1, 300); + p1 = talloc_size(ctx, loop % ALLOC_SIZE); + p2 = talloc_strdup(p1, ALLOC_DUP_STRING); + p3 = talloc_size(p1, ALLOC_SIZE); (void)p2; (void)p3; talloc_free(p1); @@ -879,20 +892,20 @@ static bool test_speed(void) count += 3 * loop; } while (private_timeval_elapsed(&tv) < 5.0); - fprintf(stderr, "talloc: %.0f ops/sec\n", count/private_timeval_elapsed(&tv)); + fprintf(stderr, "talloc:\t\t%.0f ops/sec\n", count/private_timeval_elapsed(&tv)); talloc_free(ctx); - ctx = talloc_pool(NULL, 1024); + ctx = talloc_pool(NULL, ALLOC_SIZE * 2); tv = private_timeval_current(); count = 0; do { void *p1, *p2, *p3; for (i=0;i<loop;i++) { - p1 = talloc_size(ctx, loop % 100); - p2 = talloc_strdup(p1, "foo bar"); - p3 = talloc_size(p1, 300); + p1 = talloc_size(ctx, loop % ALLOC_SIZE); + p2 = talloc_strdup(p1, ALLOC_DUP_STRING); + p3 = talloc_size(p1, ALLOC_SIZE); (void)p2; (void)p3; talloc_free(p1); @@ -902,23 +915,62 @@ static bool test_speed(void) talloc_free(ctx); - fprintf(stderr, "talloc_pool: %.0f ops/sec\n", count/private_timeval_elapsed(&tv)); + fprintf(stderr, "talloc_pool:\t%.0f ops/sec\n", count/private_timeval_elapsed(&tv)); + + tv = private_timeval_current(); + count = 0; + do { + void *p1, *p2, *p3; + for (i=0;i<loop;i++) { + p1 = malloc(loop % ALLOC_SIZE); + p2 = strdup(ALLOC_DUP_STRING); + p3 = malloc(ALLOC_SIZE); + free(p1); + free(p2); + free(p3); + } + count += 3 * loop; + } while (private_timeval_elapsed(&tv) < 5.0); + fprintf(stderr, "malloc:\t\t%.0f ops/sec\n", count/private_timeval_elapsed(&tv)); + + printf("\n# TALLOC_ZERO VS CALLOC SPEED\n"); + + ctx = talloc_new(NULL); + + tv = private_timeval_current(); + count = 0; + do { + void *p1, *p2, *p3; + for (i=0;i<loop;i++) { + p1 = talloc_zero_size(ctx, loop % ALLOC_SIZE); + p2 = talloc_strdup(p1, ALLOC_DUP_STRING); + p3 = talloc_zero_size(p1, ALLOC_SIZE); + (void)p2; + (void)p3; + talloc_free(p1); + } + count += 3 * loop; + } while (private_timeval_elapsed(&tv) < 5.0); + + fprintf(stderr, "talloc_zero:\t%.0f ops/sec\n", count/private_timeval_elapsed(&tv)); + + talloc_free(ctx); tv = private_timeval_current(); count = 0; do { void *p1, *p2, *p3; for (i=0;i<loop;i++) { - p1 = malloc(loop % 100); - p2 = strdup("foo bar"); - p3 = malloc(300); + p1 = calloc(1, loop % ALLOC_SIZE); + p2 = strdup(ALLOC_DUP_STRING); + p3 = calloc(1, ALLOC_SIZE); free(p1); free(p2); free(p3); } count += 3 * loop; } while (private_timeval_elapsed(&tv) < 5.0); - fprintf(stderr, "malloc: %.0f ops/sec\n", count/private_timeval_elapsed(&tv)); + fprintf(stderr, "calloc:\t\t%.0f ops/sec\n", count/private_timeval_elapsed(&tv)); printf("success: speed\n"); @@ -928,15 +980,15 @@ static bool test_speed(void) static bool test_lifeless(void) { void *top = talloc_new(NULL); - char *parent, *child; + char *parent, *child; void *child_owner = talloc_new(NULL); printf("test: lifeless\n# TALLOC_UNLINK LOOP\n"); parent = talloc_strdup(top, "parent"); - child = talloc_strdup(parent, "child"); + child = talloc_strdup(parent, "child"); (void)talloc_reference(child, parent); - (void)talloc_reference(child_owner, child); + (void)talloc_reference(child_owner, child); talloc_report_full(top, stderr); talloc_unlink(top, parent); talloc_unlink(top, child); @@ -969,7 +1021,7 @@ static bool test_loop(void) parent = talloc_strdup(top, "parent"); req1 = talloc(parent, struct req1); - req1->req2 = talloc_strdup(req1, "req2"); + req1->req2 = talloc_strdup(req1, "req2"); talloc_set_destructor(req1->req2, test_loop_destructor); req1->req3 = talloc_strdup(req1, "req3"); (void)talloc_reference(req1->req3, req1); @@ -979,7 +1031,7 @@ static bool test_loop(void) talloc_report_full(NULL, stderr); talloc_free(top); - torture_assert("loop", loop_destructor_count == 1, + torture_assert("loop", loop_destructor_count == 1, "FAILED TO FIRE LOOP DESTRUCTOR\n"); loop_destructor_count = 0; @@ -2097,7 +2149,7 @@ static bool test_magic_protection(void) * * Real attacks would attempt to set a real destructor. */ - memset(p1, '\0', 32); + memset_s(p1, 32, '\0', 32); /* Then the attack takes effect when the memory's freed. */ talloc_free(pool); @@ -2220,29 +2272,29 @@ bool torture_local_talloc(struct torture_context *tctx) test_reset(); ret &= test_ref4(); test_reset(); - ret &= test_unlink1(); + ret &= test_unlink1(); test_reset(); ret &= test_misc(); test_reset(); ret &= test_realloc(); test_reset(); - ret &= test_realloc_child(); + ret &= test_realloc_child(); test_reset(); - ret &= test_steal(); + ret &= test_steal(); test_reset(); - ret &= test_move(); + ret &= test_move(); test_reset(); ret &= test_unref_reparent(); test_reset(); - ret &= test_realloc_fn(); + ret &= test_realloc_fn(); test_reset(); ret &= test_type(); test_reset(); - ret &= test_lifeless(); + ret &= test_lifeless(); test_reset(); ret &= test_loop(); test_reset(); - ret &= test_free_parent_deny_child(); + ret &= test_free_parent_deny_child(); test_reset(); ret &= test_realloc_on_destructor_parent(); test_reset(); diff --git a/lib/talloc/wscript b/lib/talloc/wscript index 8b5e02d36c5..1b240ae3653 100644 --- a/lib/talloc/wscript +++ b/lib/talloc/wscript @@ -93,7 +93,7 @@ def build(bld): public_headers=[], enabled=bld.env.TALLOC_COMPAT1) - testsuite_deps = 'talloc' + testsuite_deps = 'talloc replace' if bld.CONFIG_SET('HAVE_PTHREAD'): testsuite_deps += ' pthread' -- Samba Shared Repository