If anybody is curious, here are my numbers for an AMD X2 3800+: $ gcc -O3 -std=c99 -DSTRING='"This is a very long sentence that is expected to be slow."' -o x x.c y.c strlcpy.c ; ./x NONE: 620268 us MEMCPY: 683135 us STRNCPY: 7952930 us STRLCPY: 10042364 us
$ gcc -O3 -std=c99 -DSTRING='"Short sentence."' -o x x.c y.c strlcpy.c ; ./x NONE: 554694 us MEMCPY: 691390 us STRNCPY: 7759933 us STRLCPY: 3710627 us $ gcc -O3 -std=c99 -DSTRING='""' -o x x.c y.c strlcpy.c ; ./x NONE: 631266 us MEMCPY: 775340 us STRNCPY: 7789267 us STRLCPY: 550430 us Each invocation represents 100 million calls to each of the functions. Each function accepts a 'dst' and 'src' argument, and assumes that it is copying 64 bytes from 'src' to 'dst'. The none function does nothing. The memcpy calls memcpy(), the strncpy calls strncpy(), and the strlcpy calls the strlcpy() that was posted from the BSD sources. (GLIBC doesn't have strlcpy() on my machine). This makes it clear what the overhead of the additional logic involves. memcpy() is approximately equal to nothing at all. strncpy() is always expensive. strlcpy() is often more expensive than memcpy(), except in the empty string case. These tests do not properly model the effects of real memory, however, they do model the effects of cache memory. I would suggest that the results are exaggerated, but not invalid. For anybody doubting the none vs memcpy, I've included the generated assembly code. I chalk it entirely up to fully utilizing the parallelization capability of the CPU. Although 16 movq instructions are executed, they can be executed fully in parallel. It almost makes it clear to me that all of these instructions are pretty fast. Are we sure this is a real bottleneck? Even the slowest operation above, strlcpy() on a very long string, appears to execute 10 per microsecond? Perhaps my tests are too easy for my CPU and I need to make it access many different 64-byte blocks? :-) Cheers, mark -- [EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] __________________________ . . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bind them... http://mark.mielke.cc/
.file "x.c" .text .p2align 4,,15 .globl x_none .type x_none, @function x_none: .LFB14: rep ; ret .LFE14: .size x_none, .-x_none .p2align 4,,15 .globl x_strlcpy .type x_strlcpy, @function x_strlcpy: .LFB17: movl $64, %edx jmp strlcpy .LFE17: .size x_strlcpy, .-x_strlcpy .p2align 4,,15 .globl x_strncpy .type x_strncpy, @function x_strncpy: .LFB16: movl $64, %edx jmp strncpy .LFE16: .size x_strncpy, .-x_strncpy .p2align 4,,15 .globl x_memcpy .type x_memcpy, @function x_memcpy: .LFB15: movq (%rsi), %rax movq %rax, (%rdi) movq 8(%rsi), %rax movq %rax, 8(%rdi) movq 16(%rsi), %rax movq %rax, 16(%rdi) movq 24(%rsi), %rax movq %rax, 24(%rdi) movq 32(%rsi), %rax movq %rax, 32(%rdi) movq 40(%rsi), %rax movq %rax, 40(%rdi) movq 48(%rsi), %rax movq %rax, 48(%rdi) movq 56(%rsi), %rax movq %rax, 56(%rdi) ret .LFE15: .size x_memcpy, .-x_memcpy .section .eh_frame,"a",@progbits .Lframe1: .long .LECIE1-.LSCIE1 .LSCIE1: .long 0x0 .byte 0x1 .string "zR" .uleb128 0x1 .sleb128 -8 .byte 0x10 .uleb128 0x1 .byte 0x3 .byte 0xc .uleb128 0x7 .uleb128 0x8 .byte 0x90 .uleb128 0x1 .align 8 .LECIE1: .LSFDE1: .long .LEFDE1-.LASFDE1 .LASFDE1: .long .LASFDE1-.Lframe1 .long .LFB14 .long .LFE14-.LFB14 .uleb128 0x0 .align 8 .LEFDE1: .LSFDE3: .long .LEFDE3-.LASFDE3 .LASFDE3: .long .LASFDE3-.Lframe1 .long .LFB17 .long .LFE17-.LFB17 .uleb128 0x0 .align 8 .LEFDE3: .LSFDE5: .long .LEFDE5-.LASFDE5 .LASFDE5: .long .LASFDE5-.Lframe1 .long .LFB16 .long .LFE16-.LFB16 .uleb128 0x0 .align 8 .LEFDE5: .LSFDE7: .long .LEFDE7-.LASFDE7 .LASFDE7: .long .LASFDE7-.Lframe1 .long .LFB15 .long .LFE15-.LFB15 .uleb128 0x0 .align 8 .LEFDE7: .ident "GCC: (GNU) 4.1.1 20060525 (Red Hat 4.1.1-1)" .section .note.GNU-stack,"",@progbits
---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match