Re: [HACKERS] Faster StrNCpy

2007-02-13 Thread Bruce Momjian

This is from October 2006.  Is there a TODO here?

---

Tom Lane wrote:
 I did a couple more tests using x86 architectures.  On a rather old
 Pentium-4 machine running Fedora 5 (gcc 4.1.1, glibc-2.4-11):
 
 $ gcc -O3 -std=c99 -DSTRING='This is a very long sentence that is expected 
 to be very slow.' -DN=(1024*1024) -o x x.c y.c strlcpy.c  
 NONE:786305 us
 MEMCPY: 9887843 us
 STRNCPY:   15045780 us
 STRLCPY:   1763 us
 U_STRLCPY: 14994639 us
 LENCPY:19700346 us
 
 $ gcc -O3 -std=c99 -DSTRING='This is a very long sentence that is expected 
 to be very slow.' -DN=1 -o x x.c y.c strlcpy.c 
 NONE:562001 us
 MEMCPY: 2026546 us
 STRNCPY:   11149134 us
 STRLCPY:   13747827 us
 U_STRLCPY: 12467527 us
 LENCPY:17672899 us
 
 (STRLCPY is our CVS HEAD code, U_STRLCPY is the unrolled version)
 
 On a Mac Mini (Intel Core Duo, OS X 10.4.8, gcc 4.0.1), the system has a
 version of strlcpy, but it seems to suck :-(
 
 $ gcc -O3 -std=c99 -DSTRING='This is a very long sentence that is expected 
 to be very slow.' -DN=(1024*1024) -o x x.c y.c strlcpy.c ; ./x
 NONE:480298 us
 MEMCPY: 7857291 us
 STRNCPY:   10485948 us
 STRLCPY:   16745154 us
 U_STRLCPY: 18337286 us
 S_STRLCPY: 20920213 us
 LENCPY:22878114 us
 
 $ gcc -O3 -std=c99 -DSTRING='This is a very long sentence that is expected 
 to be very slow.' -DN=1 -o x x.c y.c strlcpy.c ; ./x
 NONE:480269 us
 MEMCPY: 1858974 us
 STRNCPY:5405618 us
 STRLCPY:   16364452 us
 U_STRLCPY: 16439753 us
 S_STRLCPY: 19134538 us
 LENCPY:22873141 us
 
 It's interesting that the unrolled version is actually slower here.
 I didn't dig into the assembly code, but maybe Apple's compiler isn't
 doing a very good job with it?
 
 Anyway, these results make me less excited about the unrolled version.
 
 In any case, I don't think we should put too much emphasis on the
 long-source-string case.  In essentially all cases, the true source
 string length will be much shorter than the target buffer (were this
 not so, we'd probably be needing to make the buffer bigger), and strlcpy
 will certainly beat out strncpy in those situations.  The memcpy numbers
 look attractive, but they ignore the problem that in practice we usually
 don't know the source string length in advance --- so I don't think
 those represent something achievable.
 
 One thing that seems real clear is that the LENCPY method loses across
 the board, which surprised me, but it's hard to argue with numbers.
 
 I'm still interested to experiment with MemSet-then-strlcpy for namestrcpy,
 but given the LENCPY results this may be a loser too.
 
   regards, tom lane
 
 ---(end of broadcast)---
 TIP 4: Have you searched our list archives?
 
http://archives.postgresql.org

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Faster StrNCpy

2007-02-13 Thread Tom Lane
Bruce Momjian [EMAIL PROTECTED] writes:
 This is from October 2006.  Is there a TODO here?

I think we had decided that the code that's in there is fine.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Faster StrNCpy

2006-10-04 Thread Benny Amorsen
 ZA == Zeugswetter Andreas DCP SD [EMAIL PROTECTED] writes:

ZA Yes, but it obviously does not in some ports, and that was the
ZA main problem as I interpreted it.

strncpy is part of POSIX; I highly doubt anyone gets it wrong. Getting
sane semantics from it does require manually writing null to the
last location in the buffer.

If you need the null byte padding feature of strncpy, it seems a bit
silly to replace it with strlcpy. Particularly if the only reason is
performance. Down that road lies pglibc.


/Benny



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Faster StrNCpy

2006-10-03 Thread Zeugswetter Andreas DCP SD

 I'm still interested to experiment with MemSet-then-strlcpy 
 for namestrcpy, but given the LENCPY results this may be a loser too.

Um, why not strlcpy then MemSet the rest ?

Andreas

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Faster StrNCpy

2006-10-03 Thread mark
On Tue, Oct 03, 2006 at 10:24:10AM +0200, Zeugswetter Andreas DCP SD wrote:
  I'm still interested to experiment with MemSet-then-strlcpy 
  for namestrcpy, but given the LENCPY results this may be a loser too.
 Um, why not strlcpy then MemSet the rest ?

That's what strncpy() is supposed to be doing.

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Faster StrNCpy

2006-10-03 Thread Zeugswetter Andreas DCP SD

   I'm still interested to experiment with MemSet-then-strlcpy for 
   namestrcpy, but given the LENCPY results this may be a loser too.
  Um, why not strlcpy then MemSet the rest ?
 
 That's what strncpy() is supposed to be doing.

Yes, but it obviously does not in some ports, and that was the main
problem
as I interpreted it.

Andreas

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Faster StrNCpy

2006-10-03 Thread mark
On Tue, Oct 03, 2006 at 01:56:57PM +0200, Zeugswetter Andreas DCP SD wrote:
 
I'm still interested to experiment with MemSet-then-strlcpy for 
namestrcpy, but given the LENCPY results this may be a loser too.
   Um, why not strlcpy then MemSet the rest ?
  That's what strncpy() is supposed to be doing.

 Yes, but it obviously does not in some ports, and that was the main
 problem as I interpreted it.

I think re-writing libc isn't really a PostgreSQL goal.

strlcpy() is theoretically cheaper than strncpy() as it is not
required by any standard to do as much work. memcpy() is cheaper
because it can be more easily parallelized, and can increment
multiple bytes with each loop interval, as it does not need to
slow down to look for a '\0' in each byte.

strlcpy() than memset() the remaining is strncpy(). If you are
suggesting that a moderately tuned version of strncpy() be used
in the ports directory, this is an option, but it can't be
faster than strlcpy(). It's not possible. It would come down to
whether there was a security requirements that the last bytes
were '\0' or not. I haven't seen anybody mention this as a
requirement.

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Faster StrNCpy

2006-10-03 Thread Tom Lane
Zeugswetter Andreas DCP SD [EMAIL PROTECTED] writes:
 I'm still interested to experiment with MemSet-then-strlcpy 
 for namestrcpy, but given the LENCPY results this may be a loser too.

 Um, why not strlcpy then MemSet the rest ?

Two reasons:

* The main point is to do the zeroing using word-wide operations, and if
you do the above then memset will probably be facing a zeroing request
that is neither word-aligned nor word-length.  It may be able to recover
(doing it partly byte-wide and partly word-wide), but this will easily
eat up the time savings of skipping the first couple words.

* On compilers that treat memset as a builtin, there are significant
advantages to doing memset with a constant length: the compiler might
be able to unroll the loop entirely.  (I was annoyed to find that FC5's
gcc on x86_64 seems to understand very well how to inline a constant
length memcpy, but not memset :-(.)

I did actually do some experiments with the above yesterday, and found
that it was a significant win on an old x86 (with about a 10-byte source
string) but about a wash on newer architectures.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Faster StrNCpy

2006-10-03 Thread Tom Lane
[EMAIL PROTECTED] writes:
 ... It would come down to
 whether there was a security requirements that the last bytes
 were '\0' or not. I haven't seen anybody mention this as a
 requirement.

I think it is a requirement for namestrcpy (because the result might end
up on disk), but not elsewhere.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Faster StrNCpy

2006-10-02 Thread Strong, David
Mark,
 
Thanks for attaching the C code for your test. I ran a few tests on a 3Ghz 
Intel Xeon Paxville (dual core) system. I hope the formatting of this table 
survives:
 
 
Method  Size  N=1024*1024   N=1
 
MEMCPY  63 6964927 us 582494 us
MEMCPY  32 7102497 us 582467 us
MEMCPY  16 7116358 us 582538 us
MEMCPY  8  6965239 us 582796 us
MEMCPY  4  6964722 us 583183 us
 
STRNCPY 6310131174 us8843010 us
STRNCPY 3210648202 us9563868 us
STRNCPY 16 9187398 us7969947 us
STRNCPY 8  9275353 us8042777 us
STRNCPY 4  9067570 us8058532 us
 
STRLCPY 6315045507 us   14379702 us
STRLCPY 32 8960303 us8120471 us
STRLCPY 16 7393607 us4915457 us
STRLCPY 8  7222983 us3211931 us
STRLCPY 4  7181267 us1725546 us
 
LENCPY  63 7608932 us4416602 us
LENCPY  32 7252849 us3807535 us
LENCPY  1611680927 us   10331487 us
LENCPY  8 10409685 us9660616 us
LENCPY  4 10824632 us9525082 us

 
The first column is the copy method, the second column is the source string 
size (size of -DSTRING), the 3rd and 4th columns are the different -DN settings.
 
The memcpy () call is the clear winner, at all source string sizes. The strncpy 
() call is better than strlcpy (), until the size of the string decreases. This 
is probably due to the zero padding effect of strncpy. The lencpy () call 
starts out strong, but degrades as the size of the string decreases. This was a 
little surprising and I don't have an explanation for this behavior at this 
time.
 
The AMD optimization manuals have some interesting examples for optimizations 
for memcpy, along the lines of cache line copies and prefetching:
 
 
http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/25112.PDF#search=%22amd%20optimization%20manual%22
 
 
h 
http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/22007.pdf#search=%22amd%20optimization%20manual%22
 
ttp://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/22007.pdf#search=%22amd%20optimization%20manual%22
 
 
There also used to be an interesting article on the SGI web site called 
Optimizing CPU to Memory Accesses on the SGI Visual Workstations 320 and 540, 
but this seems to have been pulled. I did find a copy of the article here:
 
 
http://eunchul.com/database/board/cat.php?data=Win32_APIboard_group=D42a8ff5c3a9b9
 
 
Obviously, different copy mechanisms suit different data sizes. So, I added a 
little debug to the strlcpy () function that was added to Postgres the other 
day. I ran a test against Postgres for ~15 minutes that used 2 client backends 
and the BG writer - 8330804 calls to strlcpy () were generated by the test.
 
Out of the 8330804 calls, 6226616 calls used a maximum copy size of 2213 bytes 
e.g. strlcpy (dest, src, 2213) and 2104074 calls used a maximum copy size of 64 
bytes.
 
I know the 2213 size calls come from the set_ps_display () function. I don't 
know where the 64 size calls come from, yet.
 
In the 64 size case, with the exception of 35 calls, calls for size 64 are only 
copying 1 byte - I would assume this is a NULL.
 
In the 2213 size case, 1933027 calls copy 20 bytes; 2189415 calls copy 5 bytes; 
85550 calls copy 6 bytes and 2018482 calls copy 7 bytes.
 
Based on this data, it would seem that either memcpy () or strlcpy () calls 
would be better due to the source string size. 
 
Call originating from the set_ps_display () function might be able to use the 
memcpy () call as  the size of the source string should be known. The other 
calls probably need something like strlcpy () as the source string might not be 
known, although using memcpy () to copy in XX byte blocks might be interesting.
 
David



From: [EMAIL PROTECTED] on behalf of [EMAIL PROTECTED]
Sent: Fri 9/29/2006 2:59 PM
To: Tom Lane
Cc: pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] Faster StrNCpy



On Fri, Sep 29, 2006 at 05:34:30PM -0400, Tom Lane wrote:
 [EMAIL PROTECTED] writes:
  If anybody is curious, here are my numbers for an AMD X2 3800+:
 You did not show your C code, so no one else can reproduce the test on
 other hardware.  However, it looks like your compiler has unrolled the
 memcpy into straight-line 8-byte moves, which makes it pretty hard for
 anything operating byte-wise to compete, and is a bit dubious for the
 general case anyway (since it requires assuming that the size and
 alignment are known at compile time).

I did show the .s code. I call into x_memcpy(a, b), meaning that the
compiler can't assume anything. It may happen to be aligned.

Here are results over 64 Mbytes of memory, to ensure that every call is
a cache miss:

$ gcc -O3 -std=c99 -DSTRING='This is a very long sentence that is expected to 
be very slow.' -DN=(1024*1024) -o x x.c y.c strlcpy.c ; ./x
NONE:767243 us
MEMCPY: 6044137 us
STRNCPY:   10741759 us
STRLCPY:   12061630 us
LENCPY: 9459099 us

$ gcc -O3

Re: [HACKERS] Faster StrNCpy

2006-10-02 Thread Tom Lane
[EMAIL PROTECTED] writes:
 Here is the cache hit case including your strlen+memcpy as 'LENCPY':

 $ gcc -O3 -std=c99 -DSTRING='This is a very long sentence that is expected 
 to be very slow.' -DN=1 -o x x.c y.c strlcpy.c ; ./x
 NONE:696157 us
 MEMCPY:  825118 us
 STRNCPY:7983159 us
 STRLCPY:   10787462 us
 LENCPY: 6048339 us

It appears that these results are a bit platform-dependent; on my x86_64
(Xeon) Fedora 5 box, I get

$ gcc -O3 -std=c99 -DSTRING='This is a very long sentence that is expected to 
be very slow.' -DN=1 x.c y.c strlcpy.c
$ ./a.out
NONE:358679 us
MEMCPY:  619255 us
STRNCPY:8932551 us
STRLCPY:9212371 us
LENCPY:13910413 us

I'm not sure why the lencpy method sucks so badly on this machine :-(.

Anyway, I looked at glibc's strncpy and determined that on this machine
the only real optimization that's been done to it is to unroll the data
copying loop four times.  I did the same to strlcpy (attached) and got
numbers like these:

$ gcc -O3 -std=c99 -DSTRING='This is a very long sentence that is expected to 
be very slow.' -DN=1 x.c y.c strlcpy.c
$ ./a.out
NONE:359317 us
MEMCPY:  619636 us
STRNCPY:8933507 us
STRLCPY:7644576 us
LENCPY:13917927 us
$ gcc -O3 -std=c99 -DSTRING='This is a very long sentence that is expected to 
be very slow.' -DN=(1024*1024) x.c y.c strlcpy.c
$ ./a.out
NONE:502960 us
MEMCPY: 5382528 us
STRNCPY:9733890 us
STRLCPY:8740892 us
LENCPY:15358616 us
$ gcc -O3 -std=c99 -DSTRING='short' -DN=1 x.c y.c strlcpy.c
$ ./a.out
NONE:358426 us
MEMCPY:  618533 us
STRNCPY:6704926 us
STRLCPY: 867336 us
LENCPY:10115883 us
$ gcc -O3 -std=c99 -DSTRING='short' -DN=(1024*1024) x.c y.c strlcpy.c
$ ./a.out
NONE:502746 us
MEMCPY: 5365171 us
STRNCPY:7983610 us
STRLCPY:5557277 us
LENCPY:11533066 us

So the unroll seems to get us to the point of not losing compared to the
original strncpy code for any string length, and so I propose doing
that, if it holds up on other architectures.

regards, tom lane


size_t
strlcpy(char *dst, const char *src, size_t siz)
{
char *d = dst;
const char *s = src;
size_t n = siz;

/* Copy as many bytes as will fit */
if (n != 0) {
while (n  4) {
if ((*d++ = *s++) == '\0')
goto done;
if ((*d++ = *s++) == '\0')
goto done;
if ((*d++ = *s++) == '\0')
goto done;
if ((*d++ = *s++) == '\0')
goto done;
n -= 4;
}
while (--n != 0) {
if ((*d++ = *s++) == '\0')
goto done;
}
}

/* Not enough room in dst, add NUL and traverse rest of src */
if (siz != 0)
*d = '\0';  /* NUL-terminate dst */
while (*s++)
;

done:
return(s - src - 1);/* count does not include NUL */
}

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Faster StrNCpy

2006-10-02 Thread Luke Lonergan
Mark,

On 9/29/06 2:59 PM, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:

 Here are results over 64 Mbytes of memory, to ensure that every call is
 a cache miss:

On my Mac OSX intel laptop (Core Duo, 2.16 GHz, 2GB RAM, gcc 4.01):

Luke-Lonergans-Computer:~/strNcpy-perf-test lukelonergan$ gcc -O3 -std=c99
-DSTRING='This is a very long sentence that is expected to be very slow.'
-DN=(1024*1024) -o x x.c y.c strlcpy.c ; ./x
/usr/bin/ld: warning multiple definitions of symbol _strlcpy
/var/tmp//cc1eMVq7.o definition of _strlcpy in section (__TEXT,__text)
/usr/lib/gcc/i686-apple-darwin8/4.0.1/../../../libSystem.dylib(strlcpy.So)
definition of _strlcpy
NONE:416677 us
MEMCPY: 5649587 us
STRNCPY:5806591 us
STRLCPY:   12865010 us
LENCPY:17801485 us
Luke-Lonergans-Computer:~/strNcpy-perf-test lukelonergan$ gcc -O3 -std=c99
-DSTRING='Short sentence.' -DN=(1024*1024) -o x x.c y.c strlcpy.c ; ./x
/usr/bin/ld: warning multiple definitions of symbol _strlcpy
/var/tmp//ccOZl9R6.o definition of _strlcpy in section (__TEXT,__text)
/usr/lib/gcc/i686-apple-darwin8/4.0.1/../../../libSystem.dylib(strlcpy.So)
definition of _strlcpy
NONE:416652 us
MEMCPY: 5830540 us
STRNCPY:6207594 us
STRLCPY:5582607 us
LENCPY: 7887703 us
Luke-Lonergans-Computer:~/strNcpy-perf-test lukelonergan$ gcc -O3 -std=c99
-DSTRING='' -DN=(1024*1024) -o x x.c y.c strlcpy.c ; ./x
/usr/bin/ld: warning multiple definitions of symbol _strlcpy
/var/tmp//ccBUsIdR.o definition of _strlcpy in section (__TEXT,__text)
/usr/lib/gcc/i686-apple-darwin8/4.0.1/../../../libSystem.dylib(strlcpy.So)
definition of _strlcpy
NONE:417210 us
MEMCPY: 5506346 us
STRNCPY:5769302 us
STRLCPY:5424234 us
LENCPY: 5609338 us


- Luke



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Faster StrNCpy

2006-10-02 Thread mark
Thanks for the results on your machines Dave, Luke, and Tom. Interesting.

On Mon, Oct 02, 2006 at 02:30:11PM -0400, Tom Lane wrote:
 It appears that these results are a bit platform-dependent; on my x86_64
 (Xeon) Fedora 5 box, I get

*nod*

 Anyway, I looked at glibc's strncpy and determined that on this machine
 the only real optimization that's been done to it is to unroll the data
 copying loop four times.  I did the same to strlcpy (attached) and got
 numbers like these:

 So the unroll seems to get us to the point of not losing compared to the
 original strncpy code for any string length, and so I propose doing
 that, if it holds up on other architectures.

Sounds reasonable. strlcpy() is the best match. I'm not sure why GLIBC
doesn't come with strlcpy(). Apparently some philosophical differences
against it by the GLIBC maintainers... :-)

memcpy() makes too many assumptions, that would be difficult to prove
in each case, or may be difficult to ensure that they remain true. I
only mentioned it as a possibility, to keep the options open, and to
show the sort of effects we are dealing with.

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Faster StrNCpy

2006-10-02 Thread Sergey E. Koposov

Mark, Tom,

Just the test on IA64 (Itanium2, 1.6Ghz, 8Gb memory). The results seem to
be quite different:

[EMAIL PROTECTED]:~/test$ gcc -O3 -std=c99 -DSTRING='This is a very long sentence that is 
expected to be very slow.' -DN=(1024*1024) -o x x.c y.c strlcpy.c ; ./x
NONE:825671 us
MEMCPY: 4533637 us
STRNCPY:   10959380 us
STRLCPY:   34258306 us
LENCPY:13939373 us
[EMAIL PROTECTED]:~/test$ gcc -O3 -std=c99 -DSTRING='This is a very long sentence that is 
expected to be very slow.' -DN=1 -o x x.c y.c strlcpy.c ; ./x
NONE:938815 us
MEMCPY: 3441330 us
STRNCPY:5881628 us
STRLCPY:   20167825 us
LENCPY:11326259 us

[EMAIL PROTECTED]:~/test$ gcc -O3 -std=c99 -DSTRING='Short sentence.' 
-DN=(1024*1024) -o x x.c y.c strlcpy.c ; ./x
NONE:831920 us
MEMCPY: 4550683 us
STRNCPY:9339254 us
STRLCPY:   16871433 us
LENCPY:21332076 us
[EMAIL PROTECTED]:~/test$ gcc -O3 -std=c99 -DSTRING='Short sentence.' -DN=1 
-o x x.c y.c strlcpy.c ; ./x
NONE:938104 us
MEMCPY: 3438871 us
STRNCPY:3438969 us
STRLCPY:6042829 us
LENCPY: 6022909 us

[EMAIL PROTECTED]:~/test$ gcc -O3 -std=c99 -DSTRING='' -DN=(1024*1024) -o x x.c y.c strlcpy.c ; ./x 
NONE:825472 us

MEMCPY: 4547456 us
STRNCPY:8591618 us 
STRLCPY:6818957 us

LENCPY: 8264340 us
[EMAIL PROTECTED]:~/test$ gcc -O3 -std=c99 -DSTRING='' -DN=1 -o x x.c y.c strlcpy.c ; ./x 
NONE:937878 us

MEMCPY: 3439101 us
STRNCPY:3188791 us 
STRLCPY: 750437 us

LENCPY: 2751238 us


Regards,
Sergey

***
Sergey E. Koposov
Max Planck Institute for Astronomy/Sternberg Astronomical Institute
Tel: +49-6221-528-349
Web: http://lnfm1.sai.msu.ru/~math
E-mail: [EMAIL PROTECTED]


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Faster StrNCpy

2006-10-02 Thread Tom Lane
Sergey E. Koposov [EMAIL PROTECTED] writes:
 Just the test on IA64 (Itanium2, 1.6Ghz, 8Gb memory). The results seem to
 be quite different:

What libc are you using exactly?  Can you try it with the unrolled
strlcpy I posted?

In glibc-2.4.90, there seem to be out-of-line assembly code
implementations of strncpy for: sparc64 sparc32 s390x s390 alpha ia64
and an inlined assembler version for i386.  So the x86_64 case is
nearly the only popular architecture that doesn't seem to have
a hand-hacked implementation ... which throws some doubt on Mark's and
my results as possibly not being very representative.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Faster StrNCpy

2006-10-02 Thread Sergey E. Koposov

On Mon, 2 Oct 2006, Tom Lane wrote:


Sergey E. Koposov [EMAIL PROTECTED] writes:

Just the test on IA64 (Itanium2, 1.6Ghz, 8Gb memory). The results seem to
be quite different:


What libc are you using exactly?  Can you try it with the unrolled
strlcpy I posted?


glibc 2.3.5 , gcc 3.4.4

my results were obtained already with your unrolled version (but when I 
first ran it with the 'default' strlcpy the results were the same).




In glibc-2.4.90, there seem to be out-of-line assembly code
implementations of strncpy for: sparc64 sparc32 s390x s390 alpha ia64
and an inlined assembler version for i386.  So the x86_64 case is
nearly the only popular architecture that doesn't seem to have
a hand-hacked implementation ... which throws some doubt on Mark's and
my results as possibly not being very representative.



Regards,
Sergey

***
Sergey E. Koposov
Max Planck Institute for Astronomy/Sternberg Astronomical Institute
Tel: +49-6221-528-349
Web: http://lnfm1.sai.msu.ru/~math
E-mail: [EMAIL PROTECTED]

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] Faster StrNCpy

2006-10-02 Thread Tom Lane
I did a couple more tests using x86 architectures.  On a rather old
Pentium-4 machine running Fedora 5 (gcc 4.1.1, glibc-2.4-11):

$ gcc -O3 -std=c99 -DSTRING='This is a very long sentence that is expected to 
be very slow.' -DN=(1024*1024) -o x x.c y.c strlcpy.c  
NONE:786305 us
MEMCPY: 9887843 us
STRNCPY:   15045780 us
STRLCPY:   1763 us
U_STRLCPY: 14994639 us
LENCPY:19700346 us

$ gcc -O3 -std=c99 -DSTRING='This is a very long sentence that is expected to 
be very slow.' -DN=1 -o x x.c y.c strlcpy.c 
NONE:562001 us
MEMCPY: 2026546 us
STRNCPY:   11149134 us
STRLCPY:   13747827 us
U_STRLCPY: 12467527 us
LENCPY:17672899 us

(STRLCPY is our CVS HEAD code, U_STRLCPY is the unrolled version)

On a Mac Mini (Intel Core Duo, OS X 10.4.8, gcc 4.0.1), the system has a
version of strlcpy, but it seems to suck :-(

$ gcc -O3 -std=c99 -DSTRING='This is a very long sentence that is expected to 
be very slow.' -DN=(1024*1024) -o x x.c y.c strlcpy.c ; ./x
NONE:480298 us
MEMCPY: 7857291 us
STRNCPY:   10485948 us
STRLCPY:   16745154 us
U_STRLCPY: 18337286 us
S_STRLCPY: 20920213 us
LENCPY:22878114 us

$ gcc -O3 -std=c99 -DSTRING='This is a very long sentence that is expected to 
be very slow.' -DN=1 -o x x.c y.c strlcpy.c ; ./x
NONE:480269 us
MEMCPY: 1858974 us
STRNCPY:5405618 us
STRLCPY:   16364452 us
U_STRLCPY: 16439753 us
S_STRLCPY: 19134538 us
LENCPY:22873141 us

It's interesting that the unrolled version is actually slower here.
I didn't dig into the assembly code, but maybe Apple's compiler isn't
doing a very good job with it?

Anyway, these results make me less excited about the unrolled version.

In any case, I don't think we should put too much emphasis on the
long-source-string case.  In essentially all cases, the true source
string length will be much shorter than the target buffer (were this
not so, we'd probably be needing to make the buffer bigger), and strlcpy
will certainly beat out strncpy in those situations.  The memcpy numbers
look attractive, but they ignore the problem that in practice we usually
don't know the source string length in advance --- so I don't think
those represent something achievable.

One thing that seems real clear is that the LENCPY method loses across
the board, which surprised me, but it's hard to argue with numbers.

I'm still interested to experiment with MemSet-then-strlcpy for namestrcpy,
but given the LENCPY results this may be a loser too.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Faster StrNCpy

2006-10-02 Thread Tom Lane
Strong, David [EMAIL PROTECTED] writes:
 Obviously, different copy mechanisms suit different data sizes. So, I
 added a little debug to the strlcpy () function that was added to
 Postgres the other day. I ran a test against Postgres for ~15 minutes
 that used 2 client backends and the BG writer - 8330804 calls to
 strlcpy () were generated by the test.
 
 Out of the 8330804 calls, 6226616 calls used a maximum copy size of
 2213 bytes e.g. strlcpy (dest, src, 2213) and 2104074 calls used a
 maximum copy size of 64 bytes.
 
 I know the 2213 size calls come from the set_ps_display () function. I
 don't know where the 64 size calls come from, yet.

Prepared-statement and portal hashtable lookups, likely.  Were your
clients using V3 extended query protocol?

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Faster StrNCpy

2006-10-02 Thread Strong, David
Tom,
 
Yes, the clients are using the V3 protocol and prepared statements.
 
David



From: Tom Lane [mailto:[EMAIL PROTECTED]
Sent: Mon 10/2/2006 2:09 PM
To: Strong, David
Cc: pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] Faster StrNCpy 



Strong, David [EMAIL PROTECTED] writes:
 Obviously, different copy mechanisms suit different data sizes. So, I
 added a little debug to the strlcpy () function that was added to
 Postgres the other day. I ran a test against Postgres for ~15 minutes
 that used 2 client backends and the BG writer - 8330804 calls to
 strlcpy () were generated by the test.

 Out of the 8330804 calls, 6226616 calls used a maximum copy size of
 2213 bytes e.g. strlcpy (dest, src, 2213) and 2104074 calls used a
 maximum copy size of 64 bytes.

 I know the 2213 size calls come from the set_ps_display () function. I
 don't know where the 64 size calls come from, yet.

Prepared-statement and portal hashtable lookups, likely.  Were your
clients using V3 extended query protocol?

regards, tom lane



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Faster StrNCpy

2006-10-02 Thread Bruce Momjian
Tom Lane wrote:
 Neil Conway [EMAIL PROTECTED] writes:
  A wholesale replacement of strncpy() calls is probably worth doing --
  replacing them with strlcpy() if the source string is NUL-terminated,
  and I suppose memcpy() otherwise.
 
 What I'd like to do immediately is put in strlcpy() and hit the two or
 three places I think are performance-relevant.  I agree with trying to
 get rid of StrNCpy/strncpy calls over the long run, but it's just code
 beautification and probably not appropriate for beta.

Added to TODO:

* Use strlcpy() rather than our StrNCpy() macro

-- 
  Bruce Momjian   [EMAIL PROTECTED]
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Faster StrNCpy

2006-10-02 Thread Bruce Momjian
Tom Lane wrote:
 Alvaro Herrera [EMAIL PROTECTED] writes:
  You'll notice that it iterates once per char.  Between that and the
  strlen() call in Tom's version, not sure which is the lesser evil.
 
 Yeah, I was wondering that too.  My code would require two scans of the
 source string (one inside strlen and one in memcpy), but in much of our
 usage the source and dest should be reasonably well aligned and one
 could expect memcpy to be using word rather than byte operations, so you
 might possibly make it back on the strength of fewer write cycles.  And
 on the third hand, for short source strings none of this matters and
 the extra function call involved for strlen/memcpy probably dominates.
 
 I'm happy to just use the OpenBSD version as a src/port module.
 Any objections?

I found this URL about the function history of strlcpy():

http://www.gratisoft.us/todd/papers/strlcpy.html

I added the URL to port/strlcpy.c.

-- 
  Bruce Momjian   [EMAIL PROTECTED]
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Faster StrNCpy

2006-10-02 Thread Josh Berkus

Tom,

FWIW, Tom Daly did some SpecJAppserver runs on the latest snapshot and 
didn't show any reduction in text parsing overhead.  Unfortunately, he's 
gone on vacation now so I can't get details.


I'm going to try to set up some tests using TPCE to see if it's affected.

--Josh

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] Faster StrNCpy

2006-09-29 Thread Markus Schaber
Hi, Tom,

Tom Lane wrote:
 Strong, David [EMAIL PROTECTED] writes:
 Just wondering - are any of these cases where a memcpy() would work
 just as well? Or are you not sure that the source string is at least
 64 bytes in length?
 
 In most cases, we're pretty sure that it's *not* --- it'll just be a
 palloc'd C string.
 
 I'm disinclined to fool with the restriction that namestrcpy zero-pad
 Name values, because they might end up on disk, and allowing random
 memory contents to get written out is ungood from a security point of
 view.

There's another disadvantage of always copying 64 bytes:

It may be that the 64-byte range crosses a page boundary. Now guess what
happens when this next page is not mapped - segfault.

Markus
-- 
Markus Schaber | Logical TrackingTracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in Europe! www.ffii.org
www.nosoftwarepatents.org



signature.asc
Description: OpenPGP digital signature


Re: [HACKERS] Faster StrNCpy

2006-09-29 Thread Tom Lane
Markus Schaber [EMAIL PROTECTED] writes:
 There's another disadvantage of always copying 64 bytes:
 It may be that the 64-byte range crosses a page boundary. Now guess what
 happens when this next page is not mapped - segfault.

Irrelevant, because in all interesting cases the Name field is part of a
larger record that would stretch into that other page anyway.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Faster StrNCpy

2006-09-29 Thread mark
On Fri, Sep 29, 2006 at 11:21:21AM +0200, Markus Schaber wrote:
 Tom Lane wrote:
  Just wondering - are any of these cases where a memcpy() would work
  just as well? Or are you not sure that the source string is at least
  64 bytes in length?
  
  In most cases, we're pretty sure that it's *not* --- it'll just be a
  palloc'd C string.
  
  I'm disinclined to fool with the restriction that namestrcpy zero-pad
  Name values, because they might end up on disk, and allowing random
  memory contents to get written out is ungood from a security point of
  view.
 
 There's another disadvantage of always copying 64 bytes:
 
 It may be that the 64-byte range crosses a page boundary. Now guess what
 happens when this next page is not mapped - segfault.

With strncpy(), this possibility already exists. If it is a real problem,
that stand-alone 64-byte allocations are crossing page boundaries, the
fault is with the memory allocator, not with the user of the memory.

For strlcpy(), my suggestion that Tom quotes was that modern processes
do best when instructions can be fully parallelized. It is a lot
easier to parallelize a 64-byte copy, than a tight loop looking for
'\0' or n = 64. 64 bytes easily fits into cache memory, and modern
processors write cache memory in blocks of 16, 32, or 64 bytes anyways,
meaning that any savings in terms of not writing are minimal.

But it's only safe if you know that the source string allocation is
= 64 bytes. Often you don't, therefore it isn't safe, and the suggestion
is unworkable.

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Faster StrNCpy

2006-09-29 Thread mark
If anybody is curious, here are my numbers for an AMD X2 3800+:

$ gcc -O3 -std=c99 -DSTRING='This is a very long sentence that is expected to 
be slow.' -o x x.c y.c strlcpy.c ; ./x
NONE:620268 us
MEMCPY:  683135 us
STRNCPY:7952930 us
STRLCPY:   10042364 us

$ gcc -O3 -std=c99 -DSTRING='Short sentence.' -o x x.c y.c strlcpy.c ; ./x
NONE:554694 us
MEMCPY:  691390 us
STRNCPY:7759933 us
STRLCPY:3710627 us

$ gcc -O3 -std=c99 -DSTRING='' -o x x.c y.c strlcpy.c ; ./x
NONE:631266 us
MEMCPY:  775340 us
STRNCPY:7789267 us
STRLCPY: 550430 us

Each invocation represents 100 million calls to each of the functions.
Each function accepts a 'dst' and 'src' argument, and assumes that it
is copying 64 bytes from 'src' to 'dst'. The none function does
nothing. The memcpy calls memcpy(), the strncpy calls strncpy(), and
the strlcpy calls the strlcpy() that was posted from the BSD sources.
(GLIBC doesn't have strlcpy() on my machine).

This makes it clear what the overhead of the additional logic involves.
memcpy() is approximately equal to nothing at all. strncpy() is always
expensive. strlcpy() is often more expensive than memcpy(), except in
the empty string case.

These tests do not properly model the effects of real memory, however,
they do model the effects of cache memory. I would suggest that the
results are exaggerated, but not invalid.

For anybody doubting the none vs memcpy, I've included the generated
assembly code. I chalk it entirely up to fully utilizing the
parallelization capability of the CPU. Although 16 movq instructions
are executed, they can be executed fully in parallel.

It almost makes it clear to me that all of these instructions are
pretty fast. Are we sure this is a real bottleneck? Even the slowest
operation above, strlcpy() on a very long string, appears to execute
10 per microsecond? Perhaps my tests are too easy for my CPU and I
need to make it access many different 64-byte blocks? :-)

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/

.file   x.c
.text
.p2align 4,,15
.globl x_none
.type   x_none, @function
x_none:
.LFB14:
rep ; ret
.LFE14:
.size   x_none, .-x_none
.p2align 4,,15
.globl x_strlcpy
.type   x_strlcpy, @function
x_strlcpy:
.LFB17:
movl$64, %edx
jmp strlcpy
.LFE17:
.size   x_strlcpy, .-x_strlcpy
.p2align 4,,15
.globl x_strncpy
.type   x_strncpy, @function
x_strncpy:
.LFB16:
movl$64, %edx
jmp strncpy
.LFE16:
.size   x_strncpy, .-x_strncpy
.p2align 4,,15
.globl x_memcpy
.type   x_memcpy, @function
x_memcpy:
.LFB15:
movq(%rsi), %rax
movq%rax, (%rdi)
movq8(%rsi), %rax
movq%rax, 8(%rdi)
movq16(%rsi), %rax
movq%rax, 16(%rdi)
movq24(%rsi), %rax
movq%rax, 24(%rdi)
movq32(%rsi), %rax
movq%rax, 32(%rdi)
movq40(%rsi), %rax
movq%rax, 40(%rdi)
movq48(%rsi), %rax
movq%rax, 48(%rdi)
movq56(%rsi), %rax
movq%rax, 56(%rdi)
ret
.LFE15:
.size   x_memcpy, .-x_memcpy
.section.eh_frame,a,@progbits
.Lframe1:
.long   .LECIE1-.LSCIE1
.LSCIE1:
.long   0x0
.byte   0x1
.string zR
.uleb128 0x1
.sleb128 -8
.byte   0x10
.uleb128 0x1
.byte   0x3
.byte   0xc
.uleb128 0x7
.uleb128 0x8
.byte   0x90
.uleb128 0x1
.align 8
.LECIE1:
.LSFDE1:
.long   .LEFDE1-.LASFDE1
.LASFDE1:
.long   .LASFDE1-.Lframe1
.long   .LFB14
.long   .LFE14-.LFB14
.uleb128 0x0
.align 8
.LEFDE1:
.LSFDE3:
.long   .LEFDE3-.LASFDE3
.LASFDE3:
.long   .LASFDE3-.Lframe1
.long   .LFB17
.long   .LFE17-.LFB17
.uleb128 0x0
.align 8
.LEFDE3:
.LSFDE5:
.long   .LEFDE5-.LASFDE5
.LASFDE5:
.long   .LASFDE5-.Lframe1
.long   .LFB16
.long   .LFE16-.LFB16
.uleb128 0x0
.align 8
.LEFDE5:
.LSFDE7:
.long   .LEFDE7-.LASFDE7
.LASFDE7:
.long   .LASFDE7-.Lframe1
.long   .LFB15
.long   .LFE15-.LFB15
.uleb128 0x0
.align 8
.LEFDE7:
.ident  GCC: (GNU) 4.1.1 20060525 (Red Hat 4.1.1-1)
.section.note.GNU-stack,,@progbits

---(end of 

Re: [HACKERS] Faster StrNCpy

2006-09-29 Thread Tom Lane
[EMAIL PROTECTED] writes:
 If anybody is curious, here are my numbers for an AMD X2 3800+:

You did not show your C code, so no one else can reproduce the test on
other hardware.  However, it looks like your compiler has unrolled the
memcpy into straight-line 8-byte moves, which makes it pretty hard for
anything operating byte-wise to compete, and is a bit dubious for the
general case anyway (since it requires assuming that the size and
alignment are known at compile time).

This does make me wonder about whether we shouldn't try the
strlen+memcpy implementation I proposed earlier ...

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Faster StrNCpy

2006-09-29 Thread mark
On Fri, Sep 29, 2006 at 05:34:30PM -0400, Tom Lane wrote:
 [EMAIL PROTECTED] writes:
  If anybody is curious, here are my numbers for an AMD X2 3800+:
 You did not show your C code, so no one else can reproduce the test on
 other hardware.  However, it looks like your compiler has unrolled the
 memcpy into straight-line 8-byte moves, which makes it pretty hard for
 anything operating byte-wise to compete, and is a bit dubious for the
 general case anyway (since it requires assuming that the size and
 alignment are known at compile time).

I did show the .s code. I call into x_memcpy(a, b), meaning that the
compiler can't assume anything. It may happen to be aligned.

Here are results over 64 Mbytes of memory, to ensure that every call is
a cache miss:

$ gcc -O3 -std=c99 -DSTRING='This is a very long sentence that is expected to 
be very slow.' -DN=(1024*1024) -o x x.c y.c strlcpy.c ; ./x
NONE:767243 us
MEMCPY: 6044137 us
STRNCPY:   10741759 us
STRLCPY:   12061630 us
LENCPY: 9459099 us

$ gcc -O3 -std=c99 -DSTRING='Short sentence.' -DN=(1024*1024) -o x x.c y.c 
strlcpy.c ; ./x
NONE:712193 us
MEMCPY: 6072312 us
STRNCPY:9982983 us
STRLCPY:6605052 us
LENCPY: 7128258 us

$ gcc -O3 -std=c99 -DSTRING='' -DN=(1024*1024) -o x x.c y.c strlcpy.c ; ./x 
NONE:708164 us
MEMCPY: 6042817 us
STRNCPY:8885791 us
STRLCPY:5592477 us
LENCPY: 6135550 us

At least on my machine, memcpy() still comes out on top. Yes, assuming that
it is aligned correctly for the machine. Here is unaliagned (all arrays are
stored +1 offset in memory):

$ gcc -O3 -std=c99 -DSTRING='This is a very long sentence that is expected to 
be very slow.' -DN=(1024*1024) -DALIGN=1 -o x x.c y.c strlcpy.c ; ./x
NONE:790932 us
MEMCPY: 6591559 us
STRNCPY:   10622291 us
STRLCPY:   12070007 us
LENCPY:10322541 us

$ gcc -O3 -std=c99 -DSTRING='Short sentence.' -DN=(1024*1024) -DALIGN=1 -o 
x x.c y.c strlcpy.c ; ./x
NONE:764577 us
MEMCPY: 6631731 us
STRNCPY:9513540 us
STRLCPY:6615345 us
LENCPY: 7263392 us

$ gcc -O3 -std=c99 -DSTRING='' -DN=(1024*1024) -DALIGN=1 -o x x.c y.c 
strlcpy.c ; ./x
NONE:825689 us
MEMCPY: 660 us
STRNCPY:8976487 us
STRLCPY:5878088 us
LENCPY: 6180358 us

Alignment looks like it does impact the results for memcpy(). memcpy()
changes from around 6.0 seconds to 6.6 seconds. Overall, though, it is
still the winner in all cases accept for strlcpy(), which beats it on
very short strings ().

Here is the cache hit case including your strlen+memcpy as 'LENCPY':

$ gcc -O3 -std=c99 -DSTRING='This is a very long sentence that is expected to 
be very slow.' -DN=1 -o x x.c y.c strlcpy.c ; ./x
NONE:696157 us
MEMCPY:  825118 us
STRNCPY:7983159 us
STRLCPY:   10787462 us
LENCPY: 6048339 us

$ gcc -O3 -std=c99 -DSTRING='Short sentence.' -DN=1 -o x x.c y.c strlcpy.c ; 
./x
NONE:700201 us
MEMCPY:  593701 us
STRNCPY:7577380 us
STRLCPY:3727801 us
LENCPY: 3169783 us

$ gcc -O3 -std=c99 -DSTRING='' -DN=1 -o x x.c y.c strlcpy.c ; ./x
NONE:706283 us
MEMCPY:  792719 us
STRNCPY:7870425 us
STRLCPY: 681334 us
LENCPY: 2062983 us


First call was every call being a cache hit. With this one, every one is
a cache miss, and the 64-byte blocks are spread equally over 64 Mbytes of
memory. I've attached the code for your consideration. x.c is the routines
I used to perform the tests. y.c is the main program. strlcpy.c is copied
from the online reference as is without change. The compilation steps
are described above. STRING is the string to try out. N is the number
of 64-byte blocks to allocate. ALIGN is the number of bytes to offset
the array by when storing / reading / writing. ALIGN should be = 0.

At N=1, it's all in cache. At N=1024*1024 it is taking up 64 Mbytes of
RAM.

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/

#include string.h
#include sys/types.h

size_t
strlcpy(char *dst, const char *src, size_t siz);

void x_none(char * restrict a, const char * restrict b) {
// Do nothing.
}

void x_memcpy(char * restrict a, const char * restrict b) {
memcpy(a, b, 64);
}

void x_strncpy(char * restrict a, const char * restrict b) {
strncpy(a, b, 64);
}

void x_strlcpy(char * restrict a, const char * restrict b) {
strlcpy(a, b, 64);
}

void x_strlenmemcpy(char * restrict a, const char * restrict b) {
size_t len = strlen(b) + 1;
memcpy(a, b, len  64 ? len : 64);
}

#include stdio.h
#include string.h
#include 

Re: [HACKERS] Faster StrNCpy

2006-09-28 Thread Strong, David
Mark,

In the specific case of the namestrcpy () function, it will copy a
maximum of 64 bytes, but the length of the source string is unknown. I
would have to think that memcpy () would certainly win if you knew the
source and destination sizes etc. Perhaps there are some places like
that in the code that don't use memcpy () currently?

David

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, September 27, 2006 4:27 PM
To: Strong, David
Cc: pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] Faster StrNCpy

On Wed, Sep 27, 2006 at 07:08:05AM -0700, Strong, David wrote:
 We sometimes see TupleDescInitEntry () taking high CPU times via
 OProfile. This does include, amongst a lot of other code, a call to
 namestrcpy () which in turn calls StrNCpy (). Perhaps this is not a
 good candidate right now as a name string is only 64 bytes.

Just wondering - are any of these cases where a memcpy() would work
just as well? Or are you not sure that the source string is at least
64 bytes in length?

memcpy(target, source, sizeof(target));
target[sizeof(target)-1] = '\0';

I imagine any extra checking causes processor stalls, or at least for
the branch prediction to fill up? Straight copies might allow for
maximum parallelism? If it's only 64 bytes, on processors such as
Pentium or Athlon, that's 2 or 4 cache lines, and writes are always
performed as cache lines.

I haven't seen the code that you and Tom are looking at to tell
whether it is safe to do this or not.

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED]
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood
Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario,
Canada

  One ring to rule them all, one ring to find them, one ring to bring
them all
   and in the darkness bind them...

   http://mark.mielke.cc/


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Faster StrNCpy

2006-09-28 Thread Tom Lane
Strong, David [EMAIL PROTECTED] writes:
 Just wondering - are any of these cases where a memcpy() would work
 just as well? Or are you not sure that the source string is at least
 64 bytes in length?

In most cases, we're pretty sure that it's *not* --- it'll just be a
palloc'd C string.

I'm disinclined to fool with the restriction that namestrcpy zero-pad
Name values, because they might end up on disk, and allowing random
memory contents to get written out is ungood from a security point of
view.  However, it's entirely possible that it'd be a bit faster to do
a MemSet followed by strlcpy than to use strncpy for zero-padding.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Faster StrNCpy

2006-09-27 Thread Strong, David
Tom,
 
Let us know when you've added strlcpy () and we'll be happy to run some tests 
on the new code.
 
David



From: [EMAIL PROTECTED] on behalf of Tom Lane
Sent: Tue 9/26/2006 7:25 PM
To: josh@agliodbs.com
Cc: pgsql-hackers@postgresql.org; Neil Conway; Martijn van Oosterhout
Subject: Re: [HACKERS] Faster StrNCpy 



Josh Berkus josh@agliodbs.com writes:
 What I'd like to do immediately is put in strlcpy() and hit the two or
 three places I think are performance-relevant.

 Immediately?  Presumably you mean for 8.3?

No, I mean now.  This is a performance bug and it's still open season on
bugs.  If we were close to having a release-candidate version, I'd hold
off, but the above proposal seems sufficiently low-risk for the current
stage of the cycle.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Faster StrNCpy

2006-09-27 Thread Andrew Dunstan

Tom Lane wrote:

Josh Berkus josh@agliodbs.com writes:
  

What I'd like to do immediately is put in strlcpy() and hit the two or
three places I think are performance-relevant.
  


  

Immediately?  Presumably you mean for 8.3?



No, I mean now.  This is a performance bug and it's still open season on
bugs.  If we were close to having a release-candidate version, I'd hold
off, but the above proposal seems sufficiently low-risk for the current
stage of the cycle.

  


What are the other hotspots?

cheers

andrew


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Faster StrNCpy

2006-09-27 Thread Tom Lane
Andrew Dunstan [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 What I'd like to do immediately is put in strlcpy() and hit the two or
 three places I think are performance-relevant.

 What are the other hotspots?

The ones I can think of offhand are set_ps_display and use of strncpy as
a HashCopyFunc.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Faster StrNCpy

2006-09-27 Thread Strong, David
We sometimes see TupleDescInitEntry () taking high CPU times via OProfile. This 
does include, amongst a lot of other code, a call to namestrcpy () which in 
turn calls StrNCpy (). Perhaps this is not a good candidate right now as a name 
string is only 64 bytes.
 
David



From: [EMAIL PROTECTED] on behalf of Tom Lane
Sent: Wed 9/27/2006 6:49 AM
To: Andrew Dunstan
Cc: josh@agliodbs.com; pgsql-hackers@postgresql.org; Neil Conway; Martijn van 
Oosterhout
Subject: Re: [HACKERS] Faster StrNCpy 



Andrew Dunstan [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 What I'd like to do immediately is put in strlcpy() and hit the two or
 three places I think are performance-relevant.

 What are the other hotspots?

The ones I can think of offhand are set_ps_display and use of strncpy as
a HashCopyFunc.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Faster StrNCpy

2006-09-27 Thread mark
On Wed, Sep 27, 2006 at 07:08:05AM -0700, Strong, David wrote:
 We sometimes see TupleDescInitEntry () taking high CPU times via
 OProfile. This does include, amongst a lot of other code, a call to
 namestrcpy () which in turn calls StrNCpy (). Perhaps this is not a
 good candidate right now as a name string is only 64 bytes.

Just wondering - are any of these cases where a memcpy() would work
just as well? Or are you not sure that the source string is at least
64 bytes in length?

memcpy(target, source, sizeof(target));
target[sizeof(target)-1] = '\0';

I imagine any extra checking causes processor stalls, or at least for
the branch prediction to fill up? Straight copies might allow for
maximum parallelism? If it's only 64 bytes, on processors such as
Pentium or Athlon, that's 2 or 4 cache lines, and writes are always
performed as cache lines.

I haven't seen the code that you and Tom are looking at to tell
whether it is safe to do this or not.

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


[HACKERS] Faster StrNCpy

2006-09-26 Thread Tom Lane
David Strong points out here
http://archives.postgresql.org/pgsql-hackers/2006-09/msg02071.php
that some popular implementations of strncpy(dst,src,n) are quite
inefficient when strlen(src) is much less than n, because they don't
optimize the zero-pad step that is required by the standard.

It looks to me like we have a good number of places that are using
either StrNCpy or strncpy directly to copy into large buffers that
we do not need full zero-padding in, only a single guaranteed null
byte.  While not all of these places are in performance-critical
paths, some are.  David identified set_ps_display, and the other
thing that's probably significant is unnecessary use of strncpy
for keys of string-keyed hash tables.  (We used to actually need
zero padding for string-keyed hash keys, but that was a long time ago.)

I propose adding an additional macro in c.h, along the lines of

#define StrNCopy(dst,src,len) \
do \
{ \
char * _dst = (dst); \
Size _len = (len); \
\
if (_len  0) \
{ \
const char * _src = (src); \
Size _src_len = strlen(_src); \
\
if (_src_len  _len) \
memcpy(_dst, _src, _src_len + 1); \
else \
{ \
memcpy(_dst, _src, _len - 1); \
_dst[_len-1] = '\0'; \
} \
} \
} while (0)

Unlike StrNCpy, this requires that the source string be null-terminated,
so it would not be a drop-in replacement everywhere.  Also, it could be
a performance loss if strlen(src) is much larger than len ... but that
is not usually the case for the places we'd want to apply it.

Thoughts, objections?  In particular, is the name OK, or do we need
something a bit further away from StrNCpy?

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Faster StrNCpy

2006-09-26 Thread Martijn van Oosterhout
On Tue, Sep 26, 2006 at 04:24:51PM -0400, Tom Lane wrote:
 David Strong points out here
 http://archives.postgresql.org/pgsql-hackers/2006-09/msg02071.php
 that some popular implementations of strncpy(dst,src,n) are quite
 inefficient when strlen(src) is much less than n, because they don't
 optimize the zero-pad step that is required by the standard.

I think that's why strlcpy was invented, to deal with the issues with
strncpy.

http://www.gratisoft.us/todd/papers/strlcpy.html

There's an implementation here (used in glib), though you could
probably find more.

http://mail.gnome.org/archives/gtk-devel-list/2000-May/msg00029.html

Do you really think it's worth making a macro rather than just a normal
function?

Have a nice day,
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 From each according to his ability. To each according to his ability to 
 litigate.


signature.asc
Description: Digital signature


Re: [HACKERS] Faster StrNCpy

2006-09-26 Thread Alvaro Herrera
Martijn van Oosterhout wrote:
 On Tue, Sep 26, 2006 at 04:24:51PM -0400, Tom Lane wrote:
  David Strong points out here
  http://archives.postgresql.org/pgsql-hackers/2006-09/msg02071.php
  that some popular implementations of strncpy(dst,src,n) are quite
  inefficient when strlen(src) is much less than n, because they don't
  optimize the zero-pad step that is required by the standard.
 
 I think that's why strlcpy was invented, to deal with the issues with
 strncpy.
 
 http://www.gratisoft.us/todd/papers/strlcpy.html
 
 There's an implementation here (used in glib), though you could
 probably find more.
 
 http://mail.gnome.org/archives/gtk-devel-list/2000-May/msg00029.html

That one would be LGPL (glib's license).  Here is OpenBSD's version,
linked from that one:

ftp://ftp.openbsd.org/pub/OpenBSD/src/lib/libc/string/strlcpy.c

You'll notice that it iterates once per char.  Between that and the
strlen() call in Tom's version, not sure which is the lesser evil.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Faster StrNCpy

2006-09-26 Thread Tom Lane
Martijn van Oosterhout kleptog@svana.org writes:
 I think that's why strlcpy was invented, to deal with the issues with
 strncpy.
 http://www.gratisoft.us/todd/papers/strlcpy.html

strlcpy does more than we need (note that none of the existing uses care
about counting the overflowed bytes).  Not sure if it's worth adopting
those semantics when they're not really standard, but if you think a lot
of people would be familiar with strlcpy, maybe we should.

 Do you really think it's worth making a macro rather than just a normal
 function?

Only in that a macro in c.h is less work than a configure test plus a
replacement file in src/port.  But if we want to consider this a
standard function that just doesn't happen to exist everywhere, I
suppose we should use configure.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Faster StrNCpy

2006-09-26 Thread Neil Conway
On Tue, 2006-09-26 at 16:53 -0400, Tom Lane wrote:
 strlcpy does more than we need (note that none of the existing uses care
 about counting the overflowed bytes).  Not sure if it's worth adopting
 those semantics when they're not really standard, but if you think a lot
 of people would be familiar with strlcpy, maybe we should.

I think we should -- while strlcpy() is not standardized, it is widely
used (in libc on all the BSDs, Solaris and OS X, as well as private
copies in Linux, glib, etc.).

A wholesale replacement of strncpy() calls is probably worth doing --
replacing them with strlcpy() if the source string is NUL-terminated,
and I suppose memcpy() otherwise.

-Neil



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Faster StrNCpy

2006-09-26 Thread Tom Lane
Alvaro Herrera [EMAIL PROTECTED] writes:
 You'll notice that it iterates once per char.  Between that and the
 strlen() call in Tom's version, not sure which is the lesser evil.

Yeah, I was wondering that too.  My code would require two scans of the
source string (one inside strlen and one in memcpy), but in much of our
usage the source and dest should be reasonably well aligned and one
could expect memcpy to be using word rather than byte operations, so you
might possibly make it back on the strength of fewer write cycles.  And
on the third hand, for short source strings none of this matters and
the extra function call involved for strlen/memcpy probably dominates.

I'm happy to just use the OpenBSD version as a src/port module.
Any objections?

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Faster StrNCpy

2006-09-26 Thread Tom Lane
Neil Conway [EMAIL PROTECTED] writes:
 A wholesale replacement of strncpy() calls is probably worth doing --
 replacing them with strlcpy() if the source string is NUL-terminated,
 and I suppose memcpy() otherwise.

What I'd like to do immediately is put in strlcpy() and hit the two or
three places I think are performance-relevant.  I agree with trying to
get rid of StrNCpy/strncpy calls over the long run, but it's just code
beautification and probably not appropriate for beta.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Faster StrNCpy

2006-09-26 Thread Josh Berkus
Tom,

 What I'd like to do immediately is put in strlcpy() and hit the two or
 three places I think are performance-relevant.  I agree with trying to
 get rid of StrNCpy/strncpy calls over the long run, but it's just code
 beautification and probably not appropriate for beta.

Immediately?  Presumably you mean for 8.3?

-- 
--Josh

Josh Berkus
PostgreSQL @ Sun
San Francisco

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Faster StrNCpy

2006-09-26 Thread Tom Lane
Josh Berkus josh@agliodbs.com writes:
 What I'd like to do immediately is put in strlcpy() and hit the two or
 three places I think are performance-relevant.

 Immediately?  Presumably you mean for 8.3?

No, I mean now.  This is a performance bug and it's still open season on
bugs.  If we were close to having a release-candidate version, I'd hold
off, but the above proposal seems sufficiently low-risk for the current
stage of the cycle.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend