Cool! It's been a while since we've done the same kind of thing :-) - Luke
> -----Original Message----- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf Of > Heikki Linnakangas > Sent: Saturday, February 23, 2008 5:30 PM > To: pgsql-patches@postgresql.org > Subject: [PATCHES] CopyReadLineText optimization > > The purpose of CopyReadLineText is to scan the input buffer, > and find the next newline, taking into account any escape > characters. It currently operates in a loop, one byte at a > time, searching for LF, CR, or a backslash. That's a bit > slow: I've been running oprofile on COPY, and I've seen > CopyReadLine to take around ~10% of the CPU time, and Joshua > Drake just posted a very similar profile to hackers. > > Attached is a patch that modifies CopyReadLineText so that it > uses memchr to speed up the scan. The nice thing about memchr > is that we can take advantage of any clever optimizations > that might be in libc or compiler. > > In the tests I've been running, it roughly halves the time > spent in CopyReadLine (including the new memchr calls), thus > reducing the total CPU overhead by ~5%. I'm planning to run > more tests with data that has backslashes and with different > width tables to see what the worst-case and best-case > performance is like. Also, it doesn't work for CSV format at > the moment; that needs to be fixed. > > 5% isn't exactly breathtaking, but it's a start. I tried the > same trick to CopyReadAttributesText, but unfortunately it > doesn't seem to help there because you need to "stop" the > efficient word-at-a-time scan that memchr does (at least with > glibc, YMMV) whenever there's a column separator, while in > CopyReadLineText you get to process the whole line in one > call, assuming there's no backslashes. > > -- > Heikki Linnakangas > EnterpriseDB http://www.enterprisedb.com > ---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster