Bernd Helmle <maili...@oopsware.de> writes: > --On Dienstag, Mai 05, 2009 10:00:37 -0400 Tom Lane <t...@sss.pgh.pa.us> > wrote: >> Seems like the right response might be some micro-optimization effort on >> byteaout.
> Hmm looking into profiler statistics seems to second your suspicion: > Normal COPY shows: > % cumulative self self total > time seconds seconds calls s/call s/call name > 31.29 81.38 81.38 134487 0.00 0.00 CopyOneRowTo > 22.88 140.89 59.51 134487 0.00 0.00 byteaout > 13.44 175.84 34.95 3052797224 0.00 0.00 > appendBinaryStringInfo > 12.10 207.32 31.48 3052990837 0.00 0.00 CopySendChar > 8.45 229.31 21.99 3052797226 0.00 0.00 enlargeStringInfo > 3.90 239.45 10.14 55500 0.00 0.00 pglz_decompress I hadn't looked closely at these numbers before, but now that I do, what I think they are telling us is that the high proportion of backslashes in standard bytea output is a real killer for COPY performance. With no backslashes, CopySendChar wouldn't be in the picture at all here, and appendBinaryStringInfo/enlargeStringInfo would be called many fewer times (roughly 134487 not 3052797224) with proportionately more characters processed per call. The inner loop of CopyOneRowTo (I assume CopyAttributeOutText has been inlined into that function) is relatively cheap for ordinary characters and much less so for backslashes, so I bet that number would go down too. And as already noted, byteaout itself works pretty hard to produce the current representation. So I'm now persuaded that a better textual representation for bytea should indeed make things noticeably better here. It would be useful though to cross-check this thought by profiling a case that dumps a comparable volume of text data that contains no backslashes... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers