Tom Lane wrote:
Andreas Pflug <[EMAIL PROTECTED]> writes:
When dumping the table with psql \copy (non-binary), the resulting file
would be 6.6GB of size, taking about 5.5 minutes. Using psql \copy WITH
BINARY (modified psql as posted to -patches), the time was cut down to
21-22 seconds (filesize 1.4GB as expected), which is near the physical
throughput of the target disk. If server based COPY to file is used, The
same factor 12 can be observed, CPU is up to 100 % (single P4 3GHz 2MB
Cache HT disabled, 1GB main mem).
This is with an 8.0.x server, right?
I've tested both 8.0.5 and 8.1.4, no difference observed.
Testing a similar case with CVS HEAD, I see about a 5x speed difference,
which is right in line with the difference in the physical amount of
data written.
That's what I would have expected, apparently the data is near worst case.
(I was testing a case where all the bytes were emitted as
'\nnn', so it's the worst case.) oprofile says the time is being spent
in CopyAttributeOutText() and fwrite(). So I don't think there's
anything to be optimized here, as far as bytea goes: its binary
representation is just inherently a lot smaller.
Unfortunately, binary isn't the cure for all, since copying normal data
with binary option might bloat that by factor two or so. I wish there
was a third option that's fine for both kinds of data. That's not only a
question of dump file sizes, but also of network throughput (an online
compression in the line protocol would be desirable for this).
Looking at CopySendData, I wonder whether any traction could be gained
by trying not to call fwrite() once per character. I'm not sure how
much per-call overhead there is in that function. We've done a lot of
work trying to optimize the COPY IN path since 8.0, but nothing much
on COPY OUT ...
Hm, I'll see whether I can manage to check CVS head too, and see what's
happening, not a production alternative though.
Regards,
Andreas
---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match